Message ID | 1370519849-10620-3-git-send-email-namei.unix@gmail.com |
---|---|
State | New |
Headers | show |
Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: > Just call sd_create_branch() to rollback the image is good enough > > Cc: qemu-devel@nongnu.org > Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > Cc: Kevin Wolf <kwolf@redhat.com> > Cc: Stefan Hajnoczi <stefanha@redhat.com> > Signed-off-by: Liu Yuan <namei.unix@gmail.com> > --- > block/sheepdog.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/block/sheepdog.c b/block/sheepdog.c > index 94218ac..cb5ca4a 100644 > --- a/block/sheepdog.c > +++ b/block/sheepdog.c > @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) > } > > if (!s->inode.vm_state_size) { > - error_report("Invalid snapshot"); > - ret = -ENOENT; > - goto out; > + /* qemu-img asks us to rollback, we need to do it right now */ > + ret = sd_create_branch(s); > + if (ret) { > + goto out; > + } > } I'm not sure how snapshots work internally for Sheepdog, but it seems odd to me that you need to do this only for disk-only snapshots, but not when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' works on images with a VM state, so the comment doesn't seem to be completely accurate) Would you mind explaining to me how this works in detail? Kevin
On 06/06/2013 08:46 PM, Kevin Wolf wrote: > Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: >> Just call sd_create_branch() to rollback the image is good enough >> >> Cc: qemu-devel@nongnu.org >> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> >> Cc: Kevin Wolf <kwolf@redhat.com> >> Cc: Stefan Hajnoczi <stefanha@redhat.com> >> Signed-off-by: Liu Yuan <namei.unix@gmail.com> >> --- >> block/sheepdog.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/block/sheepdog.c b/block/sheepdog.c >> index 94218ac..cb5ca4a 100644 >> --- a/block/sheepdog.c >> +++ b/block/sheepdog.c >> @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) >> } >> >> if (!s->inode.vm_state_size) { >> - error_report("Invalid snapshot"); >> - ret = -ENOENT; >> - goto out; >> + /* qemu-img asks us to rollback, we need to do it right now */ >> + ret = sd_create_branch(s); >> + if (ret) { >> + goto out; >> + } >> } > > I'm not sure how snapshots work internally for Sheepdog, but it seems > odd to me that you need to do this only for disk-only snapshots, but not > when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' > works on images with a VM state, so the comment doesn't seem to be > completely accurate) > > Would you mind explaining to me how this works in detail? > Hmm, the original code isn't written by me and this snapshot mechanism exists since day 0. I just hacks it to work now. So I'll try to explain on my understanding. When we do a snapshot such as 'savedvm' or 'qemu-img snapshot', the active vdi is snapshotted and marked as snapshot and a new vdi is created as copy-on-write on the previous active vdi, then this new vdi becomes active vdi. For e.g, As1 --> As2 --> A We take snapshot of vdi A twice, tagged s1 and s2 respectively. I guess this is quit similar to qcow2 snapshots, only inode object with a bitmap is created. So when we 'loadvm' or 'qemu-img snapshot -a' to A, current logic just handle 'loadvm', that .bdrv_snapshot_goto only reloads inode object, that is, for e.g, we 'savevm s1', and mark it as snapshot, the chain would like As1 --> As2 --> A | v just reload As1's inode object Only when the write comes from VM, we do the following stuff - delete active vdi A - created a new inode based on the previously reloaded As1's inode The chain will look like: As1 --> As2 | V A This is how sheepdog handles savevm/loadvm. So for 'qemu-img snapshot -a', we should create the branch in the .bdrv_snapshot_goto. As you pointed out, we need to consider vm state even for 'snapshot -a', so I need to rework the patch 2/2. Thanks, Yuan
On 06/06/2013 08:46 PM, Kevin Wolf wrote: > I'm not sure how snapshots work internally for Sheepdog, but it seems > odd to me that you need to do this only for disk-only snapshots, but not > when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' > works on images with a VM state, so the comment doesn't seem to be > completely accurate) Seems that I misunderstood your comments. What my 2/2 patch tried to do is enable 'qemu-img snapshot -a' to rollback the sheepdog disk states only and it is correct. So what I need fixing is comment, right? how about the commenting as 'qemu-img asks us to rollback disk only, we can't rely on the write request to sd_create_branch, so just call it directly' ? Thanks, Yuan
Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: > On 06/06/2013 08:46 PM, Kevin Wolf wrote: > > Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: > >> Just call sd_create_branch() to rollback the image is good enough > >> > >> Cc: qemu-devel@nongnu.org > >> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> > >> Cc: Kevin Wolf <kwolf@redhat.com> > >> Cc: Stefan Hajnoczi <stefanha@redhat.com> > >> Signed-off-by: Liu Yuan <namei.unix@gmail.com> > >> --- > >> block/sheepdog.c | 8 +++++--- > >> 1 file changed, 5 insertions(+), 3 deletions(-) > >> > >> diff --git a/block/sheepdog.c b/block/sheepdog.c > >> index 94218ac..cb5ca4a 100644 > >> --- a/block/sheepdog.c > >> +++ b/block/sheepdog.c > >> @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) > >> } > >> > >> if (!s->inode.vm_state_size) { > >> - error_report("Invalid snapshot"); > >> - ret = -ENOENT; > >> - goto out; > >> + /* qemu-img asks us to rollback, we need to do it right now */ > >> + ret = sd_create_branch(s); > >> + if (ret) { > >> + goto out; > >> + } > >> } > > > > I'm not sure how snapshots work internally for Sheepdog, but it seems > > odd to me that you need to do this only for disk-only snapshots, but not > > when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' > > works on images with a VM state, so the comment doesn't seem to be > > completely accurate) > > > > Would you mind explaining to me how this works in detail? > > > > Hmm, the original code isn't written by me and this snapshot mechanism > exists since day 0. I just hacks it to work now. So I'll try to explain > on my understanding. > > When we do a snapshot such as 'savedvm' or 'qemu-img snapshot', the > active vdi is snapshotted and marked as snapshot and a new vdi is > created as copy-on-write on the previous active vdi, then this new vdi > becomes active vdi. For e.g, > > As1 --> As2 --> A > > We take snapshot of vdi A twice, tagged s1 and s2 respectively. I guess > this is quit similar to qcow2 snapshots, only inode object with a bitmap > is created. > > So when we 'loadvm' or 'qemu-img snapshot -a' to A, current logic just > handle 'loadvm', that .bdrv_snapshot_goto only reloads inode object, > that is, for e.g, we 'savevm s1', and mark it as snapshot, the chain > would like > > As1 --> As2 --> A > | > v > just reload As1's inode object > > Only when the write comes from VM, we do the following stuff > - delete active vdi A > - created a new inode based on the previously reloaded As1's inode Thanks, this is the part that I missed. I'm not sure however why the actual switch is delayed until the first write. This seems inconsistent with qcow2 snapshots. Do you know if there is a reason why we can't always do this already during bdrv_snapshot_goto()? > The chain will look like: > > As1 --> As2 > | > V > A > > This is how sheepdog handles savevm/loadvm. > > So for 'qemu-img snapshot -a', we should create the branch in the > .bdrv_snapshot_goto. > > As you pointed out, we need to consider vm state even for 'snapshot -a', > so I need to rework the patch 2/2. Yes, the presence of VM state is independent from whether you're using qemu-img or loadvm. And it actually goes both ways: qemu-img can revert to snapshots that have a VM state, and loadvm can be used with images that don't have a VM state (if you have multiple images, only one of them has the VM state). Kevin
On 06/07/2013 03:31 PM, Kevin Wolf wrote: > Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: >> On 06/06/2013 08:46 PM, Kevin Wolf wrote: >>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: >>>> Just call sd_create_branch() to rollback the image is good enough >>>> >>>> Cc: qemu-devel@nongnu.org >>>> Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> >>>> Cc: Kevin Wolf <kwolf@redhat.com> >>>> Cc: Stefan Hajnoczi <stefanha@redhat.com> >>>> Signed-off-by: Liu Yuan <namei.unix@gmail.com> >>>> --- >>>> block/sheepdog.c | 8 +++++--- >>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/block/sheepdog.c b/block/sheepdog.c >>>> index 94218ac..cb5ca4a 100644 >>>> --- a/block/sheepdog.c >>>> +++ b/block/sheepdog.c >>>> @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) >>>> } >>>> >>>> if (!s->inode.vm_state_size) { >>>> - error_report("Invalid snapshot"); >>>> - ret = -ENOENT; >>>> - goto out; >>>> + /* qemu-img asks us to rollback, we need to do it right now */ >>>> + ret = sd_create_branch(s); >>>> + if (ret) { >>>> + goto out; >>>> + } >>>> } >>> >>> I'm not sure how snapshots work internally for Sheepdog, but it seems >>> odd to me that you need to do this only for disk-only snapshots, but not >>> when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' >>> works on images with a VM state, so the comment doesn't seem to be >>> completely accurate) >>> >>> Would you mind explaining to me how this works in detail? >>> >> >> Hmm, the original code isn't written by me and this snapshot mechanism >> exists since day 0. I just hacks it to work now. So I'll try to explain >> on my understanding. >> >> When we do a snapshot such as 'savedvm' or 'qemu-img snapshot', the >> active vdi is snapshotted and marked as snapshot and a new vdi is >> created as copy-on-write on the previous active vdi, then this new vdi >> becomes active vdi. For e.g, >> >> As1 --> As2 --> A >> >> We take snapshot of vdi A twice, tagged s1 and s2 respectively. I guess >> this is quit similar to qcow2 snapshots, only inode object with a bitmap >> is created. >> >> So when we 'loadvm' or 'qemu-img snapshot -a' to A, current logic just >> handle 'loadvm', that .bdrv_snapshot_goto only reloads inode object, >> that is, for e.g, we 'savevm s1', and mark it as snapshot, the chain >> would like >> >> As1 --> As2 --> A >> | >> v >> just reload As1's inode object >> >> Only when the write comes from VM, we do the following stuff >> - delete active vdi A >> - created a new inode based on the previously reloaded As1's inode > > Thanks, this is the part that I missed. > > I'm not sure however why the actual switch is delayed until the first > write. This seems inconsistent with qcow2 snapshots. > > Do you know if there is a reason why we can't always do this already > during bdrv_snapshot_goto()? > I think the reason is sd_load_vmstate() need to load vm state objects with the correct inode object. I tried to remove if (!s->inode.vm_state_size) and make sd_create_branch unconditional. This means 'loadvm' command will try to call sd_create_branch() inside sd_snapshot_goto(). But failed with reading the wrong snapshot because the vdi's inode object is changed by sd_create_branch(). >> The chain will look like: >> >> As1 --> As2 >> | >> V >> A >> >> This is how sheepdog handles savevm/loadvm. >> >> So for 'qemu-img snapshot -a', we should create the branch in the >> .bdrv_snapshot_goto. >> >> As you pointed out, we need to consider vm state even for 'snapshot -a', >> so I need to rework the patch 2/2. > > Yes, the presence of VM state is independent from whether you're using > qemu-img or loadvm. And it actually goes both ways: qemu-img can revert > to snapshots that have a VM state, and loadvm can be used with images > that don't have a VM state (if you have multiple images, only one of > them has the VM state). > Seems not true of current code. If I 'loadvm' a snapshot without a vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot. Revert to it offline using qemu-img'. But 'qemu-img snapshot -a' works as you said, it can rollback to the snapshot regardless of vmstate. Also this is a difference to store vmstate for sheepdog images. *Every* snapshot image can have its own vmstate stored in sheepdog cluster. That is, we can have multiple snapshot with its own private vmstate for sheepdog. I think my patch did the correct thing, just rollback the disk state of the snapshot for 'qemu-img snapshot -a'. Anyway, I found a minor issue of 2/2 patch, so I'll resend the set. Thanks, Yuan
Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben: > On 06/07/2013 03:31 PM, Kevin Wolf wrote: > > Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: > >> On 06/06/2013 08:46 PM, Kevin Wolf wrote: > >>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: > >> Only when the write comes from VM, we do the following stuff > >> - delete active vdi A > >> - created a new inode based on the previously reloaded As1's inode > > > > Thanks, this is the part that I missed. > > > > I'm not sure however why the actual switch is delayed until the first > > write. This seems inconsistent with qcow2 snapshots. > > > > Do you know if there is a reason why we can't always do this already > > during bdrv_snapshot_goto()? > > > > I think the reason is sd_load_vmstate() need to load vm state objects > with the correct inode object. > > I tried to remove > > if (!s->inode.vm_state_size) > > and make sd_create_branch unconditional. This means 'loadvm' command > will try to call sd_create_branch() inside sd_snapshot_goto(). But > failed with reading the wrong snapshot because the vdi's inode object is > changed by sd_create_branch(). Ok, I think I start to understand how these things work. Basically, qemu's savevm functionality is designed to work with qcow2 like this: 1. Save the VM state to the active layer 2. create_snapshot saves the active layer including VM state 3. [ Forget to remove the VM state from the active layer :-) ] 4. loadvm loads the snapshot again, including VM state 5. VM state is restored from the active layer So for Sheepdog, the problem is that step 2 doesn't include the VM state, right? So our options are either to change Sheepdog so that step 2 does involve moving the VM state (might end up rather ugly) or you need to swap steps 1 and 2, so that you first switch to the new snapshot and then write the VM state. Because I think what we're doing currently is not only hard to understand, but actually wrong. Consider this: 1. savevm with a Sheepdog image 2. Exit qemu before any write request is sent 3. Restart qemu with the Sheepdog image and notice how the wrong snapshot is active. Oops. Or doesn't Sheepdog have any notion of an "active snapshot" and this is only a runtime decision? > >> The chain will look like: > >> > >> As1 --> As2 > >> | > >> V > >> A > >> > >> This is how sheepdog handles savevm/loadvm. > >> > >> So for 'qemu-img snapshot -a', we should create the branch in the > >> .bdrv_snapshot_goto. > >> > >> As you pointed out, we need to consider vm state even for 'snapshot -a', > >> so I need to rework the patch 2/2. > > > > Yes, the presence of VM state is independent from whether you're using > > qemu-img or loadvm. And it actually goes both ways: qemu-img can revert > > to snapshots that have a VM state, and loadvm can be used with images > > that don't have a VM state (if you have multiple images, only one of > > them has the VM state). > > > > Seems not true of current code. If I 'loadvm' a snapshot without a > vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot. > Revert to it offline using qemu-img'. Try again with multiple disks. Only one disk gets the VM state, the other ones get only the disk snapshot. > But 'qemu-img snapshot -a' works as you said, it can rollback to the > snapshot regardless of vmstate. > > Also this is a difference to store vmstate for sheepdog images. *Every* > snapshot image can have its own vmstate stored in sheepdog cluster. That > is, we can have multiple snapshot with its own private vmstate for sheepdog. Sorry, can't parse. :-( > I think my patch did the correct thing, just rollback the disk state of > the snapshot for 'qemu-img snapshot -a'. Anyway, I found a minor issue > of 2/2 patch, so I'll resend the set. I think I agree that the problem is less with your patch, but more with the preexisting code. We should get it sorted out anyway. Kevin
On 06/07/2013 11:22 PM, Kevin Wolf wrote: > Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben: >> On 06/07/2013 03:31 PM, Kevin Wolf wrote: >>> Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: >>>> On 06/06/2013 08:46 PM, Kevin Wolf wrote: >>>>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: >>>> Only when the write comes from VM, we do the following stuff >>>> - delete active vdi A >>>> - created a new inode based on the previously reloaded As1's inode >>> >>> Thanks, this is the part that I missed. >>> >>> I'm not sure however why the actual switch is delayed until the first >>> write. This seems inconsistent with qcow2 snapshots. >>> >>> Do you know if there is a reason why we can't always do this already >>> during bdrv_snapshot_goto()? >>> >> >> I think the reason is sd_load_vmstate() need to load vm state objects >> with the correct inode object. >> >> I tried to remove >> >> if (!s->inode.vm_state_size) >> >> and make sd_create_branch unconditional. This means 'loadvm' command >> will try to call sd_create_branch() inside sd_snapshot_goto(). But >> failed with reading the wrong snapshot because the vdi's inode object is >> changed by sd_create_branch(). > > Ok, I think I start to understand how these things work. Basically, > qemu's savevm functionality is designed to work with qcow2 like this: > > 1. Save the VM state to the active layer > 2. create_snapshot saves the active layer including VM state > 3. [ Forget to remove the VM state from the active layer :-) ] > 4. loadvm loads the snapshot again, including VM state > 5. VM state is restored from the active layer > > So for Sheepdog, the problem is that step 2 doesn't include the VM state, > right? So our options are either to change Sheepdog so that step 2 does > involve moving the VM state (might end up rather ugly) or you need to > swap steps 1 and 2, so that you first switch to the new snapshot and > then write the VM state. > Sorry, I didn't fully understand the above example. If sheepdog takes snapshots, vmstate will go with the exact current active disk into the cluster storage, for e.g 1 we take two snapshots on the disk 'test' with tag A and B respectively 2 disk(A) + vmstate(A) will be stored as indexed by A's vdi disk(B) + vmstate(B) will be stored as indexed by B's vdi The chain is A --> B --> C, C is the current active vdi. Note, both A, B and C has different vdi_id. The problem show up when we do loadvm A, the process is: 1 reload inode of A (in snapshot_goto) 2 read vmstate via A's vdi_id (loadvm_state) 3 delete C and create C1, reload inode of C1 (sd_create_branch on write) So if at stage 1, we call sd_create_branch(), the inode will point to C1 and stage 2 will fail to read vmstate because vdi_id is C1's. This is why we rely on the write to call sd_create_branch(). I am not sure if we can fix sheepdog's loadvm without touching the core code of QEMU. > Because I think what we're doing currently is not only hard to > understand, but actually wrong. Consider this: > > 1. savevm with a Sheepdog image > 2. Exit qemu before any write request is sent > 3. Restart qemu with the Sheepdog image and notice how the wrong > snapshot is active. Oops. > Sheepdog indeed has this problem. > Or doesn't Sheepdog have any notion of an "active snapshot" and this is > only a runtime decision? > >>>> The chain will look like: >>>> >>>> As1 --> As2 >>>> | >>>> V >>>> A >>>> >>>> This is how sheepdog handles savevm/loadvm. >>>> >>>> So for 'qemu-img snapshot -a', we should create the branch in the >>>> .bdrv_snapshot_goto. >>>> >>>> As you pointed out, we need to consider vm state even for 'snapshot -a', >>>> so I need to rework the patch 2/2. >>> >>> Yes, the presence of VM state is independent from whether you're using >>> qemu-img or loadvm. And it actually goes both ways: qemu-img can revert >>> to snapshots that have a VM state, and loadvm can be used with images >>> that don't have a VM state (if you have multiple images, only one of >>> them has the VM state). >>> >> >> Seems not true of current code. If I 'loadvm' a snapshot without a >> vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot. >> Revert to it offline using qemu-img'. > > Try again with multiple disks. Only one disk gets the VM state, the > other ones get only the disk snapshot. > >> But 'qemu-img snapshot -a' works as you said, it can rollback to the >> snapshot regardless of vmstate. >> >> Also this is a difference to store vmstate for sheepdog images. *Every* >> snapshot image can have its own vmstate stored in sheepdog cluster. That >> is, we can have multiple snapshot with its own private vmstate for sheepdog. > > Sorry, can't parse. :-( > I misunderstood your statements, please ignore it. Thanks, Yuan
Am 07.06.2013 um 18:14 hat Liu Yuan geschrieben: > On 06/07/2013 11:22 PM, Kevin Wolf wrote: > > Am 07.06.2013 um 15:48 hat Liu Yuan geschrieben: > >> On 06/07/2013 03:31 PM, Kevin Wolf wrote: > >>> Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: > >>>> On 06/06/2013 08:46 PM, Kevin Wolf wrote: > >>>>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: > >>>> Only when the write comes from VM, we do the following stuff > >>>> - delete active vdi A > >>>> - created a new inode based on the previously reloaded As1's inode > >>> > >>> Thanks, this is the part that I missed. > >>> > >>> I'm not sure however why the actual switch is delayed until the first > >>> write. This seems inconsistent with qcow2 snapshots. > >>> > >>> Do you know if there is a reason why we can't always do this already > >>> during bdrv_snapshot_goto()? > >>> > >> > >> I think the reason is sd_load_vmstate() need to load vm state objects > >> with the correct inode object. > >> > >> I tried to remove > >> > >> if (!s->inode.vm_state_size) > >> > >> and make sd_create_branch unconditional. This means 'loadvm' command > >> will try to call sd_create_branch() inside sd_snapshot_goto(). But > >> failed with reading the wrong snapshot because the vdi's inode object is > >> changed by sd_create_branch(). > > > > Ok, I think I start to understand how these things work. Basically, > > qemu's savevm functionality is designed to work with qcow2 like this: > > > > 1. Save the VM state to the active layer > > 2. create_snapshot saves the active layer including VM state > > 3. [ Forget to remove the VM state from the active layer :-) ] > > 4. loadvm loads the snapshot again, including VM state > > 5. VM state is restored from the active layer > > > > So for Sheepdog, the problem is that step 2 doesn't include the VM state, > > right? So our options are either to change Sheepdog so that step 2 does > > involve moving the VM state (might end up rather ugly) or you need to > > swap steps 1 and 2, so that you first switch to the new snapshot and > > then write the VM state. > > > > Sorry, I didn't fully understand the above example. If sheepdog takes > snapshots, vmstate will go with the exact current active disk into the > cluster storage, for e.g > > 1 we take two snapshots on the disk 'test' with tag A and B respectively > 2 disk(A) + vmstate(A) will be stored as indexed by A's vdi > disk(B) + vmstate(B) will be stored as indexed by B's vdi > > The chain is A --> B --> C, C is the current active vdi. Note, both A, B > and C has different vdi_id. > > The problem show up when we do loadvm A, the process is: > > 1 reload inode of A (in snapshot_goto) > 2 read vmstate via A's vdi_id (loadvm_state) > 3 delete C and create C1, reload inode of C1 (sd_create_branch on write) > > So if at stage 1, we call sd_create_branch(), the inode will point to C1 > and stage 2 will fail to read vmstate because vdi_id is C1's. > > This is why we rely on the write to call sd_create_branch(). I am not > sure if we can fix sheepdog's loadvm without touching the core code of QEMU. Hm, yes, I was confused. :-) Anyway, the options stay the same: Either C1 must somehow inherit the VM state from A on the Sheepdog level, or we must make sure to get this order: 1. Switch to (read-only) snapshot A 2. Load the VM state 3. Create (writable) snapshot C1 and switch to it This is a different order as required for qcow2 (where C1 would inherit the VM state from A, so our current first 1, then 3, then 2 works), but if we can't fix it on the Sheepdog side, we'll have to touch the core code of qemu. I don't like much to touch the block.c code for it, but if it's required to fix a bug, let's just do it. > > Because I think what we're doing currently is not only hard to > > understand, but actually wrong. Consider this: > > > > 1. savevm with a Sheepdog image > > 2. Exit qemu before any write request is sent > > 3. Restart qemu with the Sheepdog image and notice how the wrong > > snapshot is active. Oops. > > > > Sheepdog indeed has this problem. Thanks for confirming. Kevin
On 06/08/2013 12:48 AM, Kevin Wolf wrote: > Hm, yes, I was confused. :-) > > Anyway, the options stay the same: Either C1 must somehow inherit the VM > state from A on the Sheepdog level, or we must make sure to get this > order: > > 1. Switch to (read-only) snapshot A > 2. Load the VM state > 3. Create (writable) snapshot C1 and switch to it > > This is a different order as required for qcow2 (where C1 would inherit > the VM state from A, so our current first 1, then 3, then 2 works), but > if we can't fix it on the Sheepdog side, we'll have to touch the core > code of qemu. I don't like much to touch the block.c code for it, but if > it's required to fix a bug, let's just do it. Hmm, I found actually we can inherit A's vdi_id from C1. I've sent v3 and I think this version will fix the possible bug you pointed out and do loadvm the way core code assumes. Thanks, Yuan
diff --git a/block/sheepdog.c b/block/sheepdog.c index 94218ac..cb5ca4a 100644 --- a/block/sheepdog.c +++ b/block/sheepdog.c @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, const char *snapshot_id) } if (!s->inode.vm_state_size) { - error_report("Invalid snapshot"); - ret = -ENOENT; - goto out; + /* qemu-img asks us to rollback, we need to do it right now */ + ret = sd_create_branch(s); + if (ret) { + goto out; + } } s->is_snapshot = true;
Just call sd_create_branch() to rollback the image is good enough Cc: qemu-devel@nongnu.org Cc: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> --- block/sheepdog.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)