Message ID | 1261b29106dbe3ccd08bac7ce737e67bd5ae8577.1347629357.git.jcody@redhat.com |
---|---|
State | New |
Headers | show |
On 09/14/2012 07:41 AM, Jeff Cody wrote: > This adds the live commit coroutine. This iteration focuses on the > commit only below the active layer, and not the active layer itself. > > The behaviour is similar to block streaming; the sectors are walked > through, and anything that exists above 'base' is committed back down > into base. At the end, intermediate images are deleted, and the > chain stitched together. Images are restored to their original open > flags upon completion. > > + > +enum { > + /* > + * Size of data buffer for populating the image file. This should be large > + * enough to process multiple clusters in a single call, so that populating > + * contiguous regions of the image is efficient. > + */ > + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */ I'm guessing you will add a followup patch that depends on Paolo's series for controlling the granularity of this buffer? Or is it less important for the commit case? > + > +static void coroutine_fn commit_run(void *opaque) > +{ > + ret = base_len = bdrv_getlength(base); > + if (base_len < 0) { > + goto exit_restore_reopen; > + } > + > + if (base_len < s->common.len) { > + ret = bdrv_truncate(base, s->common.len); > + if (ret) { > + goto exit_restore_reopen; > + } > + } Question: is it valid to have a qcow2 file whose size is smaller than it's backing image? Suppose I have base[1M] <- mid[2M] <- top[3M] <- active[3M], and request to commit top into base. This bdrv_truncate() means I will now have: base[3M] <- mid[2M] <- top[3M] <- active[3M]. If I then abort the commit operation at this point, then we have the situation of 'mid' reporting a smaller size than 'base' - which may make 'mid' invalid. And even if it is valid, what happens if I now request to commit 'mid' into 'base', but 'base' already had data written past the 2M mark before I aborted the first operation? I'm worried that you may have to bdrv_truncate() the entire chain to keep it consistent, which is more complex because it requires more r/w files.
On 09/14/2012 11:45 AM, Eric Blake wrote: > On 09/14/2012 07:41 AM, Jeff Cody wrote: >> This adds the live commit coroutine. This iteration focuses on the >> commit only below the active layer, and not the active layer itself. >> >> The behaviour is similar to block streaming; the sectors are walked >> through, and anything that exists above 'base' is committed back down >> into base. At the end, intermediate images are deleted, and the >> chain stitched together. Images are restored to their original open >> flags upon completion. >> > >> + >> +enum { >> + /* >> + * Size of data buffer for populating the image file. This should be large >> + * enough to process multiple clusters in a single call, so that populating >> + * contiguous regions of the image is efficient. >> + */ >> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */ > > I'm guessing you will add a followup patch that depends on Paolo's > series for controlling the granularity of this buffer? Or is it less > important for the commit case? For the version of commit implemented by these patches, I don't think controlling the granularity of the commit buffer is important. However, for the next stage, when we are commiting the active layer, then it may become more important. That stage will likely have a lot of code reuse from Paolo's series, as well. > >> + >> +static void coroutine_fn commit_run(void *opaque) >> +{ > >> + ret = base_len = bdrv_getlength(base); >> + if (base_len < 0) { >> + goto exit_restore_reopen; >> + } >> + >> + if (base_len < s->common.len) { >> + ret = bdrv_truncate(base, s->common.len); >> + if (ret) { >> + goto exit_restore_reopen; >> + } >> + } > > Question: is it valid to have a qcow2 file whose size is smaller than > it's backing image? I don't think so... however: > Suppose I have base[1M] <- mid[2M] <- top[3M] <- > active[3M], and request to commit top into base. This bdrv_truncate() > means I will now have: > > base[3M] <- mid[2M] <- top[3M] <- active[3M]. > > If I then abort the commit operation at this point, then we have the > situation of 'mid' reporting a smaller size than 'base' - which may make > 'mid' invalid. And even if it is valid, what happens if I now request > to commit 'mid' into 'base', but 'base' already had data written past > the 2M mark before I aborted the first operation? Once the commit starts, I don't know if you can safely abort it, and still count on 'mid' being valid. Ignoring potential size differences, how would you ever know that what was written from 'top' into 'base' is compatible with what is present in 'mid'? Once you begin a commit, your chain as an entirety can stay safe after an abort, as long as it is accessed from 'top' or above... but I think you have to consider the intermediate images between 'base' and 'top' to be invalid as standalone images. > > I'm worried that you may have to bdrv_truncate() the entire chain to > keep it consistent, which is more complex because it requires more r/w > files. > Transactional bdrv_truncate()! :)
On 09/14/2012 10:07 AM, Jeff Cody wrote: >> Question: is it valid to have a qcow2 file whose size is smaller than >> it's backing image? > > I don't think so... however: > >> Suppose I have base[1M] <- mid[2M] <- top[3M] <- >> active[3M], and request to commit top into base. This bdrv_truncate() >> means I will now have: >> >> base[3M] <- mid[2M] <- top[3M] <- active[3M]. >> >> If I then abort the commit operation at this point, then we have the >> situation of 'mid' reporting a smaller size than 'base' - which may make >> 'mid' invalid. And even if it is valid, what happens if I now request >> to commit 'mid' into 'base', but 'base' already had data written past >> the 2M mark before I aborted the first operation? > > Once the commit starts, I don't know if you can safely abort it, and > still count on 'mid' being valid. Ignoring potential size differences, > how would you ever know that what was written from 'top' into 'base' is > compatible with what is present in 'mid'? We chatted about this some more on IRC, and I'll attempt to summarize the results of that conversation (correct me if I'm wrong)... When committing across multiple images, there are four allocation cases to consider: 1. unallocated in mid or top => nothing to do; base is already correct 2. allocated in mid but not top => copy from mid to base; as long as mid is in the chain, both mid and top see the version in mid; as soon as mid is removed from the chain, top sees the version in base 3. allocated in mid and in top => ultimately, we want to copy from top to base. We can also do an intermediate copy from mid to base, although that is less efficient; as long as the copy from top to base happens last. As long as the sector remains allocated, then mid always sees its own version, and top always sees its own version. 4. allocated in top but not mid => we want to copy from top to base, but the moment we do that, if mid is still in the chain, then we have invalidated the contents of mid. However, as long as top remains allocated, it sees its own version, and even if top is marked unallocated, it would then see through to base and see correct contents even though the intermediate file mid is inconsistent. Use of block-commit has the potential to invalidate all images that are dropped from the chain (namely, any time allocation scenario 4 is present anywhere in the image); it is up to users to avoid using commit if they have any other image chain sharing the part of the chain discarded by this operation (someday, libvirt might track all storage chains, and be able to prevent an attempt at a commit if it would strand someone else's chain; but for now, we just document the issue). Next, there is a question of whether invalidating the image up front is acceptable, or whether we must go through gyrations to avoid invalidation until after the image has been dropped from the chain. That is, does the invalidation happen the moment the commit starts (and can't be undone by an early abort), or can it be delayed until the point that the image is actually dropped from the chain. As long as the current running qemu is the only entity using the portion of the chain being dropped, then the timing does not matter, other than affecting what optimizations we might be able to perform. There is also a question of what happens if a commit is started, then aborted, then restarted. It is always safe to restart the same commit from scratch, just not optimal, as the later run will spend time copying identical content that was already in base on the first run. The only way to avoid copying sectors on a second run is to mark them unallocated on the first run, but then we have the issue of consistency: if a sector is allocated in both mid and top (scenario 3), and the first run copies top into base and then marks top unallocated, then a future read of top would pick up the contents from mid, which is wrong. Therefore, we cannot mark sectors unallocated unless we traverse them in a safe order. I was able to come up with an algorithm that allows for faster restarts of a commit operation, in order to avoid copying any sector into base more than once (at least, insofar as top is not also an active image, but we already deferred committing an active image for a later date). It requires that every image being trimmed from the chain be r/w (although only one image has to be r/w at a time), and that the copies be done in a depth-first manner. That is, the algorithm first visits all allocted sectors in 'mid'; if they are not also allocated in top, then the sector is copied into base and marked unallocated in mid. When mid is completed, it is removed from the chain, before proceeding to top. Eventually, all sectors will be copied into base, exactly once, and the algorithm is restartable because it marks sectors unallocated once base has the correct contents. But it is more complex to implement. In conclusion, since this stage of the implementation never marks sectors unallocated, the use of the top of the chain is never invalidated even if intermediate files remain in the chain but have already been invalidated. I'm okay with this patch going in as a first approximation, and saving the complications of a depth-first approach coupled with marking sectors unallocated as an optimization we can add later (perhaps even by adding a flag to the JSON command to choose whether to use the optimization, since it requires r/w on all images in the chain but allows faster restarts; or to skip the optimization, since it allows for fewer r/w images but slower restarts). That is, this patch series invalidates intermediate images at the start of the commit operation, whereas the proposed optimization would defer invalidating images until they have been removed from the chain, but it doesn't affect the correctness of this phase of the patch series.
On 09/14/2012 02:23 PM, Eric Blake wrote: > On 09/14/2012 10:07 AM, Jeff Cody wrote: >>> Question: is it valid to have a qcow2 file whose size is smaller than >>> it's backing image? >> >> I don't think so... however: >> >>> Suppose I have base[1M] <- mid[2M] <- top[3M] <- >>> active[3M], and request to commit top into base. This bdrv_truncate() >>> means I will now have: >>> >>> base[3M] <- mid[2M] <- top[3M] <- active[3M]. >>> >>> If I then abort the commit operation at this point, then we have the >>> situation of 'mid' reporting a smaller size than 'base' - which may make >>> 'mid' invalid. And even if it is valid, what happens if I now request >>> to commit 'mid' into 'base', but 'base' already had data written past >>> the 2M mark before I aborted the first operation? >> >> Once the commit starts, I don't know if you can safely abort it, and >> still count on 'mid' being valid. Ignoring potential size differences, >> how would you ever know that what was written from 'top' into 'base' is >> compatible with what is present in 'mid'? > > We chatted about this some more on IRC, and I'll attempt to summarize > the results of that conversation (correct me if I'm wrong)... > > When committing across multiple images, there are four allocation cases > to consider: > > 1. unallocated in mid or top => nothing to do; base is already correct > > 2. allocated in mid but not top => copy from mid to base; as long as mid > is in the chain, both mid and top see the version in mid; as soon as mid > is removed from the chain, top sees the version in base > > 3. allocated in mid and in top => ultimately, we want to copy from top > to base. We can also do an intermediate copy from mid to base, although > that is less efficient; as long as the copy from top to base happens > last. As long as the sector remains allocated, then mid always sees its > own version, and top always sees its own version. > > 4. allocated in top but not mid => we want to copy from top to base, but > the moment we do that, if mid is still in the chain, then we have > invalidated the contents of mid. However, as long as top remains > allocated, it sees its own version, and even if top is marked > unallocated, it would then see through to base and see correct contents > even though the intermediate file mid is inconsistent. The above is true for an image chain where mid == top->backing_hd; however, if there are additional images between mid and top, this is not strictly true - you could have an allocation in some of the intermediates, but not others, which leads to additional minor complications. See below for an illustration. > > Use of block-commit has the potential to invalidate all images that are > dropped from the chain (namely, any time allocation scenario 4 is > present anywhere in the image); it is up to users to avoid using commit > if they have any other image chain sharing the part of the chain > discarded by this operation (someday, libvirt might track all storage > chains, and be able to prevent an attempt at a commit if it would strand > someone else's chain; but for now, we just document the issue). > > Next, there is a question of whether invalidating the image up front is > acceptable, or whether we must go through gyrations to avoid > invalidation until after the image has been dropped from the chain. > That is, does the invalidation happen the moment the commit starts (and > can't be undone by an early abort), or can it be delayed until the point > that the image is actually dropped from the chain. As long as the > current running qemu is the only entity using the portion of the chain > being dropped, then the timing does not matter, other than affecting > what optimizations we might be able to perform. > > There is also a question of what happens if a commit is started, then > aborted, then restarted. It is always safe to restart the same commit > from scratch, just not optimal, as the later run will spend time copying > identical content that was already in base on the first run. The only > way to avoid copying sectors on a second run is to mark them unallocated > on the first run, but then we have the issue of consistency: if a sector > is allocated in both mid and top (scenario 3), and the first run copies > top into base and then marks top unallocated, then a future read of top > would pick up the contents from mid, which is wrong. Therefore, we > cannot mark sectors unallocated unless we traverse them in a safe order. > > I was able to come up with an algorithm that allows for faster restarts > of a commit operation, in order to avoid copying any sector into base > more than once (at least, insofar as top is not also an active image, > but we already deferred committing an active image for a later date). > It requires that every image being trimmed from the chain be r/w > (although only one image has to be r/w at a time), and that the copies > be done in a depth-first manner. That is, the algorithm first visits > all allocted sectors in 'mid'; if they are not also allocated in top, > then the sector is copied into base and marked unallocated in mid. When > mid is completed, it is removed from the chain, before proceeding to > top. Eventually, all sectors will be copied into base, exactly once, > and the algorithm is restartable because it marks sectors unallocated > once base has the correct contents. But it is more complex to implement. > Let's take the following example, but this time with 5 images instead of 4. We are doing a commit with top == 'top', and base == 'base'. 0 marks no allocation, letters mark allocation by a specific layer (the actual data is somewhat irrelevant; we assume if it is allocated to that later it is unique to the layer) Sample scenario: 1 2 3 4 | perceived data ----------------------------------- act | e 0 0 0 | (e b d d) top | d 0 d d | (d b d d) mid1 | 0 0 0 c | (0 b b c) mid2 | 0 b b 0 | (0 b b 0) base | 0 0 a 0 | (0 0 a 0) If we look at allocations from mid1's perspective, all the way through its chain, this is what we see (mid1->mid2->base): mid1: (0 b b c) Using the algorithm you mentioned above, sector 3 gives us grief. This is what happens once we are done with the commit of mid2 into base, and drop mid2 from the chain: 1 2 3 4 | perceived data ----------------------------------- act | e 0 0 0 | (e b d d) top | d 0 d d | (d b d d) mid1 | 0 0 0 c | (0 b a c) <-- invalid as stand-alone image base | 0 b a 0 | (0 b a 0) However, if mid1 still references mid2, then mid1->mid2->base remains ok afterwards: 1 2 3 4 | perceived data ----------------------------------- act | e 0 0 0 | (e b d d) top | d 0 d d | (d b d d) mid1 | 0 0 0 c | (0 b b c) mid2 | 0 0 b 0 | (0 b b 0) base | 0 b a 0 | (0 b a 0) So, in order to be safe, we can't modify an intermediate image file to drop its backing file until it is invalidated anyway by the commit of its own overlay(s) (although QEMU could drop it from its live chain in RAM, and not modify the image files themselves to reflect the drop). And since it will be rendered invalid, there is no point in modifying an intermediate's backing file, ever - the two image files that it makes sense to record the actual chain modification and intermediate drop is top and top's overlay. (This does bring to mind a change for this series - currently I only modify top's overlay's backing file to be base; I should also modify top's backing file to be base, because a different image chain that uses top as a backing file would then remain valid). Also, this optimization brings your original question back into play, that is sidestepped by assuming all intermediates are invalid; namely, if we have to grow the base via bdrv_truncate(), how we handle intermediates that are smaller than base is relevant again. > In conclusion, since this stage of the implementation never marks > sectors unallocated, the use of the top of the chain is never > invalidated even if intermediate files remain in the chain but have > already been invalidated. I'm okay with this patch going in as a first > approximation, and saving the complications of a depth-first approach > coupled with marking sectors unallocated as an optimization we can add > later (perhaps even by adding a flag to the JSON command to choose > whether to use the optimization, since it requires r/w on all images in > the chain but allows faster restarts; or to skip the optimization, since > it allows for fewer r/w images but slower restarts). That is, this > patch series invalidates intermediate images at the start of the commit > operation, whereas the proposed optimization would defer invalidating > images until they have been removed from the chain, but it doesn't > affect the correctness of this phase of the patch series. > I agree with this as well - we will be adding additional support with active layer commit later, and can also add additional optimizations like you suggested.
diff --git a/block/Makefile.objs b/block/Makefile.objs index b5754d3..4a136b8 100644 --- a/block/Makefile.objs +++ b/block/Makefile.objs @@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-obj-y += qed-check.o block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o block-obj-y += stream.o +block-obj-y += commit.o block-obj-$(CONFIG_WIN32) += raw-win32.o block-obj-$(CONFIG_POSIX) += raw-posix.o block-obj-$(CONFIG_LIBISCSI) += iscsi.o diff --git a/block/commit.c b/block/commit.c new file mode 100644 index 0000000..72959a3 --- /dev/null +++ b/block/commit.c @@ -0,0 +1,247 @@ +/* + * Live block commit + * + * Copyright Red Hat, Inc. 2012 + * + * Authors: + * Jeff Cody <jcody@redhat.com> + * Based on stream.c by Stefan Hajnoczi + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include "trace.h" +#include "block_int.h" +#include "qemu/ratelimit.h" + +enum { + /* + * Size of data buffer for populating the image file. This should be large + * enough to process multiple clusters in a single call, so that populating + * contiguous regions of the image is efficient. + */ + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */ +}; + +#define SLICE_TIME 100000000ULL /* ns */ + +typedef struct CommitBlockJob { + BlockJob common; + RateLimit limit; + BlockDriverState *active; + BlockDriverState *top; + BlockDriverState *base; + BlockErrorAction on_error; + int base_flags; + int orig_overlay_flags; +} CommitBlockJob; + +static int coroutine_fn commit_populate(BlockDriverState *bs, + BlockDriverState *base, + int64_t sector_num, int nb_sectors, + void *buf) +{ + int ret = 0; + + ret = bdrv_read(bs, sector_num, buf, nb_sectors); + if (ret) { + return ret; + } + + ret = bdrv_write(base, sector_num, buf, nb_sectors); + if (ret) { + return ret; + } + + return 0; +} + +static void coroutine_fn commit_run(void *opaque) +{ + CommitBlockJob *s = opaque; + BlockDriverState *active = s->active; + BlockDriverState *top = s->top; + BlockDriverState *base = s->base; + BlockDriverState *overlay_bs = NULL; + int64_t sector_num, end; + int ret = 0; + int n = 0; + void *buf; + int bytes_written = 0; + int64_t base_len; + + ret = s->common.len = bdrv_getlength(top); + + + if (s->common.len < 0) { + goto exit_restore_reopen; + } + + ret = base_len = bdrv_getlength(base); + if (base_len < 0) { + goto exit_restore_reopen; + } + + if (base_len < s->common.len) { + ret = bdrv_truncate(base, s->common.len); + if (ret) { + goto exit_restore_reopen; + } + } + + overlay_bs = bdrv_find_overlay(active, top); + + end = s->common.len >> BDRV_SECTOR_BITS; + buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE); + + for (sector_num = 0; sector_num < end; sector_num += n) { + uint64_t delay_ns = 0; + bool copy; + +wait: + /* Note that even when no rate limit is applied we need to yield + * with no pending I/O here so that qemu_aio_flush() returns. + */ + block_job_sleep_ns(&s->common, rt_clock, delay_ns); + if (block_job_is_cancelled(&s->common)) { + break; + } + /* Copy if allocated above the base */ + ret = bdrv_co_is_allocated_above(top, base, sector_num, + COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE, + &n); + copy = (ret == 1); + trace_commit_one_iteration(s, sector_num, n, ret); + if (copy) { + if (s->common.speed) { + delay_ns = ratelimit_calculate_delay(&s->limit, n); + if (delay_ns > 0) { + goto wait; + } + } + ret = commit_populate(top, base, sector_num, n, buf); + bytes_written += n * BDRV_SECTOR_SIZE; + } + if (ret < 0) { + if (s->on_error == BLOCK_ERR_STOP_ANY || + s->on_error == BLOCK_ERR_REPORT || + (s->on_error == BLOCK_ERR_STOP_ENOSPC && ret == -ENOSPC)) { + goto exit_free_buf; + } else { + n = 0; + continue; + } + } + /* Publish progress */ + s->common.offset += n * BDRV_SECTOR_SIZE; + } + + ret = 0; + + if (!block_job_is_cancelled(&s->common) && sector_num == end) { + /* success */ + ret = bdrv_drop_intermediate(active, top, base); + } + +exit_free_buf: + qemu_vfree(buf); + +exit_restore_reopen: + /* restore base open flags here if appropriate (e.g., change the base back + * to r/o). These reopens do not need to be atomic, since we won't abort + * even on failure here */ + if (s->base_flags != bdrv_get_flags(base)) { + bdrv_reopen(base, s->base_flags, NULL); + } + if (s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) { + bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL); + } + + block_job_complete(&s->common, ret); +} + +static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp) +{ + CommitBlockJob *s = container_of(job, CommitBlockJob, common); + + if (speed < 0) { + error_set(errp, QERR_INVALID_PARAMETER, "speed"); + return; + } + ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME); +} + +static BlockJobType commit_job_type = { + .instance_size = sizeof(CommitBlockJob), + .job_type = "commit", + .set_speed = commit_set_speed, +}; + +void commit_start(BlockDriverState *bs, BlockDriverState *base, + BlockDriverState *top, int64_t speed, + BlockErrorAction on_error, BlockDriverCompletionFunc *cb, + void *opaque, Error **errp) +{ + CommitBlockJob *s; + BlockReopenQueue *reopen_queue = NULL; + int orig_overlay_flags; + int orig_base_flags; + BlockDriverState *overlay_bs; + Error *local_err = NULL; + + if ((on_error == BLOCK_ERR_STOP_ANY || + on_error == BLOCK_ERR_STOP_ENOSPC) && + !bdrv_iostatus_is_enabled(bs)) { + error_set(errp, QERR_INVALID_PARAMETER_COMBINATION); + return; + } + + overlay_bs = bdrv_find_overlay(bs, top); + + if (overlay_bs == NULL) { + error_set(errp, ERROR_CLASS_GENERIC_ERROR, + "Could not find overlay image for %s:", top->filename); + return; + } + + orig_base_flags = bdrv_get_flags(base); + orig_overlay_flags = bdrv_get_flags(overlay_bs); + + /* convert base_bs & overlay_bs to r/w, if necessary */ + if (!(orig_base_flags & BDRV_O_RDWR)) { + reopen_queue = bdrv_reopen_queue(reopen_queue, base, + orig_base_flags | BDRV_O_RDWR); + } + if (!(orig_overlay_flags & BDRV_O_RDWR)) { + reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs, + orig_overlay_flags | BDRV_O_RDWR); + } + if (reopen_queue) { + bdrv_reopen_multiple(reopen_queue, &local_err); + if (local_err != NULL) { + error_propagate(errp, local_err); + return; + } + } + + + s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp); + if (!s) { + return; + } + + s->base = base; + s->top = top; + s->active = bs; + + s->base_flags = orig_base_flags; + s->orig_overlay_flags = orig_overlay_flags; + + s->on_error = on_error; + s->common.co = qemu_coroutine_create(commit_run); + + trace_commit_start(bs, base, top, s, s->common.co, opaque); + qemu_coroutine_enter(s->common.co, s); +} diff --git a/block_int.h b/block_int.h index e533ca4..1d44e5d 100644 --- a/block_int.h +++ b/block_int.h @@ -462,4 +462,20 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base, BlockDriverCompletionFunc *cb, void *opaque, Error **errp); +/** + * commit_start: + * @bs: Top Block device + * @base: Block device that will be written into, and become the new top + * @speed: The maximum speed, in bytes per second, or 0 for unlimited. + * @on_error: The action to take upon error. + * @cb: Completion function for the job. + * @opaque: Opaque pointer value passed to @cb. + * @errp: Error object. + * + */ +void commit_start(BlockDriverState *bs, BlockDriverState *base, + BlockDriverState *top, int64_t speed, + BlockErrorAction on_error, BlockDriverCompletionFunc *cb, + void *opaque, Error **errp); + #endif /* BLOCK_INT_H */ diff --git a/trace-events b/trace-events index b25ae1c..98a1f97 100644 --- a/trace-events +++ b/trace-events @@ -74,6 +74,8 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c # block/stream.c stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d" stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p" +commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d" +commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque) "bs %p base %p top %p s %p co %p opaque %p" # blockdev.c qmp_block_job_cancel(void *job) "job %p"
This adds the live commit coroutine. This iteration focuses on the commit only below the active layer, and not the active layer itself. The behaviour is similar to block streaming; the sectors are walked through, and anything that exists above 'base' is committed back down into base. At the end, intermediate images are deleted, and the chain stitched together. Images are restored to their original open flags upon completion. Signed-off-by: Jeff Cody <jcody@redhat.com> --- block/Makefile.objs | 1 + block/commit.c | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++++ block_int.h | 16 ++++ trace-events | 2 + 4 files changed, 266 insertions(+) create mode 100644 block/commit.c