Message ID | 1430830009-29839-2-git-send-email-famz@redhat.com |
---|---|
State | New |
Headers | show |
On 05/05/2015 14:46, Fam Zheng wrote: > Unsetting dirty globally with discard is not very correct. The discard may zero > out sectors (depending on can_write_zeroes_with_unmap), we should replicate > this change to destinition side to make sure that the guest sees the same data. > > Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator > doesn't expect unsetting of bits after current position. > > So let's do it the opposite way which fixes both problems: set the dirty bits > if we are to discard it. This is not enough, you also have to do the discard in block/mirror.c, otherwise the destination image could even become fully provisioned! Paolo > Reported-by: wangxiaolong@ucloud.cn > Signed-off-by: Fam Zheng <famz@redhat.com> > --- > block/io.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/block/io.c b/block/io.c > index 1ce62c4..809688b 100644 > --- a/block/io.c > +++ b/block/io.c > @@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > return -EROFS; > } > > - bdrv_reset_dirty(bs, sector_num, nb_sectors); > - > /* Do nothing if disabled. */ > if (!(bs->open_flags & BDRV_O_UNMAP)) { > return 0; > @@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > return 0; > } > > + bdrv_set_dirty(bs, sector_num, nb_sectors); > + > max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); > while (nb_sectors > 0) { > int ret; >
On Tue, 05/05 15:06, Paolo Bonzini wrote: > > > On 05/05/2015 14:46, Fam Zheng wrote: > > Unsetting dirty globally with discard is not very correct. The discard may zero > > out sectors (depending on can_write_zeroes_with_unmap), we should replicate > > this change to destinition side to make sure that the guest sees the same data. > > > > Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator > > doesn't expect unsetting of bits after current position. > > > > So let's do it the opposite way which fixes both problems: set the dirty bits > > if we are to discard it. > > This is not enough, you also have to do the discard in block/mirror.c, > otherwise the destination image could even become fully provisioned! I wasn't sure what if src and dest have different can_write_zeroes_with_unmap value, but your argument is stronger. I will add discard in mirror. Besides that, do we need to compare the can_write_zeroes_with_unmap? Fam > > Paolo > > > Reported-by: wangxiaolong@ucloud.cn > > Signed-off-by: Fam Zheng <famz@redhat.com> > > --- > > block/io.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/block/io.c b/block/io.c > > index 1ce62c4..809688b 100644 > > --- a/block/io.c > > +++ b/block/io.c > > @@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > > return -EROFS; > > } > > > > - bdrv_reset_dirty(bs, sector_num, nb_sectors); > > - > > /* Do nothing if disabled. */ > > if (!(bs->open_flags & BDRV_O_UNMAP)) { > > return 0; > > @@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, > > return 0; > > } > > > > + bdrv_set_dirty(bs, sector_num, nb_sectors); > > + > > max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); > > while (nb_sectors > 0) { > > int ret; > >
On 06/05/2015 03:45, Fam Zheng wrote: >> > This is not enough, you also have to do the discard in block/mirror.c, >> > otherwise the destination image could even become fully provisioned! > I wasn't sure what if src and dest have different can_write_zeroes_with_unmap > value, but your argument is stronger. > > I will add discard in mirror. Besides that, do we need to compare the > can_write_zeroes_with_unmap? Hmm, if can_write_zeroes_with_unmap is set, it's probably better to write zeroes instead of discarding, in case the guest is relying on it. Paolo
On Wed, 05/06 10:59, Paolo Bonzini wrote: > > > On 06/05/2015 03:45, Fam Zheng wrote: > >> > This is not enough, you also have to do the discard in block/mirror.c, > >> > otherwise the destination image could even become fully provisioned! > > I wasn't sure what if src and dest have different can_write_zeroes_with_unmap > > value, but your argument is stronger. > > > > I will add discard in mirror. Besides that, do we need to compare the > > can_write_zeroes_with_unmap? > > Hmm, if can_write_zeroes_with_unmap is set, it's probably better to > write zeroes instead of discarding, in case the guest is relying on it. > I think there are four cases: # src can_write_zeroes_with_unmap target can_write_zeroes_with_unmap -------------------------------------------------------------------------------- 1 true true 2 true false 3 false true 4 false false I think replicating discard only works for 1, because both side would then read 0. For case 2 & 3 it's probably better to mirror the actual reading of source. I'm not sure about 4. What do you think? Fam
On 06/05/2015 11:50, Fam Zheng wrote: > # src can_write_zeroes_with_unmap target can_write_zeroes_with_unmap > -------------------------------------------------------------------------------- > 1 true true > 2 true false > 3 false true > 4 false false 1 should replicate WRITE SAME, in case the unmap granularity of the target is different from that of the source. In that case, a discard on the target might leave some sectors untouched. Writing zeroes would ensure matching data between the source and the target. 2 should _not_ discard: it should write zeroes even at the cost of making the target fully provisioned. Perhaps you can optimize it by looking at bdrv_get_block_status for the target, and checking the answer for BDRV_ZERO. 3 and 4 can use discard on the target. So it looks like only the source setting matters. We need to check the cost of bdrv_co_get_block_status for "raw", too. If it's too expensive, that can be a problem. Paolo > For case 2 & 3 it's probably better to mirror the actual reading of source. > > I'm not sure about 4. Even in case 1, discard could be "UNMAP" and write zeroes could be "WRITE SAME". If the unmap granularity of the target is For unaligned sectors, UNMAP might leave some sectors aside while WRITE SAME will write with zeroes.
On Wed, 05/06 12:21, Paolo Bonzini wrote: > > > On 06/05/2015 11:50, Fam Zheng wrote: > > # src can_write_zeroes_with_unmap target can_write_zeroes_with_unmap > > -------------------------------------------------------------------------------- > > 1 true true > > 2 true false > > 3 false true > > 4 false false > > 1 should replicate WRITE SAME, in case the unmap granularity of the > target is different from that of the source. In that case, a discard on > the target might leave some sectors untouched. Writing zeroes would > ensure matching data between the source and the target. > > 2 should _not_ discard: it should write zeroes even at the cost of > making the target fully provisioned. Perhaps you can optimize it by > looking at bdrv_get_block_status for the target, and checking the answer > for BDRV_ZERO. > > 3 and 4 can use discard on the target. > > So it looks like only the source setting matters. > > We need to check the cost of bdrv_co_get_block_status for "raw", too. > If it's too expensive, that can be a problem. In my test, bdrv_is_allocated_above() takes 10~20us on ext4 raw image on my laptop SATA. And also note that: /* * ... * * 'pnum' is set to the number of sectors (including and immediately following * the specified sector) that are known to be in the same * allocated/unallocated state. * * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes * beyond the end of the disk image it will be clamped. */ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, So we currently don't coalesce the sequential dirty sectors. Was that intentional? Fam > > Paolo > > > For case 2 & 3 it's probably better to mirror the actual reading of source. > > > > I'm not sure about 4. > > Even in case 1, discard could be "UNMAP" and write zeroes could be > "WRITE SAME". If the unmap granularity of the target is For unaligned > sectors, UNMAP might leave some sectors aside while WRITE SAME will > write with zeroes.
On 11/05/2015 10:02, Fam Zheng wrote: > > /* > * ... > * > * 'pnum' is set to the number of sectors (including and immediately following > * the specified sector) that are known to be in the same > * allocated/unallocated state. > * > * 'nb_sectors' is the max value 'pnum' should be set to. If nb_sectors goes > * beyond the end of the disk image it will be clamped. > */ > static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs, > > So we currently don't coalesce the sequential dirty sectors. > > Was that intentional? The idea, I think, is that for some drivers bdrv_co_get_block_status is O(nb_sectors). It's okay to let a driver return *pnum > nb_sectors. You do need to audit the callers, though. It would be nice also to add TODOs whenever the code is fine but it could explot *pnum > nb_sectors to get better performance. Paolo
diff --git a/block/io.c b/block/io.c index 1ce62c4..809688b 100644 --- a/block/io.c +++ b/block/io.c @@ -2343,8 +2343,6 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return -EROFS; } - bdrv_reset_dirty(bs, sector_num, nb_sectors); - /* Do nothing if disabled. */ if (!(bs->open_flags & BDRV_O_UNMAP)) { return 0; @@ -2354,6 +2352,8 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, return 0; } + bdrv_set_dirty(bs, sector_num, nb_sectors); + max_discard = MIN_NON_ZERO(bs->bl.max_discard, BDRV_REQUEST_MAX_SECTORS); while (nb_sectors > 0) { int ret;
Unsetting dirty globally with discard is not very correct. The discard may zero out sectors (depending on can_write_zeroes_with_unmap), we should replicate this change to destinition side to make sure that the guest sees the same data. Calling bdrv_reset_dirty also troubles mirror job because the hbitmap iterator doesn't expect unsetting of bits after current position. So let's do it the opposite way which fixes both problems: set the dirty bits if we are to discard it. Reported-by: wangxiaolong@ucloud.cn Signed-off-by: Fam Zheng <famz@redhat.com> --- block/io.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)