diff mbox series

[3/4] block/mirror: support unaligned write in active mirror

Message ID 20190912151338.21225-4-vsementsov@virtuozzo.com
State New
Headers show
Series active-mirror: support unaligned guest operations | expand

Commit Message

Vladimir Sementsov-Ogievskiy Sept. 12, 2019, 3:13 p.m. UTC
Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
region in the dirty bitmap, which means that we may not copy some bytes
and assume them copied, which actually leads to producing corrupted
target.

So 9adc1cb49af8d forced dirty bitmap granularity to be
request_alignment for mirror-top filter, so we are not working with
unaligned requests. However forcing large alignment obviously decreases
performance of unaligned requests.

This commit provides another solution for the problem: if unaligned
padding is already dirty, we can safely ignore it, as
1. It's dirty, it will be copied by mirror_iteration anyway
2. It's dirty, so skipping it now we don't increase dirtiness of the
   bitmap and therefore don't damage "synchronicity" of the
   write-blocking mirror.

If unaligned padding is not dirty, we just write it, no reason to touch
dirty bitmap if we succeed (on failure we'll set the whole region
ofcourse, but we loss "synchronicity" on failure anyway).

Note: we need to disable dirty_bitmap, otherwise we will not be able to
see in do_sync_target_write bitmap state before current operation. We
may of course check dirty bitmap before the operation in
bdrv_mirror_top_do_write and remember it, but we don't need active
dirty bitmap for write-blocking mirror anyway.

New code-path is unused until the following commit reverts
9adc1cb49af8d.

Suggested-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 38 insertions(+), 1 deletion(-)

Comments

Max Reitz Oct. 2, 2019, 2:57 p.m. UTC | #1
On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
> region in the dirty bitmap, which means that we may not copy some bytes
> and assume them copied, which actually leads to producing corrupted
> target.
> 
> So 9adc1cb49af8d forced dirty bitmap granularity to be
> request_alignment for mirror-top filter, so we are not working with
> unaligned requests. However forcing large alignment obviously decreases
> performance of unaligned requests.
> 
> This commit provides another solution for the problem: if unaligned
> padding is already dirty, we can safely ignore it, as
> 1. It's dirty, it will be copied by mirror_iteration anyway
> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>    bitmap and therefore don't damage "synchronicity" of the
>    write-blocking mirror.

But that’s not what active mirror is for.  The point of active mirror is
that it must converge because every guest write will contribute towards
that goal.

If you skip active mirroring for unaligned guest writes, they will not
contribute towards converging, but in fact lead to the opposite.

Max
Vladimir Sementsov-Ogievskiy Oct. 2, 2019, 3:03 p.m. UTC | #2
02.10.2019 17:57, Max Reitz wrote:
> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>> region in the dirty bitmap, which means that we may not copy some bytes
>> and assume them copied, which actually leads to producing corrupted
>> target.
>>
>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>> request_alignment for mirror-top filter, so we are not working with
>> unaligned requests. However forcing large alignment obviously decreases
>> performance of unaligned requests.
>>
>> This commit provides another solution for the problem: if unaligned
>> padding is already dirty, we can safely ignore it, as
>> 1. It's dirty, it will be copied by mirror_iteration anyway
>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>     bitmap and therefore don't damage "synchronicity" of the
>>     write-blocking mirror.
> 
> But that’s not what active mirror is for.  The point of active mirror is
> that it must converge because every guest write will contribute towards
> that goal.
> 
> If you skip active mirroring for unaligned guest writes, they will not
> contribute towards converging, but in fact lead to the opposite.
> 

The will not contribute only if region is already dirty. Actually, after
first iteration of mirroring (copying the whole disk), all following writes
will contribute, so the whole process must converge. It is a bit similar with
running one mirror loop in normal mode, and then enable write-blocking.
Vladimir Sementsov-Ogievskiy Oct. 2, 2019, 3:06 p.m. UTC | #3
02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
> 02.10.2019 17:57, Max Reitz wrote:
>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>> region in the dirty bitmap, which means that we may not copy some bytes
>>> and assume them copied, which actually leads to producing corrupted
>>> target.
>>>
>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>> request_alignment for mirror-top filter, so we are not working with
>>> unaligned requests. However forcing large alignment obviously decreases
>>> performance of unaligned requests.
>>>
>>> This commit provides another solution for the problem: if unaligned
>>> padding is already dirty, we can safely ignore it, as
>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>     bitmap and therefore don't damage "synchronicity" of the
>>>     write-blocking mirror.
>>
>> But that’s not what active mirror is for.  The point of active mirror is
>> that it must converge because every guest write will contribute towards
>> that goal.
>>
>> If you skip active mirroring for unaligned guest writes, they will not
>> contribute towards converging, but in fact lead to the opposite.
>>
> 
> The will not contribute only if region is already dirty. Actually, after
> first iteration of mirroring (copying the whole disk), all following writes
> will contribute, so the whole process must converge. It is a bit similar with
> running one mirror loop in normal mode, and then enable write-blocking.
> 


In other words, we don't need "all guest writes contribute" to converge,
"all guest writes don't create new dirty bits" is enough, as we have parallel
mirror iteration which contiguously handles dirty bits.
Max Reitz Oct. 2, 2019, 3:52 p.m. UTC | #4
On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>> 02.10.2019 17:57, Max Reitz wrote:
>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>> and assume them copied, which actually leads to producing corrupted
>>>> target.
>>>>
>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>> request_alignment for mirror-top filter, so we are not working with
>>>> unaligned requests. However forcing large alignment obviously decreases
>>>> performance of unaligned requests.
>>>>
>>>> This commit provides another solution for the problem: if unaligned
>>>> padding is already dirty, we can safely ignore it, as
>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>     bitmap and therefore don't damage "synchronicity" of the
>>>>     write-blocking mirror.
>>>
>>> But that’s not what active mirror is for.  The point of active mirror is
>>> that it must converge because every guest write will contribute towards
>>> that goal.
>>>
>>> If you skip active mirroring for unaligned guest writes, they will not
>>> contribute towards converging, but in fact lead to the opposite.
>>>
>>
>> The will not contribute only if region is already dirty. Actually, after
>> first iteration of mirroring (copying the whole disk), all following writes
>> will contribute, so the whole process must converge. It is a bit similar with
>> running one mirror loop in normal mode, and then enable write-blocking.
>>
> 
> 
> In other words, we don't need "all guest writes contribute" to converge,
> "all guest writes don't create new dirty bits" is enough, as we have parallel
> mirror iteration which contiguously handles dirty bits.

Hm, in a sense.  But it does mean that guest writes will not contribute
to convergence.

And that’s against the current definition of write-blocking, which does
state that “when data is written to the source, write it (synchronously)
to the target as well”.

Max
Vladimir Sementsov-Ogievskiy Oct. 3, 2019, 9:34 a.m. UTC | #5
02.10.2019 18:52, Max Reitz wrote:
> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>> 02.10.2019 17:57, Max Reitz wrote:
>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>> and assume them copied, which actually leads to producing corrupted
>>>>> target.
>>>>>
>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>> performance of unaligned requests.
>>>>>
>>>>> This commit provides another solution for the problem: if unaligned
>>>>> padding is already dirty, we can safely ignore it, as
>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>      bitmap and therefore don't damage "synchronicity" of the
>>>>>      write-blocking mirror.
>>>>
>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>> that it must converge because every guest write will contribute towards
>>>> that goal.
>>>>
>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>> contribute towards converging, but in fact lead to the opposite.
>>>>
>>>
>>> The will not contribute only if region is already dirty. Actually, after
>>> first iteration of mirroring (copying the whole disk), all following writes
>>> will contribute, so the whole process must converge. It is a bit similar with
>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>
>>
>>
>> In other words, we don't need "all guest writes contribute" to converge,
>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>> mirror iteration which contiguously handles dirty bits.
> 
> Hm, in a sense.  But it does mean that guest writes will not contribute
> to convergence.
> 
> And that’s against the current definition of write-blocking, which does
> state that “when data is written to the source, write it (synchronously)
> to the target as well”.
> 

Hmm, understand. But IMHO our proposed behavior is better in general.
Do you think it's a problem to change spec now?
If yes, I'll resend with an option
Max Reitz Oct. 4, 2019, 12:59 p.m. UTC | #6
On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
> 02.10.2019 18:52, Max Reitz wrote:
>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>> target.
>>>>>>
>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>> performance of unaligned requests.
>>>>>>
>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>      bitmap and therefore don't damage "synchronicity" of the
>>>>>>      write-blocking mirror.
>>>>>
>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>> that it must converge because every guest write will contribute towards
>>>>> that goal.
>>>>>
>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>
>>>>
>>>> The will not contribute only if region is already dirty. Actually, after
>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>
>>>
>>>
>>> In other words, we don't need "all guest writes contribute" to converge,
>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>> mirror iteration which contiguously handles dirty bits.
>>
>> Hm, in a sense.  But it does mean that guest writes will not contribute
>> to convergence.
>>
>> And that’s against the current definition of write-blocking, which does
>> state that “when data is written to the source, write it (synchronously)
>> to the target as well”.
>>
> 
> Hmm, understand. But IMHO our proposed behavior is better in general.
> Do you think it's a problem to change spec now?
> If yes, I'll resend with an option

Well, the thing is that I’d find it weird if write-blocking wasn’t
blocking in all cases.  And in my opinion, it makes more sense for
active mirror if all writes actively contributed to convergence.

Max
Vladimir Sementsov-Ogievskiy Oct. 4, 2019, 1:22 p.m. UTC | #7
04.10.2019 15:59, Max Reitz wrote:
> On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
>> 02.10.2019 18:52, Max Reitz wrote:
>>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>>> target.
>>>>>>>
>>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>>> performance of unaligned requests.
>>>>>>>
>>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>>       bitmap and therefore don't damage "synchronicity" of the
>>>>>>>       write-blocking mirror.
>>>>>>
>>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>>> that it must converge because every guest write will contribute towards
>>>>>> that goal.
>>>>>>
>>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>>
>>>>>
>>>>> The will not contribute only if region is already dirty. Actually, after
>>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>>
>>>>
>>>>
>>>> In other words, we don't need "all guest writes contribute" to converge,
>>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>>> mirror iteration which contiguously handles dirty bits.
>>>
>>> Hm, in a sense.  But it does mean that guest writes will not contribute
>>> to convergence.
>>>
>>> And that’s against the current definition of write-blocking, which does
>>> state that “when data is written to the source, write it (synchronously)
>>> to the target as well”.
>>>
>>
>> Hmm, understand. But IMHO our proposed behavior is better in general.
>> Do you think it's a problem to change spec now?
>> If yes, I'll resend with an option
> 
> Well, the thing is that I’d find it weird if write-blocking wasn’t
> blocking in all cases.  And in my opinion, it makes more sense for
> active mirror if all writes actively contributed to convergence.
> 

Why? What is the benefit in it?
What is "all writes actively contributed to convergence" for user?

I think for user there may be the following criteria:

1. guaranteed converge, with any guest write load.
Both current and my proposed variants are OK.

2. Less impact on guest.
Obviously my proposed variant is better

3. Total time of mirroring
Seems, current may be a bit better, but I don't think that unaligned
tails gives significant impact.

===

So, assume I want [1]+[2]. And possibly
2.2: Even less impact on guest: ignore not only unaligned tails if they are
already dirty, but full synchronous mirror operation if area is already dirty.

How should I call this? Should it be separate mode, or option for write-blocking?
Max Reitz Oct. 4, 2019, 2:48 p.m. UTC | #8
On 04.10.19 15:22, Vladimir Sementsov-Ogievskiy wrote:
> 04.10.2019 15:59, Max Reitz wrote:
>> On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
>>> 02.10.2019 18:52, Max Reitz wrote:
>>>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>>>> target.
>>>>>>>>
>>>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>>>> performance of unaligned requests.
>>>>>>>>
>>>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>>>       bitmap and therefore don't damage "synchronicity" of the
>>>>>>>>       write-blocking mirror.
>>>>>>>
>>>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>>>> that it must converge because every guest write will contribute towards
>>>>>>> that goal.
>>>>>>>
>>>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>>>
>>>>>>
>>>>>> The will not contribute only if region is already dirty. Actually, after
>>>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>>>
>>>>>
>>>>>
>>>>> In other words, we don't need "all guest writes contribute" to converge,
>>>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>>>> mirror iteration which contiguously handles dirty bits.
>>>>
>>>> Hm, in a sense.  But it does mean that guest writes will not contribute
>>>> to convergence.
>>>>
>>>> And that’s against the current definition of write-blocking, which does
>>>> state that “when data is written to the source, write it (synchronously)
>>>> to the target as well”.
>>>>
>>>
>>> Hmm, understand. But IMHO our proposed behavior is better in general.
>>> Do you think it's a problem to change spec now?
>>> If yes, I'll resend with an option
>>
>> Well, the thing is that I’d find it weird if write-blocking wasn’t
>> blocking in all cases.  And in my opinion, it makes more sense for
>> active mirror if all writes actively contributed to convergence.
>>
> 
> Why? What is the benefit in it?
> What is "all writes actively contributed to convergence" for user?

One thing I wonder about is whether it’s really guaranteed that the
background job will run under full I/O load, and how often it runs.

I fear that with your model, the background job might starve and the
mirror may take a very long time.  It won’t diverge, but it also won’t
really converge.

The advantage of letting all writes block is that even under full I/O
load, the mirror job will progress at a steady pace.

> I think for user there may be the following criteria:
> 
> 1. guaranteed converge, with any guest write load.
> Both current and my proposed variants are OK.
> 
> 2. Less impact on guest.
> Obviously my proposed variant is better
> 
> 3. Total time of mirroring
> Seems, current may be a bit better, but I don't think that unaligned
> tails gives significant impact.
> 
> ===
> 
> So, assume I want [1]+[2]. And possibly
> 2.2: Even less impact on guest: ignore not only unaligned tails if they are
> already dirty, but full synchronous mirror operation if area is already dirty.
> 
> How should I call this? Should it be separate mode, or option for write-blocking?

I don’t know whether it makes sense to add a separate mode or a separate
option just for this difference.  I don’t think anyone would choose the
non-default option.

But I do think there’s quite a bit of difference in how the job behaves
still...  I don’t know. :-/

Max
Vladimir Sementsov-Ogievskiy Oct. 4, 2019, 3:04 p.m. UTC | #9
04.10.2019 17:48, Max Reitz wrote:
> On 04.10.19 15:22, Vladimir Sementsov-Ogievskiy wrote:
>> 04.10.2019 15:59, Max Reitz wrote:
>>> On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
>>>> 02.10.2019 18:52, Max Reitz wrote:
>>>>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>>>>> target.
>>>>>>>>>
>>>>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>>>>> performance of unaligned requests.
>>>>>>>>>
>>>>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>>>>        bitmap and therefore don't damage "synchronicity" of the
>>>>>>>>>        write-blocking mirror.
>>>>>>>>
>>>>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>>>>> that it must converge because every guest write will contribute towards
>>>>>>>> that goal.
>>>>>>>>
>>>>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>>>>
>>>>>>>
>>>>>>> The will not contribute only if region is already dirty. Actually, after
>>>>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>>>>
>>>>>>
>>>>>>
>>>>>> In other words, we don't need "all guest writes contribute" to converge,
>>>>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>>>>> mirror iteration which contiguously handles dirty bits.
>>>>>
>>>>> Hm, in a sense.  But it does mean that guest writes will not contribute
>>>>> to convergence.
>>>>>
>>>>> And that’s against the current definition of write-blocking, which does
>>>>> state that “when data is written to the source, write it (synchronously)
>>>>> to the target as well”.
>>>>>
>>>>
>>>> Hmm, understand. But IMHO our proposed behavior is better in general.
>>>> Do you think it's a problem to change spec now?
>>>> If yes, I'll resend with an option
>>>
>>> Well, the thing is that I’d find it weird if write-blocking wasn’t
>>> blocking in all cases.  And in my opinion, it makes more sense for
>>> active mirror if all writes actively contributed to convergence.
>>>
>>
>> Why? What is the benefit in it?
>> What is "all writes actively contributed to convergence" for user?
> 
> One thing I wonder about is whether it’s really guaranteed that the
> background job will run under full I/O load, and how often it runs.
> 
> I fear that with your model, the background job might starve and the
> mirror may take a very long time.

Hmmm. I think mirror job is in same context as guest writes, and goes
through same IO api.. Why it will starve? (I understand that my words
are not an evidence...).

>  It won’t diverge, but it also won’t
> really converge.

But same will be with current behavior: guest is not guaranteed to write
to all parts of disk. And in most scenarios it doesn't. So, if mirror job
starve because of huge guest IO load, we will not converge anyway.

So, background process is necessary thing for converge anyway.

> 
> The advantage of letting all writes block is that even under full I/O
> load, the mirror job will progress at a steady pace.
> 
>> I think for user there may be the following criteria:
>>
>> 1. guaranteed converge, with any guest write load.
>> Both current and my proposed variants are OK.
>>
>> 2. Less impact on guest.
>> Obviously my proposed variant is better
>>
>> 3. Total time of mirroring
>> Seems, current may be a bit better, but I don't think that unaligned
>> tails gives significant impact.
>>
>> ===
>>
>> So, assume I want [1]+[2]. And possibly
>> 2.2: Even less impact on guest: ignore not only unaligned tails if they are
>> already dirty, but full synchronous mirror operation if area is already dirty.
>>
>> How should I call this? Should it be separate mode, or option for write-blocking?
> 
> I don’t know whether it makes sense to add a separate mode or a separate
> option just for this difference.  I don’t think anyone would choose the
> non-default option.

At least Virtuozzo will choose :)

> 
> But I do think there’s quite a bit of difference in how the job behaves
> still...  I don’t know. :-/
> 
> Max
>
Max Reitz Oct. 4, 2019, 3:27 p.m. UTC | #10
On 04.10.19 17:04, Vladimir Sementsov-Ogievskiy wrote:
> 04.10.2019 17:48, Max Reitz wrote:
>> On 04.10.19 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>> 04.10.2019 15:59, Max Reitz wrote:
>>>> On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 02.10.2019 18:52, Max Reitz wrote:
>>>>>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>>>>>> target.
>>>>>>>>>>
>>>>>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>>>>>> performance of unaligned requests.
>>>>>>>>>>
>>>>>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>>>>>        bitmap and therefore don't damage "synchronicity" of the
>>>>>>>>>>        write-blocking mirror.
>>>>>>>>>
>>>>>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>>>>>> that it must converge because every guest write will contribute towards
>>>>>>>>> that goal.
>>>>>>>>>
>>>>>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>>>>>
>>>>>>>>
>>>>>>>> The will not contribute only if region is already dirty. Actually, after
>>>>>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>>>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>>>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In other words, we don't need "all guest writes contribute" to converge,
>>>>>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>>>>>> mirror iteration which contiguously handles dirty bits.
>>>>>>
>>>>>> Hm, in a sense.  But it does mean that guest writes will not contribute
>>>>>> to convergence.
>>>>>>
>>>>>> And that’s against the current definition of write-blocking, which does
>>>>>> state that “when data is written to the source, write it (synchronously)
>>>>>> to the target as well”.
>>>>>>
>>>>>
>>>>> Hmm, understand. But IMHO our proposed behavior is better in general.
>>>>> Do you think it's a problem to change spec now?
>>>>> If yes, I'll resend with an option
>>>>
>>>> Well, the thing is that I’d find it weird if write-blocking wasn’t
>>>> blocking in all cases.  And in my opinion, it makes more sense for
>>>> active mirror if all writes actively contributed to convergence.
>>>>
>>>
>>> Why? What is the benefit in it?
>>> What is "all writes actively contributed to convergence" for user?
>>
>> One thing I wonder about is whether it’s really guaranteed that the
>> background job will run under full I/O load, and how often it runs.
>>
>> I fear that with your model, the background job might starve and the
>> mirror may take a very long time.
> 
> Hmmm. I think mirror job is in same context as guest writes, and goes
> through same IO api.. Why it will starve? (I understand that my words
> are not an evidence...).

I thought that maybe if the disk is read to write and write all the
time, there’d be no time for the mirror coroutine.

>>  It won’t diverge, but it also won’t
>> really converge.
> 
> But same will be with current behavior: guest is not guaranteed to write
> to all parts of disk. And in most scenarios it doesn't. So, if mirror job
> starve because of huge guest IO load, we will not converge anyway.
> 
> So, background process is necessary thing for converge anyway.

Good point.  That convinces me.

>> The advantage of letting all writes block is that even under full I/O
>> load, the mirror job will progress at a steady pace.
>>
>>> I think for user there may be the following criteria:
>>>
>>> 1. guaranteed converge, with any guest write load.
>>> Both current and my proposed variants are OK.
>>>
>>> 2. Less impact on guest.
>>> Obviously my proposed variant is better
>>>
>>> 3. Total time of mirroring
>>> Seems, current may be a bit better, but I don't think that unaligned
>>> tails gives significant impact.
>>>
>>> ===
>>>
>>> So, assume I want [1]+[2]. And possibly
>>> 2.2: Even less impact on guest: ignore not only unaligned tails if they are
>>> already dirty, but full synchronous mirror operation if area is already dirty.
>>>
>>> How should I call this? Should it be separate mode, or option for write-blocking?
>>
>> I don’t know whether it makes sense to add a separate mode or a separate
>> option just for this difference.  I don’t think anyone would choose the
>> non-default option.
> 
> At least Virtuozzo will choose :)

But if Virtuozzo knows exactly what to choose, then we can just
implement only that behavior.

Max
Vladimir Sementsov-Ogievskiy Oct. 4, 2019, 3:38 p.m. UTC | #11
04.10.2019 18:27, Max Reitz wrote:
> On 04.10.19 17:04, Vladimir Sementsov-Ogievskiy wrote:
>> 04.10.2019 17:48, Max Reitz wrote:
>>> On 04.10.19 15:22, Vladimir Sementsov-Ogievskiy wrote:
>>>> 04.10.2019 15:59, Max Reitz wrote:
>>>>> On 03.10.19 11:34, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 02.10.2019 18:52, Max Reitz wrote:
>>>>>>> On 02.10.19 17:06, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>> 02.10.2019 18:03, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>> 02.10.2019 17:57, Max Reitz wrote:
>>>>>>>>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>>>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>>>>>>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>>>>>>>>> and assume them copied, which actually leads to producing corrupted
>>>>>>>>>>> target.
>>>>>>>>>>>
>>>>>>>>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>>>>>>>>> request_alignment for mirror-top filter, so we are not working with
>>>>>>>>>>> unaligned requests. However forcing large alignment obviously decreases
>>>>>>>>>>> performance of unaligned requests.
>>>>>>>>>>>
>>>>>>>>>>> This commit provides another solution for the problem: if unaligned
>>>>>>>>>>> padding is already dirty, we can safely ignore it, as
>>>>>>>>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>>>>>>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>>>>>>>>         bitmap and therefore don't damage "synchronicity" of the
>>>>>>>>>>>         write-blocking mirror.
>>>>>>>>>>
>>>>>>>>>> But that’s not what active mirror is for.  The point of active mirror is
>>>>>>>>>> that it must converge because every guest write will contribute towards
>>>>>>>>>> that goal.
>>>>>>>>>>
>>>>>>>>>> If you skip active mirroring for unaligned guest writes, they will not
>>>>>>>>>> contribute towards converging, but in fact lead to the opposite.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The will not contribute only if region is already dirty. Actually, after
>>>>>>>>> first iteration of mirroring (copying the whole disk), all following writes
>>>>>>>>> will contribute, so the whole process must converge. It is a bit similar with
>>>>>>>>> running one mirror loop in normal mode, and then enable write-blocking.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In other words, we don't need "all guest writes contribute" to converge,
>>>>>>>> "all guest writes don't create new dirty bits" is enough, as we have parallel
>>>>>>>> mirror iteration which contiguously handles dirty bits.
>>>>>>>
>>>>>>> Hm, in a sense.  But it does mean that guest writes will not contribute
>>>>>>> to convergence.
>>>>>>>
>>>>>>> And that’s against the current definition of write-blocking, which does
>>>>>>> state that “when data is written to the source, write it (synchronously)
>>>>>>> to the target as well”.
>>>>>>>
>>>>>>
>>>>>> Hmm, understand. But IMHO our proposed behavior is better in general.
>>>>>> Do you think it's a problem to change spec now?
>>>>>> If yes, I'll resend with an option
>>>>>
>>>>> Well, the thing is that I’d find it weird if write-blocking wasn’t
>>>>> blocking in all cases.  And in my opinion, it makes more sense for
>>>>> active mirror if all writes actively contributed to convergence.
>>>>>
>>>>
>>>> Why? What is the benefit in it?
>>>> What is "all writes actively contributed to convergence" for user?
>>>
>>> One thing I wonder about is whether it’s really guaranteed that the
>>> background job will run under full I/O load, and how often it runs.
>>>
>>> I fear that with your model, the background job might starve and the
>>> mirror may take a very long time.
>>
>> Hmmm. I think mirror job is in same context as guest writes, and goes
>> through same IO api.. Why it will starve? (I understand that my words
>> are not an evidence...).
> 
> I thought that maybe if the disk is read to write and write all the
> time, there’d be no time for the mirror coroutine.
> 
>>>   It won’t diverge, but it also won’t
>>> really converge.
>>
>> But same will be with current behavior: guest is not guaranteed to write
>> to all parts of disk. And in most scenarios it doesn't. So, if mirror job
>> starve because of huge guest IO load, we will not converge anyway.
>>
>> So, background process is necessary thing for converge anyway.
> 
> Good point.  That convinces me.
> 
>>> The advantage of letting all writes block is that even under full I/O
>>> load, the mirror job will progress at a steady pace.
>>>
>>>> I think for user there may be the following criteria:
>>>>
>>>> 1. guaranteed converge, with any guest write load.
>>>> Both current and my proposed variants are OK.
>>>>
>>>> 2. Less impact on guest.
>>>> Obviously my proposed variant is better
>>>>
>>>> 3. Total time of mirroring
>>>> Seems, current may be a bit better, but I don't think that unaligned
>>>> tails gives significant impact.
>>>>
>>>> ===
>>>>
>>>> So, assume I want [1]+[2]. And possibly
>>>> 2.2: Even less impact on guest: ignore not only unaligned tails if they are
>>>> already dirty, but full synchronous mirror operation if area is already dirty.
>>>>
>>>> How should I call this? Should it be separate mode, or option for write-blocking?
>>>
>>> I don’t know whether it makes sense to add a separate mode or a separate
>>> option just for this difference.  I don’t think anyone would choose the
>>> non-default option.
>>
>> At least Virtuozzo will choose :)
> 
> But if Virtuozzo knows exactly what to choose, then we can just
> implement only that behavior.
> 

Then, welcome to these series)

Still [2.2] is arguable: if we skip some dirty areas on synchronous copy, guest
will be a bit happier, but we miss a chance to skip read part of copying for the
mirror part. But this may be postponed.
Max Reitz Oct. 4, 2019, 4:31 p.m. UTC | #12
On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
> region in the dirty bitmap, which means that we may not copy some bytes
> and assume them copied, which actually leads to producing corrupted
> target.
> 
> So 9adc1cb49af8d forced dirty bitmap granularity to be
> request_alignment for mirror-top filter, so we are not working with
> unaligned requests. However forcing large alignment obviously decreases
> performance of unaligned requests.
> 
> This commit provides another solution for the problem: if unaligned
> padding is already dirty, we can safely ignore it, as
> 1. It's dirty, it will be copied by mirror_iteration anyway
> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>    bitmap and therefore don't damage "synchronicity" of the
>    write-blocking mirror.
> 
> If unaligned padding is not dirty, we just write it, no reason to touch
> dirty bitmap if we succeed (on failure we'll set the whole region
> ofcourse, but we loss "synchronicity" on failure anyway).
> 
> Note: we need to disable dirty_bitmap, otherwise we will not be able to
> see in do_sync_target_write bitmap state before current operation. We
> may of course check dirty bitmap before the operation in
> bdrv_mirror_top_do_write and remember it, but we don't need active
> dirty bitmap for write-blocking mirror anyway.
> 
> New code-path is unused until the following commit reverts
> 9adc1cb49af8d.
> 
> Suggested-by: Denis V. Lunev <den@openvz.org>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index d176bf5920..d192f6a96b 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -1204,6 +1204,39 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>                       QEMUIOVector *qiov, int flags)
>  {
>      int ret;
> +    size_t qiov_offset = 0;
> +
> +    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
> +            /*
> +             * Dirty unaligned padding
> +             * 1. It's already dirty, no damage to "actively_synced" if we just
> +             *    skip unaligned part.
> +             * 2. If we copy it, we can't reset corresponding bit in
> +             *    dirty_bitmap as there may be some "dirty" bytes still not
> +             *    copied.
> +             * So, just ignore it.
> +             */
> +            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
> +            if (bytes <= qiov_offset) {
> +                /* nothing to do after shrink */
> +                return;
> +            }
> +            offset += qiov_offset;
> +            bytes -= qiov_offset;
> +    }
> +
> +    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
> +    {
> +        uint64_t tail = (offset + bytes) % job->granularity;
> +
> +        if (bytes <= tail) {
> +            /* nothing to do after shrink */
> +            return;
> +        }
> +        bytes -= tail;
> +    }
>  
>      bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
>  

The bdrv_set_dirty_bitmap() in the error case below needs to use the
original offset/bytes, I suppose.

Apart from that, looks good to me.

Max
Vladimir Sementsov-Ogievskiy Oct. 11, 2019, 8:33 a.m. UTC | #13
04.10.2019 19:31, Max Reitz wrote:
> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>> region in the dirty bitmap, which means that we may not copy some bytes
>> and assume them copied, which actually leads to producing corrupted
>> target.
>>
>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>> request_alignment for mirror-top filter, so we are not working with
>> unaligned requests. However forcing large alignment obviously decreases
>> performance of unaligned requests.
>>
>> This commit provides another solution for the problem: if unaligned
>> padding is already dirty, we can safely ignore it, as
>> 1. It's dirty, it will be copied by mirror_iteration anyway
>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>     bitmap and therefore don't damage "synchronicity" of the
>>     write-blocking mirror.
>>
>> If unaligned padding is not dirty, we just write it, no reason to touch
>> dirty bitmap if we succeed (on failure we'll set the whole region
>> ofcourse, but we loss "synchronicity" on failure anyway).
>>
>> Note: we need to disable dirty_bitmap, otherwise we will not be able to
>> see in do_sync_target_write bitmap state before current operation. We
>> may of course check dirty bitmap before the operation in
>> bdrv_mirror_top_do_write and remember it, but we don't need active
>> dirty bitmap for write-blocking mirror anyway.
>>
>> New code-path is unused until the following commit reverts
>> 9adc1cb49af8d.
>>
>> Suggested-by: Denis V. Lunev <den@openvz.org>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 38 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/mirror.c b/block/mirror.c
>> index d176bf5920..d192f6a96b 100644
>> --- a/block/mirror.c
>> +++ b/block/mirror.c
>> @@ -1204,6 +1204,39 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>>                        QEMUIOVector *qiov, int flags)
>>   {
>>       int ret;
>> +    size_t qiov_offset = 0;
>> +
>> +    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
>> +            /*
>> +             * Dirty unaligned padding
>> +             * 1. It's already dirty, no damage to "actively_synced" if we just
>> +             *    skip unaligned part.
>> +             * 2. If we copy it, we can't reset corresponding bit in
>> +             *    dirty_bitmap as there may be some "dirty" bytes still not
>> +             *    copied.
>> +             * So, just ignore it.
>> +             */
>> +            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
>> +            if (bytes <= qiov_offset) {
>> +                /* nothing to do after shrink */
>> +                return;
>> +            }
>> +            offset += qiov_offset;
>> +            bytes -= qiov_offset;
>> +    }
>> +
>> +    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
>> +    {
>> +        uint64_t tail = (offset + bytes) % job->granularity;
>> +
>> +        if (bytes <= tail) {
>> +            /* nothing to do after shrink */
>> +            return;
>> +        }
>> +        bytes -= tail;
>> +    }
>>   
>>       bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
>>   
> 
> The bdrv_set_dirty_bitmap() in the error case below needs to use the
> original offset/bytes, I suppose.

No, because we shrink tail only if it is already dirty. And we've locked the
region for in-flight operation, so nobody can clear the bitmap in a meantime.

But still, here is something to do:

for not-shrinked tails, if any, we should:
1. align down for reset
2. align up for set on failure


> 
> Apart from that, looks good to me.
> 
> Max
>
Max Reitz Oct. 11, 2019, 8:58 a.m. UTC | #14
On 11.10.19 10:33, Vladimir Sementsov-Ogievskiy wrote:
> 04.10.2019 19:31, Max Reitz wrote:
>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>> region in the dirty bitmap, which means that we may not copy some bytes
>>> and assume them copied, which actually leads to producing corrupted
>>> target.
>>>
>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>> request_alignment for mirror-top filter, so we are not working with
>>> unaligned requests. However forcing large alignment obviously decreases
>>> performance of unaligned requests.
>>>
>>> This commit provides another solution for the problem: if unaligned
>>> padding is already dirty, we can safely ignore it, as
>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>     bitmap and therefore don't damage "synchronicity" of the
>>>     write-blocking mirror.
>>>
>>> If unaligned padding is not dirty, we just write it, no reason to touch
>>> dirty bitmap if we succeed (on failure we'll set the whole region
>>> ofcourse, but we loss "synchronicity" on failure anyway).
>>>
>>> Note: we need to disable dirty_bitmap, otherwise we will not be able to
>>> see in do_sync_target_write bitmap state before current operation. We
>>> may of course check dirty bitmap before the operation in
>>> bdrv_mirror_top_do_write and remember it, but we don't need active
>>> dirty bitmap for write-blocking mirror anyway.
>>>
>>> New code-path is unused until the following commit reverts
>>> 9adc1cb49af8d.
>>>
>>> Suggested-by: Denis V. Lunev <den@openvz.org>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 38 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/mirror.c b/block/mirror.c
>>> index d176bf5920..d192f6a96b 100644
>>> --- a/block/mirror.c
>>> +++ b/block/mirror.c
>>> @@ -1204,6 +1204,39 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>>>                        QEMUIOVector *qiov, int flags)
>>>   {
>>>       int ret;
>>> +    size_t qiov_offset = 0;
>>> +
>>> +    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
>>> +            /*
>>> +             * Dirty unaligned padding
>>> +             * 1. It's already dirty, no damage to "actively_synced" if we just
>>> +             *    skip unaligned part.
>>> +             * 2. If we copy it, we can't reset corresponding bit in
>>> +             *    dirty_bitmap as there may be some "dirty" bytes still not
>>> +             *    copied.
>>> +             * So, just ignore it.
>>> +             */
>>> +            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
>>> +            if (bytes <= qiov_offset) {
>>> +                /* nothing to do after shrink */
>>> +                return;
>>> +            }
>>> +            offset += qiov_offset;
>>> +            bytes -= qiov_offset;
>>> +    }
>>> +
>>> +    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
>>> +    {
>>> +        uint64_t tail = (offset + bytes) % job->granularity;
>>> +
>>> +        if (bytes <= tail) {
>>> +            /* nothing to do after shrink */
>>> +            return;
>>> +        }
>>> +        bytes -= tail;
>>> +    }
>>>   
>>>       bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
>>>   
>>
>> The bdrv_set_dirty_bitmap() in the error case below needs to use the
>> original offset/bytes, I suppose.
> 
> No, because we shrink tail only if it is already dirty. And we've locked the
> region for in-flight operation, so nobody can clear the bitmap in a meantime.

True.  But wouldn’t it be simpler to understand to just use the original
offsets?

> But still, here is something to do:
> 
> for not-shrinked tails, if any, we should:
> 1. align down for reset
> 2. align up for set on failure

Well, the align up is done automatically, and I think that’s pretty
self-explanatory.

Max

>>
>> Apart from that, looks good to me.
>>
>> Max
>>
> 
>
Vladimir Sementsov-Ogievskiy Oct. 11, 2019, 9:09 a.m. UTC | #15
11.10.2019 11:58, Max Reitz wrote:
> On 11.10.19 10:33, Vladimir Sementsov-Ogievskiy wrote:
>> 04.10.2019 19:31, Max Reitz wrote:
>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>> and assume them copied, which actually leads to producing corrupted
>>>> target.
>>>>
>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>> request_alignment for mirror-top filter, so we are not working with
>>>> unaligned requests. However forcing large alignment obviously decreases
>>>> performance of unaligned requests.
>>>>
>>>> This commit provides another solution for the problem: if unaligned
>>>> padding is already dirty, we can safely ignore it, as
>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>      bitmap and therefore don't damage "synchronicity" of the
>>>>      write-blocking mirror.
>>>>
>>>> If unaligned padding is not dirty, we just write it, no reason to touch
>>>> dirty bitmap if we succeed (on failure we'll set the whole region
>>>> ofcourse, but we loss "synchronicity" on failure anyway).
>>>>
>>>> Note: we need to disable dirty_bitmap, otherwise we will not be able to
>>>> see in do_sync_target_write bitmap state before current operation. We
>>>> may of course check dirty bitmap before the operation in
>>>> bdrv_mirror_top_do_write and remember it, but we don't need active
>>>> dirty bitmap for write-blocking mirror anyway.
>>>>
>>>> New code-path is unused until the following commit reverts
>>>> 9adc1cb49af8d.
>>>>
>>>> Suggested-by: Denis V. Lunev <den@openvz.org>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 38 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index d176bf5920..d192f6a96b 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -1204,6 +1204,39 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>>>>                         QEMUIOVector *qiov, int flags)
>>>>    {
>>>>        int ret;
>>>> +    size_t qiov_offset = 0;
>>>> +
>>>> +    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
>>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
>>>> +            /*
>>>> +             * Dirty unaligned padding
>>>> +             * 1. It's already dirty, no damage to "actively_synced" if we just
>>>> +             *    skip unaligned part.
>>>> +             * 2. If we copy it, we can't reset corresponding bit in
>>>> +             *    dirty_bitmap as there may be some "dirty" bytes still not
>>>> +             *    copied.
>>>> +             * So, just ignore it.
>>>> +             */
>>>> +            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
>>>> +            if (bytes <= qiov_offset) {
>>>> +                /* nothing to do after shrink */
>>>> +                return;
>>>> +            }
>>>> +            offset += qiov_offset;
>>>> +            bytes -= qiov_offset;
>>>> +    }
>>>> +
>>>> +    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
>>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
>>>> +    {
>>>> +        uint64_t tail = (offset + bytes) % job->granularity;
>>>> +
>>>> +        if (bytes <= tail) {
>>>> +            /* nothing to do after shrink */
>>>> +            return;
>>>> +        }
>>>> +        bytes -= tail;
>>>> +    }
>>>>    
>>>>        bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
>>>>    
>>>
>>> The bdrv_set_dirty_bitmap() in the error case below needs to use the
>>> original offset/bytes, I suppose.
>>
>> No, because we shrink tail only if it is already dirty. And we've locked the
>> region for in-flight operation, so nobody can clear the bitmap in a meantime.
> 
> True.  But wouldn’t it be simpler to understand to just use the original
> offsets?
> 
>> But still, here is something to do:
>>
>> for not-shrinked tails, if any, we should:
>> 1. align down for reset
>> 2. align up for set on failure
> 
> Well, the align up is done automatically, and I think that’s pretty
> self-explanatory.

Patch to make hbitmap_reset very strict (assert on analigned except for the
very end of the bitmap, if orig_size is unaligned) is queued by John.
So, I've made explicit alignment in v2.

> 
>>>
>>> Apart from that, looks good to me.
>>>
>>> Max
>>>
>>
>>
> 
>
Vladimir Sementsov-Ogievskiy Oct. 11, 2019, 9:10 a.m. UTC | #16
11.10.2019 11:58, Max Reitz wrote:
> On 11.10.19 10:33, Vladimir Sementsov-Ogievskiy wrote:
>> 04.10.2019 19:31, Max Reitz wrote:
>>> On 12.09.19 17:13, Vladimir Sementsov-Ogievskiy wrote:
>>>> Prior 9adc1cb49af8d do_sync_target_write had a bug: it reset aligned-up
>>>> region in the dirty bitmap, which means that we may not copy some bytes
>>>> and assume them copied, which actually leads to producing corrupted
>>>> target.
>>>>
>>>> So 9adc1cb49af8d forced dirty bitmap granularity to be
>>>> request_alignment for mirror-top filter, so we are not working with
>>>> unaligned requests. However forcing large alignment obviously decreases
>>>> performance of unaligned requests.
>>>>
>>>> This commit provides another solution for the problem: if unaligned
>>>> padding is already dirty, we can safely ignore it, as
>>>> 1. It's dirty, it will be copied by mirror_iteration anyway
>>>> 2. It's dirty, so skipping it now we don't increase dirtiness of the
>>>>      bitmap and therefore don't damage "synchronicity" of the
>>>>      write-blocking mirror.
>>>>
>>>> If unaligned padding is not dirty, we just write it, no reason to touch
>>>> dirty bitmap if we succeed (on failure we'll set the whole region
>>>> ofcourse, but we loss "synchronicity" on failure anyway).
>>>>
>>>> Note: we need to disable dirty_bitmap, otherwise we will not be able to
>>>> see in do_sync_target_write bitmap state before current operation. We
>>>> may of course check dirty bitmap before the operation in
>>>> bdrv_mirror_top_do_write and remember it, but we don't need active
>>>> dirty bitmap for write-blocking mirror anyway.
>>>>
>>>> New code-path is unused until the following commit reverts
>>>> 9adc1cb49af8d.
>>>>
>>>> Suggested-by: Denis V. Lunev <den@openvz.org>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>> ---
>>>>    block/mirror.c | 39 ++++++++++++++++++++++++++++++++++++++-
>>>>    1 file changed, 38 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/block/mirror.c b/block/mirror.c
>>>> index d176bf5920..d192f6a96b 100644
>>>> --- a/block/mirror.c
>>>> +++ b/block/mirror.c
>>>> @@ -1204,6 +1204,39 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>>>>                         QEMUIOVector *qiov, int flags)
>>>>    {
>>>>        int ret;
>>>> +    size_t qiov_offset = 0;
>>>> +
>>>> +    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
>>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
>>>> +            /*
>>>> +             * Dirty unaligned padding
>>>> +             * 1. It's already dirty, no damage to "actively_synced" if we just
>>>> +             *    skip unaligned part.
>>>> +             * 2. If we copy it, we can't reset corresponding bit in
>>>> +             *    dirty_bitmap as there may be some "dirty" bytes still not
>>>> +             *    copied.
>>>> +             * So, just ignore it.
>>>> +             */
>>>> +            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
>>>> +            if (bytes <= qiov_offset) {
>>>> +                /* nothing to do after shrink */
>>>> +                return;
>>>> +            }
>>>> +            offset += qiov_offset;
>>>> +            bytes -= qiov_offset;
>>>> +    }
>>>> +
>>>> +    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
>>>> +        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
>>>> +    {
>>>> +        uint64_t tail = (offset + bytes) % job->granularity;
>>>> +
>>>> +        if (bytes <= tail) {
>>>> +            /* nothing to do after shrink */
>>>> +            return;
>>>> +        }
>>>> +        bytes -= tail;
>>>> +    }
>>>>    
>>>>        bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
>>>>    
>>>
>>> The bdrv_set_dirty_bitmap() in the error case below needs to use the
>>> original offset/bytes, I suppose.
>>
>> No, because we shrink tail only if it is already dirty. And we've locked the
>> region for in-flight operation, so nobody can clear the bitmap in a meantime.
> 
> True.  But wouldn’t it be simpler to understand to just use the original
> offsets?

then we need to store them.. And if we do it, I'd prefer to add an assertion that
shrunk tails are still dirty instead.

> 
>> But still, here is something to do:
>>
>> for not-shrinked tails, if any, we should:
>> 1. align down for reset
>> 2. align up for set on failure
> 
> Well, the align up is done automatically, and I think that’s pretty
> self-explanatory.
> 
> Max
> 
>>>
>>> Apart from that, looks good to me.
>>>
>>> Max
>>>
>>
>>
> 
>
diff mbox series

Patch

diff --git a/block/mirror.c b/block/mirror.c
index d176bf5920..d192f6a96b 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1204,6 +1204,39 @@  do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
                      QEMUIOVector *qiov, int flags)
 {
     int ret;
+    size_t qiov_offset = 0;
+
+    if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
+        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset)) {
+            /*
+             * Dirty unaligned padding
+             * 1. It's already dirty, no damage to "actively_synced" if we just
+             *    skip unaligned part.
+             * 2. If we copy it, we can't reset corresponding bit in
+             *    dirty_bitmap as there may be some "dirty" bytes still not
+             *    copied.
+             * So, just ignore it.
+             */
+            qiov_offset = QEMU_ALIGN_UP(offset, job->granularity) - offset;
+            if (bytes <= qiov_offset) {
+                /* nothing to do after shrink */
+                return;
+            }
+            offset += qiov_offset;
+            bytes -= qiov_offset;
+    }
+
+    if (!QEMU_IS_ALIGNED(offset + bytes, job->granularity) &&
+        bdrv_dirty_bitmap_get(job->dirty_bitmap, offset + bytes - 1))
+    {
+        uint64_t tail = (offset + bytes) % job->granularity;
+
+        if (bytes <= tail) {
+            /* nothing to do after shrink */
+            return;
+        }
+        bytes -= tail;
+    }
 
     bdrv_reset_dirty_bitmap(job->dirty_bitmap, offset, bytes);
 
@@ -1211,7 +1244,8 @@  do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
 
     switch (method) {
     case MIRROR_METHOD_COPY:
-        ret = blk_co_pwritev(job->target, offset, bytes, qiov, flags);
+        ret = blk_co_pwritev_part(job->target, offset, bytes,
+                                  qiov, qiov_offset, flags);
         break;
 
     case MIRROR_METHOD_ZERO:
@@ -1640,6 +1674,9 @@  static BlockJob *mirror_start_job(
     if (!s->dirty_bitmap) {
         goto fail;
     }
+    if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
+        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+    }
 
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,
                              BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |