diff mbox series

kvm: Take into account the unaligned section size when preparing bitmap

Message ID 20201208114013.875-1-yuzenghui@huawei.com
State New
Headers show
Series kvm: Take into account the unaligned section size when preparing bitmap | expand

Commit Message

Zenghui Yu Dec. 8, 2020, 11:40 a.m. UTC
The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
start and the size of the given range of pages. We have been careful to
handle the unaligned cases when performing CLEAR on one slot. But it seems
that we forget to take the unaligned *size* case into account when
preparing bitmap for the interface, and we may end up clearing dirty status
for pages outside of [start, start + size).

If the size is unaligned, let's go through the slow path to manipulate a
temp bitmap for the interface so that we won't bother with those unaligned
bits at the end of bitmap.

I don't think this can happen in practice since the upper layer would
provide us with the alignment guarantee. I'm not sure if kvm-all could rely
on it. And this patch is mainly intended to address correctness of the
specific algorithm used inside kvm_log_clear_one_slot().

Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
---
 accel/kvm/kvm-all.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Peter Xu Dec. 8, 2020, 3:16 p.m. UTC | #1
Hi, Zenghui,

On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
> start and the size of the given range of pages. We have been careful to
> handle the unaligned cases when performing CLEAR on one slot. But it seems
> that we forget to take the unaligned *size* case into account when
> preparing bitmap for the interface, and we may end up clearing dirty status
> for pages outside of [start, start + size).

Thanks for the patch, though my understanding is that this is not a bug.

Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
value sizeof(unsigned long)).  That exactly covers 64 pages.

So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
long enough to cover the range we'd like to clear.

Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size
of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized
it to all zero so the extra bits will always be zero for the whole lifecycle of
the vm/bitmap.

Thanks,

> 
> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
> ---
>  accel/kvm/kvm-all.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index bed2455ca5..05d323ba1f 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>      assert(bmap_start % BITS_PER_LONG == 0);
>      /* We should never do log_clear before log_sync */
>      assert(mem->dirty_bmap);
> -    if (start_delta) {
> +    if (start_delta || bmap_npages - size / psize) {
>          /* Slow path - we need to manipulate a temp bitmap */
>          bmap_clear = bitmap_new(bmap_npages);
>          bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>          bitmap_clear(bmap_clear, 0, start_delta);
>          d.dirty_bitmap = bmap_clear;
>      } else {
> -        /* Fast path - start address aligns well with BITS_PER_LONG */
> +        /*
> +         * Fast path - both start and size align well with BITS_PER_LONG
> +         * (or the end of memory slot)
> +         */
>          d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
>      }
>  
> -- 
> 2.19.1
> 
>
Zenghui Yu Dec. 9, 2020, 2:33 a.m. UTC | #2
Hi Peter,

Thanks for having a look at it.

On 2020/12/8 23:16, Peter Xu wrote:
> Hi, Zenghui,
> 
> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
>> start and the size of the given range of pages. We have been careful to
>> handle the unaligned cases when performing CLEAR on one slot. But it seems
>> that we forget to take the unaligned *size* case into account when
>> preparing bitmap for the interface, and we may end up clearing dirty status
>> for pages outside of [start, start + size).
> 
> Thanks for the patch, though my understanding is that this is not a bug.
> 
> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
> value sizeof(unsigned long)).  That exactly covers 64 pages.
> 
> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
> long enough to cover the range we'd like to clear.

I agree.  But actually I'm not saying that KVMSlot.dirty_bmap is not
long enough.  What I was having in mind is something like:

     // psize = qemu_real_host_page_size;
     // slot.start_addr = 0;
     // slot.memory_size = 64 * psize;

     kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize);   --> [1]
     kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize);  --> [2]

So the @size is not aligned with 64 pages.  Before this patch, we'll
clear dirty status for all pages(0-63) through [1].  It looks to me that
this violates the caller's expectation since we only want to clear
pages(0-31).

As I said, I don't think this will happen in practice -- the migration
code should always provide us with a 64-page aligned section (right?).
I'm just thinking about the correctness of the specific algorithm used
by kvm_log_clear_one_slot().

Or maybe I had missed some other points obvious ;-) ?


Thanks,
Zenghui

> Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size
> of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized
> it to all zero so the extra bits will always be zero for the whole lifecycle of
> the vm/bitmap.
> 
> Thanks,
> 
>>
>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>> ---
>>   accel/kvm/kvm-all.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>> index bed2455ca5..05d323ba1f 100644
>> --- a/accel/kvm/kvm-all.c
>> +++ b/accel/kvm/kvm-all.c
>> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>       assert(bmap_start % BITS_PER_LONG == 0);
>>       /* We should never do log_clear before log_sync */
>>       assert(mem->dirty_bmap);
>> -    if (start_delta) {
>> +    if (start_delta || bmap_npages - size / psize) {
>>           /* Slow path - we need to manipulate a temp bitmap */
>>           bmap_clear = bitmap_new(bmap_npages);
>>           bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
>> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>           bitmap_clear(bmap_clear, 0, start_delta);
>>           d.dirty_bitmap = bmap_clear;
>>       } else {
>> -        /* Fast path - start address aligns well with BITS_PER_LONG */
>> +        /*
>> +         * Fast path - both start and size align well with BITS_PER_LONG
>> +         * (or the end of memory slot)
>> +         */
>>           d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
>>       }
>>   
>> -- 
>> 2.19.1
>>
>>
>
Peter Xu Dec. 9, 2020, 9:09 p.m. UTC | #3
On Wed, Dec 09, 2020 at 10:33:41AM +0800, Zenghui Yu wrote:
> Hi Peter,
> 
> Thanks for having a look at it.
> 
> On 2020/12/8 23:16, Peter Xu wrote:
> > Hi, Zenghui,
> > 
> > On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
> > > The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
> > > start and the size of the given range of pages. We have been careful to
> > > handle the unaligned cases when performing CLEAR on one slot. But it seems
> > > that we forget to take the unaligned *size* case into account when
> > > preparing bitmap for the interface, and we may end up clearing dirty status
> > > for pages outside of [start, start + size).
> > 
> > Thanks for the patch, though my understanding is that this is not a bug.
> > 
> > Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
> > dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
> > value sizeof(unsigned long)).  That exactly covers 64 pages.
> > 
> > So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
> > won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
> > long enough to cover the range we'd like to clear.
> 
> I agree.  But actually I'm not saying that KVMSlot.dirty_bmap is not
> long enough.  What I was having in mind is something like:
> 
>     // psize = qemu_real_host_page_size;
>     // slot.start_addr = 0;
>     // slot.memory_size = 64 * psize;
> 
>     kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize);   --> [1]
>     kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize);  --> [2]
> 
> So the @size is not aligned with 64 pages.  Before this patch, we'll
> clear dirty status for all pages(0-63) through [1].  It looks to me that
> this violates the caller's expectation since we only want to clear
> pages(0-31).

Now I see; I think you're right. :)

> 
> As I said, I don't think this will happen in practice -- the migration
> code should always provide us with a 64-page aligned section (right?).

Yes, migration is the major consumer, and that should be guaranteed indeed, see
CLEAR_BITMAP_SHIFT_MIN.

Not sure about VGA - that should try to do log clear even without migration,
but I guess that satisfies the 64-page alignment too, since it's not a huge
number (256KB).  The VGA buffer size could be related to screen resolution,
then N*1024*768 could still guarantee a safe use of the fast path.

> I'm just thinking about the correctness of the specific algorithm used
> by kvm_log_clear_one_slot().

Yeah, then I think it's okay to have this, just in case someday we'll hit it.

Acked-by: Peter Xu <peterx@redhat.com>

(It would be nicer if above example could be squashed into commit message)

Thanks,
Keqian Zhu Dec. 10, 2020, 1:46 a.m. UTC | #4
Hi,

I see that if start or size is not PAGE aligned, it also clears areas
which beyond caller's expectation, so do we also need to consider this?

Thanks,
Keqian

On 2020/12/9 10:33, Zenghui Yu wrote:
> Hi Peter,
> 
> Thanks for having a look at it.
> 
> On 2020/12/8 23:16, Peter Xu wrote:
>> Hi, Zenghui,
>>
>> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
>>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
>>> start and the size of the given range of pages. We have been careful to
>>> handle the unaligned cases when performing CLEAR on one slot. But it seems
>>> that we forget to take the unaligned *size* case into account when
>>> preparing bitmap for the interface, and we may end up clearing dirty status
>>> for pages outside of [start, start + size).
>>
>> Thanks for the patch, though my understanding is that this is not a bug.
>>
>> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
>> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
>> value sizeof(unsigned long)).  That exactly covers 64 pages.
>>
>> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
>> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
>> long enough to cover the range we'd like to clear.
> 
> I agree.  But actually I'm not saying that KVMSlot.dirty_bmap is not
> long enough.  What I was having in mind is something like:
> 
>     // psize = qemu_real_host_page_size;
>     // slot.start_addr = 0;
>     // slot.memory_size = 64 * psize;
> 
>     kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize);   --> [1]
>     kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize);  --> [2]
> 
> So the @size is not aligned with 64 pages.  Before this patch, we'll
> clear dirty status for all pages(0-63) through [1].  It looks to me that
> this violates the caller's expectation since we only want to clear
> pages(0-31).
> 
> As I said, I don't think this will happen in practice -- the migration
> code should always provide us with a 64-page aligned section (right?).
> I'm just thinking about the correctness of the specific algorithm used
> by kvm_log_clear_one_slot().
> 
> Or maybe I had missed some other points obvious ;-) ?
> 
> 
> Thanks,
> Zenghui
> 
>> Note that the size of KVMSlot.dirty_bmap can be bigger than the actually size
>> of the kvm memslot, however since kvm_memslot_init_dirty_bitmap() initialized
>> it to all zero so the extra bits will always be zero for the whole lifecycle of
>> the vm/bitmap.
>>
>> Thanks,
>>
>>>
>>> Signed-off-by: Zenghui Yu <yuzenghui@huawei.com>
>>> ---
>>>   accel/kvm/kvm-all.c | 7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
>>> index bed2455ca5..05d323ba1f 100644
>>> --- a/accel/kvm/kvm-all.c
>>> +++ b/accel/kvm/kvm-all.c
>>> @@ -747,7 +747,7 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>       assert(bmap_start % BITS_PER_LONG == 0);
>>>       /* We should never do log_clear before log_sync */
>>>       assert(mem->dirty_bmap);
>>> -    if (start_delta) {
>>> +    if (start_delta || bmap_npages - size / psize) {
>>>           /* Slow path - we need to manipulate a temp bitmap */
>>>           bmap_clear = bitmap_new(bmap_npages);
>>>           bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
>>> @@ -760,7 +760,10 @@ static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
>>>           bitmap_clear(bmap_clear, 0, start_delta);
>>>           d.dirty_bitmap = bmap_clear;
>>>       } else {
>>> -        /* Fast path - start address aligns well with BITS_PER_LONG */
>>> +        /*
>>> +         * Fast path - both start and size align well with BITS_PER_LONG
>>> +         * (or the end of memory slot)
>>> +         */
>>>           d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
>>>       }
>>>   -- 
>>> 2.19.1
>>>
>>>
>>
> 
> .
>
Peter Xu Dec. 10, 2020, 2:08 a.m. UTC | #5
Keqian,

On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
> Hi,
> 
> I see that if start or size is not PAGE aligned, it also clears areas
> which beyond caller's expectation, so do we also need to consider this?

Could you elaborate?

If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.

Thanks,
Keqian Zhu Dec. 10, 2020, 2:53 a.m. UTC | #6
On 2020/12/10 10:08, Peter Xu wrote:
> Keqian,
> 
> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
>> Hi,
>>
>> I see that if start or size is not PAGE aligned, it also clears areas
>> which beyond caller's expectation, so do we also need to consider this?
> 
> Could you elaborate?
> 
> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
> 
> Thanks,
> 

Hi Peter,

start_delta /= psize;

If start is not PAGE aligned, then start_delta is not PAGE aligned.
so I think the above code will implicitly extend our start to be PAGE aligned.

I suggest that we should shrink the start and (start + size) to be PAGE aligned
at beginning of this function.

Maybe I miss something?

Keqian.
Zenghui Yu Dec. 10, 2020, 3:31 a.m. UTC | #7
On 2020/12/10 10:53, zhukeqian wrote:
> 
> 
> On 2020/12/10 10:08, Peter Xu wrote:
>> Keqian,
>>
>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
>>> Hi,
>>>
>>> I see that if start or size is not PAGE aligned, it also clears areas
>>> which beyond caller's expectation, so do we also need to consider this?

Ah, I really don't anticipate this to happen but ... The question is
what's the expectation of the caller if a non-PAGE aligned @start is
given. Should we clear dirty state for the page which covers the
unaligned @start? Or just skip it?

>>
>> Could you elaborate?
>>
>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
>>
>> Thanks,
>>
> 
> Hi Peter,
> 
> start_delta /= psize;
> 
> If start is not PAGE aligned, then start_delta is not PAGE aligned.
> so I think the above code will implicitly extend our start to be PAGE aligned.
> 
> I suggest that we should shrink the start and (start + size) to be PAGE aligned
> at beginning of this function.

I don't object to do so (though haven't checked the code carefully).
I also don't think there is much we can do given that the caller
doesn't care about which *exact pages* to clear.


Thanks,
Zenghui
Zenghui Yu Dec. 10, 2020, 4:23 a.m. UTC | #8
On 2020/12/10 5:09, Peter Xu wrote:
> On Wed, Dec 09, 2020 at 10:33:41AM +0800, Zenghui Yu wrote:
>> Hi Peter,
>>
>> Thanks for having a look at it.
>>
>> On 2020/12/8 23:16, Peter Xu wrote:
>>> Hi, Zenghui,
>>>
>>> On Tue, Dec 08, 2020 at 07:40:13PM +0800, Zenghui Yu wrote:
>>>> The kernel KVM_CLEAR_DIRTY_LOG interface has align requirement on both the
>>>> start and the size of the given range of pages. We have been careful to
>>>> handle the unaligned cases when performing CLEAR on one slot. But it seems
>>>> that we forget to take the unaligned *size* case into account when
>>>> preparing bitmap for the interface, and we may end up clearing dirty status
>>>> for pages outside of [start, start + size).
>>>
>>> Thanks for the patch, though my understanding is that this is not a bug.
>>>
>>> Please have a look at kvm_memslot_init_dirty_bitmap() where we'll allocate the
>>> dirty bitmap to be aligned to 8 bytes (assuming that's the possible max of the
>>> value sizeof(unsigned long)).  That exactly covers 64 pages.
>>>
>>> So here as long as start_delta==0 (so the value of "bmap_npages - size / psize"
>>> won't really matter a lot, imho), then we'll definitely have KVMSlot.dirty_bmap
>>> long enough to cover the range we'd like to clear.
>>
>> I agree.  But actually I'm not saying that KVMSlot.dirty_bmap is not
>> long enough.  What I was having in mind is something like:
>>
>>      // psize = qemu_real_host_page_size;
>>      // slot.start_addr = 0;
>>      // slot.memory_size = 64 * psize;
>>
>>      kvm_log_clear_one_slot(slot, as, 0 * psize, 32 * psize);   --> [1]
>>      kvm_log_clear_one_slot(slot, as, 32 * psize, 32 * psize);  --> [2]
>>
>> So the @size is not aligned with 64 pages.  Before this patch, we'll
>> clear dirty status for all pages(0-63) through [1].  It looks to me that
>> this violates the caller's expectation since we only want to clear
>> pages(0-31).
> 
> Now I see; I think you're right. :)
> 
>>
>> As I said, I don't think this will happen in practice -- the migration
>> code should always provide us with a 64-page aligned section (right?).
> 
> Yes, migration is the major consumer, and that should be guaranteed indeed, see
> CLEAR_BITMAP_SHIFT_MIN.
> 
> Not sure about VGA - that should try to do log clear even without migration,
> but I guess that satisfies the 64-page alignment too, since it's not a huge
> number (256KB).  The VGA buffer size could be related to screen resolution,
> then N*1024*768 could still guarantee a safe use of the fast path.
> 
>> I'm just thinking about the correctness of the specific algorithm used
>> by kvm_log_clear_one_slot().
> 
> Yeah, then I think it's okay to have this, just in case someday we'll hit it.
> 
> Acked-by: Peter Xu <peterx@redhat.com>

Thanks for that.

> (It would be nicer if above example could be squashed into commit message)

I'll squash it if v2 is needed.


Thanks,
Zenghui
Peter Xu Dec. 10, 2020, 2:50 p.m. UTC | #9
On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote:
> 
> 
> On 2020/12/10 10:08, Peter Xu wrote:
> > Keqian,
> > 
> > On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
> >> Hi,
> >>
> >> I see that if start or size is not PAGE aligned, it also clears areas
> >> which beyond caller's expectation, so do we also need to consider this?
> > 
> > Could you elaborate?
> > 
> > If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
> > 
> > Thanks,
> > 
> 
> Hi Peter,
> 
> start_delta /= psize;
> 
> If start is not PAGE aligned, then start_delta is not PAGE aligned.
> so I think the above code will implicitly extend our start to be PAGE aligned.
> 
> I suggest that we should shrink the start and (start + size) to be PAGE aligned
> at beginning of this function.

Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64
should be pretty safe since host/guest page sizes match.

Though indeed I must confess I don't know how it worked in general when host
page size != target page size, at least for migration.  For example, I believe
kvm dirty logging is host page size based, though migration should be migrating
pages in guest page size granule when it spots a dirty bit set.
Keqian Zhu Dec. 11, 2020, 1:13 a.m. UTC | #10
On 2020/12/10 22:50, Peter Xu wrote:
> On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote:
>>
>>
>> On 2020/12/10 10:08, Peter Xu wrote:
>>> Keqian,
>>>
>>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
>>>> Hi,
>>>>
>>>> I see that if start or size is not PAGE aligned, it also clears areas
>>>> which beyond caller's expectation, so do we also need to consider this?
>>>
>>> Could you elaborate?
>>>
>>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
>>>
>>> Thanks,
>>>
>>
>> Hi Peter,
>>
>> start_delta /= psize;
>>
>> If start is not PAGE aligned, then start_delta is not PAGE aligned.
>> so I think the above code will implicitly extend our start to be PAGE aligned.
>>
>> I suggest that we should shrink the start and (start + size) to be PAGE aligned
>> at beginning of this function.
> 
> Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64
> should be pretty safe since host/guest page sizes match.
> 
> Though indeed I must confess I don't know how it worked in general when host
> page size != target page size, at least for migration.  For example, I believe
> kvm dirty logging is host page size based, though migration should be migrating
> pages in guest page size granule when it spots a dirty bit set.
> 
Hi,

Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better
add explicit function comments about alignment requirement, and explicit alignment assert
on @start and @size?

Keqian.
Peter Xu Dec. 11, 2020, 3:25 p.m. UTC | #11
On Fri, Dec 11, 2020 at 09:13:10AM +0800, zhukeqian wrote:
> 
> On 2020/12/10 22:50, Peter Xu wrote:
> > On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote:
> >>
> >>
> >> On 2020/12/10 10:08, Peter Xu wrote:
> >>> Keqian,
> >>>
> >>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
> >>>> Hi,
> >>>>
> >>>> I see that if start or size is not PAGE aligned, it also clears areas
> >>>> which beyond caller's expectation, so do we also need to consider this?
> >>>
> >>> Could you elaborate?
> >>>
> >>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
> >>>
> >>> Thanks,
> >>>
> >>
> >> Hi Peter,
> >>
> >> start_delta /= psize;
> >>
> >> If start is not PAGE aligned, then start_delta is not PAGE aligned.
> >> so I think the above code will implicitly extend our start to be PAGE aligned.
> >>
> >> I suggest that we should shrink the start and (start + size) to be PAGE aligned
> >> at beginning of this function.
> > 
> > Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64
> > should be pretty safe since host/guest page sizes match.
> > 
> > Though indeed I must confess I don't know how it worked in general when host
> > page size != target page size, at least for migration.  For example, I believe
> > kvm dirty logging is host page size based, though migration should be migrating
> > pages in guest page size granule when it spots a dirty bit set.
> > 
> Hi,
> 
> Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better
> add explicit function comments about alignment requirement, and explicit alignment assert
> on @start and @size?

Yes we can, but I think it's not strongly necessary.  As Zenghui pointed out,
the callers of memory_region_clear_dirty_bitmap() should always be aware of the
fact that dirty bitmap is always page size based.

OTOH I'm more worried on the other question on how we handle guest psize !=
host psize case for migration now...

Thanks,
Keqian Zhu Dec. 14, 2020, 2:14 a.m. UTC | #12
On 2020/12/11 23:25, Peter Xu wrote:
> On Fri, Dec 11, 2020 at 09:13:10AM +0800, zhukeqian wrote:
>>
>> On 2020/12/10 22:50, Peter Xu wrote:
>>> On Thu, Dec 10, 2020 at 10:53:23AM +0800, zhukeqian wrote:
>>>>
>>>>
>>>> On 2020/12/10 10:08, Peter Xu wrote:
>>>>> Keqian,
>>>>>
>>>>> On Thu, Dec 10, 2020 at 09:46:06AM +0800, zhukeqian wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I see that if start or size is not PAGE aligned, it also clears areas
>>>>>> which beyond caller's expectation, so do we also need to consider this?
>>>>>
>>>>> Could you elaborate?
>>>>>
>>>>> If start_delta != 0, kvm_log_clear_one_slot() should already go the slow path.
>>>>>
>>>>> Thanks,
>>>>>
>>>>
>>>> Hi Peter,
>>>>
>>>> start_delta /= psize;
>>>>
>>>> If start is not PAGE aligned, then start_delta is not PAGE aligned.
>>>> so I think the above code will implicitly extend our start to be PAGE aligned.
>>>>
>>>> I suggest that we should shrink the start and (start + size) to be PAGE aligned
>>>> at beginning of this function.
>>>
>>> Callers should be with TARGET_PAGE_SIZE aligned on the size, so at least x86_64
>>> should be pretty safe since host/guest page sizes match.
>>>
>>> Though indeed I must confess I don't know how it worked in general when host
>>> page size != target page size, at least for migration.  For example, I believe
>>> kvm dirty logging is host page size based, though migration should be migrating
>>> pages in guest page size granule when it spots a dirty bit set.
>>>
>> Hi,
>>
>> Indeed, we handle target_page_size aligned @start and @size in general. Maybe we'd better
>> add explicit function comments about alignment requirement, and explicit alignment assert
>> on @start and @size?
> 

Hi Peter,

> Yes we can, but I think it's not strongly necessary.  As Zenghui pointed out,
> the callers of memory_region_clear_dirty_bitmap() should always be aware of the
> fact that dirty bitmap is always page size based.
Agree.

> 
> OTOH I'm more worried on the other question on how we handle guest psize !=
> host psize case for migration now...
I think it does not matter when guest_psize != host_psize, as we only need to interact with
stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even
if guest close stage1, we also can do a successful migration.

Please point out if I misunderstood what you meant.

Thanks,
Keqian

> 
> Thanks,
>
Peter Xu Dec. 14, 2020, 3:36 p.m. UTC | #13
On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote:

[...]

> >>> Though indeed I must confess I don't know how it worked in general when host
> >>> page size != target page size, at least for migration.  For example, I believe
> >>> kvm dirty logging is host page size based, though migration should be migrating
> >>> pages in guest page size granule when it spots a dirty bit set.

[1]

> Hi Peter,

Keqian,

> > OTOH I'm more worried on the other question on how we handle guest psize !=
> > host psize case for migration now...
> I think it does not matter when guest_psize != host_psize, as we only need to interact with
> stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even
> if guest close stage1, we also can do a successful migration.

I don't know why 2-stage matters here, since I believe KVM can track dirty
pages either using two dimentional paging or shadowing, however it's always
done in host small page size.  The question I'm confused is there seems to have
a size mismatch between qemu migration and what kvm does [1].  For example, how
migration works on ARM64 where host has psize==4K while guest has psize=64K.

Thanks,
Keqian Zhu Dec. 15, 2020, 7:23 a.m. UTC | #14
On 2020/12/14 23:36, Peter Xu wrote:
> On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote:
> 
> [...]
> 
>>>>> Though indeed I must confess I don't know how it worked in general when host
>>>>> page size != target page size, at least for migration.  For example, I believe
>>>>> kvm dirty logging is host page size based, though migration should be migrating
>>>>> pages in guest page size granule when it spots a dirty bit set.
> 
> [1]
> 
>> Hi Peter,
> 
> Keqian,
> 
>>> OTOH I'm more worried on the other question on how we handle guest psize !=
>>> host psize case for migration now...
>> I think it does not matter when guest_psize != host_psize, as we only need to interact with
>> stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even
>> if guest close stage1, we also can do a successful migration.
> 
> I don't know why 2-stage matters here, since I believe KVM can track dirty
> pages either using two dimentional paging or shadowing, however it's always
> done in host small page size.  The question I'm confused is there seems to have
> a size mismatch between qemu migration and what kvm does [1].  For example, how
> migration works on ARM64 where host has psize==4K while guest has psize=64K.
> 
Hi Peter,

OK, I got it ;-) Do you mean qemu_real_host_page_size != TARGET_PAGE_SIZE?
After my analysis, I see that when qemu_real_host_page_size != TARGET_PAGE_SIZE,
there are some problems indeed. I have send out some patches, please check whether they solve this
problem, thanks!

Keqian.

> Thanks,
>
Zenghui Yu Dec. 15, 2020, 7:39 a.m. UTC | #15
Hi Keqian, Peter,

On 2020/12/15 15:23, zhukeqian wrote:
> 
> On 2020/12/14 23:36, Peter Xu wrote:
>> On Mon, Dec 14, 2020 at 10:14:11AM +0800, zhukeqian wrote:
>>
>> [...]
>>
>>>>>> Though indeed I must confess I don't know how it worked in general when host
>>>>>> page size != target page size, at least for migration.  For example, I believe
>>>>>> kvm dirty logging is host page size based, though migration should be migrating
>>>>>> pages in guest page size granule when it spots a dirty bit set.
>>
>> [1]
>>
>>> Hi Peter,
>>
>> Keqian,
>>
>>>> OTOH I'm more worried on the other question on how we handle guest psize !=
>>>> host psize case for migration now...
>>> I think it does not matter when guest_psize != host_psize, as we only need to interact with
>>> stage2 page tables during migration. Stage2 is enough to tracking guest dirty memory, and even
>>> if guest close stage1, we also can do a successful migration.
>>
>> I don't know why 2-stage matters here, since I believe KVM can track dirty
>> pages either using two dimentional paging or shadowing, however it's always
>> done in host small page size.  The question I'm confused is there seems to have
>> a size mismatch between qemu migration and what kvm does [1].  For example, how
>> migration works on ARM64 where host has psize==4K while guest has psize=64K.
>>
> Hi Peter,
> 
> OK, I got it ;-) Do you mean qemu_real_host_page_size != TARGET_PAGE_SIZE?
> After my analysis, I see that when qemu_real_host_page_size != TARGET_PAGE_SIZE,
> there are some problems indeed. I have send out some patches, please check whether they solve this
> problem, thanks!

Now I see what your concern is :) Thanks both for the explanation and
the further fix!


Zenghui
diff mbox series

Patch

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index bed2455ca5..05d323ba1f 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -747,7 +747,7 @@  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
     assert(bmap_start % BITS_PER_LONG == 0);
     /* We should never do log_clear before log_sync */
     assert(mem->dirty_bmap);
-    if (start_delta) {
+    if (start_delta || bmap_npages - size / psize) {
         /* Slow path - we need to manipulate a temp bitmap */
         bmap_clear = bitmap_new(bmap_npages);
         bitmap_copy_with_src_offset(bmap_clear, mem->dirty_bmap,
@@ -760,7 +760,10 @@  static int kvm_log_clear_one_slot(KVMSlot *mem, int as_id, uint64_t start,
         bitmap_clear(bmap_clear, 0, start_delta);
         d.dirty_bitmap = bmap_clear;
     } else {
-        /* Fast path - start address aligns well with BITS_PER_LONG */
+        /*
+         * Fast path - both start and size align well with BITS_PER_LONG
+         * (or the end of memory slot)
+         */
         d.dirty_bitmap = mem->dirty_bmap + BIT_WORD(bmap_start);
     }