Patchwork memory: synchronize dirty bitmap before unmapping a range

login
register
mail settings
Submitter Avi Kivity
Date July 31, 2011, 7:47 p.m.
Message ID <1312141678-5141-1-git-send-email-avi@redhat.com>
Download mbox | patch
Permalink /patch/107630/
State New
Headers show

Comments

Avi Kivity - July 31, 2011, 7:47 p.m.
When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
dirty bitmap to avoid losing information forever.

Fixes grub2 screen update.

Signed-off-by: Avi Kivity <avi@redhat.com>
---

Please apply before the PCI batch to avoid bisectability issues (and don't
pull, since that removes ordering).

 memory.c |    4 ++++
 1 files changed, 4 insertions(+), 0 deletions(-)
Jan Kiszka - Aug. 1, 2011, 7:34 a.m.
On 2011-07-31 21:47, Avi Kivity wrote:
> When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
> dirty bitmap to avoid losing information forever.
> 
> Fixes grub2 screen update.

I does.

But something is still broken. As I reported before, the performance of
grub2 startup is an order of magnitude slower than with the existing
code. According to ftrace, we are getting tons of additional
EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
difference yet. The effective slot setup as communicated to kvm looks
innocent.

Jan
Jan Kiszka - Aug. 1, 2011, 7:52 a.m.
On 2011-08-01 09:34, Jan Kiszka wrote:
> On 2011-07-31 21:47, Avi Kivity wrote:
>> When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
>> dirty bitmap to avoid losing information forever.
>>
>> Fixes grub2 screen update.
> 
> I does.
> 
> But something is still broken. As I reported before, the performance of
> grub2 startup is an order of magnitude slower than with the existing
> code. According to ftrace, we are getting tons of additional
> EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
> difference yet. The effective slot setup as communicated to kvm looks
> innocent.

I take it back: We obviously once in a while resume the guest with the
vga segment unmapped. And that, of course, ends up doing mmio instead of
plain ram accesses.

Jan
Avi Kivity - Aug. 1, 2011, 8:16 a.m.
On 08/01/2011 10:52 AM, Jan Kiszka wrote:
> On 2011-08-01 09:34, Jan Kiszka wrote:
> >  On 2011-07-31 21:47, Avi Kivity wrote:
> >>  When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
> >>  dirty bitmap to avoid losing information forever.
> >>
> >>  Fixes grub2 screen update.
> >
> >  I does.
> >
> >  But something is still broken. As I reported before, the performance of
> >  grub2 startup is an order of magnitude slower than with the existing
> >  code. According to ftrace, we are getting tons of additional
> >  EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
> >  difference yet. The effective slot setup as communicated to kvm looks
> >  innocent.
>
> I take it back: We obviously once in a while resume the guest with the
> vga segment unmapped. And that, of course, ends up doing mmio instead of
> plain ram accesses.
>

qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think 
they're even faster than upstream, but perhaps I'm not objective).
Jan Kiszka - Aug. 1, 2011, 9:05 a.m.
On 2011-08-01 10:16, Avi Kivity wrote:
> On 08/01/2011 10:52 AM, Jan Kiszka wrote:
>> On 2011-08-01 09:34, Jan Kiszka wrote:
>> >  On 2011-07-31 21:47, Avi Kivity wrote:
>> >>  When a range is being unmapped, ask accelerators (e.g. kvm) to
>> synchronize the
>> >>  dirty bitmap to avoid losing information forever.
>> >>
>> >>  Fixes grub2 screen update.
>> >
>> >  I does.
>> >
>> >  But something is still broken. As I reported before, the
>> performance of
>> >  grub2 startup is an order of magnitude slower than with the existing
>> >  code. According to ftrace, we are getting tons of additional
>> >  EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
>> >  difference yet. The effective slot setup as communicated to kvm looks
>> >  innocent.
>>
>> I take it back: We obviously once in a while resume the guest with the
>> vga segment unmapped. And that, of course, ends up doing mmio instead of
>> plain ram accesses.
>>
> 
> qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think
> they're even faster than upstream, but perhaps I'm not objective).
> 

Just updated to the latest memory-region branch - how did you test it?
It does not link here due to forgotten rwhandler in Makefile.target.

Anyway, that commit has no impact on the issue I'm seeing. I'm also
carrying transaction changes for cirrus here, but they have no
noticeable impact. That indicates that the new API is not actually slow,
it likely just has some bug.

Jan
Avi Kivity - Aug. 1, 2011, 9:30 a.m.
On 08/01/2011 12:05 PM, Jan Kiszka wrote:
> On 2011-08-01 10:16, Avi Kivity wrote:
> >  On 08/01/2011 10:52 AM, Jan Kiszka wrote:
> >>  On 2011-08-01 09:34, Jan Kiszka wrote:
> >>  >   On 2011-07-31 21:47, Avi Kivity wrote:
> >>  >>   When a range is being unmapped, ask accelerators (e.g. kvm) to
> >>  synchronize the
> >>  >>   dirty bitmap to avoid losing information forever.
> >>  >>
> >>  >>   Fixes grub2 screen update.
> >>  >
> >>  >   I does.
> >>  >
> >>  >   But something is still broken. As I reported before, the
> >>  performance of
> >>  >   grub2 startup is an order of magnitude slower than with the existing
> >>  >   code. According to ftrace, we are getting tons of additional
> >>  >   EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
> >>  >   difference yet. The effective slot setup as communicated to kvm looks
> >>  >   innocent.
> >>
> >>  I take it back: We obviously once in a while resume the guest with the
> >>  vga segment unmapped. And that, of course, ends up doing mmio instead of
> >>  plain ram accesses.
> >>
> >
> >  qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think
> >  they're even faster than upstream, but perhaps I'm not objective).
> >
>
> Just updated to the latest memory-region branch - how did you test it?

opensuse 11.4

> It does not link here due to forgotten rwhandler in Makefile.target.
>

I probably still have it in my tree, will update and re-push.

> Anyway, that commit has no impact on the issue I'm seeing. I'm also
> carrying transaction changes for cirrus here, but they have no
> noticeable impact. That indicates that the new API is not actually slow,
> it likely just has some bug.

It should be visible from the memory map dump.
Avi Kivity - Aug. 1, 2011, 9:32 a.m.
On 08/01/2011 12:30 PM, Avi Kivity wrote:
>
>> It does not link here due to forgotten rwhandler in Makefile.target.
>>
>
> I probably still have it in my tree, will update and re-push.

done now.
Avi Kivity - Aug. 1, 2011, 9:45 a.m.
On 08/01/2011 12:05 PM, Jan Kiszka wrote:
> On 2011-08-01 10:16, Avi Kivity wrote:
> >  On 08/01/2011 10:52 AM, Jan Kiszka wrote:
> >>  On 2011-08-01 09:34, Jan Kiszka wrote:
> >>  >   On 2011-07-31 21:47, Avi Kivity wrote:
> >>  >>   When a range is being unmapped, ask accelerators (e.g. kvm) to
> >>  synchronize the
> >>  >>   dirty bitmap to avoid losing information forever.
> >>  >>
> >>  >>   Fixes grub2 screen update.
> >>  >
> >>  >   I does.
> >>  >
> >>  >   But something is still broken. As I reported before, the
> >>  performance of
> >>  >   grub2 startup is an order of magnitude slower than with the existing
> >>  >   code. According to ftrace, we are getting tons of additional
> >>  >   EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
> >>  >   difference yet. The effective slot setup as communicated to kvm looks
> >>  >   innocent.
> >>
> >>  I take it back: We obviously once in a while resume the guest with the
> >>  vga segment unmapped. And that, of course, ends up doing mmio instead of
> >>  plain ram accesses.
> >>
> >
> >  qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think
> >  they're even faster than upstream, but perhaps I'm not objective).
> >
>
> Just updated to the latest memory-region branch - how did you test it?
> It does not link here due to forgotten rwhandler in Makefile.target.
>
> Anyway, that commit has no impact on the issue I'm seeing. I'm also
> carrying transaction changes for cirrus here, but they have no
> noticeable impact. That indicates that the new API is not actually slow,
> it likely just has some bug.

Here's the log of range changes while in grub2:

adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 40000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 20000 ram 40040000
dropping a0000-affff
adding a0000-affff offset 30000 ram 40040000

Note that drop/add is always paired (i.e. the guest never sees an 
unmapped area), and we always map the full 64k even though cirrus code 
manages each 32k bank individually.  It looks optimal... we're probably 
not testing the same thing (either qemu or guest code).
Jan Kiszka - Aug. 1, 2011, 10:21 a.m.
On 2011-08-01 11:45, Avi Kivity wrote:
> On 08/01/2011 12:05 PM, Jan Kiszka wrote:
>> On 2011-08-01 10:16, Avi Kivity wrote:
>>>  On 08/01/2011 10:52 AM, Jan Kiszka wrote:
>>>>  On 2011-08-01 09:34, Jan Kiszka wrote:
>>>>  >   On 2011-07-31 21:47, Avi Kivity wrote:
>>>>  >>   When a range is being unmapped, ask accelerators (e.g. kvm) to
>>>>  synchronize the
>>>>  >>   dirty bitmap to avoid losing information forever.
>>>>  >>
>>>>  >>   Fixes grub2 screen update.
>>>>  >
>>>>  >   I does.
>>>>  >
>>>>  >   But something is still broken. As I reported before, the
>>>>  performance of
>>>>  >   grub2 startup is an order of magnitude slower than with the existing
>>>>  >   code. According to ftrace, we are getting tons of additional
>>>>  >   EPT_MISCONFIG exits over the 0xA0000 segment. But I haven't spot the
>>>>  >   difference yet. The effective slot setup as communicated to kvm looks
>>>>  >   innocent.
>>>>
>>>>  I take it back: We obviously once in a while resume the guest with the
>>>>  vga segment unmapped. And that, of course, ends up doing mmio instead of
>>>>  plain ram accesses.
>>>>
>>>
>>>  qemu-kvm.git 6b5956c573 and its predecessor fix the issue (and I think
>>>  they're even faster than upstream, but perhaps I'm not objective).
>>>
>>
>> Just updated to the latest memory-region branch - how did you test it?
>> It does not link here due to forgotten rwhandler in Makefile.target.
>>
>> Anyway, that commit has no impact on the issue I'm seeing. I'm also
>> carrying transaction changes for cirrus here, but they have no
>> noticeable impact. That indicates that the new API is not actually slow,
>> it likely just has some bug.
> 
> Here's the log of range changes while in grub2:
> 
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 40000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 20000 ram 40040000
> dropping a0000-affff
> adding a0000-affff offset 30000 ram 40040000

I saw this as well and thought it should be fine. But it does not tell
you what is currently active when the guest runs.

> 
> Note that drop/add is always paired (i.e. the guest never sees an 
> unmapped area), and we always map the full 64k even though cirrus code 
> manages each 32k bank individually.  It looks optimal... we're probably 
> not testing the same thing (either qemu or guest code).

This is what my instrumentation revealed:

map_linear_vram_bank 0
map 0				(actually perform the mapping)
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a62000 1	(KVM_SET_USER_MEMORY_REGION)
4 a0000 10000 7fe863a72000 1
run				(enter guest)
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a72000 1
4 a0000 10000 7fe863a62000 1
run
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
4 a0000 0 7fe863a62000 1
run
map_linear_vram_bank 0
map 0
map_linear_vram_bank 1
map 1
run

So we suddenly get out of sync and enter the guest with an unmapped vram
segment. I takes a long time (in number of map changes) until the region
becomes mapped again.

Jan
Avi Kivity - Aug. 1, 2011, 10:27 a.m.
On 08/01/2011 01:21 PM, Jan Kiszka wrote:
> >
> >  Note that drop/add is always paired (i.e. the guest never sees an
> >  unmapped area), and we always map the full 64k even though cirrus code
> >  manages each 32k bank individually.  It looks optimal... we're probably
> >  not testing the same thing (either qemu or guest code).
>
> This is what my instrumentation revealed:
>
> map_linear_vram_bank 0
> map 0				(actually perform the mapping)
> map_linear_vram_bank 1
> map 1
> 4 a0000 0 7fe863a62000 1	(KVM_SET_USER_MEMORY_REGION)
> 4 a0000 10000 7fe863a72000 1
> run				(enter guest)
> map_linear_vram_bank 0
> map 0
> map_linear_vram_bank 1
> map 1
> 4 a0000 0 7fe863a72000 1
> 4 a0000 10000 7fe863a62000 1
> run
> map_linear_vram_bank 0
> map 0
> map_linear_vram_bank 1
> map 1
> 4 a0000 0 7fe863a62000 1
> run
> map_linear_vram_bank 0
> map 0
> map_linear_vram_bank 1
> map 1
> run
>
> So we suddenly get out of sync and enter the guest with an unmapped vram
> segment. I takes a long time (in number of map changes) until the region
> becomes mapped again.

I'll try to reproduce.  Yes, it looks like a bug in the core, perhaps in 
the symmetric-difference code.
Anthony Liguori - Aug. 5, 2011, 4:47 p.m.
On 07/31/2011 02:47 PM, Avi Kivity wrote:
> When a range is being unmapped, ask accelerators (e.g. kvm) to synchronize the
> dirty bitmap to avoid losing information forever.
>
> Fixes grub2 screen update.
>
> Signed-off-by: Avi Kivity<avi@redhat.com>

Applied.  Thanks.

Regards,

Anthony Liguori

> ---
>
> Please apply before the PCI batch to avoid bisectability issues (and don't
> pull, since that removes ordering).
>
>   memory.c |    4 ++++
>   1 files changed, 4 insertions(+), 0 deletions(-)
>
> diff --git a/memory.c b/memory.c
> index 5c6e63d..5f20320 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -245,6 +245,10 @@ static void as_memory_range_add(AddressSpace *as, FlatRange *fr)
>
>   static void as_memory_range_del(AddressSpace *as, FlatRange *fr)
>   {
> +    if (fr->dirty_log_mask) {
> +        cpu_physical_sync_dirty_bitmap(fr->addr.start,
> +                                       fr->addr.start + fr->addr.size);
> +    }
>       cpu_register_physical_memory(fr->addr.start, fr->addr.size,
>                                    IO_MEM_UNASSIGNED);
>   }

Patch

diff --git a/memory.c b/memory.c
index 5c6e63d..5f20320 100644
--- a/memory.c
+++ b/memory.c
@@ -245,6 +245,10 @@  static void as_memory_range_add(AddressSpace *as, FlatRange *fr)
 
 static void as_memory_range_del(AddressSpace *as, FlatRange *fr)
 {
+    if (fr->dirty_log_mask) {
+        cpu_physical_sync_dirty_bitmap(fr->addr.start,
+                                       fr->addr.start + fr->addr.size);
+    }
     cpu_register_physical_memory(fr->addr.start, fr->addr.size,
                                  IO_MEM_UNASSIGNED);
 }