Patchwork [V3,WIP,3/3] disable vhost_verify_ring_mappings check

login
register
mail settings
Submitter Michael S. Tsirkin
Date April 2, 2013, 1:27 p.m.
Message ID <20130402132729.GK21545@redhat.com>
Download mbox | patch
Permalink /patch/232993/
State New
Headers show

Comments

Michael S. Tsirkin - April 2, 2013, 1:27 p.m.
On Mon, Apr 01, 2013 at 06:05:47PM -0700, Nicholas A. Bellinger wrote:
> On Fri, 2013-03-29 at 09:14 +0100, Paolo Bonzini wrote: 
> > Il 29/03/2013 03:53, Nicholas A. Bellinger ha scritto:
> > > On Thu, 2013-03-28 at 06:13 -0400, Paolo Bonzini wrote:
> > >>> I think it's the right thing to do, but maybe not the right place
> > >>> to do this, need to reset after all IO is done, before
> > >>> ring memory is write protected.
> > >>
> > >> Our emails are crossing each other unfortunately, but I want to
> > >> reinforce this: ring memory is not write protected.
> > > 
> > > Understood.  However, AFAICT the act of write protecting these ranges
> > > for ROM generates the offending callbacks to vhost_set_memory().
> > > 
> > > The part that I'm missing is if ring memory is not being write protected
> > > by make_bios_readonly_intel(), why are the vhost_set_memory() calls
> > > being invoked..?
> > 
> > Because mappings change for the region that contains the ring.  vhost
> > doesn't know yet that the changes do not affect ring memory,
> > vhost_set_memory() is called exactly to ascertain that.
> > 
> 
> Hi Paolo & Co,
> 
> Here's a bit more information on what is going on with the same
> cpu_physical_memory_map() failure in vhost_verify_ring_mappings()..
> 
> So as before, at the point that seabios is marking memory as readonly
> for ROM in src/shadow.c:make_bios_readonly_intel() with the following
> call:
> 
> Calling pci_config_writeb(0x31): bdf: 0x0000 pam: 0x0000005b
> 
> the memory API update hook triggers back into vhost_region_del() code,
> and following occurs:
> 
> Entering vhost_region_del section: 0x7fd30a213b60 offset_within_region: 0xc0000 size: 2146697216 readonly: 0
> vhost_region_del: is_rom: 0, rom_device: 0
> vhost_region_del: readable: 1
> vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
> vhost_region_del: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd30a213b60 add: 0, dev->started: 1
> Entering verify_ring_mappings: start_addr 0x00000000000c0000 size: 2146697216
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l: 5124
> address_space_map: addr: 0xed000, plen: 5124
> address_space_map: l: 4096, len: 5124
> phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> address_space_map: section: 0x7fd30fabaed0 memory_region_is_ram: 0 readonly: 0
> address_space_map: section: 0x7fd30fabaed0 offset_within_region: 0x0 section size: 18446744073709551615
> Unable to map ring buffer for ring 2, l: 4096
> 
> So the interesting part is that phys_page_find() is not able to locate
> the corresponding page for vq->ring_phys: 0xed000 from the
> vhost_region_del() callback with section->offset_within_region:
> 0xc0000..
> 
> Is there any case where this would not be considered a bug..? 
> 
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> Entering vhost_region_add section: 0x7fd30a213aa0 offset_within_region: 0xc0000 size: 32768 readonly: 1
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd30a213aa0 add: 1, dev->started: 1
> Entering verify_ring_mappings: start_addr 0x00000000000c0000 size: 32768
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> verify_ring_mappings: Got !ranges_overlap, skipping
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> Entering vhost_region_add section: 0x7fd30a213aa0 offset_within_region: 0xc8000 size: 2146664448 readonly: 0
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd30a213aa0 add: 1, dev->started: 1
> Entering verify_ring_mappings: start_addr 0x00000000000c8000 size: 2146664448
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l: 5124
> address_space_map: addr: 0xed000, plen: 5124
> address_space_map: l: 4096, len: 5124
> address_space_map: section: 0x7fd30fabb020 memory_region_is_ram: 1 readonly: 0
> address_space_map: section: 0x7fd30fabb020 offset_within_region: 0xc8000 section size: 2146664448
> address_space_map: l: 4096, len: 1028
> address_space_map: section: 0x7fd30fabb020 memory_region_is_ram: 1 readonly: 0
> address_space_map: section: 0x7fd30fabb020 offset_within_region: 0xc8000 section size: 2146664448
> address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> 
> So here the vhost_region_add() callback for
> section->offset_within_region: 0xc8000 for vq->ring_phys: 0xed000 is
> able to locate *section via phys_page_find() within address_space_map(),
> and cpu_physical_memory_map() completes as expected..
> 
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> 
> So while plodding my way through the memory API, the thing that would be
> useful to know is if the offending *section that is missing for the
> first phys_page_find() call is getting removed before the callback makes
> it's way into vhost_verify_ring_mappings() code, or that some other bug
> is occuring..?
> 
> Any idea on how this could be verified..?
> 
> Thanks,
> 
> --nab

Is it possible that what is going on here,
is that we had a region at address 0x0 size 0x80000000,
and now a chunk from it is being made readonly,
and to this end the whole old region is removed
then new ones are added?

If yes maybe the problem is that we don't use the atomic
begin/commit ops in the memory API.
Maybe the following will help?
Completely untested, posting just to give you the idea:

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

---
Nicholas A. Bellinger - April 3, 2013, 4:04 a.m.
On Tue, 2013-04-02 at 16:27 +0300, Michael S. Tsirkin wrote:
> On Mon, Apr 01, 2013 at 06:05:47PM -0700, Nicholas A. Bellinger wrote:
> > On Fri, 2013-03-29 at 09:14 +0100, Paolo Bonzini wrote: 
> > > Il 29/03/2013 03:53, Nicholas A. Bellinger ha scritto:
> > > > On Thu, 2013-03-28 at 06:13 -0400, Paolo Bonzini wrote:
> > > >>> I think it's the right thing to do, but maybe not the right place
> > > >>> to do this, need to reset after all IO is done, before
> > > >>> ring memory is write protected.
> > > >>
> > > >> Our emails are crossing each other unfortunately, but I want to
> > > >> reinforce this: ring memory is not write protected.
> > > > 
> > > > Understood.  However, AFAICT the act of write protecting these ranges
> > > > for ROM generates the offending callbacks to vhost_set_memory().
> > > > 
> > > > The part that I'm missing is if ring memory is not being write protected
> > > > by make_bios_readonly_intel(), why are the vhost_set_memory() calls
> > > > being invoked..?
> > > 
> > > Because mappings change for the region that contains the ring.  vhost
> > > doesn't know yet that the changes do not affect ring memory,
> > > vhost_set_memory() is called exactly to ascertain that.
> > > 
> > 
> > Hi Paolo & Co,
> > 
> > Here's a bit more information on what is going on with the same
> > cpu_physical_memory_map() failure in vhost_verify_ring_mappings()..
> > 
> > So as before, at the point that seabios is marking memory as readonly
> > for ROM in src/shadow.c:make_bios_readonly_intel() with the following
> > call:
> > 
> > Calling pci_config_writeb(0x31): bdf: 0x0000 pam: 0x0000005b
> > 
> > the memory API update hook triggers back into vhost_region_del() code,
> > and following occurs:
> > 
> > Entering vhost_region_del section: 0x7fd30a213b60 offset_within_region: 0xc0000 size: 2146697216 readonly: 0
> > vhost_region_del: is_rom: 0, rom_device: 0
> > vhost_region_del: readable: 1
> > vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
> > vhost_region_del: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd30a213b60 add: 0, dev->started: 1
> > Entering verify_ring_mappings: start_addr 0x00000000000c0000 size: 2146697216
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> > verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l: 5124
> > address_space_map: addr: 0xed000, plen: 5124
> > address_space_map: l: 4096, len: 5124
> > phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> > address_space_map: section: 0x7fd30fabaed0 memory_region_is_ram: 0 readonly: 0
> > address_space_map: section: 0x7fd30fabaed0 offset_within_region: 0x0 section size: 18446744073709551615
> > Unable to map ring buffer for ring 2, l: 4096
> > 
> > So the interesting part is that phys_page_find() is not able to locate
> > the corresponding page for vq->ring_phys: 0xed000 from the
> > vhost_region_del() callback with section->offset_within_region:
> > 0xc0000..
> > 
> > Is there any case where this would not be considered a bug..? 
> > 
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > Entering vhost_region_add section: 0x7fd30a213aa0 offset_within_region: 0xc0000 size: 32768 readonly: 1
> > vhost_region_add: is_rom: 0, rom_device: 0
> > vhost_region_add: readable: 1
> > vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> > vhost_region_add: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd30a213aa0 add: 1, dev->started: 1
> > Entering verify_ring_mappings: start_addr 0x00000000000c0000 size: 32768
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> > verify_ring_mappings: Got !ranges_overlap, skipping
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > Entering vhost_region_add section: 0x7fd30a213aa0 offset_within_region: 0xc8000 size: 2146664448 readonly: 0
> > vhost_region_add: is_rom: 0, rom_device: 0
> > vhost_region_add: readable: 1
> > vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> > vhost_region_add: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd30a213aa0 add: 1, dev->started: 1
> > Entering verify_ring_mappings: start_addr 0x00000000000c8000 size: 2146664448
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0x0 ring_size: 0
> > verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> > verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l: 5124
> > address_space_map: addr: 0xed000, plen: 5124
> > address_space_map: l: 4096, len: 5124
> > address_space_map: section: 0x7fd30fabb020 memory_region_is_ram: 1 readonly: 0
> > address_space_map: section: 0x7fd30fabb020 offset_within_region: 0xc8000 section size: 2146664448
> > address_space_map: l: 4096, len: 1028
> > address_space_map: section: 0x7fd30fabb020 memory_region_is_ram: 1 readonly: 0
> > address_space_map: section: 0x7fd30fabb020 offset_within_region: 0xc8000 section size: 2146664448
> > address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> > address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> > 
> > So here the vhost_region_add() callback for
> > section->offset_within_region: 0xc8000 for vq->ring_phys: 0xed000 is
> > able to locate *section via phys_page_find() within address_space_map(),
> > and cpu_physical_memory_map() completes as expected..
> > 
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > register_multipage : d: 0x7fd30f7d0ed0 section: 0x7fd30a2139b0
> > phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> > 
> > So while plodding my way through the memory API, the thing that would be
> > useful to know is if the offending *section that is missing for the
> > first phys_page_find() call is getting removed before the callback makes
> > it's way into vhost_verify_ring_mappings() code, or that some other bug
> > is occuring..?
> > 
> > Any idea on how this could be verified..?
> > 
> > Thanks,
> > 
> > --nab
> 
> Is it possible that what is going on here,
> is that we had a region at address 0x0 size 0x80000000,
> and now a chunk from it is being made readonly,
> and to this end the whole old region is removed
> then new ones are added?

Yes, I believe this is exactly what is happening..

> 
> If yes maybe the problem is that we don't use the atomic
> begin/commit ops in the memory API.
> Maybe the following will help?
> Completely untested, posting just to give you the idea:
> 

Mmmm, one question on how vhost_region_del() + vhost_region_add() +
vhost_commit() should work..

Considering the following when the same seabios code snippet:

   pci_config_writeb(0x31): bdf: 0x0000 pam: 0x0000005b

is executed to mark an pc.ram area 0xc0000 as readonly:

Entering vhost_begin >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Entering vhost_region_del section: 0x7fd037a4bb60 offset_within_region: 0xc0000 size: 2146697216 readonly: 0
vhost_region_del: is_rom: 0, rom_device: 0
vhost_region_del: readable: 1
vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
vhost_region_del: name: pc.ram
Entering vhost_set_memory, section: 0x7fd037a4bb60 add: 0, dev->started: 1
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region: 0xc0000 size: 32768 readonly: 1
vhost_region_add is readonly !!!!!!!!!!!!!!!!!!!
vhost_region_add: is_rom: 0, rom_device: 0
vhost_region_add: readable: 1
vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
vhost_region_add: name: pc.ram
Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
vhost_dev_assign_memory(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> reg->guest_phys_addr: 0xc0000
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region: 0xc8000 size: 2146664448 readonly: 0
vhost_region_add: is_rom: 0, rom_device: 0
vhost_region_add: readable: 1
vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
vhost_region_add: name: pc.ram
Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc8000
phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>

Note that originally we'd see the cpu_physical_memory_map() failure in
vhost_verify_ring_mappings() after the first ->region_del() above.

Adding a hardcoded cpu_physical_memory_map() testcase in vhost_commit()
for phys_addr=0xed000, len=5124 (vq ring) does locate the correct
*section from address_space_map(), which correct points to the section
generated by the last vhost_region_add() above:

Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
address_space_map: addr: 0xed000, plen: 5124
address_space_map: l: 4096, len: 5124
address_space_map: section: 0x7f41b325f020 memory_region_is_ram: 1 readonly: 0
address_space_map: section: 0x7f41b325f020 offset_within_region: 0xc8000 section size: 2146664448
address_space_map: l: 4096, len: 1028
address_space_map: section: 0x7f41b325f020 memory_region_is_ram: 1 readonly: 0
address_space_map: section: 0x7f41b325f020 offset_within_region: 0xc8000 section size: 2146664448
address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
cpu_physical_memory_map(0xed000) got l: 5124

So, does using a ->commit callback for MemoryListener  mean that
vhost_verify_ring_mappings() is OK to be called only from the final
->commit callback, and not from each ->region_del + ->region_add
callback..?   Eg: I seem to recall something about
vhost_verify_ring_mappings() being called during each ->region_del()
when dev->started == true was important, no..?

If this OK, then it seems a matter of keeping an updated bit for each of
the regions in vhost_dev->mem_sections[] and performing the
vhost_verify_ring_mappings() on all three above during the final
->commit() call, right..?

WDYT..?

--nab

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ---
> 
> diff --git a/hw/vhost.c b/hw/vhost.c
> index 4d6aee3..716cfaa 100644
> --- a/hw/vhost.c
> +++ b/hw/vhost.c
> @@ -416,6 +416,25 @@ static void vhost_set_memory(MemoryListener *listener,
>          /* Remove old mapping for this memory, if any. */
>          vhost_dev_unassign_memory(dev, start_addr, size);
>      }
> +    dev->memory_changed = true;
> +}
> +
> +static bool vhost_section(MemoryRegionSection *section)
> +{
> +    return memory_region_is_ram(section->mr);
> +}
> +
> +static void vhost_begin(MemoryListener *listener)
> +{
> +}
> +
> +static void vhost_commit(MemoryListener *listener)
> +{
> +    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
> +                                         memory_listener);
> +    if (!dev->memory_changed) {
> +        return;
> +    }
>  
>      if (!dev->started) {
>          return;
> @@ -445,19 +464,7 @@ static void vhost_set_memory(MemoryListener *listener,
>      if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
>          vhost_dev_log_resize(dev, log_size);
>      }
> -}
> -
> -static bool vhost_section(MemoryRegionSection *section)
> -{
> -    return memory_region_is_ram(section->mr);
> -}
> -
> -static void vhost_begin(MemoryListener *listener)
> -{
> -}
> -
> -static void vhost_commit(MemoryListener *listener)
> -{
> +    dev->memory_changed = false;
>  }
>  
>  static void vhost_region_add(MemoryListener *listener,
> @@ -842,6 +849,7 @@ int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
>      hdev->log_size = 0;
>      hdev->log_enabled = false;
>      hdev->started = false;
> +    hdev->memory_changed = false;
>      memory_listener_register(&hdev->memory_listener, &address_space_memory);
>      hdev->force = force;
>      return 0;
> 
> --
> To unsubscribe from this list: send the line "unsubscribe target-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicholas A. Bellinger - April 3, 2013, 4:59 a.m.
On Tue, 2013-04-02 at 21:04 -0700, Nicholas A. Bellinger wrote:
> On Tue, 2013-04-02 at 16:27 +0300, Michael S. Tsirkin wrote:
> > On Mon, Apr 01, 2013 at 06:05:47PM -0700, Nicholas A. Bellinger wrote:
> > > On Fri, 2013-03-29 at 09:14 +0100, Paolo Bonzini wrote: 
> > > > Il 29/03/2013 03:53, Nicholas A. Bellinger ha scritto:
> > > > > On Thu, 2013-03-28 at 06:13 -0400, Paolo Bonzini wrote:
> > > > >>> I think it's the right thing to do, but maybe not the right place
> > > > >>> to do this, need to reset after all IO is done, before
> > > > >>> ring memory is write protected.
> > > > >>
> > > > >> Our emails are crossing each other unfortunately, but I want to
> > > > >> reinforce this: ring memory is not write protected.
> > > > > 
> > > > > Understood.  However, AFAICT the act of write protecting these ranges
> > > > > for ROM generates the offending callbacks to vhost_set_memory().
> > > > > 
> > > > > The part that I'm missing is if ring memory is not being write protected
> > > > > by make_bios_readonly_intel(), why are the vhost_set_memory() calls
> > > > > being invoked..?
> > > > 
> > > > Because mappings change for the region that contains the ring.  vhost
> > > > doesn't know yet that the changes do not affect ring memory,
> > > > vhost_set_memory() is called exactly to ascertain that.
> > > > 

<SNIP>

> > 
> > Is it possible that what is going on here,
> > is that we had a region at address 0x0 size 0x80000000,
> > and now a chunk from it is being made readonly,
> > and to this end the whole old region is removed
> > then new ones are added?
> 
> Yes, I believe this is exactly what is happening..
> 
> > 
> > If yes maybe the problem is that we don't use the atomic
> > begin/commit ops in the memory API.
> > Maybe the following will help?
> > Completely untested, posting just to give you the idea:
> > 
> 
> Mmmm, one question on how vhost_region_del() + vhost_region_add() +
> vhost_commit() should work..
> 
> Considering the following when the same seabios code snippet:
> 
>    pci_config_writeb(0x31): bdf: 0x0000 pam: 0x0000005b
> 
> is executed to mark an pc.ram area 0xc0000 as readonly:
> 
> Entering vhost_begin >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Entering vhost_region_del section: 0x7fd037a4bb60 offset_within_region: 0xc0000 size: 2146697216 readonly: 0
> vhost_region_del: is_rom: 0, rom_device: 0
> vhost_region_del: readable: 1
> vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
> vhost_region_del: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd037a4bb60 add: 0, dev->started: 1
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
> Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region: 0xc0000 size: 32768 readonly: 1
> vhost_region_add is readonly !!!!!!!!!!!!!!!!!!!
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
> vhost_dev_assign_memory(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> reg->guest_phys_addr: 0xc0000
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
> Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region: 0xc8000 size: 2146664448 readonly: 0
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc8000
> phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
> 
> Note that originally we'd see the cpu_physical_memory_map() failure in
> vhost_verify_ring_mappings() after the first ->region_del() above.
> 
> Adding a hardcoded cpu_physical_memory_map() testcase in vhost_commit()
> for phys_addr=0xed000, len=5124 (vq ring) does locate the correct
> *section from address_space_map(), which correct points to the section
> generated by the last vhost_region_add() above:
> 
> Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
> address_space_map: addr: 0xed000, plen: 5124
> address_space_map: l: 4096, len: 5124
> address_space_map: section: 0x7f41b325f020 memory_region_is_ram: 1 readonly: 0
> address_space_map: section: 0x7f41b325f020 offset_within_region: 0xc8000 section size: 2146664448
> address_space_map: l: 4096, len: 1028
> address_space_map: section: 0x7f41b325f020 memory_region_is_ram: 1 readonly: 0
> address_space_map: section: 0x7f41b325f020 offset_within_region: 0xc8000 section size: 2146664448
> address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
> cpu_physical_memory_map(0xed000) got l: 5124
> 
> So, does using a ->commit callback for MemoryListener  mean that
> vhost_verify_ring_mappings() is OK to be called only from the final
> ->commit callback, and not from each ->region_del + ->region_add
> callback..?   Eg: I seem to recall something about
> vhost_verify_ring_mappings() being called during each ->region_del()
> when dev->started == true was important, no..?
> 
> If this OK, then it seems a matter of keeping an updated bit for each of
> the regions in vhost_dev->mem_sections[] and performing the
> vhost_verify_ring_mappings() on all three above during the final
> ->commit() call, right..?

Or even better, what about just invoking vhost_verify_ring_mappings()
once from ->commit without section start_addr+size + drop the existing
!rings_overlap check..?

Does the following look like you'd expect for marking 0xc0000 area as
read-only case..?

Entering vhost_begin >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Entering vhost_region_del section: 0x7f16d3a1db60 offset_within_region: 0xc0000 size: 2146697216 readonly: 0
vhost_region_del: is_rom: 0, rom_device: 0
vhost_region_del: readable: 1
vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
vhost_region_del: name: pc.ram
Entering vhost_set_memory, section: 0x7f16d3a1db60 add: 0, dev->started: 1
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
Entering vhost_region_add section: 0x7f16d3a1daa0 offset_within_region: 0xc0000 size: 32768 readonly: 1
vhost_region_add is readonly !!!!!!!!!!!!!!!!!!!
vhost_region_add: is_rom: 0, rom_device: 0
vhost_region_add: readable: 1
vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
vhost_region_add: name: pc.ram
Entering vhost_set_memory, section: 0x7f16d3a1daa0 add: 1, dev->started: 1
vhost_dev_assign_memory(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> reg->guest_phys_addr: 0xc0000
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
Entering vhost_region_add section: 0x7f16d3a1daa0 offset_within_region: 0xc8000 size: 2146664448 readonly: 0
vhost_region_add: is_rom: 0, rom_device: 0
vhost_region_add: readable: 1
vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size: 2147483648
vhost_region_add: name: pc.ram
Entering vhost_set_memory, section: 0x7f16d3a1daa0 add: 1, dev->started: 1
vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc8000
phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
vhost_commit, skipping vhost_verify_ring_mappings without start_addr !!!!!!!!!!!!!
Entering verify_ring_mappings: start_addr 0x0000000000000000 size: 0
verify_ring_mappings: ring_phys 0x0 ring_size: 0
verify_ring_mappings: ring_phys 0x0 ring_size: 0
verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l: 5124
address_space_map: addr: 0xed000, plen: 5124
address_space_map: l: 4096, len: 5124
address_space_map: section: 0x7f16d92c6020 memory_region_is_ram: 1 readonly: 0
address_space_map: section: 0x7f16d92c6020 offset_within_region: 0xc8000 section size: 2146664448
address_space_map: l: 4096, len: 1028
address_space_map: section: 0x7f16d92c6020 memory_region_is_ram: 1 readonly: 0
address_space_map: section: 0x7f16d92c6020 offset_within_region: 0xc8000 section size: 2146664448
address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen: 5124
Paolo Bonzini - April 3, 2013, 6:47 a.m.
> > Considering the following when the same seabios code snippet:
> > 
> >    pci_config_writeb(0x31): bdf: 0x0000 pam: 0x0000005b
> > 
> > is executed to mark an pc.ram area 0xc0000 as readonly:
> > 
> > Entering vhost_begin >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > Entering vhost_region_del section: 0x7fd037a4bb60 offset_within_region:
> > 0xc0000 size: 2146697216 readonly: 0
> > vhost_region_del: is_rom: 0, rom_device: 0
> > vhost_region_del: readable: 1
> > vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
> > vhost_region_del: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd037a4bb60 add: 0, dev->started: 1
> > vhost_set_memory: Setting dev->memory_changed = true for start_addr:
> > 0xc0000
> > Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region:
> > 0xc0000 size: 32768 readonly: 1
> > vhost_region_add is readonly !!!!!!!!!!!!!!!!!!!
> > vhost_region_add: is_rom: 0, rom_device: 0
> > vhost_region_add: readable: 1
> > vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0
> > size: 2147483648
> > vhost_region_add: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
> > vhost_dev_assign_memory(); >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > reg->guest_phys_addr: 0xc0000
> > vhost_set_memory: Setting dev->memory_changed = true for start_addr:
> > 0xc0000
> > Entering vhost_region_add section: 0x7fd037a4baa0 offset_within_region:
> > 0xc8000 size: 2146664448 readonly: 0
> > vhost_region_add: is_rom: 0, rom_device: 0
> > vhost_region_add: readable: 1
> > vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0
> > size: 2147483648
> > vhost_region_add: name: pc.ram
> > Entering vhost_set_memory, section: 0x7fd037a4baa0 add: 1, dev->started: 1
> > vhost_set_memory: Setting dev->memory_changed = true for start_addr:
> > 0xc8000
> > phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> > Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > 
> > Note that originally we'd see the cpu_physical_memory_map() failure in
> > vhost_verify_ring_mappings() after the first ->region_del() above.
> > 
> > So, does using a ->commit callback for MemoryListener  mean that
> > vhost_verify_ring_mappings() is OK to be called only from the final
> > ->commit callback, and not from each ->region_del + ->region_add
> > callback..?   Eg: I seem to recall something about
> > vhost_verify_ring_mappings() being called during each ->region_del()
> > when dev->started == true was important, no..?

It is important in the case there are only some deleted regions, and
no added region, or in general when the last callback is a ->region_del().

But it is even better to just call vhost_verify_ring_mappings() once,
from the ->region_commit() callback.

> > If this OK, then it seems a matter of keeping an updated bit for each of
> > the regions in vhost_dev->mem_sections[] and performing the
> > vhost_verify_ring_mappings() on all three above during the final
> > ->commit() call, right..?
> 
> Or even better, what about just invoking vhost_verify_ring_mappings()
> once from ->commit without section start_addr+size + drop the existing
> !rings_overlap check..?

Either drop the ranges_overlap check, or keep the start_addr/end_addr
up-to-date.  Compared to Michael's prototype patch, that would be like
this:

   vhost_begin()
   {
       dev->mem_changed_end_addr = 0;
       dev->mem_changed_start_addr = -1;
   }

   vhost_set_memory()
   {
       ...
       dev->mem_changed_start_addr = MIN(dev->mem_changed_start_addr, start_addr);
       dev->mem_changed_end_addr = MAX(dev->mem_changed_end_addr, start_addr + size - 1);
   }

   vhost_commit()
   {
       if (dev->mem_changed_start_addr > dev->mem_changed_end_addr) {
           return;
       }
       start_addr = dev->mem_changed_start_addr;
       size = dev->mem_changed_end_addr - dev->mem_changed_start_addr + 1;
       ...
   }

Paolo

> Does the following look like you'd expect for marking 0xc0000 area as
> read-only case..?
> 
> Entering vhost_begin >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> Entering vhost_region_del section: 0x7f16d3a1db60 offset_within_region:
> 0xc0000 size: 2146697216 readonly: 0
> vhost_region_del: is_rom: 0, rom_device: 0
> vhost_region_del: readable: 1
> vhost_region_del: ram_addr 0x0, addr: 0x0 size: 2147483648
> vhost_region_del: name: pc.ram
> Entering vhost_set_memory, section: 0x7f16d3a1db60 add: 0, dev->started: 1
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
> Entering vhost_region_add section: 0x7f16d3a1daa0 offset_within_region:
> 0xc0000 size: 32768 readonly: 1
> vhost_region_add is readonly !!!!!!!!!!!!!!!!!!!
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size:
> 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7f16d3a1daa0 add: 1, dev->started: 1
> vhost_dev_assign_memory(); >>>>>>>>>>>>>>>>>>>>>>>>>>>> reg->guest_phys_addr:
> 0xc0000
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc0000
> Entering vhost_region_add section: 0x7f16d3a1daa0 offset_within_region:
> 0xc8000 size: 2146664448 readonly: 0
> vhost_region_add: is_rom: 0, rom_device: 0
> vhost_region_add: readable: 1
> vhost_region_add: ram_addr 0x0000000000000000, addr: 0x               0 size:
> 2147483648
> vhost_region_add: name: pc.ram
> Entering vhost_set_memory, section: 0x7f16d3a1daa0 add: 1, dev->started: 1
> vhost_set_memory: Setting dev->memory_changed = true for start_addr: 0xc8000
> phys_page_find got PHYS_MAP_NODE_NIL >>>>>>>>>>>>>>>>>>>>>>..
> Entering vhost_commit >>>>>>>>>>>>>>>>>>>>>>>>>>>
> vhost_commit, skipping vhost_verify_ring_mappings without start_addr
> !!!!!!!!!!!!!
> Entering verify_ring_mappings: start_addr 0x0000000000000000 size: 0
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0x0 ring_size: 0
> verify_ring_mappings: ring_phys 0xed000 ring_size: 5124
> verify_ring_mappings: calling cpu_physical_memory_map ring_phys: 0xed000 l:
> 5124
> address_space_map: addr: 0xed000, plen: 5124
> address_space_map: l: 4096, len: 5124
> address_space_map: section: 0x7f16d92c6020 memory_region_is_ram: 1 readonly:
> 0
> address_space_map: section: 0x7f16d92c6020 offset_within_region: 0xc8000
> section size: 2146664448
> address_space_map: l: 4096, len: 1028
> address_space_map: section: 0x7f16d92c6020 memory_region_is_ram: 1 readonly:
> 0
> address_space_map: section: 0x7f16d92c6020 offset_within_region: 0xc8000
> section size: 2146664448
> address_space_map: Calling qemu_ram_ptr_length: raddr: 0x           ed000
> rlen: 5124
> address_space_map: After qemu_ram_ptr_length: raddr: 0x           ed000 rlen:
> 5124
> 
>

Patch

diff --git a/hw/vhost.c b/hw/vhost.c
index 4d6aee3..716cfaa 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -416,6 +416,25 @@  static void vhost_set_memory(MemoryListener *listener,
         /* Remove old mapping for this memory, if any. */
         vhost_dev_unassign_memory(dev, start_addr, size);
     }
+    dev->memory_changed = true;
+}
+
+static bool vhost_section(MemoryRegionSection *section)
+{
+    return memory_region_is_ram(section->mr);
+}
+
+static void vhost_begin(MemoryListener *listener)
+{
+}
+
+static void vhost_commit(MemoryListener *listener)
+{
+    struct vhost_dev *dev = container_of(listener, struct vhost_dev,
+                                         memory_listener);
+    if (!dev->memory_changed) {
+        return;
+    }
 
     if (!dev->started) {
         return;
@@ -445,19 +464,7 @@  static void vhost_set_memory(MemoryListener *listener,
     if (dev->log_size > log_size + VHOST_LOG_BUFFER) {
         vhost_dev_log_resize(dev, log_size);
     }
-}
-
-static bool vhost_section(MemoryRegionSection *section)
-{
-    return memory_region_is_ram(section->mr);
-}
-
-static void vhost_begin(MemoryListener *listener)
-{
-}
-
-static void vhost_commit(MemoryListener *listener)
-{
+    dev->memory_changed = false;
 }
 
 static void vhost_region_add(MemoryListener *listener,
@@ -842,6 +849,7 @@  int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
     hdev->log_size = 0;
     hdev->log_enabled = false;
     hdev->started = false;
+    hdev->memory_changed = false;
     memory_listener_register(&hdev->memory_listener, &address_space_memory);
     hdev->force = force;
     return 0;