Message ID | 20170825083123.47432-1-aik@ozlabs.ru |
---|---|
State | New |
Headers | show |
On 25/08/2017 10:31, Alexey Kardashevskiy wrote: > Otherwise old dispatch holds way too much memory before RCU gets > a chance to free old dispatches. > > Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> > --- > > This is a follow-up to the "Memory use with >100 virtio devices" > thread. > > I assume this is a dirty hack (which fixes the problem though) This doesn't work. AddressSpaceDispatch can be accessed outside the big QEMU lock, which is why it is destroyed via call_rcu. The solution is to: 1) share the FlatView structures if they refer to the same root memory region; 2) have one AddressSpaceDispatch per FlatView instead of one per AddressSpace, so that FlatView reference counting takes care of clearing the AddressSpaceDispatch too. Neither is particularly hard. The second requires getting rid of the as->dispatch_listener and "hard coding" the creation of the AddressSpaceDispatch from the FlatView in memory.c. Paolo > and I wonder what the proper solution would be. Thanks. > > > What happens here is that every virtio block device creates 2 address > spaces - for modern config space (called "virtio-pci-cfg-as") and > for busmaster (common pci thing, called after the device name, > in my case "virtio-blk-pci"). > > Each address_space_init() updates topology for _every_ address space. > Every topology update (address_space_update_topology()) creates a new > dispatch tree - AddressSpaceDispatch with nodes (1KB) and > sections (48KB) and destroys the old one. > > However the dispatch destructor is postponed via RCU which does not > get a chance to execute until the machine is initialized but before > we get there, memory is not returned to the pool, and this is a lot > of memory which grows n^2. > > > Interestingly, mem_add() from exec.c is called twice: > as as->dispatch_listener.region_add() and > as as->dispatch_listener.region_nop() - I did not understand > the trick but it does not work if I remove the .region_nop() hook. > How does it work? :) > > --- > exec.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/exec.c b/exec.c > index 01ac21e3cd..ea5f3eb209 100644 > --- a/exec.c > +++ b/exec.c > @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener) > > atomic_rcu_set(&as->dispatch, next); > if (cur) { > - call_rcu(cur, address_space_dispatch_free, rcu); > + address_space_dispatch_free(cur); > } > } > >
On 25/08/2017 10:31, Alexey Kardashevskiy wrote: > > Interestingly, mem_add() from exec.c is called twice: > as as->dispatch_listener.region_add() and > as as->dispatch_listener.region_nop() - I did not understand > the trick but it does not work if I remove the .region_nop() hook. > How does it work? :) Didn't note this. The hooks are: - region_add: a new MemoryRegionSection appeared compared to the previous FlatView - region_nop: a region that was in the previous FlatView stayed there - region_del: a MemoryRegionSection disappeared compared to the previous FlatView Because the AddressSpaceDispatch is rebuilt from scratch, it cares about both new (region_add) and existing (region_nop) regions. Paolo
On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote: > The solution is to: 1) share the FlatView structures if they refer to > the same root memory region; 2) have one AddressSpaceDispatch per > FlatView instead of one per AddressSpace, so that FlatView reference > counting takes care of clearing the AddressSpaceDispatch too. Neither > is particularly hard. If we did this we could get rid of address_space_init_shareable(), right? (It's a bit of a cheesy hack aimed at avoiding having duplicate address space structures for the same root memory region, but if the underlying code is better at not duplicating all the data structures unless necessary then the benefit of having the separate API goes away I think.) thanks -- PMM
On 25/08/2017 11:22, Peter Maydell wrote: > On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote: >> The solution is to: 1) share the FlatView structures if they refer to >> the same root memory region; 2) have one AddressSpaceDispatch per >> FlatView instead of one per AddressSpace, so that FlatView reference >> counting takes care of clearing the AddressSpaceDispatch too. Neither >> is particularly hard. > If we did this we could get rid of address_space_init_shareable(), > right? (It's a bit of a cheesy hack aimed at avoiding having duplicate > address space structures for the same root memory region, but if > the underlying code is better at not duplicating all the data > structures unless necessary then the benefit of having the > separate API goes away I think.) Yes, indeed. Paolo
On Fri, Aug 25, 2017 at 11:57:26AM +0200, Paolo Bonzini wrote: > On 25/08/2017 11:22, Peter Maydell wrote: > > On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote: > >> The solution is to: 1) share the FlatView structures if they refer to > >> the same root memory region; 2) have one AddressSpaceDispatch per > >> FlatView instead of one per AddressSpace, so that FlatView reference > >> counting takes care of clearing the AddressSpaceDispatch too. Neither > >> is particularly hard. > > If we did this we could get rid of address_space_init_shareable(), > > right? (It's a bit of a cheesy hack aimed at avoiding having duplicate > > address space structures for the same root memory region, but if > > the underlying code is better at not duplicating all the data > > structures unless necessary then the benefit of having the > > separate API goes away I think.) > > Yes, indeed. Hm. Why do we need to construct full ASes for virtio-blk, rather than just MRs?
On 25 August 2017 at 14:19, David Gibson <david@gibson.dropbear.id.au> wrote: > Hm. Why do we need to construct full ASes for virtio-blk, rather than > just MRs? It's the PCI layer that's doing it, but the overall reason is because virtio-blk needs to make memory transactions (DMA reads and writes). The AddressSpace is our construct for "thing you use to be able to efficiently make memory transactions on a MemoryRegion". (This is what the FlatView and AddressSpaceDispatch data structures are for -- they let you efficiently go from "write to address 0x4000 in this thing" to the actual destination RAM or device read/write pointers.) thanks -- PMM
On 25/08/2017 15:19, David Gibson wrote: > On Fri, Aug 25, 2017 at 11:57:26AM +0200, Paolo Bonzini wrote: >> On 25/08/2017 11:22, Peter Maydell wrote: >>> On 25 August 2017 at 09:53, Paolo Bonzini <pbonzini@redhat.com> wrote: >>>> The solution is to: 1) share the FlatView structures if they refer to >>>> the same root memory region; 2) have one AddressSpaceDispatch per >>>> FlatView instead of one per AddressSpace, so that FlatView reference >>>> counting takes care of clearing the AddressSpaceDispatch too. Neither >>>> is particularly hard. >>> If we did this we could get rid of address_space_init_shareable(), >>> right? (It's a bit of a cheesy hack aimed at avoiding having duplicate >>> address space structures for the same root memory region, but if >>> the underlying code is better at not duplicating all the data >>> structures unless necessary then the benefit of having the >>> separate API goes away I think.) >> >> Yes, indeed. > > Hm. Why do we need to construct full ASes for virtio-blk, rather than > just MRs? It's an artifact of how virtio_address_space_read and virtio_address_space_write are implemented. Paolo
On 25/08/17 18:53, Paolo Bonzini wrote: > On 25/08/2017 10:31, Alexey Kardashevskiy wrote: >> Otherwise old dispatch holds way too much memory before RCU gets >> a chance to free old dispatches. >> >> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> >> --- >> >> This is a follow-up to the "Memory use with >100 virtio devices" >> thread. >> >> I assume this is a dirty hack (which fixes the problem though) > > This doesn't work. AddressSpaceDispatch can be accessed outside the big > QEMU lock, which is why it is destroyed via call_rcu. > > The solution is to: 1) share the FlatView structures if they refer to > the same root memory region; 2) have one AddressSpaceDispatch per > FlatView instead of one per AddressSpace, so that FlatView reference > counting takes care of clearing the AddressSpaceDispatch too. Neither > is particularly hard. The second requires getting rid of the > as->dispatch_listener and "hard coding" the creation of the > AddressSpaceDispatch from the FlatView in memory.c. While I am trying this approach, here is a cheap workaround - diff --git a/vl.c b/vl.c index 8e247cc2a2..4d95bc2a6a 100644 --- a/vl.c +++ b/vl.c @@ -4655,12 +4655,16 @@ int main(int argc, char **argv, char **envp) igd_gfx_passthru(); /* init generic devices */ + memory_region_transaction_begin(); + rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE); if (qemu_opts_foreach(qemu_find_opts("device"), device_init_func, NULL, NULL)) { exit(1); } + memory_region_transaction_commit(); + cpu_synchronize_all_post_init(); rom_reset_order_override(); Just for self-education - how dirty is this hack? This effectively postpones all dispatch trees allocation/disposal till MR transation depth is 0 (which happens in this memory_region_transaction_commit()), for 500 virtio devices it reduces the amount of resident RAM from 45GB to 11GB with 2GB guest, good for speed too - time to build a machine is reduced from 4:00 to 1:20. > > Paolo > >> and I wonder what the proper solution would be. Thanks. >> >> >> What happens here is that every virtio block device creates 2 address >> spaces - for modern config space (called "virtio-pci-cfg-as") and >> for busmaster (common pci thing, called after the device name, >> in my case "virtio-blk-pci"). >> >> Each address_space_init() updates topology for _every_ address space. >> Every topology update (address_space_update_topology()) creates a new >> dispatch tree - AddressSpaceDispatch with nodes (1KB) and >> sections (48KB) and destroys the old one. >> >> However the dispatch destructor is postponed via RCU which does not >> get a chance to execute until the machine is initialized but before >> we get there, memory is not returned to the pool, and this is a lot >> of memory which grows n^2. >> >> >> Interestingly, mem_add() from exec.c is called twice: >> as as->dispatch_listener.region_add() and >> as as->dispatch_listener.region_nop() - I did not understand >> the trick but it does not work if I remove the .region_nop() hook. >> How does it work? :) >> >> --- >> exec.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/exec.c b/exec.c >> index 01ac21e3cd..ea5f3eb209 100644 >> --- a/exec.c >> +++ b/exec.c >> @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener) >> >> atomic_rcu_set(&as->dispatch, next); >> if (cur) { >> - call_rcu(cur, address_space_dispatch_free, rcu); >> + address_space_dispatch_free(cur); >> } >> } >> >> >
diff --git a/exec.c b/exec.c index 01ac21e3cd..ea5f3eb209 100644 --- a/exec.c +++ b/exec.c @@ -2707,7 +2707,7 @@ static void mem_commit(MemoryListener *listener) atomic_rcu_set(&as->dispatch, next); if (cur) { - call_rcu(cur, address_space_dispatch_free, rcu); + address_space_dispatch_free(cur); } }
Otherwise old dispatch holds way too much memory before RCU gets a chance to free old dispatches. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> --- This is a follow-up to the "Memory use with >100 virtio devices" thread. I assume this is a dirty hack (which fixes the problem though) and I wonder what the proper solution would be. Thanks. What happens here is that every virtio block device creates 2 address spaces - for modern config space (called "virtio-pci-cfg-as") and for busmaster (common pci thing, called after the device name, in my case "virtio-blk-pci"). Each address_space_init() updates topology for _every_ address space. Every topology update (address_space_update_topology()) creates a new dispatch tree - AddressSpaceDispatch with nodes (1KB) and sections (48KB) and destroys the old one. However the dispatch destructor is postponed via RCU which does not get a chance to execute until the machine is initialized but before we get there, memory is not returned to the pool, and this is a lot of memory which grows n^2. Interestingly, mem_add() from exec.c is called twice: as as->dispatch_listener.region_add() and as as->dispatch_listener.region_nop() - I did not understand the trick but it does not work if I remove the .region_nop() hook. How does it work? :) --- exec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)