diff mbox

[v1,2/2] reduce qemu's heap Rss size from 12252kB to 2752KB

Message ID 4712D8F4B26E034E80552F30A67BE0B1A19A11@ORSMSX112.amr.corp.intel.com
State New
Headers show

Commit Message

Xu, Anthony March 16, 2017, 8:02 p.m. UTC
> > memory_region_finalize.

> > Let me know if you think otherwise.

> 

> Yes, you can replace memory_region_del_subregion in

> memory_region_finalize

> with special code that does

> 

>     assert(!mr->enabled);

>     assert(subregion->container == mr);

>     subregion->container = NULL;

>     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);

>     memory_region_unref(subregion);

> 

> The last four lines are shared with memory_region_del_subregion, so please

> factor them in a new function, for example

> memory_region_del_subregion_internal.



After adding synchronize_rcu, I saw an infinite recursive call,
  mem_commit-> memory_region_finalize-> mem_commit->
memory_region_finalize-> ......
it caused a segment fault, because 8M stack space is used up, and found when 
memory_region_finalize is called, memory_region_transaction_depth is 0 and 
memory_region_update_pending is true. That's not normal!

As you mentioned in your previous email, that should  not happen.
>" But if memory_region_transaction_depth is == 0, there should be no

>update pending because the loop has never run"


The root cause is,
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called before 
memory_region_clear_pending() in memory_region_transaction_commit.
so when MEMORY_LISTENER_CALL_GLOBAL(commit, Forward) is called,
memory_region_transaction_depth is 0 and memory_region_update_pending is true.
mem_commit may call memory_region_finalize, it causes infinite recursive call.

my previous fix for this is not complete.
We may clear memory_region_update_pending before calling 
MEMORY_LISTENER_CALL_GLOBAL(commit, Forward)

Please review below patch




> > It is not slow, the contention is not high. Yes we can test.

> 

> It's not a matter of contention.  It's a matter of how long an RCU

> critical section can be---not just at startup, but at any point

> during QEMU execution.

The thing is, seems both address_space_translate and address_space_dispatch_free 
are called under the global lock. When synchronize_rcu is called, no other threads 
are in RCU critical section.

Seems RCU is not that useful for address space.


> 

> Have you tried tcmalloc or jemalloc?  They use the brk region

> less and sometimes are more aggressive in releasing mmap-ed memory

> areas.


They may be aggressive. But if memory are freed afterward, it may not help in some cases,
for example, starting a lot of VMs at the same time.


> 

> > Please review below patch, MemoryRegion is protected by RCU.

> 

> Removing transaction begin/commit in memory_region_finalize makes

> little sense because memory_region_del_subregion calls those two

> functions again.  See above for an alternative.  Apart from this,

> the patch is at least correct, though I'm not sure it's a good

> idea (synchronize_rcu needs testing). 

> I'm not sure I understand the

> address_space_write change).


After adding synchronize_rcu, we noticed a RCU dead loop.  synchronize_rcu is called 
inside RCU critical section. It happened when guest OS programmed the PCI bar.
The call trace is like,
address_space_write-> pci_host_config_write_common -> 
memory_region_transaction_commit ->mem_commit-> synchronize_rcu
pci_host_config_write_common is called inside RCU critical section.

The address_space_write change fixed this issue.


Thanks,
Anthony

Comments

Paolo Bonzini March 22, 2017, 11:46 a.m. UTC | #1
On 16/03/2017 21:02, Xu, Anthony wrote:
>>> memory_region_finalize.
>>> Let me know if you think otherwise.
>>
>> Yes, you can replace memory_region_del_subregion in
>> memory_region_finalize
>> with special code that does
>>
>>     assert(!mr->enabled);
>>     assert(subregion->container == mr);
>>     subregion->container = NULL;
>>     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>     memory_region_unref(subregion);
>>
>> The last four lines are shared with memory_region_del_subregion, so please
>> factor them in a new function, for example
>> memory_region_del_subregion_internal.
> 
> After adding synchronize_rcu, I saw an infinite recursive call,
>   mem_commit-> memory_region_finalize-> mem_commit->
> memory_region_finalize-> ......
> it caused a segment fault, because 8M stack space is used up, and found when 
> memory_region_finalize is called, memory_region_transaction_depth is 0 and 
> memory_region_update_pending is true. That's not normal!

Ok, this is a bug.  This would fix it:

> Please review below patch
> 
> diff --git a/memory.c b/memory.c
> index 64b0a60..4c95aaf 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -906,12 +906,6 @@ void memory_region_transaction_begin(void)
>      ++memory_region_transaction_depth;
>  }
> 
> -static void memory_region_clear_pending(void)
> -{
> -    memory_region_update_pending = false;
> -    ioeventfd_update_pending = false;
> -}
> -
>  void memory_region_transaction_commit(void)
>  {
>      AddressSpace *as;
> @@ -927,14 +921,14 @@ void memory_region_transaction_commit(void)
>              QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>                  address_space_update_topology(as);
>              }
> -
> +            memory_region_update_pending = false;
>              MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
>          } else if (ioeventfd_update_pending) {
>              QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
>                  address_space_update_ioeventfds(as);
>              }
> +            ioeventfd_update_pending = false;
>          }
> -        memory_region_clear_pending();
>     }
>  }

So please send it to the list with Signed-off-by line.

> The thing is, seems both address_space_translate and address_space_dispatch_free 
> are called under the global lock. When synchronize_rcu is called, no other threads 
> are in RCU critical section.

No, not necessarily.  address_space_write for example is called outside
the global lock by KVM and it calls address_space_translate.

> Seems RCU is not that useful for address space.

I suggest that you study the code more closely...  there is this in
kvm-all.c:

            DPRINTF("handle_mmio\n");
            /* Called outside BQL */
            address_space_rw(&address_space_memory,
                             run->mmio.phys_addr, attrs,
                             run->mmio.data,
                             run->mmio.len,
                             run->mmio.is_write);

and adding a simple assert() would have quickly disproved your theory.

> After adding synchronize_rcu, we noticed a RCU dead loop.  synchronize_rcu is called 
> inside RCU critical section. It happened when guest OS programmed the PCI bar.
> The call trace is like,
> address_space_write-> pci_host_config_write_common -> 
> memory_region_transaction_commit ->mem_commit-> synchronize_rcu
> pci_host_config_write_common is called inside RCU critical section.
> 
> The address_space_write change fixed this issue.

It's not a fix if the code is not thread-safe anymore!  But I think you
have the answer now as to why you cannot use synchronize_rcu.

Paolo
Xu, Anthony March 22, 2017, 6:26 p.m. UTC | #2
> So please send it to the list with Signed-off-by line.

Thanks, 

> 

>             DPRINTF("handle_mmio\n");

>             /* Called outside BQL */

>             address_space_rw(&address_space_memory,

>                              run->mmio.phys_addr, attrs,

>                              run->mmio.data,

>                              run->mmio.len,

>                              run->mmio.is_write);

> 

> and adding a simple assert() would have quickly disproved your theory.

You are right here.
If it is a PCI bar write, a memory region operation(del/add) may 
be called inside address_space_rw, and 
memory_region_transaction_commit will be called.
If address_space_rw is called  without the global lock, 
memory_region_transaction_commit is called with the global lock.
 It conflicts with what you said before.

>No, you don't need to check it.  Most functions (in this case,

>memory_region_transaction_commit) can only be called under the global lock.


And if two vcpus program the different PCI bars at the same time 
(it is unlikely, but QEMU should not assume it), without the global lock, 
region operations may be called at the same time. 
Are memory_region_del_subregion and 
memory_region_add_subregion_overlap thread-safe?

> It's not a fix if the code is not thread-safe anymore!  But I think you

> have the answer now as to why you cannot use synchronize_rcu.


Can you elaborate why the code is not thread-safe?

Thanks
Anthony
diff mbox

Patch

diff --git a/memory.c b/memory.c
index 64b0a60..4c95aaf 100644
--- a/memory.c
+++ b/memory.c
@@ -906,12 +906,6 @@  void memory_region_transaction_begin(void)
     ++memory_region_transaction_depth;
 }

-static void memory_region_clear_pending(void)
-{
-    memory_region_update_pending = false;
-    ioeventfd_update_pending = false;
-}
-
 void memory_region_transaction_commit(void)
 {
     AddressSpace *as;
@@ -927,14 +921,14 @@  void memory_region_transaction_commit(void)
             QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
                 address_space_update_topology(as);
             }
-
+            memory_region_update_pending = false;
             MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
         } else if (ioeventfd_update_pending) {
             QTAILQ_FOREACH(as, &address_spaces, address_spaces_link) {
                 address_space_update_ioeventfds(as);
             }
+            ioeventfd_update_pending = false;
         }
-        memory_region_clear_pending();
    }
 }