[RFC] Fake machine for scalability testing

Message ID	4D389209.8070202@codemonkey.ws
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> Message-ID: <4D389209.8070202@codemonkey.ws> Date: Thu, 20 Jan 2011 13:50:33 -0600 From: Anthony Liguori <anthony@codemonkey.ws> User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.15) Gecko/20101027 Lightning/1.0b1 Thunderbird/3.0.10 MIME-Version: 1.0 To: Markus Armbruster <armbru@redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing References: <m38vyit1ao.fsf@blackfin.pond.sub.org> <4D38642C.5050306@codemonkey.ws> <m31v47wlch.fsf@blackfin.pond.sub.org> In-Reply-To: <m31v47wlch.fsf@blackfin.pond.sub.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: qemu-devel@nongnu.org Precedence: list Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Anthony Liguori Jan. 20, 2011, 7:50 p.m. UTC

On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>      
>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>> of your management stack.  If each guest eats up a few 100MiB and
>>> competes for CPU, that requires a serious host machine.  Which you don't
>>> have.  You also don't want to modify the management stack at all, if you
>>> can help it.
>>>
>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>
>>> New option -fake-machine creates a fake machine incapable of running
>>> guest code.  Completely compiled out by default, enable with configure
>>> --enable-fake-machine.
>>>
>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>> modest.
>>>
>>> Non-fake VM running F-14 live, right after boot:
>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>
>>> Same VM -fake-machine, after similar time elapsed:
>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>
>>> We're using a very similar patch for RHEL scalability testing.
>>>
>>>        
>> Interesting, but:
>>
>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>> qemu-system-x86
>>
>> That's qemu-system-x86 -m 4
>>      
> Sure you ran qemu-system-x86 -fake-machine?
>    

No, I didn't try it.  My point was that -m 4 is already pretty small.

>> In terms of memory overhead, the largest source is not really going to
>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>      
> git-grep phys_ram_dirty finds nothing.
>    

Yeah, it's now ram_list[i].phys_dirty.

l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages

phys_dirty is mem_size_in_pages bytes.

>> I don't really understand the point of not creating a VCPU with KVM.
>> Is there some type of overhead in doing that?
>>      
> I briefly looked at both main loops, TCG's was the first one I happened
> to crack, and I didn't feel like doing both then.  If the general
> approach is okay, I'll gladly investigate how to do it with KVM.
>    

I guess what I don't understand is why do you need to not run guest 
code?  Specifically, if you remove the following, is it any less useful?


Regards,

Anthony Liguori

Markus Armbruster Jan. 21, 2011, 10:38 a.m. UTC | #1

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>>    
>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>      
>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>> have.  You also don't want to modify the management stack at all, if you
>>>> can help it.
>>>>
>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>
>>>> New option -fake-machine creates a fake machine incapable of running
>>>> guest code.  Completely compiled out by default, enable with configure
>>>> --enable-fake-machine.
>>>>
>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>> modest.
>>>>
>>>> Non-fake VM running F-14 live, right after boot:
>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>
>>>> Same VM -fake-machine, after similar time elapsed:
>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>
>>>> We're using a very similar patch for RHEL scalability testing.
>>>>
>>>>        
>>> Interesting, but:
>>>
>>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>> qemu-system-x86
>>>
>>> That's qemu-system-x86 -m 4
>>>      
>> Sure you ran qemu-system-x86 -fake-machine?
>>    
>
> No, I didn't try it.  My point was that -m 4 is already pretty small.

Ah!

However, it's not as small as -fake-machine, and eats all the CPU it can
get.

None-fake VM as above, but with -m 4:
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]

And I believe we can make -fake-machine use even less memory than now,
with a little more work.

>>> In terms of memory overhead, the largest source is not really going to
>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>      
>> git-grep phys_ram_dirty finds nothing.
>>    
>
> Yeah, it's now ram_list[i].phys_dirty.
>
> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>
> phys_dirty is mem_size_in_pages bytes.

Thanks.

>>> I don't really understand the point of not creating a VCPU with KVM.
>>> Is there some type of overhead in doing that?
>>>      
>> I briefly looked at both main loops, TCG's was the first one I happened
>> to crack, and I didn't feel like doing both then.  If the general
>> approach is okay, I'll gladly investigate how to do it with KVM.
>>    
>
> I guess what I don't understand is why do you need to not run guest
> code?  Specifically, if you remove the following, is it any less
> useful?
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>      uint8_t *tc_ptr;
>      unsigned long next_tb;
>
> -    if (cpu_halted(env1) == EXCP_HALTED)
> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>
>          return EXCP_HALTED;

I don't want 1000s of guests running infinite "not enough memory to do
anything useful, panic!" reboot loops.  Because that's 1000s of guests
competing for CPU.

If you think we can achieve my goals (stated in my first paragraph) in a
different way, I'm all ears.

Daniel P. Berrangé Jan. 21, 2011, 10:43 a.m. UTC | #2

On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> >Anthony Liguori<anthony@codemonkey.ws>  writes:
> >
> >>On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> >>>The problem: you want to do serious scalability testing (1000s of VMs)
> >>>of your management stack.  If each guest eats up a few 100MiB and
> >>>competes for CPU, that requires a serious host machine.  Which you don't
> >>>have.  You also don't want to modify the management stack at all, if you
> >>>can help it.
> >>>
> >>>The solution: a perfectly normal-looking QEMU that uses minimal
> >>>resources.  Ability to execute any guest code is strictly optional ;)
> >>>
> >>>New option -fake-machine creates a fake machine incapable of running
> >>>guest code.  Completely compiled out by default, enable with configure
> >>>--enable-fake-machine.
> >>>
> >>>With -fake-machine, CPU use is negligible, and memory use is rather
> >>>modest.
> >>>
> >>>Non-fake VM running F-14 live, right after boot:
> >>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
> >>>
> >>>Same VM -fake-machine, after similar time elapsed:
> >>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
> >>>
> >>>We're using a very similar patch for RHEL scalability testing.
> >>>
> >>Interesting, but:
> >>
> >>  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
> >>qemu-system-x86
> >>
> >>That's qemu-system-x86 -m 4
> >Sure you ran qemu-system-x86 -fake-machine?
> 
> No, I didn't try it.  My point was that -m 4 is already pretty small.

One of the core ideas/requirements behind the "fake QEMU" was
that we won't need to modify applications to adjust the command
line arguments in this kind of way. We want all their machine
definition logic to remain unaffected. In fact my original
prototype did not even require addition of the passing of an
extra '-fake-machine' argument, it would have just been a plain
drop in alternative QEMU binary. It also stubbed out much of
the KVM codepaths, so you could "enable"  KVM mode without
actually really having KVM available on the host.

> >>In terms of memory overhead, the largest source is not really going to
> >>be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
> >git-grep phys_ram_dirty finds nothing.
> 
> Yeah, it's now ram_list[i].phys_dirty.
> 
> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
> 
> phys_dirty is mem_size_in_pages bytes.
> 
> >>I don't really understand the point of not creating a VCPU with KVM.
> >>Is there some type of overhead in doing that?
> >I briefly looked at both main loops, TCG's was the first one I happened
> >to crack, and I didn't feel like doing both then.  If the general
> >approach is okay, I'll gladly investigate how to do it with KVM.
> 
> I guess what I don't understand is why do you need to not run guest
> code?  Specifically, if you remove the following, is it any less
> useful?

IIUC, if you don't have the following change, then the guest CPUs
will actually execute the kernel/bootable disk configured, causing
host CPU utilization to rise. Even if it only adds 2% load on the
host, this quickly becomes an issue as you get 200 or more VMs on
the host. Ideally we would have the main loop completely disabled,
not merely the CPUs, because this would avoid all possible background
CPU load that any QEMU internal timers might cause.

Basically the desired goal is, make no change to the QEMU command
line aguments, but have zero memory and CPU overhead by running
QEMU. fake-machine doesn't get as close to zero as my original
fake QEMU target managed, but it is still pretty good, considering
how much less code is involved in fake-machine.

> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>      uint8_t *tc_ptr;
>      unsigned long next_tb;
> 
> -    if (cpu_halted(env1) == EXCP_HALTED)
> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
> 
>          return EXCP_HALTED;

Daniel

Anthony Liguori Jan. 21, 2011, 2:43 p.m. UTC | #3

On 01/21/2011 04:43 AM, Daniel P. Berrange wrote:
> On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
>    
>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>>        
>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>          
>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>> can help it.
>>>>>
>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>
>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>> --enable-fake-machine.
>>>>>
>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>> modest.
>>>>>
>>>>> Non-fake VM running F-14 live, right after boot:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>
>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>
>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>
>>>>>            
>>>> Interesting, but:
>>>>
>>>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>> qemu-system-x86
>>>>
>>>> That's qemu-system-x86 -m 4
>>>>          
>>> Sure you ran qemu-system-x86 -fake-machine?
>>>        
>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>      
> One of the core ideas/requirements behind the "fake QEMU" was
> that we won't need to modify applications to adjust the command
> line arguments in this kind of way. We want all their machine
> definition logic to remain unaffected. In fact my original
> prototype did not even require addition of the passing of an
> extra '-fake-machine' argument, it would have just been a plain
> drop in alternative QEMU binary. It also stubbed out much of
> the KVM codepaths, so you could "enable"  KVM mode without
> actually really having KVM available on the host.
>
>    
>>>> In terms of memory overhead, the largest source is not really going to
>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>          
>>> git-grep phys_ram_dirty finds nothing.
>>>        
>> Yeah, it's now ram_list[i].phys_dirty.
>>
>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>
>> phys_dirty is mem_size_in_pages bytes.
>>
>>      
>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>> Is there some type of overhead in doing that?
>>>>          
>>> I briefly looked at both main loops, TCG's was the first one I happened
>>> to crack, and I didn't feel like doing both then.  If the general
>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>        
>> I guess what I don't understand is why do you need to not run guest
>> code?  Specifically, if you remove the following, is it any less
>> useful?
>>      
> IIUC, if you don't have the following change, then the guest CPUs
> will actually execute the kernel/bootable disk configured, causing
> host CPU utilization to rise. Even if it only adds 2% load on the
> host, this quickly becomes an issue as you get 200 or more VMs on
> the host. Ideally we would have the main loop completely disabled,
> not merely the CPUs, because this would avoid all possible background
> CPU load that any QEMU internal timers might cause.
>
> Basically the desired goal is, make no change to the QEMU command
> line aguments, but have zero memory and CPU overhead by running
> QEMU. fake-machine doesn't get as close to zero as my original
> fake QEMU target managed, but it is still pretty good, considering
> how much less code is involved in fake-machine.
>    

Oh, so what you really want to do is:

#!/bin/sh
/usr/libexec/qemu-kvm -m 4

Ignore all command line parameters and just run a micro guest.  If you 
don't specify any kernel/boot disks, you don't need to disable a VCPU 
execution because it'll spin in a hlt loop once the bios executes.

Regards,

Anthony Liguori

>    
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 8c9fb8b..cd1259a 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>       uint8_t *tc_ptr;
>>       unsigned long next_tb;
>>
>> -    if (cpu_halted(env1) == EXCP_HALTED)
>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>
>>           return EXCP_HALTED;
>>      
>
> Daniel
>
>

Anthony Liguori Jan. 21, 2011, 2:45 p.m. UTC | #4

On 01/21/2011 04:38 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>>
>>>        
>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>
>>>>          
>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>> can help it.
>>>>>
>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>
>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>> --enable-fake-machine.
>>>>>
>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>> modest.
>>>>>
>>>>> Non-fake VM running F-14 live, right after boot:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>
>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>
>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>
>>>>>
>>>>>            
>>>> Interesting, but:
>>>>
>>>>    9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>> qemu-system-x86
>>>>
>>>> That's qemu-system-x86 -m 4
>>>>
>>>>          
>>> Sure you ran qemu-system-x86 -fake-machine?
>>>
>>>        
>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>      
> Ah!
>
> However, it's not as small as -fake-machine, and eats all the CPU it can
> get.
>
> None-fake VM as above, but with -m 4:
> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]
>
> And I believe we can make -fake-machine use even less memory than now,
> with a little more work.
>
>    
>>>> In terms of memory overhead, the largest source is not really going to
>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>
>>>>          
>>> git-grep phys_ram_dirty finds nothing.
>>>
>>>        
>> Yeah, it's now ram_list[i].phys_dirty.
>>
>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>
>> phys_dirty is mem_size_in_pages bytes.
>>      
> Thanks.
>
>    
>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>> Is there some type of overhead in doing that?
>>>>
>>>>          
>>> I briefly looked at both main loops, TCG's was the first one I happened
>>> to crack, and I didn't feel like doing both then.  If the general
>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>
>>>        
>> I guess what I don't understand is why do you need to not run guest
>> code?  Specifically, if you remove the following, is it any less
>> useful?
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 8c9fb8b..cd1259a 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>       uint8_t *tc_ptr;
>>       unsigned long next_tb;
>>
>> -    if (cpu_halted(env1) == EXCP_HALTED)
>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>
>>           return EXCP_HALTED;
>>      
> I don't want 1000s of guests running infinite "not enough memory to do
> anything useful, panic!" reboot loops.  Because that's 1000s of guests
> competing for CPU.
>    

Hrm, that's not the behavior I see.  With no bootable drive, the BIOS 
will spin in a HLT loop as part of int18.

Regards,

Anthony Liguori

> If you think we can achieve my goals (stated in my first paragraph) in a
> different way, I'm all ears.
>
>

Daniel P. Berrangé Jan. 21, 2011, 2:46 p.m. UTC | #5

On Fri, Jan 21, 2011 at 08:43:20AM -0600, Anthony Liguori wrote:
> On 01/21/2011 04:43 AM, Daniel P. Berrange wrote:
> >On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
> >>On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> >>>Anthony Liguori<anthony@codemonkey.ws>   writes:
> >>>
> >>>>On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> >>>>>The problem: you want to do serious scalability testing (1000s of VMs)
> >>>>>of your management stack.  If each guest eats up a few 100MiB and
> >>>>>competes for CPU, that requires a serious host machine.  Which you don't
> >>>>>have.  You also don't want to modify the management stack at all, if you
> >>>>>can help it.
> >>>>>
> >>>>>The solution: a perfectly normal-looking QEMU that uses minimal
> >>>>>resources.  Ability to execute any guest code is strictly optional ;)
> >>>>>
> >>>>>New option -fake-machine creates a fake machine incapable of running
> >>>>>guest code.  Completely compiled out by default, enable with configure
> >>>>>--enable-fake-machine.
> >>>>>
> >>>>>With -fake-machine, CPU use is negligible, and memory use is rather
> >>>>>modest.
> >>>>>
> >>>>>Non-fake VM running F-14 live, right after boot:
> >>>>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>>>armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
> >>>>>
> >>>>>Same VM -fake-machine, after similar time elapsed:
> >>>>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>>>armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
> >>>>>
> >>>>>We're using a very similar patch for RHEL scalability testing.
> >>>>>
> >>>>Interesting, but:
> >>>>
> >>>>  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
> >>>>qemu-system-x86
> >>>>
> >>>>That's qemu-system-x86 -m 4
> >>>Sure you ran qemu-system-x86 -fake-machine?
> >>No, I didn't try it.  My point was that -m 4 is already pretty small.
> >One of the core ideas/requirements behind the "fake QEMU" was
> >that we won't need to modify applications to adjust the command
> >line arguments in this kind of way. We want all their machine
> >definition logic to remain unaffected. In fact my original
> >prototype did not even require addition of the passing of an
> >extra '-fake-machine' argument, it would have just been a plain
> >drop in alternative QEMU binary. It also stubbed out much of
> >the KVM codepaths, so you could "enable"  KVM mode without
> >actually really having KVM available on the host.
> >
> >>>>In terms of memory overhead, the largest source is not really going to
> >>>>be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
> >>>git-grep phys_ram_dirty finds nothing.
> >>Yeah, it's now ram_list[i].phys_dirty.
> >>
> >>l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
> >>
> >>phys_dirty is mem_size_in_pages bytes.
> >>
> >>>>I don't really understand the point of not creating a VCPU with KVM.
> >>>>Is there some type of overhead in doing that?
> >>>I briefly looked at both main loops, TCG's was the first one I happened
> >>>to crack, and I didn't feel like doing both then.  If the general
> >>>approach is okay, I'll gladly investigate how to do it with KVM.
> >>I guess what I don't understand is why do you need to not run guest
> >>code?  Specifically, if you remove the following, is it any less
> >>useful?
> >IIUC, if you don't have the following change, then the guest CPUs
> >will actually execute the kernel/bootable disk configured, causing
> >host CPU utilization to rise. Even if it only adds 2% load on the
> >host, this quickly becomes an issue as you get 200 or more VMs on
> >the host. Ideally we would have the main loop completely disabled,
> >not merely the CPUs, because this would avoid all possible background
> >CPU load that any QEMU internal timers might cause.
> >
> >Basically the desired goal is, make no change to the QEMU command
> >line aguments, but have zero memory and CPU overhead by running
> >QEMU. fake-machine doesn't get as close to zero as my original
> >fake QEMU target managed, but it is still pretty good, considering
> >how much less code is involved in fake-machine.
> 
> Oh, so what you really want to do is:
> 
> #!/bin/sh
> /usr/libexec/qemu-kvm -m 4
> 
> Ignore all command line parameters and just run a micro guest.  If
> you don't specify any kernel/boot disks, you don't need to disable a
> VCPU execution because it'll spin in a hlt loop once the bios
> executes.

That's going to likely cause app confusion, because the app will
be specifying 1 GB, but when it talks to the balloon it is only
going to see / be allowed to set the balloon between 0 & 4 MB.

Daniel

Markus Armbruster Jan. 21, 2011, 4:51 p.m. UTC | #6

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/21/2011 04:38 AM, Markus Armbruster wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>>    
>>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>>      
>>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>>
>>>>
>>>>        
>>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>>
>>>>>          
>>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>>> can help it.
>>>>>>
>>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>>
>>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>>> --enable-fake-machine.
>>>>>>
>>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>>> modest.
>>>>>>
>>>>>> Non-fake VM running F-14 live, right after boot:
>>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>>
>>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>>
>>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Interesting, but:
>>>>>
>>>>>    9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>>> qemu-system-x86
>>>>>
>>>>> That's qemu-system-x86 -m 4
>>>>>
>>>>>          
>>>> Sure you ran qemu-system-x86 -fake-machine?
>>>>
>>>>        
>>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>>      
>> Ah!
>>
>> However, it's not as small as -fake-machine, and eats all the CPU it can
>> get.
>>
>> None-fake VM as above, but with -m 4:
>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>> armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]
>>
>> And I believe we can make -fake-machine use even less memory than now,
>> with a little more work.
>>
>>    
>>>>> In terms of memory overhead, the largest source is not really going to
>>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>>
>>>>>          
>>>> git-grep phys_ram_dirty finds nothing.
>>>>
>>>>        
>>> Yeah, it's now ram_list[i].phys_dirty.
>>>
>>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>>
>>> phys_dirty is mem_size_in_pages bytes.
>>>      
>> Thanks.
>>
>>    
>>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>>> Is there some type of overhead in doing that?
>>>>>
>>>>>          
>>>> I briefly looked at both main loops, TCG's was the first one I happened
>>>> to crack, and I didn't feel like doing both then.  If the general
>>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>>
>>>>        
>>> I guess what I don't understand is why do you need to not run guest
>>> code?  Specifically, if you remove the following, is it any less
>>> useful?
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index 8c9fb8b..cd1259a 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>>       uint8_t *tc_ptr;
>>>       unsigned long next_tb;
>>>
>>> -    if (cpu_halted(env1) == EXCP_HALTED)
>>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>>
>>>           return EXCP_HALTED;
>>>      
>> I don't want 1000s of guests running infinite "not enough memory to do
>> anything useful, panic!" reboot loops.  Because that's 1000s of guests
>> competing for CPU.
>>    
>
> Hrm, that's not the behavior I see.  With no bootable drive, the BIOS
> will spin in a HLT loop as part of int18.

Aha.  I used a bootable drive.

Using a non-bootable drive may well curb the CPU use sufficiently.  Not
sure we can always do that in our testing.  The less we have to hack up
the stack for testing, the better.

[RFC] Fake machine for scalability testing

Commit Message

Comments

Patch