diff mbox series

[v2,04/12] i386: Register feature bit properties as class properties

Message ID 20201111183823.283752-5-ehabkost@redhat.com
State New
Headers show
Series qom: Convert some properties to class properties | expand

Commit Message

Eduardo Habkost Nov. 11, 2020, 6:38 p.m. UTC
Class properties make QOM introspection simpler and easier, as
they don't require an object to be instantiated.

Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
---
Changes v1 -> v2:
* Remove x86_cpu_release_bit_prop() function, as we will allocate
  BitProperty only once and never free the class property list

---
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Eduardo Habkost <ehabkost@redhat.com>
Cc: qemu-devel@nongnu.org
---
 target/i386/cpu.c | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

Comments

Markus Armbruster Dec. 15, 2020, 2:11 p.m. UTC | #1
Eduardo Habkost <ehabkost@redhat.com> writes:

> Class properties make QOM introspection simpler and easier, as
> they don't require an object to be instantiated.
>
> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

This is significantly more than just "simpler and easier".

The other day, I played with the QMP core to reduce its appetite for
malloc.  I came up with patches that should approximately halve it, and
felt quite pleased with myself.  I looked for a simple test to
demonstrate the effect.  Something with plenty of output.  Hmm, why not
query-cpu-definitions, it produces about 32KiB.  Instrument, instrument,
run, ... whaaaat?!?

My patches save some 7000 allocations (670 KiB total), roughly matching
my expectations.

Turns out this is a drop in the bucket: query-cpu-definitions still
takes some 180,000 allocations (almost 12 MiB total).  They're hiding
behind this line in qmp_query_cpu_definitions():

    g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list);

The line takes more than a quarter second for me.

Hogging the main loop for a quarter second is not good.

Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9
MiB total).  It's a smaller pig now.
Eduardo Habkost Dec. 15, 2020, 3:01 p.m. UTC | #2
On Tue, Dec 15, 2020 at 03:11:06PM +0100, Markus Armbruster wrote:
> Eduardo Habkost <ehabkost@redhat.com> writes:
> 
> > Class properties make QOM introspection simpler and easier, as
> > they don't require an object to be instantiated.
> >
> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
> 
> This is significantly more than just "simpler and easier".
> 
> The other day, I played with the QMP core to reduce its appetite for
> malloc.  I came up with patches that should approximately halve it, and
> felt quite pleased with myself.  I looked for a simple test to
> demonstrate the effect.  Something with plenty of output.  Hmm, why not
> query-cpu-definitions, it produces about 32KiB.  Instrument, instrument,
> run, ... whaaaat?!?
> 
> My patches save some 7000 allocations (670 KiB total), roughly matching
> my expectations.
> 
> Turns out this is a drop in the bucket: query-cpu-definitions still
> takes some 180,000 allocations (almost 12 MiB total).  They're hiding
> behind this line in qmp_query_cpu_definitions():
> 
>     g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list);
> 
> The line takes more than a quarter second for me.
> 
> Hogging the main loop for a quarter second is not good.

Wow!

> 
> Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9
> MiB total).  It's a smaller pig now.

Thanks for investigating this!  I'll amend the commit message with:

  Also, the hundreds of instance properties were having an impact
  on QMP commands that create temporary CPU objects.  On my
  machine, run time of qmp_query_cpu_definitions() patch changed
  from ~200ms to ~16ms after applying this patch.

Numbers were obtained with:

 $ sudo perf probe -v -x ./qemu-system-x86_64 -a 'qmp_query_cpu_definitions%return' -a 'qmp_query_cpu_definitions'
 $ echo -e '{"execute": "qmp_capabilities"}\n{"execute": "query-cpu-definitions"}\n{"execute": "quit"}' | sudo perf trace -e 'probe_qemu:*' ./qemu-system-x86_64 -S -display none -qmp stdio > /dev/null

Before:

     0.000 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94851767056275)
   204.072 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94851767056275, __probe_ret_ip: 94851768499362)

After:

     0.000 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94554723186579)
    16.445 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94554723186579, __probe_ret_ip: 94554724629631)
Markus Armbruster Dec. 16, 2020, 5:35 a.m. UTC | #3
Eduardo Habkost <ehabkost@redhat.com> writes:

> On Tue, Dec 15, 2020 at 03:11:06PM +0100, Markus Armbruster wrote:
>> Eduardo Habkost <ehabkost@redhat.com> writes:
>> 
>> > Class properties make QOM introspection simpler and easier, as
>> > they don't require an object to be instantiated.
>> >
>> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
>> 
>> This is significantly more than just "simpler and easier".
>> 
>> The other day, I played with the QMP core to reduce its appetite for
>> malloc.  I came up with patches that should approximately halve it, and
>> felt quite pleased with myself.  I looked for a simple test to
>> demonstrate the effect.  Something with plenty of output.  Hmm, why not
>> query-cpu-definitions, it produces about 32KiB.  Instrument, instrument,
>> run, ... whaaaat?!?
>> 
>> My patches save some 7000 allocations (670 KiB total), roughly matching
>> my expectations.
>> 
>> Turns out this is a drop in the bucket: query-cpu-definitions still
>> takes some 180,000 allocations (almost 12 MiB total).  They're hiding
>> behind this line in qmp_query_cpu_definitions():
>> 
>>     g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list);
>> 
>> The line takes more than a quarter second for me.
>> 
>> Hogging the main loop for a quarter second is not good.
>
> Wow!
>
>> 
>> Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9
>> MiB total).  It's a smaller pig now.
>
> Thanks for investigating this!  I'll amend the commit message with:
>
>   Also, the hundreds of instance properties were having an impact
>   on QMP commands that create temporary CPU objects.  On my
>   machine, run time of qmp_query_cpu_definitions() patch changed
>   from ~200ms to ~16ms after applying this patch.
>
> Numbers were obtained with:
>
>  $ sudo perf probe -v -x ./qemu-system-x86_64 -a 'qmp_query_cpu_definitions%return' -a 'qmp_query_cpu_definitions'
>  $ echo -e '{"execute": "qmp_capabilities"}\n{"execute": "query-cpu-definitions"}\n{"execute": "quit"}' | sudo perf trace -e 'probe_qemu:*' ./qemu-system-x86_64 -S -display none -qmp stdio > /dev/null
>
> Before:
>
>      0.000 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94851767056275)
>    204.072 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94851767056275, __probe_ret_ip: 94851768499362)
>
> After:
>
>      0.000 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94554723186579)
>     16.445 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94554723186579, __probe_ret_ip: 94554724629631)

Looks good, thanks!
diff mbox series

Patch

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 0d8606958e..dbba1151f2 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -6823,29 +6823,23 @@  static void x86_cpu_set_bit_prop(Object *obj, Visitor *v, const char *name,
     cpu->env.user_features[fp->w] |= fp->mask;
 }
 
-static void x86_cpu_release_bit_prop(Object *obj, const char *name,
-                                     void *opaque)
-{
-    BitProperty *prop = opaque;
-    g_free(prop);
-}
-
 /* Register a boolean property to get/set a single bit in a uint32_t field.
  *
  * The same property name can be registered multiple times to make it affect
  * multiple bits in the same FeatureWord. In that case, the getter will return
  * true only if all bits are set.
  */
-static void x86_cpu_register_bit_prop(X86CPU *cpu,
+static void x86_cpu_register_bit_prop(X86CPUClass *xcc,
                                       const char *prop_name,
                                       FeatureWord w,
                                       int bitnr)
 {
+    ObjectClass *oc = OBJECT_CLASS(xcc);
     BitProperty *fp;
     ObjectProperty *op;
     uint64_t mask = (1ULL << bitnr);
 
-    op = object_property_find(OBJECT(cpu), prop_name);
+    op = object_class_property_find(oc, prop_name);
     if (op) {
         fp = op->opaque;
         assert(fp->w == w);
@@ -6854,14 +6848,14 @@  static void x86_cpu_register_bit_prop(X86CPU *cpu,
         fp = g_new0(BitProperty, 1);
         fp->w = w;
         fp->mask = mask;
-        object_property_add(OBJECT(cpu), prop_name, "bool",
-                            x86_cpu_get_bit_prop,
-                            x86_cpu_set_bit_prop,
-                            x86_cpu_release_bit_prop, fp);
+        object_class_property_add(oc, prop_name, "bool",
+                                  x86_cpu_get_bit_prop,
+                                  x86_cpu_set_bit_prop,
+                                  NULL, fp);
     }
 }
 
-static void x86_cpu_register_feature_bit_props(X86CPU *cpu,
+static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc,
                                                FeatureWord w,
                                                int bitnr)
 {
@@ -6880,7 +6874,7 @@  static void x86_cpu_register_feature_bit_props(X86CPU *cpu,
     /* aliases don't use "|" delimiters anymore, they are registered
      * manually using object_property_add_alias() */
     assert(!strchr(name, '|'));
-    x86_cpu_register_bit_prop(cpu, name, w, bitnr);
+    x86_cpu_register_bit_prop(xcc, name, w, bitnr);
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -6934,7 +6928,6 @@  static void x86_cpu_initfn(Object *obj)
     X86CPU *cpu = X86_CPU(obj);
     X86CPUClass *xcc = X86_CPU_GET_CLASS(obj);
     CPUX86State *env = &cpu->env;
-    FeatureWord w;
 
     env->nr_dies = 1;
     cpu_set_cpustate_pointers(cpu);
@@ -6946,14 +6939,6 @@  static void x86_cpu_initfn(Object *obj)
                         x86_cpu_get_feature_words,
                         NULL, NULL, (void *)cpu->filtered_features);
 
-    for (w = 0; w < FEATURE_WORDS; w++) {
-        int bitnr;
-
-        for (bitnr = 0; bitnr < 64; bitnr++) {
-            x86_cpu_register_feature_bit_props(cpu, w, bitnr);
-        }
-    }
-
     object_property_add_alias(obj, "sse3", obj, "pni");
     object_property_add_alias(obj, "pclmuldq", obj, "pclmulqdq");
     object_property_add_alias(obj, "sse4-1", obj, "sse4.1");
@@ -7239,6 +7224,7 @@  static void x86_cpu_common_class_init(ObjectClass *oc, void *data)
     X86CPUClass *xcc = X86_CPU_CLASS(oc);
     CPUClass *cc = CPU_CLASS(oc);
     DeviceClass *dc = DEVICE_CLASS(oc);
+    FeatureWord w;
 
     device_class_set_parent_realize(dc, x86_cpu_realizefn,
                                     &xcc->parent_realize);
@@ -7328,6 +7314,12 @@  static void x86_cpu_common_class_init(ObjectClass *oc, void *data)
                               x86_cpu_get_crash_info_qom, NULL, NULL, NULL);
 #endif
 
+    for (w = 0; w < FEATURE_WORDS; w++) {
+        int bitnr;
+        for (bitnr = 0; bitnr < 64; bitnr++) {
+            x86_cpu_register_feature_bit_props(xcc, w, bitnr);
+        }
+    }
 }
 
 static const TypeInfo x86_cpu_type_info = {