Message ID | 20201111183823.283752-5-ehabkost@redhat.com |
---|---|
State | New |
Headers | show |
Series | qom: Convert some properties to class properties | expand |
Eduardo Habkost <ehabkost@redhat.com> writes: > Class properties make QOM introspection simpler and easier, as > they don't require an object to be instantiated. > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> This is significantly more than just "simpler and easier". The other day, I played with the QMP core to reduce its appetite for malloc. I came up with patches that should approximately halve it, and felt quite pleased with myself. I looked for a simple test to demonstrate the effect. Something with plenty of output. Hmm, why not query-cpu-definitions, it produces about 32KiB. Instrument, instrument, run, ... whaaaat?!? My patches save some 7000 allocations (670 KiB total), roughly matching my expectations. Turns out this is a drop in the bucket: query-cpu-definitions still takes some 180,000 allocations (almost 12 MiB total). They're hiding behind this line in qmp_query_cpu_definitions(): g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list); The line takes more than a quarter second for me. Hogging the main loop for a quarter second is not good. Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9 MiB total). It's a smaller pig now.
On Tue, Dec 15, 2020 at 03:11:06PM +0100, Markus Armbruster wrote: > Eduardo Habkost <ehabkost@redhat.com> writes: > > > Class properties make QOM introspection simpler and easier, as > > they don't require an object to be instantiated. > > > > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> > > This is significantly more than just "simpler and easier". > > The other day, I played with the QMP core to reduce its appetite for > malloc. I came up with patches that should approximately halve it, and > felt quite pleased with myself. I looked for a simple test to > demonstrate the effect. Something with plenty of output. Hmm, why not > query-cpu-definitions, it produces about 32KiB. Instrument, instrument, > run, ... whaaaat?!? > > My patches save some 7000 allocations (670 KiB total), roughly matching > my expectations. > > Turns out this is a drop in the bucket: query-cpu-definitions still > takes some 180,000 allocations (almost 12 MiB total). They're hiding > behind this line in qmp_query_cpu_definitions(): > > g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list); > > The line takes more than a quarter second for me. > > Hogging the main loop for a quarter second is not good. Wow! > > Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9 > MiB total). It's a smaller pig now. Thanks for investigating this! I'll amend the commit message with: Also, the hundreds of instance properties were having an impact on QMP commands that create temporary CPU objects. On my machine, run time of qmp_query_cpu_definitions() patch changed from ~200ms to ~16ms after applying this patch. Numbers were obtained with: $ sudo perf probe -v -x ./qemu-system-x86_64 -a 'qmp_query_cpu_definitions%return' -a 'qmp_query_cpu_definitions' $ echo -e '{"execute": "qmp_capabilities"}\n{"execute": "query-cpu-definitions"}\n{"execute": "quit"}' | sudo perf trace -e 'probe_qemu:*' ./qemu-system-x86_64 -S -display none -qmp stdio > /dev/null Before: 0.000 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94851767056275) 204.072 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94851767056275, __probe_ret_ip: 94851768499362) After: 0.000 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94554723186579) 16.445 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94554723186579, __probe_ret_ip: 94554724629631)
Eduardo Habkost <ehabkost@redhat.com> writes: > On Tue, Dec 15, 2020 at 03:11:06PM +0100, Markus Armbruster wrote: >> Eduardo Habkost <ehabkost@redhat.com> writes: >> >> > Class properties make QOM introspection simpler and easier, as >> > they don't require an object to be instantiated. >> > >> > Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> >> >> This is significantly more than just "simpler and easier". >> >> The other day, I played with the QMP core to reduce its appetite for >> malloc. I came up with patches that should approximately halve it, and >> felt quite pleased with myself. I looked for a simple test to >> demonstrate the effect. Something with plenty of output. Hmm, why not >> query-cpu-definitions, it produces about 32KiB. Instrument, instrument, >> run, ... whaaaat?!? >> >> My patches save some 7000 allocations (670 KiB total), roughly matching >> my expectations. >> >> Turns out this is a drop in the bucket: query-cpu-definitions still >> takes some 180,000 allocations (almost 12 MiB total). They're hiding >> behind this line in qmp_query_cpu_definitions(): >> >> g_slist_foreach(list, x86_cpu_definition_entry, &cpu_list); >> >> The line takes more than a quarter second for me. >> >> Hogging the main loop for a quarter second is not good. > > Wow! > >> >> Eduardo's patch reduces run time to 0.02 seconds (40,000 allocations, 9 >> MiB total). It's a smaller pig now. > > Thanks for investigating this! I'll amend the commit message with: > > Also, the hundreds of instance properties were having an impact > on QMP commands that create temporary CPU objects. On my > machine, run time of qmp_query_cpu_definitions() patch changed > from ~200ms to ~16ms after applying this patch. > > Numbers were obtained with: > > $ sudo perf probe -v -x ./qemu-system-x86_64 -a 'qmp_query_cpu_definitions%return' -a 'qmp_query_cpu_definitions' > $ echo -e '{"execute": "qmp_capabilities"}\n{"execute": "query-cpu-definitions"}\n{"execute": "quit"}' | sudo perf trace -e 'probe_qemu:*' ./qemu-system-x86_64 -S -display none -qmp stdio > /dev/null > > Before: > > 0.000 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94851767056275) > 204.072 qemu-system-x8/3103211 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94851767056275, __probe_ret_ip: 94851768499362) > > After: > > 0.000 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions(__probe_ip: 94554723186579) > 16.445 qemu-system-x8/3105969 probe_qemu:qmp_query_cpu_definitions__return(__probe_func: 94554723186579, __probe_ret_ip: 94554724629631) Looks good, thanks!
diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 0d8606958e..dbba1151f2 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -6823,29 +6823,23 @@ static void x86_cpu_set_bit_prop(Object *obj, Visitor *v, const char *name, cpu->env.user_features[fp->w] |= fp->mask; } -static void x86_cpu_release_bit_prop(Object *obj, const char *name, - void *opaque) -{ - BitProperty *prop = opaque; - g_free(prop); -} - /* Register a boolean property to get/set a single bit in a uint32_t field. * * The same property name can be registered multiple times to make it affect * multiple bits in the same FeatureWord. In that case, the getter will return * true only if all bits are set. */ -static void x86_cpu_register_bit_prop(X86CPU *cpu, +static void x86_cpu_register_bit_prop(X86CPUClass *xcc, const char *prop_name, FeatureWord w, int bitnr) { + ObjectClass *oc = OBJECT_CLASS(xcc); BitProperty *fp; ObjectProperty *op; uint64_t mask = (1ULL << bitnr); - op = object_property_find(OBJECT(cpu), prop_name); + op = object_class_property_find(oc, prop_name); if (op) { fp = op->opaque; assert(fp->w == w); @@ -6854,14 +6848,14 @@ static void x86_cpu_register_bit_prop(X86CPU *cpu, fp = g_new0(BitProperty, 1); fp->w = w; fp->mask = mask; - object_property_add(OBJECT(cpu), prop_name, "bool", - x86_cpu_get_bit_prop, - x86_cpu_set_bit_prop, - x86_cpu_release_bit_prop, fp); + object_class_property_add(oc, prop_name, "bool", + x86_cpu_get_bit_prop, + x86_cpu_set_bit_prop, + NULL, fp); } } -static void x86_cpu_register_feature_bit_props(X86CPU *cpu, +static void x86_cpu_register_feature_bit_props(X86CPUClass *xcc, FeatureWord w, int bitnr) { @@ -6880,7 +6874,7 @@ static void x86_cpu_register_feature_bit_props(X86CPU *cpu, /* aliases don't use "|" delimiters anymore, they are registered * manually using object_property_add_alias() */ assert(!strchr(name, '|')); - x86_cpu_register_bit_prop(cpu, name, w, bitnr); + x86_cpu_register_bit_prop(xcc, name, w, bitnr); } #if !defined(CONFIG_USER_ONLY) @@ -6934,7 +6928,6 @@ static void x86_cpu_initfn(Object *obj) X86CPU *cpu = X86_CPU(obj); X86CPUClass *xcc = X86_CPU_GET_CLASS(obj); CPUX86State *env = &cpu->env; - FeatureWord w; env->nr_dies = 1; cpu_set_cpustate_pointers(cpu); @@ -6946,14 +6939,6 @@ static void x86_cpu_initfn(Object *obj) x86_cpu_get_feature_words, NULL, NULL, (void *)cpu->filtered_features); - for (w = 0; w < FEATURE_WORDS; w++) { - int bitnr; - - for (bitnr = 0; bitnr < 64; bitnr++) { - x86_cpu_register_feature_bit_props(cpu, w, bitnr); - } - } - object_property_add_alias(obj, "sse3", obj, "pni"); object_property_add_alias(obj, "pclmuldq", obj, "pclmulqdq"); object_property_add_alias(obj, "sse4-1", obj, "sse4.1"); @@ -7239,6 +7224,7 @@ static void x86_cpu_common_class_init(ObjectClass *oc, void *data) X86CPUClass *xcc = X86_CPU_CLASS(oc); CPUClass *cc = CPU_CLASS(oc); DeviceClass *dc = DEVICE_CLASS(oc); + FeatureWord w; device_class_set_parent_realize(dc, x86_cpu_realizefn, &xcc->parent_realize); @@ -7328,6 +7314,12 @@ static void x86_cpu_common_class_init(ObjectClass *oc, void *data) x86_cpu_get_crash_info_qom, NULL, NULL, NULL); #endif + for (w = 0; w < FEATURE_WORDS; w++) { + int bitnr; + for (bitnr = 0; bitnr < 64; bitnr++) { + x86_cpu_register_feature_bit_props(xcc, w, bitnr); + } + } } static const TypeInfo x86_cpu_type_info = {
Class properties make QOM introspection simpler and easier, as they don't require an object to be instantiated. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> --- Changes v1 -> v2: * Remove x86_cpu_release_bit_prop() function, as we will allocate BitProperty only once and never free the class property list --- Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: qemu-devel@nongnu.org --- target/i386/cpu.c | 40 ++++++++++++++++------------------------ 1 file changed, 16 insertions(+), 24 deletions(-)