diff mbox series

[v6,5/5] accel: abort if we fail to load the accelerator plugin

Message ID 20220923232104.28420-6-cfontana@suse.de
State New
Headers show
Series improve error handling for module load | expand

Commit Message

Claudio Fontana Sept. 23, 2022, 11:21 p.m. UTC
if QEMU is configured with modules enabled, it is possible that the
load of an accelerator module will fail.
Abort in this case, relying on module_object_class_by_name to report
the specific load error if any.

Signed-off-by: Claudio Fontana <cfontana@suse.de>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/accel-softmmu.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Claudio Fontana Sept. 26, 2022, 7:58 a.m. UTC | #1
On 9/24/22 14:35, Philippe Mathieu-Daudé via wrote:
> On 24/9/22 01:21, Claudio Fontana wrote:
>> if QEMU is configured with modules enabled, it is possible that the
>> load of an accelerator module will fail.
>> Abort in this case, relying on module_object_class_by_name to report
>> the specific load error if any.
>>
>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>>   accel/accel-softmmu.c | 8 +++++++-
>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
>> index 67276e4f52..9fa4849f2c 100644
>> --- a/accel/accel-softmmu.c
>> +++ b/accel/accel-softmmu.c
>> @@ -66,6 +66,7 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>   {
>>       const char *ac_name;
>>       char *ops_name;
>> +    ObjectClass *oc;
>>       AccelOpsClass *ops;
>>   
>>       ac_name = object_class_get_name(OBJECT_CLASS(ac));
>> @@ -73,8 +74,13 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>   
>>       ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
>>       ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
>> +    oc = module_object_class_by_name(ops_name);
>> +    if (!oc) {
>> +        error_report("fatal: could not load module for type '%s'", ops_name);
>> +        abort();
> 
> I still think a coredump won't help at all to figure the problem here: a 

I can change this from abort to exit(1), the issue I am seeing is, usually when we fail to create or initialize objects
we seem to be using abort(), the most prominent examples are in qom/object.c:

static TypeImpl *type_new(const TypeInfo *info)
{
    TypeImpl *ti = g_malloc0(sizeof(*ti));
    int i;

    g_assert(info->name != NULL);

    if (type_table_lookup(info->name) != NULL) {
        fprintf(stderr, "Registering `%s' which already exists\n", info->name);
        abort();
    }

...

void object_initialize(void *data, size_t size, const char *typename)
{
    TypeImpl *type = type_get_by_name(typename);

#ifdef CONFIG_MODULES
    if (!type) {
        Error *local_err = NULL;
        int rv = module_load_qom(typename, &local_err);
        if (rv > 0) {
            type = type_get_by_name(typename);
        } else if (rv < 0) {
            error_report_err(local_err);
        }
    }
#endif
    if (!type) {
        error_report("missing object type '%s'", typename);
        abort();
    }

    object_initialize_with_type(data, size, type);
}


Do you propose to change only the assert in accel_init_ops_interfaces to exit(1)?

Or the other case as well in the series? (ie hw/core/qdev.c qdev_new() ?)

Do you propose to change this consistently through the codebase including the object.c snippets above?


> module is missing, we know its name. Anyhow I don't mind much, and this
> can be cleaned later, so:

Sure this could be fixed later with a series that tries to use exit() vs abort() consistently throughout the codebase when initializing and creating objects.

> 
> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> 

Thanks!

Claudio

>> +    }
>>       g_free(ops_name);
>> -
>> +    ops = ACCEL_OPS_CLASS(oc);
>>       /*
>>        * all accelerators need to define ops, providing at least a mandatory
>>        * non-NULL create_vcpu_thread operation.
> 
>
Kevin Wolf Sept. 26, 2022, 10:56 a.m. UTC | #2
Am 26.09.2022 um 09:58 hat Claudio Fontana geschrieben:
> On 9/24/22 14:35, Philippe Mathieu-Daudé via wrote:
> > On 24/9/22 01:21, Claudio Fontana wrote:
> >> if QEMU is configured with modules enabled, it is possible that the
> >> load of an accelerator module will fail.
> >> Abort in this case, relying on module_object_class_by_name to report
> >> the specific load error if any.
> >>
> >> Signed-off-by: Claudio Fontana <cfontana@suse.de>
> >> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> >> ---
> >>   accel/accel-softmmu.c | 8 +++++++-
> >>   1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
> >> index 67276e4f52..9fa4849f2c 100644
> >> --- a/accel/accel-softmmu.c
> >> +++ b/accel/accel-softmmu.c
> >> @@ -66,6 +66,7 @@ void accel_init_ops_interfaces(AccelClass *ac)
> >>   {
> >>       const char *ac_name;
> >>       char *ops_name;
> >> +    ObjectClass *oc;
> >>       AccelOpsClass *ops;
> >>   
> >>       ac_name = object_class_get_name(OBJECT_CLASS(ac));
> >> @@ -73,8 +74,13 @@ void accel_init_ops_interfaces(AccelClass *ac)
> >>   
> >>       ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
> >>       ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
> >> +    oc = module_object_class_by_name(ops_name);
> >> +    if (!oc) {
> >> +        error_report("fatal: could not load module for type '%s'", ops_name);
> >> +        abort();
> > 
> > I still think a coredump won't help at all to figure the problem here: a 
> 
> I can change this from abort to exit(1), the issue I am seeing is, usually when we fail to create or initialize objects
> we seem to be using abort(), the most prominent examples are in qom/object.c:
> 
> static TypeImpl *type_new(const TypeInfo *info)
> {
>     TypeImpl *ti = g_malloc0(sizeof(*ti));
>     int i;
> 
>     g_assert(info->name != NULL);
> 
>     if (type_table_lookup(info->name) != NULL) {
>         fprintf(stderr, "Registering `%s' which already exists\n", info->name);
>         abort();
>     }
> 
> ...
> 
> void object_initialize(void *data, size_t size, const char *typename)
> {
>     TypeImpl *type = type_get_by_name(typename);
> 
> #ifdef CONFIG_MODULES
>     if (!type) {
>         Error *local_err = NULL;
>         int rv = module_load_qom(typename, &local_err);
>         if (rv > 0) {
>             type = type_get_by_name(typename);
>         } else if (rv < 0) {
>             error_report_err(local_err);
>         }
>     }
> #endif
>     if (!type) {
>         error_report("missing object type '%s'", typename);
>         abort();
>     }
> 
>     object_initialize_with_type(data, size, type);
> }
> 
> 
> Do you propose to change only the assert in accel_init_ops_interfaces
> to exit(1)?
> 
> Or the other case as well in the series? (ie hw/core/qdev.c qdev_new()
> ?)
> 
> Do you propose to change this consistently through the codebase
> including the object.c snippets above?

The difference with the snippets above (in the non-module case) is that
calling object_new() with a type that doesn't exist is a bug, it's an
programming error. Calling type_new() twice for the same TypeInfo or for
two TypeInfos with the same name is a programming error, too. abort() is
correct for situations that should never happen in a bug free QEMU.

Not being able to load a module is generally not a bug in QEMU, it's an
error of external origin. So here abort() is not appropriate.

The CONFIG_MODULES code in object_initialize() is problematic because it
doesn't have a way to deal with an error case that can happen without a
bug in QEMU. Without changing the prototype of the function to actually
allow error returns (which I suspect might be a very invasive change),
maybe the best approach is to just make it a fatal error and leave the
code mostly as it is in current master:

#ifdef CONFIG_MODULES
    if (!type) {
        /* Assuming that module_load_qom_one() returns an error if the
         * module doesn't exist */
        module_load_qom_one(typename, &error_fatal);
        type = type_get_by_name(typename);
    }
#endif
    if (!type) {
        error_report("missing object type '%s'", typename);
        abort();
    }

    object_initialize_with_type(data, size, type);

This makes it print an error message and exit(). Which is honestly not
great during runtime because it doesn't properly shut down QEMU, let
alone just fail the operation and keep running, but at least slightly
better than abort().

> > module is missing, we know its name. Anyhow I don't mind much, and this
> > can be cleaned later, so:
> 
> Sure this could be fixed later with a series that tries to use exit()
> vs abort() consistently throughout the codebase when initializing and
> creating objects.

This should mean consistently distinguishing programming errors (i.e.
QEMU bugs) from errors of external origin.

Kevin
Claudio Fontana Sept. 26, 2022, 11:21 a.m. UTC | #3
On 9/26/22 12:56, Kevin Wolf wrote:
> Am 26.09.2022 um 09:58 hat Claudio Fontana geschrieben:
>> On 9/24/22 14:35, Philippe Mathieu-Daudé via wrote:
>>> On 24/9/22 01:21, Claudio Fontana wrote:
>>>> if QEMU is configured with modules enabled, it is possible that the
>>>> load of an accelerator module will fail.
>>>> Abort in this case, relying on module_object_class_by_name to report
>>>> the specific load error if any.
>>>>
>>>> Signed-off-by: Claudio Fontana <cfontana@suse.de>
>>>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>>> ---
>>>>   accel/accel-softmmu.c | 8 +++++++-
>>>>   1 file changed, 7 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
>>>> index 67276e4f52..9fa4849f2c 100644
>>>> --- a/accel/accel-softmmu.c
>>>> +++ b/accel/accel-softmmu.c
>>>> @@ -66,6 +66,7 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>>>   {
>>>>       const char *ac_name;
>>>>       char *ops_name;
>>>> +    ObjectClass *oc;
>>>>       AccelOpsClass *ops;
>>>>   
>>>>       ac_name = object_class_get_name(OBJECT_CLASS(ac));
>>>> @@ -73,8 +74,13 @@ void accel_init_ops_interfaces(AccelClass *ac)
>>>>   
>>>>       ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
>>>>       ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
>>>> +    oc = module_object_class_by_name(ops_name);
>>>> +    if (!oc) {
>>>> +        error_report("fatal: could not load module for type '%s'", ops_name);
>>>> +        abort();
>>>
>>> I still think a coredump won't help at all to figure the problem here: a 
>>
>> I can change this from abort to exit(1), the issue I am seeing is, usually when we fail to create or initialize objects
>> we seem to be using abort(), the most prominent examples are in qom/object.c:
>>
>> static TypeImpl *type_new(const TypeInfo *info)
>> {
>>     TypeImpl *ti = g_malloc0(sizeof(*ti));
>>     int i;
>>
>>     g_assert(info->name != NULL);
>>
>>     if (type_table_lookup(info->name) != NULL) {
>>         fprintf(stderr, "Registering `%s' which already exists\n", info->name);
>>         abort();
>>     }
>>
>> ...
>>
>> void object_initialize(void *data, size_t size, const char *typename)
>> {
>>     TypeImpl *type = type_get_by_name(typename);
>>
>> #ifdef CONFIG_MODULES
>>     if (!type) {
>>         Error *local_err = NULL;
>>         int rv = module_load_qom(typename, &local_err);
>>         if (rv > 0) {
>>             type = type_get_by_name(typename);
>>         } else if (rv < 0) {
>>             error_report_err(local_err);
>>         }
>>     }
>> #endif
>>     if (!type) {
>>         error_report("missing object type '%s'", typename);
>>         abort();
>>     }
>>
>>     object_initialize_with_type(data, size, type);
>> }
>>
>>
>> Do you propose to change only the assert in accel_init_ops_interfaces
>> to exit(1)?
>>
>> Or the other case as well in the series? (ie hw/core/qdev.c qdev_new()
>> ?)
>>
>> Do you propose to change this consistently through the codebase
>> including the object.c snippets above?
> 
> The difference with the snippets above (in the non-module case) is that
> calling object_new() with a type that doesn't exist is a bug, it's an
I see where you are going, however in the object initialization case,
we still have the possibility that the type is available only after module load.

So according to your differentiation, we would have to exit() in the module case, and abort() if modules are disabled.


> programming error. Calling type_new() twice for the same TypeInfo or for
> two TypeInfos with the same name is a programming error, too. abort() is
> correct for situations that should never happen in a bug free QEMU.
> 
> Not being able to load a module is generally not a bug in QEMU, it's an
> error of external origin. So here abort() is not appropriate.
> 
> The CONFIG_MODULES code in object_initialize() is problematic because it
> doesn't have a way to deal with an error case that can happen without a
> bug in QEMU. Without changing the prototype of the function to actually
> allow error returns (which I suspect might be a very invasive change),
> maybe the best approach is to just make it a fatal error and leave the
> code mostly as it is in current master:
> 
> #ifdef CONFIG_MODULES
>     if (!type) {
>         /* Assuming that module_load_qom_one() returns an error if the
>          * module doesn't exist */
>         module_load_qom_one(typename, &error_fatal);
>         type = type_get_by_name(typename);
>     }
> #endif
>     if (!type) {
>         error_report("missing object type '%s'", typename);
>         abort();
>     }
> 
>     object_initialize_with_type(data, size, type);
> 
> This makes it print an error message and exit(). Which is honestly not
> great during runtime because it doesn't properly shut down QEMU, let
> alone just fail the operation and keep running, but at least slightly
> better than abort().

Ok so the core of the issue is, if CONFIG_MODULES we want to exit() and not abort() if the type is not available.

> 
>>> module is missing, we know its name. Anyhow I don't mind much, and this
>>> can be cleaned later, so:
>>
>> Sure this could be fixed later with a series that tries to use exit()
>> vs abort() consistently throughout the codebase when initializing and
>> creating objects.
> 
> This should mean consistently distinguishing programming errors (i.e.
> QEMU bugs) from errors of external origin.
> 
> Kevin
> 

Thanks,

I'll have a new pass with this in mind.

Claudio
diff mbox series

Patch

diff --git a/accel/accel-softmmu.c b/accel/accel-softmmu.c
index 67276e4f52..9fa4849f2c 100644
--- a/accel/accel-softmmu.c
+++ b/accel/accel-softmmu.c
@@ -66,6 +66,7 @@  void accel_init_ops_interfaces(AccelClass *ac)
 {
     const char *ac_name;
     char *ops_name;
+    ObjectClass *oc;
     AccelOpsClass *ops;
 
     ac_name = object_class_get_name(OBJECT_CLASS(ac));
@@ -73,8 +74,13 @@  void accel_init_ops_interfaces(AccelClass *ac)
 
     ops_name = g_strdup_printf("%s" ACCEL_OPS_SUFFIX, ac_name);
     ops = ACCEL_OPS_CLASS(module_object_class_by_name(ops_name));
+    oc = module_object_class_by_name(ops_name);
+    if (!oc) {
+        error_report("fatal: could not load module for type '%s'", ops_name);
+        abort();
+    }
     g_free(ops_name);
-
+    ops = ACCEL_OPS_CLASS(oc);
     /*
      * all accelerators need to define ops, providing at least a mandatory
      * non-NULL create_vcpu_thread operation.