Patchwork RFC: double free in qmp_output_visitor_cleanup()

login
register
mail settings
Submitter Laszlo Ersek
Date March 16, 2012, 2:37 p.m.
Message ID <4F635016.7090703@redhat.com>
Download mbox | patch
Permalink /patch/147210/
State New
Headers show

Comments

Laszlo Ersek - March 16, 2012, 2:37 p.m.
Hi,

we seem to have found a double free in qmp_output_visitor_cleanup().
Please read the analysis below (that is based on commit e4e6aa14) and
please tell me if you'd like me to write a patch for solution (a) or
solution (b), as described at the bottom.

Paolo wrote a test case to trigger the problem, I'm attaching it.

Thank you,
Laszlo

-------- Original Message --------
Date: Fri, 16 Mar 2012 13:55:48 +0100
From: Laszlo Ersek <lersek@redhat.com>

[...]

Consider the following example: we want to put an int (I0) in a dict
(D1) in a dict (D0). D0 is the root element.

When we start D0 with qmp_output_start_struct(), the stack is empty, so

qmp_output_start_struct()
-> qmp_output_add_obj(D0)
   -> qmp_output_push_obj(D0)    [1]
-> qmp_output_push(D0)           [2]

[1] After this step completes, we have a single entry (entry#0) in the
stack, and it points to D0. refcnt(D0) equals 1 (still from
qdict_new()). Entry#0 is an *owning* reference to D0 ("aggregation",
contributes to refcounts).

[2] In this step we push D0 again, so entry#1 will point at D0 too.
refcnt(D0) still equals 1. Entry#1 is only a *navigation* reference
(weak reference etc.), it's only here so that we can continue adding
elements to D0 after we finished building a subtree below -- see step [6].

Then we start D1:

qmp_output_start_struct()
-> qmp_output_add(D1)
   -> qdict_put_obj(D0, D1)      [3]
-> qmp_output_push(D1)           [4]

[3] refcnt(D1) == 1, from qdict_new(). qdict_put_obj() transfers
ownership of D1 to D0, without changing refcnt(D1), so that still equals
1. At the end of this step, D1 is *owned* by D0.

[4] Here we push D1 to the stack, without changing refcounts. Entry#2 is
only a *navigation* reference to D1.

Then we add I0:
qmp_output_type_int()
-> qmp_output_add(I0)
   -> qdict_put_obj(D1, I0)      [5]

[5] In this step we transfer ownership of I0 to D1, utilizing the
entry#2 *navigation* link to find D1. refcnt(I0) == 1, from
qint_from_int(). The status quo is as follows:

    {Figure-1}

    entry#0 --- owns [1] --------------+
                                       |
                                       v

    entry#1 --- navigates-to [2] --->  D0 (refcnt=1)

                                       |
                                     owns [3]
                                       |
                                       v

    entry#2 --- navigates-to [4] --->  D1 (refcnt=1)

                                       |
                                     owns [5]
                                       |
                                       v

                                       I0 (refcnt=1)

Then we close D1 and return to D0:

qmp_output_end_struct()
-> qmp_output_pop()              [6]

[6] In this step we simply throw away (release) entry#2. We could
continue adding elements to D0 via the entry#1 navigation link, that was
set up in step [2].

Then we close D0 too:
qmp_output_end_struct()
-> qmp_output_pop()              [7]

We drop entry#1.

We *never* drop entry #0 during this. After we issued an equal number of
pushes and pops, that is, when "we're done", this is how it looks:

    {Figure-2}

    entry#0 --- owns [1] --------------+
                                       |
                                       v

                                       D0 (refcnt=1)

                                       |
                                     owns [3]
                                       |
                                       v

                                       D1 (refcnt=1)

                                       |
                                     owns [5]
                                       |
                                       v

                                       I0 (refcnt=1)


Now, consider running qmp_output_visitor_cleanup() on {Figure-2}. Since
there's only entry#0, we remove it, and issue

qobject_decref(D0)
-> qdict_destroy_obj(D0)
   -> qentry_destroy()
      -> qobject_decref(D1)
         -> qdict_destroy_obj(D1)
            -> qentry_destroy()
               -> qobject_decref(I0)
                  -> qint_destroy_obj(I0)
                     -> qemu_free(I0)
            -> qemu_free(D1)
   -> qemu_free(D0)

Everything's gone.

Consider instead running qmp_output_visitor_cleanup() on {Figure-1},
that is, in a state where there have been more qmp_output_start_struct()
calls than qmp_output_end_struct() calls:

    {Figure-1 again}

    entry#0 --- owns [1] --------------+
                                       |
                                       v

    entry#1 --- navigates-to [2] --->  D0 (refcnt=1)

                                       |
                                     owns [3]
                                       |
                                       v

    entry#2 --- navigates-to [4] --->  D1 (refcnt=1)

                                       |
                                     owns [5]
                                       |
                                       v

                                       I0 (refcnt=1)

qmp_output_visitor_cleanup()'s loop implements "popping order" (both
qmp_output_pop() and this loop starts removing entries at "tqh_first").

We issue a qobject_decref(D1) via entry#2. That kills I0 and D1 (in that
order) for good. Notice that the "owns [3]" link from D0 becomes a
dangling *owning* pointer!

Then we invoke qobject_decref(D0) via entry#1:

qobject_decref(D0)
-> qdict_destroy_obj(D0)
   -> qentry_destroy()
      -> qobject_decref(D1)
         -> "--D1->refcnt" -- BOOM


-- How to fix this:

(a) When pushing, increase the refcount (ie. turn stack entries #1, #2,
and so on from navigation links into contributing ownership links).

(b) Alternatively, in qmp_output_visitor_cleanup(), throw away all
entries except entry#0, and issue a decref on that one alone (because
it's an owning link).

Notice that qmp_ *input* _visitor_cleanup() does exactly (b): it throws
away the stack at once (the navigation links in the array), and issues a
decref on the separately maintained pointer that *owns* the root object.

-- Also, you don't have to have "three levels" to trigger the problem.
As long as you have at least one "layer of bracketing imbalance" when
you call cleanup, that is, you have at least entry#0 (owner) and entry#1
(navigator) on the stack, it will blow up. Entry#1 will kill completely
whatever entry#0 *thinks* it owns.

-- Why would you call qmp_output_visitor_cleanup() halfway into building
the object tree? Because you run into an error. For example,
qmp_output_type_enum() sets an error when an invalid "int" value is
passed to it (ie. one without a corresponding symbolic name). At that
point you probably want to "abort" your half-done object tree.

... In fact this example should provide a good test case. Try to
serialize/send a native (ie. C) struct hierarchy with a deeply nested
*invalid* enum. (In C you can easily set it to any value.)


I hope I'm not hallucinating things... :)

Thanks,
Laszlo
Michael Roth - March 16, 2012, 5 p.m.
On Fri, Mar 16, 2012 at 03:37:10PM +0100, Laszlo Ersek wrote:
> Hi,
> 
> we seem to have found a double free in qmp_output_visitor_cleanup().
> Please read the analysis below (that is based on commit e4e6aa14) and
> please tell me if you'd like me to write a patch for solution (a) or
> solution (b), as described at the bottom.
> 
> Paolo wrote a test case to trigger the problem, I'm attaching it.
> 
> Thank you,
> Laszlo
> 
> -------- Original Message --------
> Date: Fri, 16 Mar 2012 13:55:48 +0100
> From: Laszlo Ersek <lersek@redhat.com>
> 
> [...]
> 
> Consider the following example: we want to put an int (I0) in a dict
> (D1) in a dict (D0). D0 is the root element.
> 
> When we start D0 with qmp_output_start_struct(), the stack is empty, so
> 
> qmp_output_start_struct()
> -> qmp_output_add_obj(D0)
>    -> qmp_output_push_obj(D0)    [1]
> -> qmp_output_push(D0)           [2]
> 
> [1] After this step completes, we have a single entry (entry#0) in the
> stack, and it points to D0. refcnt(D0) equals 1 (still from
> qdict_new()). Entry#0 is an *owning* reference to D0 ("aggregation",
> contributes to refcounts).
> 
> [2] In this step we push D0 again, so entry#1 will point at D0 too.
> refcnt(D0) still equals 1. Entry#1 is only a *navigation* reference
> (weak reference etc.), it's only here so that we can continue adding
> elements to D0 after we finished building a subtree below -- see step [6].
> 
> Then we start D1:
> 
> qmp_output_start_struct()
> -> qmp_output_add(D1)
>    -> qdict_put_obj(D0, D1)      [3]
> -> qmp_output_push(D1)           [4]
> 
> [3] refcnt(D1) == 1, from qdict_new(). qdict_put_obj() transfers
> ownership of D1 to D0, without changing refcnt(D1), so that still equals
> 1. At the end of this step, D1 is *owned* by D0.
> 
> [4] Here we push D1 to the stack, without changing refcounts. Entry#2 is
> only a *navigation* reference to D1.
> 
> Then we add I0:
> qmp_output_type_int()
> -> qmp_output_add(I0)
>    -> qdict_put_obj(D1, I0)      [5]
> 
> [5] In this step we transfer ownership of I0 to D1, utilizing the
> entry#2 *navigation* link to find D1. refcnt(I0) == 1, from
> qint_from_int(). The status quo is as follows:
> 
>     {Figure-1}
> 
>     entry#0 --- owns [1] --------------+
>                                        |
>                                        v
> 
>     entry#1 --- navigates-to [2] --->  D0 (refcnt=1)
> 
>                                        |
>                                      owns [3]
>                                        |
>                                        v
> 
>     entry#2 --- navigates-to [4] --->  D1 (refcnt=1)
> 
>                                        |
>                                      owns [5]
>                                        |
>                                        v
> 
>                                        I0 (refcnt=1)
> 
> Then we close D1 and return to D0:
> 
> qmp_output_end_struct()
> -> qmp_output_pop()              [6]
> 
> [6] In this step we simply throw away (release) entry#2. We could
> continue adding elements to D0 via the entry#1 navigation link, that was
> set up in step [2].
> 
> Then we close D0 too:
> qmp_output_end_struct()
> -> qmp_output_pop()              [7]
> 
> We drop entry#1.
> 
> We *never* drop entry #0 during this. After we issued an equal number of
> pushes and pops, that is, when "we're done", this is how it looks:
> 
>     {Figure-2}
> 
>     entry#0 --- owns [1] --------------+
>                                        |
>                                        v
> 
>                                        D0 (refcnt=1)
> 
>                                        |
>                                      owns [3]
>                                        |
>                                        v
> 
>                                        D1 (refcnt=1)
> 
>                                        |
>                                      owns [5]
>                                        |
>                                        v
> 
>                                        I0 (refcnt=1)
> 
> 
> Now, consider running qmp_output_visitor_cleanup() on {Figure-2}. Since
> there's only entry#0, we remove it, and issue
> 
> qobject_decref(D0)
> -> qdict_destroy_obj(D0)
>    -> qentry_destroy()
>       -> qobject_decref(D1)
>          -> qdict_destroy_obj(D1)
>             -> qentry_destroy()
>                -> qobject_decref(I0)
>                   -> qint_destroy_obj(I0)
>                      -> qemu_free(I0)
>             -> qemu_free(D1)
>    -> qemu_free(D0)
> 
> Everything's gone.
> 
> Consider instead running qmp_output_visitor_cleanup() on {Figure-1},
> that is, in a state where there have been more qmp_output_start_struct()
> calls than qmp_output_end_struct() calls:
> 
>     {Figure-1 again}
> 
>     entry#0 --- owns [1] --------------+
>                                        |
>                                        v
> 
>     entry#1 --- navigates-to [2] --->  D0 (refcnt=1)
> 
>                                        |
>                                      owns [3]
>                                        |
>                                        v
> 
>     entry#2 --- navigates-to [4] --->  D1 (refcnt=1)
> 
>                                        |
>                                      owns [5]
>                                        |
>                                        v
> 
>                                        I0 (refcnt=1)
> 
> qmp_output_visitor_cleanup()'s loop implements "popping order" (both
> qmp_output_pop() and this loop starts removing entries at "tqh_first").
> 
> We issue a qobject_decref(D1) via entry#2. That kills I0 and D1 (in that
> order) for good. Notice that the "owns [3]" link from D0 becomes a
> dangling *owning* pointer!
> 
> Then we invoke qobject_decref(D0) via entry#1:
> 
> qobject_decref(D0)
> -> qdict_destroy_obj(D0)
>    -> qentry_destroy()
>       -> qobject_decref(D1)
>          -> "--D1->refcnt" -- BOOM
> 
> 
> -- How to fix this:
> 
> (a) When pushing, increase the refcount (ie. turn stack entries #1, #2,
> and so on from navigation links into contributing ownership links).
> 
> (b) Alternatively, in qmp_output_visitor_cleanup(), throw away all
> entries except entry#0, and issue a decref on that one alone (because
> it's an owning link).
> 
> Notice that qmp_ *input* _visitor_cleanup() does exactly (b): it throws
> away the stack at once (the navigation links in the array), and issues a
> decref on the separately maintained pointer that *owns* the root object.
> 
> -- Also, you don't have to have "three levels" to trigger the problem.
> As long as you have at least one "layer of bracketing imbalance" when
> you call cleanup, that is, you have at least entry#0 (owner) and entry#1
> (navigator) on the stack, it will blow up. Entry#1 will kill completely
> whatever entry#0 *thinks* it owns.
> 
> -- Why would you call qmp_output_visitor_cleanup() halfway into building
> the object tree? Because you run into an error. For example,
> qmp_output_type_enum() sets an error when an invalid "int" value is
> passed to it (ie. one without a corresponding symbolic name). At that
> point you probably want to "abort" your half-done object tree.
> 
> ... In fact this example should provide a good test case. Try to
> serialize/send a native (ie. C) struct hierarchy with a deeply nested
> *invalid* enum. (In C you can easily set it to any value.)
> 
> 
> I hope I'm not hallucinating things... :)

Nope, it's definitely there, just triggered it easily with simple test case. I
think we've been lucky so far with qmp commands not returning responses that
trigger this, but it's definitely a bug. I'm inclined to go with option
b), just need to look at it some more before sending out a patch. Thanks
for catching this!

> 
> Thanks,
> Laszlo

> diff --git a/qapi-schema-test.json b/qapi-schema-test.json
> index 8c7f9f7..9eae350 100644
> --- a/qapi-schema-test.json
> +++ b/qapi-schema-test.json
> @@ -8,7 +8,7 @@
> 
>  # for testing nested structs
>  { 'type': 'UserDefOne',
> -  'data': { 'integer': 'int', 'string': 'str' } }
> +  'data': { 'integer': 'int', 'string': 'str', '*enum1': 'EnumOne' } }
> 
>  { 'type': 'UserDefTwo',
>    'data': { 'string': 'str',
> diff --git a/test-qmp-output-visitor.c b/test-qmp-output-visitor.c
> index 78e5f2d..4e0ecd7 100644
> --- a/test-qmp-output-visitor.c
> +++ b/test-qmp-output-visitor.c
> @@ -274,6 +274,24 @@ static void test_visitor_out_struct_nested(TestOutputVisitorData *data,
>      qapi_free_UserDefNested(ud2);
>  }
> 
> +static void test_visitor_out_struct_errors(TestOutputVisitorData *data,
> +                                           const void *unused)
> +{
> +    EnumOne bad_values[] = { ENUM_ONE_MAX, -1 };
> +    UserDefOne u = { 0 }, *pu = &u;
> +    Error *errp;
> +    int i;
> +
> +    for (i = 0; i < ARRAY_SIZE(bad_values) ; i++) {
> +        errp = NULL;
> +        u.has_enum1 = true;
> +        u.enum1 = bad_values[i];
> +        visit_type_UserDefOne(data->ov, &pu, "unused", &errp);
> +        g_assert(error_is_set(&errp) == true);
> +        error_free(errp);
> +    }
> +}
> +
>  typedef struct TestStructList
>  {
>      TestStruct *value;
> @@ -444,6 +462,8 @@ int main(int argc, char **argv)
>                              &out_visitor_data, test_visitor_out_struct);
>      output_visitor_test_add("/visitor/output/struct-nested",
>                              &out_visitor_data, test_visitor_out_struct_nested);
> +    output_visitor_test_add("/visitor/output/struct-errors",
> +                            &out_visitor_data, test_visitor_out_struct_errors);
>      output_visitor_test_add("/visitor/output/list",
>                              &out_visitor_data, test_visitor_out_list);
>      output_visitor_test_add("/visitor/output/list-qapi-free",

Patch

diff --git a/qapi-schema-test.json b/qapi-schema-test.json
index 8c7f9f7..9eae350 100644
--- a/qapi-schema-test.json
+++ b/qapi-schema-test.json
@@ -8,7 +8,7 @@ 
 
 # for testing nested structs
 { 'type': 'UserDefOne',
-  'data': { 'integer': 'int', 'string': 'str' } }
+  'data': { 'integer': 'int', 'string': 'str', '*enum1': 'EnumOne' } }
 
 { 'type': 'UserDefTwo',
   'data': { 'string': 'str',
diff --git a/test-qmp-output-visitor.c b/test-qmp-output-visitor.c
index 78e5f2d..4e0ecd7 100644
--- a/test-qmp-output-visitor.c
+++ b/test-qmp-output-visitor.c
@@ -274,6 +274,24 @@  static void test_visitor_out_struct_nested(TestOutputVisitorData *data,
     qapi_free_UserDefNested(ud2);
 }
 
+static void test_visitor_out_struct_errors(TestOutputVisitorData *data,
+                                           const void *unused)
+{
+    EnumOne bad_values[] = { ENUM_ONE_MAX, -1 };
+    UserDefOne u = { 0 }, *pu = &u;
+    Error *errp;
+    int i;
+
+    for (i = 0; i < ARRAY_SIZE(bad_values) ; i++) {
+        errp = NULL;
+        u.has_enum1 = true;
+        u.enum1 = bad_values[i];
+        visit_type_UserDefOne(data->ov, &pu, "unused", &errp);
+        g_assert(error_is_set(&errp) == true);
+        error_free(errp);
+    }
+}
+
 typedef struct TestStructList
 {
     TestStruct *value;
@@ -444,6 +462,8 @@  int main(int argc, char **argv)
                             &out_visitor_data, test_visitor_out_struct);
     output_visitor_test_add("/visitor/output/struct-nested",
                             &out_visitor_data, test_visitor_out_struct_nested);
+    output_visitor_test_add("/visitor/output/struct-errors",
+                            &out_visitor_data, test_visitor_out_struct_errors);
     output_visitor_test_add("/visitor/output/list",
                             &out_visitor_data, test_visitor_out_list);
     output_visitor_test_add("/visitor/output/list-qapi-free",