diff mbox

[v6,20/26] qapi: Fix output visitor to return qnull() instead of NULL

Message ID 55F3411B.5070905@redhat.com
State New
Headers show

Commit Message

Eric Blake Sept. 11, 2015, 9:01 p.m. UTC
On 09/11/2015 02:43 PM, Eric Blake wrote:

>> +++ b/tests/test-qmp-output-visitor.c
>> @@ -485,7 +485,7 @@ static void test_visitor_out_empty(TestOutputVisitorData *data,
>>      QObject *arg;
>>  
>>      arg = qmp_output_get_qobject(data->qov);
>> -    g_assert(!arg);
>> +    g_assert(qobject_type(arg) == QTYPE_QNULL);
>>  }
> 
> Missing
>  qobject_decref(arg);
> 
> Ultimately, the global qnull_ starts life with refcnt 1, and after this
> test should be back to 1.  But it is coming back as 3, so even with a
> qobject_decref, that would get it back down to 2.  So we are leaking a
> reference to qnull somewhere.
> 
> I'm still investigating, and may be able to find the patch

Squash this in, and you can have:
Reviewed-by: Eric Blake <eblake@redhat.com>

 static void init_native_list(UserDefNativeListUnion *cvalue)

Comments

Markus Armbruster Sept. 12, 2015, 8:10 a.m. UTC | #1
Eric Blake <eblake@redhat.com> writes:

> On 09/11/2015 02:43 PM, Eric Blake wrote:
>
>>> +++ b/tests/test-qmp-output-visitor.c
>>> @@ -485,7 +485,7 @@ static void test_visitor_out_empty(TestOutputVisitorData *data,
>>>      QObject *arg;
>>>  
>>>      arg = qmp_output_get_qobject(data->qov);
>>> -    g_assert(!arg);
>>> +    g_assert(qobject_type(arg) == QTYPE_QNULL);
>>>  }
>> 
>> Missing
>>  qobject_decref(arg);
>> 
>> Ultimately, the global qnull_ starts life with refcnt 1, and after this
>> test should be back to 1.  But it is coming back as 3, so even with a
>> qobject_decref, that would get it back down to 2.  So we are leaking a
>> reference to qnull somewhere.

Relatively harmless, because the qnull_ singleton is static.  Worth
fixing anyway, of course.

>> I'm still investigating, and may be able to find the patch
>
> Squash this in, and you can have:
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> diff --git i/qapi/qmp-output-visitor.c w/qapi/qmp-output-visitor.c
> index 2d6083e..b96849e 100644
> --- i/qapi/qmp-output-visitor.c
> +++ w/qapi/qmp-output-visitor.c
> @@ -66,11 +66,7 @@ static QObject *qmp_output_first(QmpOutputVisitor *qov)
>  {
>      QStackEntry *e = QTAILQ_LAST(&qov->stack, QStack);
>
> -    if (!e) {
> -        return qnull();
> -    }
> -
> -    return e->value;
> +    return e ? e->value : NULL;
>  }
>
>  static QObject *qmp_output_last(QmpOutputVisitor *qov)
> @@ -190,6 +186,8 @@ QObject *qmp_output_get_qobject(QmpOutputVisitor *qov)
>      QObject *obj = qmp_output_first(qov);
>      if (obj) {
>          qobject_incref(obj);
> +    } else {
> +        obj = qnull();
>      }
>      return obj;
>  }

The visitor code is disgustingly ambiguous / confused / confusing about
NULL.  The whole thing feels like it was hacked up in a rush.  We've
been paying interest for it not having a written contract ever since.

I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely.
Also because that's where I found the FIXME :)

You lift it into one of two callers.  Impact:

* qmp_output_get_qobject()

  - master: return NULL when !e || !e->value

  - my patch: return qnull() when !e
              return NULL when !e->value

  - your patch: return qnull() when !e || !e->value

* qmp_output_visitor_cleanup()

  - master: root = NULL when when !e || !e->value

  - my patch: root = qnull() when !e
              root = NULL when !e->value

  - your patch: root = NULL when when !e || !e->value

where e = QTAILQ_LAST(&qov->stack, QStack)

Questions:

* How can e become NULL?

  The only place that pushes onto &qov->stack appears to be
  qmp_output_push_obj().  Can obviously push e with !e->value, but can't
  push null e.

* What about qmp_output_last()?

  Why does qmp_output_first() check for !e, but not qmp_output_last()?

  My patch makes them less symmetric (bad smell).  Yours makes them more
  symmetric, but not quite.

* How does this contraption work?

> diff --git i/tests/test-qmp-output-visitor.c
> w/tests/test-qmp-output-visitor.c
> index 256bffd..8bd48f4 100644
> --- i/tests/test-qmp-output-visitor.c
> +++ w/tests/test-qmp-output-visitor.c
> @@ -486,6 +486,9 @@ static void
> test_visitor_out_empty(TestOutputVisitorData *data,
>
>      arg = qmp_output_get_qobject(data->qov);
>      g_assert(qobject_type(arg) == QTYPE_QNULL);
> +    /* Check that qnull reference counting is sane */
> +    g_assert(arg->refcnt == 2);
> +    qobject_decref(arg);
>  }
>
>  static void init_native_list(UserDefNativeListUnion *cvalue)
Eric Blake Sept. 12, 2015, 12:16 p.m. UTC | #2
On 09/12/2015 02:10 AM, Markus Armbruster wrote:

> Relatively harmless, because the qnull_ singleton is static.  Worth
> fixing anyway, of course.
> 
>>> I'm still investigating, and may be able to find the patch
>>
>> Squash this in, and you can have:
>> Reviewed-by: Eric Blake <eblake@redhat.com>

Well, your further questions are spot on; my squash proposal isn't quite
right.


> I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely.
> Also because that's where I found the FIXME :)
> 
> You lift it into one of two callers.  Impact:
> 
> * qmp_output_get_qobject()
> 
>   - master: return NULL when !e || !e->value
> 
>   - my patch: return qnull() when !e
>               return NULL when !e->value
> 
>   - your patch: return qnull() when !e || !e->value

The leak in your patch was that qnull() increments the count, then
qmp_output_get_object() also increments the count.

I guess I'll have to investigate where e->value can be set to NULL to
see if my idea was safe, or if we'd have to do something even different.

If this were the only caller, then I guess we could always do something
like this in qmp_output_first(), leaving the possibility of returning
NULL for e->value:

if (!e) {
    obj = qnull();
    qobject_decref(obj); /* Caller will again increment refcount */
    return obj;
}

But it's not the only caller.

> 
> * qmp_output_visitor_cleanup()
> 
>   - master: root = NULL when when !e || !e->value
> 
>   - my patch: root = qnull() when !e
>               root = NULL when !e->value
> 
>   - your patch: root = NULL when when !e || !e->value

And calls qobject_decref(root), but that is safe on NULL.

Here, your patch ends up with a net 0 refcnt on qnull() (incremented in
qmp_output_first(), decremented in the cleanup), but my idea above to
pre-decrement would be wrong.

Another option would be to keep your patch to qmp_output_first(), but
then fix qmp_output_get_object() to special case it it has an instance
of QNull (no refcnt change) vs. anything else (qobject_incref).  But
that feels gross.

> 
> where e = QTAILQ_LAST(&qov->stack, QStack)
> 
> Questions:
> 
> * How can e become NULL?
> 
>   The only place that pushes onto &qov->stack appears to be
>   qmp_output_push_obj().  Can obviously push e with !e->value, but can't
>   push null e.

My understanding is that qov->stack corresponds to nesting levels of {}
or [] in the JSON code.  The testsuite shows a case where the stack is
empty as one case where e is NULL, but if e is non-NULL, I'm not sure
whether e->value can ever be NULL.  I'll have to read the code more closely.

> 
> * What about qmp_output_last()?
> 
>   Why does qmp_output_first() check for !e, but not qmp_output_last()?
> 
>   My patch makes them less symmetric (bad smell).  Yours makes them more
>   symmetric, but not quite.

Which is awkward in its own right.

> 
> * How does this contraption work?

Indeed. Without reading further, we're both shooting in the dark for
something that makes tests pass, but without being a clean interface.

How about this: go ahead with your series as proposed, with the squash
hunk to tests/ to avoid the leak in the testsuite, but leaving the leak
in qmp_output_get_object(), and we address the leak in a followup patch.
 refcnt is size_t, so on 32-bit platforms, it could roll over after 4G
repeats of the leak and cause catastrophe, but I don't think we are
outputting qnull() frequently enough for the leak to bite while we wait
for a followup patch.

The followup patch(es) will then have to include some contract
documentation, so that we track what we learned while investigating the
code, and so that the next reader has more than just code to start from.
Markus Armbruster Sept. 12, 2015, 2:11 p.m. UTC | #3
Eric Blake <eblake@redhat.com> writes:

> On 09/12/2015 02:10 AM, Markus Armbruster wrote:
>
>> Relatively harmless, because the qnull_ singleton is static.  Worth
>> fixing anyway, of course.
>> 
>>>> I'm still investigating, and may be able to find the patch
>>>
>>> Squash this in, and you can have:
>>> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> Well, your further questions are spot on; my squash proposal isn't quite
> right.
>
>
>> I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely.
>> Also because that's where I found the FIXME :)
>> 
>> You lift it into one of two callers.  Impact:
>> 
>> * qmp_output_get_qobject()
>> 
>>   - master: return NULL when !e || !e->value
>> 
>>   - my patch: return qnull() when !e
>>               return NULL when !e->value
>> 
>>   - your patch: return qnull() when !e || !e->value
>
> The leak in your patch was that qnull() increments the count, then
> qmp_output_get_object() also increments the count.
>
> I guess I'll have to investigate where e->value can be set to NULL to
> see if my idea was safe, or if we'd have to do something even different.
>
> If this were the only caller, then I guess we could always do something
> like this in qmp_output_first(), leaving the possibility of returning
> NULL for e->value:
>
> if (!e) {
>     obj = qnull();
>     qobject_decref(obj); /* Caller will again increment refcount */
>     return obj;
> }
>
> But it's not the only caller.
>
>> 
>> * qmp_output_visitor_cleanup()
>> 
>>   - master: root = NULL when when !e || !e->value
>> 
>>   - my patch: root = qnull() when !e
>>               root = NULL when !e->value
>> 
>>   - your patch: root = NULL when when !e || !e->value
>
> And calls qobject_decref(root), but that is safe on NULL.
>
> Here, your patch ends up with a net 0 refcnt on qnull() (incremented in
> qmp_output_first(), decremented in the cleanup), but my idea above to
> pre-decrement would be wrong.
>
> Another option would be to keep your patch to qmp_output_first(), but
> then fix qmp_output_get_object() to special case it it has an instance
> of QNull (no refcnt change) vs. anything else (qobject_incref).  But
> that feels gross.

It does.

>> where e = QTAILQ_LAST(&qov->stack, QStack)
>> 
>> Questions:
>> 
>> * How can e become NULL?
>> 
>>   The only place that pushes onto &qov->stack appears to be
>>   qmp_output_push_obj().  Can obviously push e with !e->value, but can't
>>   push null e.
>
> My understanding is that qov->stack corresponds to nesting levels of {}
> or [] in the JSON code.  The testsuite shows a case where the stack is
> empty as one case where e is NULL, but if e is non-NULL, I'm not sure
> whether e->value can ever be NULL.  I'll have to read the code more closely.
>
>> 
>> * What about qmp_output_last()?
>> 
>>   Why does qmp_output_first() check for !e, but not qmp_output_last()?
>> 
>>   My patch makes them less symmetric (bad smell).  Yours makes them more
>>   symmetric, but not quite.
>
> Which is awkward in its own right.
>
>> 
>> * How does this contraption work?
>
> Indeed. Without reading further, we're both shooting in the dark for
> something that makes tests pass, but without being a clean interface.
>
> How about this: go ahead with your series as proposed, with the squash
> hunk to tests/ to avoid the leak in the testsuite, but leaving the leak
> in qmp_output_get_object(), and we address the leak in a followup patch.

I'll add a FIXME explaining the reference counting bug, and the wider
problem.

What exactly do you want changed in tests?

>  refcnt is size_t, so on 32-bit platforms, it could roll over after 4G
> repeats of the leak and cause catastrophe,

Assertion failure in qnull_destroy_obj(), to be exact.

>                                            but I don't think we are
> outputting qnull() frequently enough for the leak to bite while we wait
> for a followup patch.

Agree.

> The followup patch(es) will then have to include some contract
> documentation, so that we track what we learned while investigating the
> code, and so that the next reader has more than just code to start from.

It's time to retrofit visitors with a proper contract.

Retrofitting a contract is much harder than starting with one, but we
got to play the hand we've been dealt.
Eric Blake Sept. 12, 2015, 2:17 p.m. UTC | #4
On 09/12/2015 08:11 AM, Markus Armbruster wrote:

>> Indeed. Without reading further, we're both shooting in the dark for
>> something that makes tests pass, but without being a clean interface.
>>
>> How about this: go ahead with your series as proposed, with the squash
>> hunk to tests/ to avoid the leak in the testsuite, but leaving the leak
>> in qmp_output_get_object(), and we address the leak in a followup patch.

I've given a couple more responses with my analysis, but up to you how
much you want to take now or defer to later.

> 
> I'll add a FIXME explaining the reference counting bug, and the wider
> problem.
> 
> What exactly do you want changed in tests?

Definitely add the qobject_decref(arg). But the g_assert(refcnt...)
should not be added unless we go with a solution that doesn't leak, and
instead should be replaced with a FIXME, if you don't want my followup
mails now.


>> The followup patch(es) will then have to include some contract
>> documentation, so that we track what we learned while investigating the
>> code, and so that the next reader has more than just code to start from.
> 
> It's time to retrofit visitors with a proper contract.
> 
> Retrofitting a contract is much harder than starting with one, but we
> got to play the hand we've been dealt.

I've already started that work in some of my pending patches:
https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02597.html

but it could indeed use further documentation in each visitor
implementation.
diff mbox

Patch

diff --git i/qapi/qmp-output-visitor.c w/qapi/qmp-output-visitor.c
index 2d6083e..b96849e 100644
--- i/qapi/qmp-output-visitor.c
+++ w/qapi/qmp-output-visitor.c
@@ -66,11 +66,7 @@  static QObject *qmp_output_first(QmpOutputVisitor *qov)
 {
     QStackEntry *e = QTAILQ_LAST(&qov->stack, QStack);

-    if (!e) {
-        return qnull();
-    }
-
-    return e->value;
+    return e ? e->value : NULL;
 }

 static QObject *qmp_output_last(QmpOutputVisitor *qov)
@@ -190,6 +186,8 @@  QObject *qmp_output_get_qobject(QmpOutputVisitor *qov)
     QObject *obj = qmp_output_first(qov);
     if (obj) {
         qobject_incref(obj);
+    } else {
+        obj = qnull();
     }
     return obj;
 }
diff --git i/tests/test-qmp-output-visitor.c
w/tests/test-qmp-output-visitor.c
index 256bffd..8bd48f4 100644
--- i/tests/test-qmp-output-visitor.c
+++ w/tests/test-qmp-output-visitor.c
@@ -486,6 +486,9 @@  static void
test_visitor_out_empty(TestOutputVisitorData *data,

     arg = qmp_output_get_qobject(data->qov);
     g_assert(qobject_type(arg) == QTYPE_QNULL);
+    /* Check that qnull reference counting is sane */
+    g_assert(arg->refcnt == 2);
+    qobject_decref(arg);
 }