Message ID | 55F3411B.5070905@redhat.com |
---|---|
State | New |
Headers | show |
Eric Blake <eblake@redhat.com> writes: > On 09/11/2015 02:43 PM, Eric Blake wrote: > >>> +++ b/tests/test-qmp-output-visitor.c >>> @@ -485,7 +485,7 @@ static void test_visitor_out_empty(TestOutputVisitorData *data, >>> QObject *arg; >>> >>> arg = qmp_output_get_qobject(data->qov); >>> - g_assert(!arg); >>> + g_assert(qobject_type(arg) == QTYPE_QNULL); >>> } >> >> Missing >> qobject_decref(arg); >> >> Ultimately, the global qnull_ starts life with refcnt 1, and after this >> test should be back to 1. But it is coming back as 3, so even with a >> qobject_decref, that would get it back down to 2. So we are leaking a >> reference to qnull somewhere. Relatively harmless, because the qnull_ singleton is static. Worth fixing anyway, of course. >> I'm still investigating, and may be able to find the patch > > Squash this in, and you can have: > Reviewed-by: Eric Blake <eblake@redhat.com> > > diff --git i/qapi/qmp-output-visitor.c w/qapi/qmp-output-visitor.c > index 2d6083e..b96849e 100644 > --- i/qapi/qmp-output-visitor.c > +++ w/qapi/qmp-output-visitor.c > @@ -66,11 +66,7 @@ static QObject *qmp_output_first(QmpOutputVisitor *qov) > { > QStackEntry *e = QTAILQ_LAST(&qov->stack, QStack); > > - if (!e) { > - return qnull(); > - } > - > - return e->value; > + return e ? e->value : NULL; > } > > static QObject *qmp_output_last(QmpOutputVisitor *qov) > @@ -190,6 +186,8 @@ QObject *qmp_output_get_qobject(QmpOutputVisitor *qov) > QObject *obj = qmp_output_first(qov); > if (obj) { > qobject_incref(obj); > + } else { > + obj = qnull(); > } > return obj; > } The visitor code is disgustingly ambiguous / confused / confusing about NULL. The whole thing feels like it was hacked up in a rush. We've been paying interest for it not having a written contract ever since. I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely. Also because that's where I found the FIXME :) You lift it into one of two callers. Impact: * qmp_output_get_qobject() - master: return NULL when !e || !e->value - my patch: return qnull() when !e return NULL when !e->value - your patch: return qnull() when !e || !e->value * qmp_output_visitor_cleanup() - master: root = NULL when when !e || !e->value - my patch: root = qnull() when !e root = NULL when !e->value - your patch: root = NULL when when !e || !e->value where e = QTAILQ_LAST(&qov->stack, QStack) Questions: * How can e become NULL? The only place that pushes onto &qov->stack appears to be qmp_output_push_obj(). Can obviously push e with !e->value, but can't push null e. * What about qmp_output_last()? Why does qmp_output_first() check for !e, but not qmp_output_last()? My patch makes them less symmetric (bad smell). Yours makes them more symmetric, but not quite. * How does this contraption work? > diff --git i/tests/test-qmp-output-visitor.c > w/tests/test-qmp-output-visitor.c > index 256bffd..8bd48f4 100644 > --- i/tests/test-qmp-output-visitor.c > +++ w/tests/test-qmp-output-visitor.c > @@ -486,6 +486,9 @@ static void > test_visitor_out_empty(TestOutputVisitorData *data, > > arg = qmp_output_get_qobject(data->qov); > g_assert(qobject_type(arg) == QTYPE_QNULL); > + /* Check that qnull reference counting is sane */ > + g_assert(arg->refcnt == 2); > + qobject_decref(arg); > } > > static void init_native_list(UserDefNativeListUnion *cvalue)
On 09/12/2015 02:10 AM, Markus Armbruster wrote: > Relatively harmless, because the qnull_ singleton is static. Worth > fixing anyway, of course. > >>> I'm still investigating, and may be able to find the patch >> >> Squash this in, and you can have: >> Reviewed-by: Eric Blake <eblake@redhat.com> Well, your further questions are spot on; my squash proposal isn't quite right. > I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely. > Also because that's where I found the FIXME :) > > You lift it into one of two callers. Impact: > > * qmp_output_get_qobject() > > - master: return NULL when !e || !e->value > > - my patch: return qnull() when !e > return NULL when !e->value > > - your patch: return qnull() when !e || !e->value The leak in your patch was that qnull() increments the count, then qmp_output_get_object() also increments the count. I guess I'll have to investigate where e->value can be set to NULL to see if my idea was safe, or if we'd have to do something even different. If this were the only caller, then I guess we could always do something like this in qmp_output_first(), leaving the possibility of returning NULL for e->value: if (!e) { obj = qnull(); qobject_decref(obj); /* Caller will again increment refcount */ return obj; } But it's not the only caller. > > * qmp_output_visitor_cleanup() > > - master: root = NULL when when !e || !e->value > > - my patch: root = qnull() when !e > root = NULL when !e->value > > - your patch: root = NULL when when !e || !e->value And calls qobject_decref(root), but that is safe on NULL. Here, your patch ends up with a net 0 refcnt on qnull() (incremented in qmp_output_first(), decremented in the cleanup), but my idea above to pre-decrement would be wrong. Another option would be to keep your patch to qmp_output_first(), but then fix qmp_output_get_object() to special case it it has an instance of QNull (no refcnt change) vs. anything else (qobject_incref). But that feels gross. > > where e = QTAILQ_LAST(&qov->stack, QStack) > > Questions: > > * How can e become NULL? > > The only place that pushes onto &qov->stack appears to be > qmp_output_push_obj(). Can obviously push e with !e->value, but can't > push null e. My understanding is that qov->stack corresponds to nesting levels of {} or [] in the JSON code. The testsuite shows a case where the stack is empty as one case where e is NULL, but if e is non-NULL, I'm not sure whether e->value can ever be NULL. I'll have to read the code more closely. > > * What about qmp_output_last()? > > Why does qmp_output_first() check for !e, but not qmp_output_last()? > > My patch makes them less symmetric (bad smell). Yours makes them more > symmetric, but not quite. Which is awkward in its own right. > > * How does this contraption work? Indeed. Without reading further, we're both shooting in the dark for something that makes tests pass, but without being a clean interface. How about this: go ahead with your series as proposed, with the squash hunk to tests/ to avoid the leak in the testsuite, but leaving the leak in qmp_output_get_object(), and we address the leak in a followup patch. refcnt is size_t, so on 32-bit platforms, it could roll over after 4G repeats of the leak and cause catastrophe, but I don't think we are outputting qnull() frequently enough for the leak to bite while we wait for a followup patch. The followup patch(es) will then have to include some contract documentation, so that we track what we learned while investigating the code, and so that the next reader has more than just code to start from.
Eric Blake <eblake@redhat.com> writes: > On 09/12/2015 02:10 AM, Markus Armbruster wrote: > >> Relatively harmless, because the qnull_ singleton is static. Worth >> fixing anyway, of course. >> >>>> I'm still investigating, and may be able to find the patch >>> >>> Squash this in, and you can have: >>> Reviewed-by: Eric Blake <eblake@redhat.com> > > Well, your further questions are spot on; my squash proposal isn't quite > right. > > >> I put the qnull() in qmp_output_first() to avoid (QObject *)0 entirely. >> Also because that's where I found the FIXME :) >> >> You lift it into one of two callers. Impact: >> >> * qmp_output_get_qobject() >> >> - master: return NULL when !e || !e->value >> >> - my patch: return qnull() when !e >> return NULL when !e->value >> >> - your patch: return qnull() when !e || !e->value > > The leak in your patch was that qnull() increments the count, then > qmp_output_get_object() also increments the count. > > I guess I'll have to investigate where e->value can be set to NULL to > see if my idea was safe, or if we'd have to do something even different. > > If this were the only caller, then I guess we could always do something > like this in qmp_output_first(), leaving the possibility of returning > NULL for e->value: > > if (!e) { > obj = qnull(); > qobject_decref(obj); /* Caller will again increment refcount */ > return obj; > } > > But it's not the only caller. > >> >> * qmp_output_visitor_cleanup() >> >> - master: root = NULL when when !e || !e->value >> >> - my patch: root = qnull() when !e >> root = NULL when !e->value >> >> - your patch: root = NULL when when !e || !e->value > > And calls qobject_decref(root), but that is safe on NULL. > > Here, your patch ends up with a net 0 refcnt on qnull() (incremented in > qmp_output_first(), decremented in the cleanup), but my idea above to > pre-decrement would be wrong. > > Another option would be to keep your patch to qmp_output_first(), but > then fix qmp_output_get_object() to special case it it has an instance > of QNull (no refcnt change) vs. anything else (qobject_incref). But > that feels gross. It does. >> where e = QTAILQ_LAST(&qov->stack, QStack) >> >> Questions: >> >> * How can e become NULL? >> >> The only place that pushes onto &qov->stack appears to be >> qmp_output_push_obj(). Can obviously push e with !e->value, but can't >> push null e. > > My understanding is that qov->stack corresponds to nesting levels of {} > or [] in the JSON code. The testsuite shows a case where the stack is > empty as one case where e is NULL, but if e is non-NULL, I'm not sure > whether e->value can ever be NULL. I'll have to read the code more closely. > >> >> * What about qmp_output_last()? >> >> Why does qmp_output_first() check for !e, but not qmp_output_last()? >> >> My patch makes them less symmetric (bad smell). Yours makes them more >> symmetric, but not quite. > > Which is awkward in its own right. > >> >> * How does this contraption work? > > Indeed. Without reading further, we're both shooting in the dark for > something that makes tests pass, but without being a clean interface. > > How about this: go ahead with your series as proposed, with the squash > hunk to tests/ to avoid the leak in the testsuite, but leaving the leak > in qmp_output_get_object(), and we address the leak in a followup patch. I'll add a FIXME explaining the reference counting bug, and the wider problem. What exactly do you want changed in tests? > refcnt is size_t, so on 32-bit platforms, it could roll over after 4G > repeats of the leak and cause catastrophe, Assertion failure in qnull_destroy_obj(), to be exact. > but I don't think we are > outputting qnull() frequently enough for the leak to bite while we wait > for a followup patch. Agree. > The followup patch(es) will then have to include some contract > documentation, so that we track what we learned while investigating the > code, and so that the next reader has more than just code to start from. It's time to retrofit visitors with a proper contract. Retrofitting a contract is much harder than starting with one, but we got to play the hand we've been dealt.
On 09/12/2015 08:11 AM, Markus Armbruster wrote: >> Indeed. Without reading further, we're both shooting in the dark for >> something that makes tests pass, but without being a clean interface. >> >> How about this: go ahead with your series as proposed, with the squash >> hunk to tests/ to avoid the leak in the testsuite, but leaving the leak >> in qmp_output_get_object(), and we address the leak in a followup patch. I've given a couple more responses with my analysis, but up to you how much you want to take now or defer to later. > > I'll add a FIXME explaining the reference counting bug, and the wider > problem. > > What exactly do you want changed in tests? Definitely add the qobject_decref(arg). But the g_assert(refcnt...) should not be added unless we go with a solution that doesn't leak, and instead should be replaced with a FIXME, if you don't want my followup mails now. >> The followup patch(es) will then have to include some contract >> documentation, so that we track what we learned while investigating the >> code, and so that the next reader has more than just code to start from. > > It's time to retrofit visitors with a proper contract. > > Retrofitting a contract is much harder than starting with one, but we > got to play the hand we've been dealt. I've already started that work in some of my pending patches: https://lists.gnu.org/archive/html/qemu-devel/2015-09/msg02597.html but it could indeed use further documentation in each visitor implementation.
diff --git i/qapi/qmp-output-visitor.c w/qapi/qmp-output-visitor.c index 2d6083e..b96849e 100644 --- i/qapi/qmp-output-visitor.c +++ w/qapi/qmp-output-visitor.c @@ -66,11 +66,7 @@ static QObject *qmp_output_first(QmpOutputVisitor *qov) { QStackEntry *e = QTAILQ_LAST(&qov->stack, QStack); - if (!e) { - return qnull(); - } - - return e->value; + return e ? e->value : NULL; } static QObject *qmp_output_last(QmpOutputVisitor *qov) @@ -190,6 +186,8 @@ QObject *qmp_output_get_qobject(QmpOutputVisitor *qov) QObject *obj = qmp_output_first(qov); if (obj) { qobject_incref(obj); + } else { + obj = qnull(); } return obj; } diff --git i/tests/test-qmp-output-visitor.c w/tests/test-qmp-output-visitor.c index 256bffd..8bd48f4 100644 --- i/tests/test-qmp-output-visitor.c +++ w/tests/test-qmp-output-visitor.c @@ -486,6 +486,9 @@ static void test_visitor_out_empty(TestOutputVisitorData *data, arg = qmp_output_get_qobject(data->qov); g_assert(qobject_type(arg) == QTYPE_QNULL); + /* Check that qnull reference counting is sane */ + g_assert(arg->refcnt == 2); + qobject_decref(arg); }