diff mbox

[v14,13/21] qdict: allow qdict_crumple to accept compound types as values

Message ID 1475246744-29302-14-git-send-email-berrange@redhat.com
State New
Headers show

Commit Message

Daniel P. Berrangé Sept. 30, 2016, 2:45 p.m. UTC
Currently qdict_crumple requires a totally flat QDict as its
input. i.e. all values in the QDict must be scalar types.

In order to have backwards compatibility with the OptsVisitor,
qemu_opt_to_qdict() has a new mode where it may return a QList
for values in the QDict, if there was a repeated key. We thus
need to allow compound types to appear as values in the input
dict given to qdict_crumple().

To avoid confusion, we sanity check that the user has not mixed
the old and new syntax at the same time. e.g. these are allowed

   foo=hello,foo=world,foo=wibble
   foo.0=hello,foo.1=world,foo.2=wibble

but this is forbidden

   foo=hello,foo=world,foo.2=wibble

Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
---
 qobject/qdict.c     |  45 +++++++++-----
 tests/check-qdict.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 204 insertions(+), 16 deletions(-)

Comments

Markus Armbruster Oct. 13, 2016, 12:35 p.m. UTC | #1
Cc: Kevin for discussion of QemuOpts dotted key convention

"Daniel P. Berrange" <berrange@redhat.com> writes:

> Currently qdict_crumple requires a totally flat QDict as its
> input. i.e. all values in the QDict must be scalar types.
>
> In order to have backwards compatibility with the OptsVisitor,
> qemu_opt_to_qdict() has a new mode where it may return a QList
> for values in the QDict, if there was a repeated key. We thus
> need to allow compound types to appear as values in the input
> dict given to qdict_crumple().
>
> To avoid confusion, we sanity check that the user has not mixed
> the old and new syntax at the same time. e.g. these are allowed
>
>    foo=hello,foo=world,foo=wibble
>    foo.0=hello,foo.1=world,foo.2=wibble
>
> but this is forbidden
>
>    foo=hello,foo=world,foo.2=wibble

I understand the need for foo.bar=val.  It makes it possible to specify
nested dictionaries with QemuOpts.

The case for foo.0=val is less clear.  QemuOpts already supports lists,
by repeating keys.  Why do we need a second, wordier way to specify
them?

Note that this second way creates entirely new failure modes and
restrictions.  Let me show using an example derived from one in
qdict_crumple()'s contract:

    foo.0.bar=bla,foo.eek.bar=blubb

    Without the dotted key convention, this is perfectly fine: key
    "foo.0.bar" has the single value "bla", and key "foo.eek.bar" has
    the single value "blubb".  Equivalent JSON would be

      { "foo.0.bar": "bla", "foo.eek.bar": "blubb" }

    With just the struct convention, it's still fine: it obviously means
    the same as JSON

      { "foo": { "0": { "bar": "bla" }, "eek": { "bar": "blubb" } } }

    Adding the list convention makes it invalid.  It also outlaws a
    bunch of keys that would be just fine in JSON, namely any that get
    recognized as list index.  Raise your hand if you're willing to bet
    real money on your predictions of what will be recognized as list
    index, without looking at the code.  I'm not.

I'm afraid I have growing doubts regarding the QemuOpts dotted key
convention in general.

The convention makes '.' a special character in keys, but only
sometimes.  If the key gets consumed by something that uses dotted key
convention, '.' is special, and to get a non-special '.', you need to
escape it by doubling.  Else, it's not.

Since the same key can be used differently by different code, the same
'.' could in theory be both special and non-special.  In practice, this
would be madness.

Adopting the dotted key convention for an existing QemuOpts option, say
-object [PATCH 15], *breaks* existing command line usage of keys
containing '.', because you now have to escape the '.'.  Dan, I'm afraid
you need to show that no such keys exist, or if they exist they don't
matter.

I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
property "ide.0".  Our chronic inability to consistently restrict names
in ABI to something sane is beyond foolish.

It's probably too late to back out the dotted key convention completely.
Kevin?

Can we still back out the list part of the convention, and use repeated
keys instead?

If we're stuck with some form of the dotted key convention, can we at
least make it a more integral part of QemuOpts rather than something
bolted on as an afterthought?  Here's my thinking on how that might be
done:

* Have a QemuOptsList flag @flat.

* If @flat, QemuOpts behaves as it always has: the special characters
  are ',' and '=', and parsing a key=value,... string produces a list
  where each element represents one key=value from the string, in the
  same order.

* If not @flat, '.' becomes an additional special character, and parsing
  a key=value,... string produces a dictionary, similar to the one you
  get now by converting with qemu_opts_to_qdict() and filtering through
  qdict_crumple().

The difference to now is that you either always crumple, or not at all,
and the meaning of '.' is unambiguous.

I wish we had refrained from saddling QemuOpts with even more magic.
Compared to this swamp, use of JSON on the command line looks rather
appealing to me.
Kevin Wolf Oct. 13, 2016, 2:46 p.m. UTC | #2
Am 13.10.2016 um 14:35 hat Markus Armbruster geschrieben:
> Cc: Kevin for discussion of QemuOpts dotted key convention
> 
> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > Currently qdict_crumple requires a totally flat QDict as its
> > input. i.e. all values in the QDict must be scalar types.
> >
> > In order to have backwards compatibility with the OptsVisitor,
> > qemu_opt_to_qdict() has a new mode where it may return a QList
> > for values in the QDict, if there was a repeated key. We thus
> > need to allow compound types to appear as values in the input
> > dict given to qdict_crumple().
> >
> > To avoid confusion, we sanity check that the user has not mixed
> > the old and new syntax at the same time. e.g. these are allowed
> >
> >    foo=hello,foo=world,foo=wibble
> >    foo.0=hello,foo.1=world,foo.2=wibble
> >
> > but this is forbidden
> >
> >    foo=hello,foo=world,foo.2=wibble
> 
> I understand the need for foo.bar=val.  It makes it possible to specify
> nested dictionaries with QemuOpts.
> 
> The case for foo.0=val is less clear.  QemuOpts already supports lists,
> by repeating keys.  Why do we need a second, wordier way to specify
> them?

Probably primarily because someone didn't realise this when introducing
the dotted syntax. Also because flat QDicts can't represent this.

But even if I realised that QemuOpts support this syntax, I think we
would still have to use the dotted syntax because it's explicit about
the index and we need that because the list can contains dicts.

Compare this:

    driver=quorum,
    child.0.driver=file,child.0.filename=disk1.img,
    child.1.driver=host_device,child.1.filename=/dev/sdb,
    child.2.driver=nbd,child.2.host=localhost

And this:

    driver=quorum,
    child.driver=file,child.filename=disk1.img,
    child.driver=host_device,child.filename=/dev/sdb,
    child.driver=nbd,child.host=localhost

For starters, can we really trust the order in QemuOpts so that the
right driver and filename are associated with each other?

We would also have code to associate the third child.driver with the
first child.host (because file and host_device don't have a host
property). And this isn't even touching optional arguments yet. Would
you really want to implement or review this?

> Note that this second way creates entirely new failure modes and
> restrictions.  Let me show using an example derived from one in
> qdict_crumple()'s contract:
> 
>     foo.0.bar=bla,foo.eek.bar=blubb
> 
>     Without the dotted key convention, this is perfectly fine: key
>     "foo.0.bar" has the single value "bla", and key "foo.eek.bar" has
>     the single value "blubb".  Equivalent JSON would be
> 
>       { "foo.0.bar": "bla", "foo.eek.bar": "blubb" }
> 
>     With just the struct convention, it's still fine: it obviously means
>     the same as JSON
> 
>       { "foo": { "0": { "bar": "bla" }, "eek": { "bar": "blubb" } } }
> 
>     Adding the list convention makes it invalid.  It also outlaws a
>     bunch of keys that would be just fine in JSON, namely any that get
>     recognized as list index.  Raise your hand if you're willing to bet
>     real money on your predictions of what will be recognized as list
>     index, without looking at the code.  I'm not.

If you're adding dict keys to the schema that could be reasonably parsed
as a list index, I think you're doing something wrong. I would agree
that this is a problem if we were talking about user-supplied values,
but it's just keys that are defined by qemu.

> I'm afraid I have growing doubts regarding the QemuOpts dotted key
> convention in general.
> 
> The convention makes '.' a special character in keys, but only
> sometimes.  If the key gets consumed by something that uses dotted key
> convention, '.' is special, and to get a non-special '.', you need to
> escape it by doubling.  Else, it's not.

Do we really implement escaping by doubling dots? This would be news to
me.

Do we even have a reason to escape dots, i.e. are there any options in
the schema whose keys contain a dot? I think we took care to avoid such
things.

> Since the same key can be used differently by different code, the same
> '.' could in theory be both special and non-special.  In practice, this
> would be madness.
> 
> Adopting the dotted key convention for an existing QemuOpts option, say
> -object [PATCH 15], *breaks* existing command line usage of keys
> containing '.', because you now have to escape the '.'.  Dan, I'm afraid
> you need to show that no such keys exist, or if they exist they don't
> matter.
> 
> I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
> property "ide.0".  Our chronic inability to consistently restrict names
> in ABI to something sane is beyond foolish.

I wanted to have a look at this example, but I can only find the string
"ide.0" used as a bus name in the sources, that is, a value rather than
a key.

Do you have a pointer to the property definition that you mean?

> It's probably too late to back out the dotted key convention completely.
> Kevin?

Yes, far too late.

Also, I haven't yet seen even just an idea for an alternative solution.

> Can we still back out the list part of the convention, and use repeated
> keys instead?

See above, repeated keys are messy and can't provide the same thing.

Also it's been in use since quorum was introduced in commit c88a1de51,
February 2014. Pretty sure that this means no anyway.

> If we're stuck with some form of the dotted key convention, can we at
> least make it a more integral part of QemuOpts rather than something
> bolted on as an afterthought?  Here's my thinking on how that might be
> done:
> 
> * Have a QemuOptsList flag @flat.
> 
> * If @flat, QemuOpts behaves as it always has: the special characters
>   are ',' and '=', and parsing a key=value,... string produces a list
>   where each element represents one key=value from the string, in the
>   same order.
> 
> * If not @flat, '.' becomes an additional special character, and parsing
>   a key=value,... string produces a dictionary, similar to the one you
>   get now by converting with qemu_opts_to_qdict() and filtering through
>   qdict_crumple().
> 
> The difference to now is that you either always crumple, or not at all,
> and the meaning of '.' is unambiguous.

Integrating qdict_crumple() functionality in QemuOpts sounds like a
reasonable next step after Dan's series.

> I wish we had refrained from saddling QemuOpts with even more magic.
> Compared to this swamp, use of JSON on the command line looks rather
> appealing to me.

I'd bet most users would disagree with you because they use only simple
cases.

In more complicated setups, dotted syntax still works (unlike repeated
options), but becomes a bit verbose because of the repetition of long
prefixes. If we wanted to attack this, we would have to introduce two
more special characters (brackets) for a more concise nesting syntax.

Kevin
Markus Armbruster Oct. 17, 2016, 2:50 p.m. UTC | #3
Kevin Wolf <kwolf@redhat.com> writes:

> Am 13.10.2016 um 14:35 hat Markus Armbruster geschrieben:
>> Cc: Kevin for discussion of QemuOpts dotted key convention
>> 
>> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> 
>> > Currently qdict_crumple requires a totally flat QDict as its
>> > input. i.e. all values in the QDict must be scalar types.
>> >
>> > In order to have backwards compatibility with the OptsVisitor,
>> > qemu_opt_to_qdict() has a new mode where it may return a QList
>> > for values in the QDict, if there was a repeated key. We thus
>> > need to allow compound types to appear as values in the input
>> > dict given to qdict_crumple().
>> >
>> > To avoid confusion, we sanity check that the user has not mixed
>> > the old and new syntax at the same time. e.g. these are allowed
>> >
>> >    foo=hello,foo=world,foo=wibble
>> >    foo.0=hello,foo.1=world,foo.2=wibble
>> >
>> > but this is forbidden
>> >
>> >    foo=hello,foo=world,foo.2=wibble
>> 
>> I understand the need for foo.bar=val.  It makes it possible to specify
>> nested dictionaries with QemuOpts.
>> 
>> The case for foo.0=val is less clear.  QemuOpts already supports lists,
>> by repeating keys.  Why do we need a second, wordier way to specify
>> them?
>
> Probably primarily because someone didn't realise this when introducing
> the dotted syntax.

Can't even blame "someone" for that; it's an obscure, underdocumented
feature of an interface that's collapsing under its load of obscure,
underdocumented features.

On the other hand, that's not exactly a state that allows for *more*
obscure features.

>                    Also because flat QDicts can't represent this.

Explain?

> But even if I realised that QemuOpts support this syntax, I think we
> would still have to use the dotted syntax because it's explicit about
> the index and we need that because the list can contains dicts.
>
> Compare this:
>
>     driver=quorum,
>     child.0.driver=file,child.0.filename=disk1.img,
>     child.1.driver=host_device,child.1.filename=/dev/sdb,
>     child.2.driver=nbd,child.2.host=localhost
>
> And this:
>
>     driver=quorum,
>     child.driver=file,child.filename=disk1.img,
>     child.driver=host_device,child.filename=/dev/sdb,
>     child.driver=nbd,child.host=localhost

Aside: both are about equally illegible, to be honest.

> For starters, can we really trust the order in QemuOpts so that the
> right driver and filename are associated with each other?

The order is trustworthy, but...

> We would also have code to associate the third child.driver with the
> first child.host (because file and host_device don't have a host
> property). And this isn't even touching optional arguments yet. Would
> you really want to implement or review this?

... you're right, doing lists by repeating keys breaks down when
combined with the dotted key convention's use of repetition to do
structured values.

Permit me to digress.

QemuOpts wasn't designed for list-values keys.  Doing lists by
repetition was clever.

QemuOpts wasn't designed for structured values.  Doing structured values
by a dotted key convention plus repetition was clever.

And there's the problem: too much cleverness, not enough "this is being
pushed way beyond its design limits, time to replace it".

For me, a replacement should do structured values by providing suitable
*value* syntax instead of hacking it into the keys:

    { "driver": "quorum",
      "child": [ { "driver": "file", "filename": "disk1.img" },
                 { "driver": "host_device", "filename=/dev/sdb" },
                 { "driver": "nbd", "host": "localhost" } ] }

Note how the structure is obvious.  It isn't with dotted keys, not even
if you order them sensibly (which users inevitably won't).

Also not that the value needs to be parsed by QemuOpts!  You can't leave
it to the consumer of the QemuOpts, because if you did, you'd have to
escape the commas.

If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
could try

    driver=quorum,
    child=[{ driver=file, filename=disk1.img },
           { driver=host_device, filename=/dev/sdb },
           { driver=nbd, host=localhost } ]

I'd go with some existing syntax, though.  The one we already use is
JSON.

End of digression.

>> Note that this second way creates entirely new failure modes and
>> restrictions.  Let me show using an example derived from one in
>> qdict_crumple()'s contract:
>> 
>>     foo.0.bar=bla,foo.eek.bar=blubb
>> 
>>     Without the dotted key convention, this is perfectly fine: key
>>     "foo.0.bar" has the single value "bla", and key "foo.eek.bar" has
>>     the single value "blubb".  Equivalent JSON would be
>> 
>>       { "foo.0.bar": "bla", "foo.eek.bar": "blubb" }
>> 
>>     With just the struct convention, it's still fine: it obviously means
>>     the same as JSON
>> 
>>       { "foo": { "0": { "bar": "bla" }, "eek": { "bar": "blubb" } } }
>> 
>>     Adding the list convention makes it invalid.  It also outlaws a
>>     bunch of keys that would be just fine in JSON, namely any that get
>>     recognized as list index.  Raise your hand if you're willing to bet
>>     real money on your predictions of what will be recognized as list
>>     index, without looking at the code.  I'm not.
>
> If you're adding dict keys to the schema that could be reasonably parsed
> as a list index, I think you're doing something wrong. I would agree
> that this is a problem if we were talking about user-supplied values,
> but it's just keys that are defined by qemu.

Your argument would be stronger if we didn't have such a disappointing
track record on sensible naming.  We've failed at picking a consistent
rules for names, let alone enforcing them.  Instead, we have local
rules, enforced to varying degrees, and of course inconsistent with each
other.  We even have code to mangle names, so that they conform to some
other set of rules.

We should not extend an interface in a way that makes it rely on certain
rules for names without writing down and enforcing these rules.

You did it *locally* for block stuff, and got away with it, because you
control the keys (hopefully), and nobody (including probably you and
very much me) wants to think about this particular mess unless it's
absolutely, obviously, embarrassingly necessary.

Which is what he get to do now, because now we're discussing to adopt
your local thing for general use, with keys no single person really
knows, let alone controls.

Your dotted key convention requires two rules: 1. names must not look
like integers, and 2. names must not contain '.'.

We can avoid rule 2 by requiring '.' to be escaped.  Dan's
qdict_crumple() currently does that, to your surprise.  Adding the
escaping to existing options is a compatibility break, however.  So, if
names with '.' already exist, we can either break compatibility by
renaming them, or break it by requiring the '.' to be escaped.

The names in question are the names declared in QemuOptDesc (easy enough
to find) plus any names that the code may use with empty QemuOptDesc
(harder).  Empty QemuOptDesc are used with the following option groups:

* "drive", qemu_drive_opts in blockdev.c
  Also empty_opts in qemu-io.c and qemu_drive_opts in test-replication.c

* "numa", qemu_numa_opts in numa.c

* "device", qemu_device_opts in qdev-monitor.c

  This one pulls in qdev properties.  Properties violating rule 2 exist.

* "object", qemu_object_opts in vl.c
  Also qemu_object_opts in qemu-img.c, qemu-io.c and qemu-nbd.c (WTF?)

  This one pulls in properties of user-creatable objects.

* "source", qemu_soource_opts (sic!) in qemu-img.c

* "reopen", reopen_opts in qemu-io-cmds.c

* "file", file_opts in qemu-io.c and qemu-nbd.c

* "machine", qemu_machine_opts in vl.c

  This one pulls in machine properties.

* "tpmdev", qemu_tpmdev_opts in vl.c

* "netdev", qemu_netdev_opts in net.c

* "net", qemu_net_opts in net.c

* "userdef", userdef_opts in tests-opts-visitor.c

* "opts_list_03", opts_list_03 in test-qemu-opts.c

* "acpi", qemu_acpi_opts in hw/acpi/core.c

* "smbios", qemu_smbios_opts in smbios.c

Empty QemuOptDesc is used this widely because QemuOpts is too primitive
to support the needs of its users.  Unlike a QAPI schema.  Just sayin'.

Finding the names and assessing how they're impacted by the dotted key
convention is going to be a substantial chunk of work.

>> I'm afraid I have growing doubts regarding the QemuOpts dotted key
>> convention in general.
>> 
>> The convention makes '.' a special character in keys, but only
>> sometimes.  If the key gets consumed by something that uses dotted key
>> convention, '.' is special, and to get a non-special '.', you need to
>> escape it by doubling.  Else, it's not.
>
> Do we really implement escaping by doubling dots? This would be news to
> me.

Dan's PATCH 02/21 does.

> Do we even have a reason to escape dots, i.e. are there any options in
> the schema whose keys contain a dot? I think we took care to avoid such
> things.

I'm afraid we did not.

>> Since the same key can be used differently by different code, the same
>> '.' could in theory be both special and non-special.  In practice, this
>> would be madness.
>> 
>> Adopting the dotted key convention for an existing QemuOpts option, say
>> -object [PATCH 15], *breaks* existing command line usage of keys
>> containing '.', because you now have to escape the '.'.  Dan, I'm afraid
>> you need to show that no such keys exist, or if they exist they don't
>> matter.
>> 
>> I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
>> property "ide.0".  Our chronic inability to consistently restrict names
>> in ABI to something sane is beyond foolish.
>
> I wanted to have a look at this example, but I can only find the string
> "ide.0" used as a bus name in the sources, that is, a value rather than
> a key.
>
> Do you have a pointer to the property definition that you mean?

We've gotten better at hiding property definitions...

"qemu-system-ppc -device macio-ide,help" shows the property:

    macio-ide.ide.0=child<IDE>

The device is not marked no-user, but the obvious attempt to use it
fails:

    $ qemu-system-ppc -device macio-ide
    qemu-system-ppc: Option '-device macio-ide' cannot be handled by this machine

This is the Alex's dynamic system bus thingy, which put most of the
system bus devices in help even when they cannot be used.

Perhaps you can make a case why this property cannot be used via
QemuOpts.  If so, great!  There are about three dozen more for you to
check, followed by fourteen more option groups with empty QemuOptDesc,
and a bunch of non-empty QemuOptDesc ;-P

>> It's probably too late to back out the dotted key convention completely.
>> Kevin?
>
> Yes, far too late.

If we're stuck with the existing use of dotted key convention, we're
stuck with it.  But that's no excuse for adding more uses.

If we decide we want to use dotted key convention more widely, we must
document and enforce the naming conventions it requires at least in all
the places where we use it.  As discussed above, this might involve
compatibility breaks, hopefully minor.

> Also, I haven't yet seen even just an idea for an alternative solution.

See digression above.

>> Can we still back out the list part of the convention, and use repeated
>> keys instead?
>
> See above, repeated keys are messy and can't provide the same thing.
>
> Also it's been in use since quorum was introduced in commit c88a1de51,
> February 2014. Pretty sure that this means no anyway.
>
>> If we're stuck with some form of the dotted key convention, can we at
>> least make it a more integral part of QemuOpts rather than something
>> bolted on as an afterthought?  Here's my thinking on how that might be
>> done:
>> 
>> * Have a QemuOptsList flag @flat.
>> 
>> * If @flat, QemuOpts behaves as it always has: the special characters
>>   are ',' and '=', and parsing a key=value,... string produces a list
>>   where each element represents one key=value from the string, in the
>>   same order.
>> 
>> * If not @flat, '.' becomes an additional special character, and parsing
>>   a key=value,... string produces a dictionary, similar to the one you
>>   get now by converting with qemu_opts_to_qdict() and filtering through
>>   qdict_crumple().
>> 
>> The difference to now is that you either always crumple, or not at all,
>> and the meaning of '.' is unambiguous.
>
> Integrating qdict_crumple() functionality in QemuOpts sounds like a
> reasonable next step after Dan's series.
>
>> I wish we had refrained from saddling QemuOpts with even more magic.
>> Compared to this swamp, use of JSON on the command line looks rather
>> appealing to me.
>
> I'd bet most users would disagree with you because they use only simple
> cases.

Use of JSON on the command line when and where you need nesting, not
everywhere.  The simple cases can stay exactly as they are.

> In more complicated setups, dotted syntax still works (unlike repeated
> options), but becomes a bit verbose because of the repetition of long
> prefixes. If we wanted to attack this, we would have to introduce two
> more special characters (brackets) for a more concise nesting syntax.
>
> Kevin
Paolo Bonzini Oct. 17, 2016, 3:43 p.m. UTC | #4
> For me, a replacement should do structured values by providing suitable
> *value* syntax instead of hacking it into the keys:
> 
>     { "driver": "quorum",
>       "child": [ { "driver": "file", "filename": "disk1.img" },
>                  { "driver": "host_device", "filename=/dev/sdb" },
>                  { "driver": "nbd", "host": "localhost" } ] }
> 
> Note how the structure is obvious.  It isn't with dotted keys, not even
> if you order them sensibly (which users inevitably won't).
> 
> Also not that the value needs to be parsed by QemuOpts!  You can't leave
> it to the consumer of the QemuOpts, because if you did, you'd have to
> escape the commas.
> 
> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
> could try
> 
>     driver=quorum,
>     child=[{ driver=file, filename=disk1.img },
>            { driver=host_device, filename=/dev/sdb },
>            { driver=nbd, host=localhost } ]
> 
> I'd go with some existing syntax, though.  The one we already use is
> JSON.

In fact there is already "filename=json:{...}" support in the block layer.
By the way, abuse of QemuOpts dates back to http://wiki.qemu.org/Features/QCFG.

> Your dotted key convention requires two rules: 1. names must not look
> like integers, and 2. names must not contain '.'.
> 
> We can avoid rule 2 by requiring '.' to be escaped.  Dan's
> qdict_crumple() currently does that, to your surprise.  Adding the
> escaping to existing options is a compatibility break, however.  So, if
> names with '.' already exist, we can either break compatibility by
> renaming them, or break it by requiring the '.' to be escaped.

> * "device", qemu_device_opts in qdev-monitor.c
> 
>   This one pulls in qdev properties.  Properties violating rule 2 exist.

Which are they?  Only bus names?

> * "object", qemu_object_opts in vl.c
> 
>   This one pulls in properties of user-creatable objects.
> 
> * "machine", qemu_machine_opts in vl.c
> 
>   This one pulls in machine properties.

> > > I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
> > > property "ide.0".  Our chronic inability to consistently restrict names
> > > in ABI to something sane is beyond foolish.
> >
> > I wanted to have a look at this example, but I can only find the string
> > "ide.0" used as a bus name in the sources, that is, a value rather than
> > a key.
> >
> > Do you have a pointer to the property definition that you mean?
> 
> We've gotten better at hiding property definitions...
> 
> "qemu-system-ppc -device macio-ide,help" shows the property:
> 
>     macio-ide.ide.0=child<IDE>

It is a bug that this property is shown in the help, because it's not
assignable (same for all other child<> properties).  I'd rather declare
other occurrences of "." in user-accessible property names to be bugs,
and break the ABI if there are any.

Paolo
Eric Blake Oct. 17, 2016, 5:38 p.m. UTC | #5
On 10/17/2016 09:50 AM, Markus Armbruster wrote:
>> But even if I realised that QemuOpts support this syntax, I think we
>> would still have to use the dotted syntax because it's explicit about
>> the index and we need that because the list can contains dicts.
>>
>> Compare this:
>>
>>     driver=quorum,
>>     child.0.driver=file,child.0.filename=disk1.img,
>>     child.1.driver=host_device,child.1.filename=/dev/sdb,
>>     child.2.driver=nbd,child.2.host=localhost
>>
>> And this:
>>
>>     driver=quorum,
>>     child.driver=file,child.filename=disk1.img,
>>     child.driver=host_device,child.filename=/dev/sdb,
>>     child.driver=nbd,child.host=localhost
> 
> Aside: both are about equally illegible, to be honest.

> Permit me to digress.
> 
> QemuOpts wasn't designed for list-values keys.  Doing lists by
> repetition was clever.
> 
> QemuOpts wasn't designed for structured values.  Doing structured values
> by a dotted key convention plus repetition was clever.
> 
> And there's the problem: too much cleverness, not enough "this is being
> pushed way beyond its design limits, time to replace it".
> 
> For me, a replacement should do structured values by providing suitable
> *value* syntax instead of hacking it into the keys:
> 
>     { "driver": "quorum",
>       "child": [ { "driver": "file", "filename": "disk1.img" },
>                  { "driver": "host_device", "filename=/dev/sdb" },
>                  { "driver": "nbd", "host": "localhost" } ] }

Possible hack solution:

QemuOpts already special-cases id=.  What if we ALSO make it
special-case a leading json=?  Shown here with shell quoting, the above
example of creating a Quorum -drive argument could then be:

-drive json='
    { "driver": "quorum",
      "child": [ { "driver": "file", "filename": "disk1.img" },
                 { "driver": "host_device", "filename=/dev/sdb" },
                 { "driver": "nbd", "host": "localhost" } ] }
'

As far as I know, we don't have 'json' as any existing QemuOpts key (do
we? A full audit may be better than my quick git grep '"json"').  Thus,
if QemuOpts sees a leading json=, it hands off the rest of the string to
the same parser as we use for QMP, where we no longer have to escape
commas (even nicer than the drive hack where we support
filename=json:{...} but have to double up all commas to make it through
the QemuOpts layer).  Encountering json= as anything other than the
first option would be an error, and you would be unable to combine a
json= option with any other old-style option.  In other words, the use
of leading json= would be the switch for whether to do old-style parsing
or to use a saner syntax for everything else that needs structure, on a
per-argument basis.
Markus Armbruster Oct. 17, 2016, 5:48 p.m. UTC | #6
Paolo Bonzini <pbonzini@redhat.com> writes:

>> For me, a replacement should do structured values by providing suitable
>> *value* syntax instead of hacking it into the keys:
>> 
>>     { "driver": "quorum",
>>       "child": [ { "driver": "file", "filename": "disk1.img" },
>>                  { "driver": "host_device", "filename=/dev/sdb" },
>>                  { "driver": "nbd", "host": "localhost" } ] }
>> 
>> Note how the structure is obvious.  It isn't with dotted keys, not even
>> if you order them sensibly (which users inevitably won't).
>> 
>> Also not that the value needs to be parsed by QemuOpts!  You can't leave
>> it to the consumer of the QemuOpts, because if you did, you'd have to
>> escape the commas.
>> 
>> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
>> could try
>> 
>>     driver=quorum,
>>     child=[{ driver=file, filename=disk1.img },
>>            { driver=host_device, filename=/dev/sdb },
>>            { driver=nbd, host=localhost } ]
>> 
>> I'd go with some existing syntax, though.  The one we already use is
>> JSON.
>
> In fact there is already "filename=json:{...}" support in the block layer.

Yes.

> By the way, abuse of QemuOpts dates back to http://wiki.qemu.org/Features/QCFG.

That page is from 2011, when QAPI was being implemented.  The idea to
bring the power of the QAPI schema to bear on the command line is sound,
but QCFG never made it past the ideas stage, I'm afraid.  It uses either
dotted keys or braces for nested structs.  It doesn't cover lists.

>> Your dotted key convention requires two rules: 1. names must not look
>> like integers, and 2. names must not contain '.'.
>> 
>> We can avoid rule 2 by requiring '.' to be escaped.  Dan's
>> qdict_crumple() currently does that, to your surprise.  Adding the
>> escaping to existing options is a compatibility break, however.  So, if
>> names with '.' already exist, we can either break compatibility by
>> renaming them, or break it by requiring the '.' to be escaped.
>
>> * "device", qemu_device_opts in qdev-monitor.c
>> 
>>   This one pulls in qdev properties.  Properties violating rule 2 exist.
>
> Which are they?  Only bus names?

Finding properties is difficult, because we (foolishly!) define them in
code rather than data.  My testing with "info qdm" for all targets
finds:

    = xlnx.axi-dma =
    xlnx.axi-dma[0]
    = xlnx.xps-ethernetlite =
    xlnx.xps-ethernetlite[0]
    = xlnx.xps-intc =
    xlnx.xps-intc[0]
    = xlnx.xps-uartlite =
    xlnx.xps-uartlite[0]
    = xlnx.axi-dma =
    xlnx.axi-dma[0]
    = xlnx.xps-ethernetlite =
    xlnx.xps-ethernetlite[0]
    = xlnx.xps-intc =
    xlnx.xps-intc[0]
    = xlnx.xps-uartlite =
    xlnx.xps-uartlite[0]
    = cuda =
    adb.0
    = raven-pcihost =
    pci.0
    = macio-ide =
    ide.0
    = mpc8544-guts =
    mpc8544.guts[0]
    = xlnx.xps-ethernetlite =
    xlnx.xps-ethernetlite[0]
    = xlnx.xps-intc =
    xlnx.xps-intc[0]
    = xlnx.xps-uartlite =
    xlnx.xps-uartlite[0]
    = cuda =
    adb.0
    = raven-pcihost =
    pci.0
    = macio-ide =
    ide.0
    = mpc8544-guts =
    mpc8544.guts[0]
    = xlnx.xps-ethernetlite =
    xlnx.xps-ethernetlite[0]
    = xlnx.xps-intc =
    xlnx.xps-intc[0]
    = xlnx.xps-uartlite =
    xlnx.xps-uartlite[0]
    = s390-sclp-event-facility =
    s390-sclp-events-bus.0
    = cgthree =
    cg3.reg[0]
    cg3.prom[0]
    = SUNW,tcx =
    tcx.thc[0]
    tcx.rblit[0]
    tcx.dac[0]
    tcx.alt[0]
    tcx.tec[0]
    tcx.rstip[0]
    tcx.blit[0]
    tcx.dhc[0]
    tcx.prom[0]
    tcx.stip[0]

>> * "object", qemu_object_opts in vl.c
>> 
>>   This one pulls in properties of user-creatable objects.
>> 
>> * "machine", qemu_machine_opts in vl.c
>> 
>>   This one pulls in machine properties.
>
>> > > I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
>> > > property "ide.0".  Our chronic inability to consistently restrict names
>> > > in ABI to something sane is beyond foolish.
>> >
>> > I wanted to have a look at this example, but I can only find the string
>> > "ide.0" used as a bus name in the sources, that is, a value rather than
>> > a key.
>> >
>> > Do you have a pointer to the property definition that you mean?
>> 
>> We've gotten better at hiding property definitions...
>> 
>> "qemu-system-ppc -device macio-ide,help" shows the property:
>> 
>>     macio-ide.ide.0=child<IDE>
>
> It is a bug that this property is shown in the help, because it's not
> assignable (same for all other child<> properties).

Let's fix it then.

>                                                      I'd rather declare
> other occurrences of "." in user-accessible property names to be bugs,
> and break the ABI if there are any.

I propose to adopt QAPI's rules on permissible names: "All names must
begin with a letter, and contain only ASCII letters, digits, dash, and
underscore" (docs/qapi-code-gen.txt).  QAPI's naming rules get adopted
anyway on QAPIfication, so why wait? :)

Note that this may affect names generated by automatic arrayification
(commit 3396590), because those contain [ and ].
Kevin Wolf Oct. 18, 2016, 9:34 a.m. UTC | #7
Am 17.10.2016 um 16:50 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 13.10.2016 um 14:35 hat Markus Armbruster geschrieben:
> >> Cc: Kevin for discussion of QemuOpts dotted key convention
> >> 
> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
> >> 
> >> > Currently qdict_crumple requires a totally flat QDict as its
> >> > input. i.e. all values in the QDict must be scalar types.
> >> >
> >> > In order to have backwards compatibility with the OptsVisitor,
> >> > qemu_opt_to_qdict() has a new mode where it may return a QList
> >> > for values in the QDict, if there was a repeated key. We thus
> >> > need to allow compound types to appear as values in the input
> >> > dict given to qdict_crumple().
> >> >
> >> > To avoid confusion, we sanity check that the user has not mixed
> >> > the old and new syntax at the same time. e.g. these are allowed
> >> >
> >> >    foo=hello,foo=world,foo=wibble
> >> >    foo.0=hello,foo.1=world,foo.2=wibble
> >> >
> >> > but this is forbidden
> >> >
> >> >    foo=hello,foo=world,foo.2=wibble
> >> 
> >> I understand the need for foo.bar=val.  It makes it possible to specify
> >> nested dictionaries with QemuOpts.
> >> 
> >> The case for foo.0=val is less clear.  QemuOpts already supports lists,
> >> by repeating keys.  Why do we need a second, wordier way to specify
> >> them?
> >
> > Probably primarily because someone didn't realise this when introducing
> > the dotted syntax.
> 
> Can't even blame "someone" for that; it's an obscure, underdocumented
> feature of an interface that's collapsing under its load of obscure,
> underdocumented features.
> 
> On the other hand, that's not exactly a state that allows for *more*
> obscure features.

I don't really think we're introducing more obscure features here, but
rather trying to implement a single, and rather straightforward, way as
the standard.

Dotted syntax for hierarchy has actually plenty of precedence in qemu if
you look a bit closer (the block layer, -global, -device foo,help, even
the bus names you mentioned below are really just flattened lists), so
we're only making things more consistent.

> >                    Also because flat QDicts can't represent this.
> 
> Explain?

Repeated options are parsed into QLists. If you keep it at that without
flattening you have at least a QDict that contains a QList that contains
simple types. This is not flat any more.

Of course, you could argue that flat QDicts are the wrong data structure
in the first place and instead of flatting everything we should have
done the equivalent of qdict_crumple from the beginning, but they were
the natural extension of what was already there and happened to work
good enough, so this is what we're currently using.

> > But even if I realised that QemuOpts support this syntax, I think we
> > would still have to use the dotted syntax because it's explicit about
> > the index and we need that because the list can contains dicts.
> >
> > Compare this:
> >
> >     driver=quorum,
> >     child.0.driver=file,child.0.filename=disk1.img,
> >     child.1.driver=host_device,child.1.filename=/dev/sdb,
> >     child.2.driver=nbd,child.2.host=localhost
> >
> > And this:
> >
> >     driver=quorum,
> >     child.driver=file,child.filename=disk1.img,
> >     child.driver=host_device,child.filename=/dev/sdb,
> >     child.driver=nbd,child.host=localhost
> 
> Aside: both are about equally illegible, to be honest.

Not sure about equally, but let's agree on "both are illegible".

> > For starters, can we really trust the order in QemuOpts so that the
> > right driver and filename are associated with each other?
> 
> The order is trustworthy, but...
> 
> > We would also have code to associate the third child.driver with the
> > first child.host (because file and host_device don't have a host
> > property). And this isn't even touching optional arguments yet. Would
> > you really want to implement or review this?
> 
> ... you're right, doing lists by repeating keys breaks down when
> combined with the dotted key convention's use of repetition to do
> structured values.
> 
> Permit me to digress.
> 
> QemuOpts wasn't designed for list-values keys.  Doing lists by
> repetition was clever.
> 
> QemuOpts wasn't designed for structured values.  Doing structured values
> by a dotted key convention plus repetition was clever.
> 
> And there's the problem: too much cleverness, not enough "this is being
> pushed way beyond its design limits, time to replace it".
> 
> For me, a replacement should do structured values by providing suitable
> *value* syntax instead of hacking it into the keys:
> 
>     { "driver": "quorum",
>       "child": [ { "driver": "file", "filename": "disk1.img" },
>                  { "driver": "host_device", "filename=/dev/sdb" },
>                  { "driver": "nbd", "host": "localhost" } ] }
> 
> Note how the structure is obvious.  It isn't with dotted keys, not even
> if you order them sensibly (which users inevitably won't).
> 
> Also not that the value needs to be parsed by QemuOpts!  You can't leave
> it to the consumer of the QemuOpts, because if you did, you'd have to
> escape the commas.

Okay, so I like JSON. It's a great format for our monitor protocol. We
even have pretty printers so that it's more or less readable as output
for human users. However, there is one thing for which it is horrible:
Getting user input.

Yes, getting rid of the comma escaping is a first step, but typing JSON
on the command line remains a PITA. Mostly because of the quotes (which
you probably have to escape), but also things like the useless outer
brackets.

> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
> could try
> 
>     driver=quorum,
>     child=[{ driver=file, filename=disk1.img },
>            { driver=host_device, filename=/dev/sdb },
>            { driver=nbd, host=localhost } ]

This looks a lot more agreeable as something that I have to type on the
command line.

What's more, this is a direct extension of the existing syntax. You
complained about how the step from simple configurations with -hda to
more complex ones with -drive completely changes the syntax (and rightly
so). Going from simple QemuOpts syntax to doing JSON as soon as you get
structured values and lists would be another similar step. In contrast,
the new syntax you're suggesting above is a natural extension of what's
there.

> I'd go with some existing syntax, though.  The one we already use is
> JSON.

The one we already use in this context is comma separated key/value
pairs as parsed by QemuOpts.

> Your dotted key convention requires two rules: 1. names must not look
> like integers, and 2. names must not contain '.'.
> 
> We can avoid rule 2 by requiring '.' to be escaped.  Dan's
> qdict_crumple() currently does that, to your surprise.  Adding the
> escaping to existing options is a compatibility break, however.  So, if
> names with '.' already exist, we can either break compatibility by
> renaming them, or break it by requiring the '.' to be escaped.

I don't think we should support any escaping and rather forbid '.'
completely in names.

> The names in question are the names declared in QemuOptDesc (easy enough
> to find) plus any names that the code may use with empty QemuOptDesc
> (harder).  Empty QemuOptDesc are used with the following option groups:
> 
> * "drive", qemu_drive_opts in blockdev.c
>   Also empty_opts in qemu-io.c and qemu_drive_opts in test-replication.c
> 
> * "numa", qemu_numa_opts in numa.c
> 
> * "device", qemu_device_opts in qdev-monitor.c
> 
>   This one pulls in qdev properties.  Properties violating rule 2 exist.

You posted a list of them in another email. Apart from not being
user-configurable, I think it actually makes sense to interpret these as
being structured.

A lot of them are really array/list properties with a single entry. It
seems we have some inconsistency there as to whether it the syntax is
'list.index' or 'list[index]'. The former matches the dotted syntax
rule and is used in external interfaces (e.g. in bus=...), and I seem to
remember the latter is a newer QOM thing.

Would it be too late to change the latter into the former? In other
words, is the difference externally visible outside of qom-*, which we
declared a non-ABI, as far as I know?

> * "object", qemu_object_opts in vl.c
>   Also qemu_object_opts in qemu-img.c, qemu-io.c and qemu-nbd.c (WTF?)
> 
>   This one pulls in properties of user-creatable objects.
> 
> * "source", qemu_soource_opts (sic!) in qemu-img.c
> 
> * "reopen", reopen_opts in qemu-io-cmds.c
> 
> * "file", file_opts in qemu-io.c and qemu-nbd.c
> 
> * "machine", qemu_machine_opts in vl.c
> 
>   This one pulls in machine properties.
> 
> * "tpmdev", qemu_tpmdev_opts in vl.c
> 
> * "netdev", qemu_netdev_opts in net.c
> 
> * "net", qemu_net_opts in net.c
> 
> * "userdef", userdef_opts in tests-opts-visitor.c
> 
> * "opts_list_03", opts_list_03 in test-qemu-opts.c
> 
> * "acpi", qemu_acpi_opts in hw/acpi/core.c
> 
> * "smbios", qemu_smbios_opts in smbios.c
> 
> Empty QemuOptDesc is used this widely because QemuOpts is too primitive
> to support the needs of its users.  Unlike a QAPI schema.  Just sayin'.

I don't think you have to convince anyone that QemuOpts is too limited.
But today it's still a piece that is needed to get from the command line
to something superior like QAPI. This is the whole reason why we're
doing all of this.

Completely QAPIfication of everything and going directly from the
command line to QAPI objects would be nice, but somehow I feel it's not
a realistic hope to get this as a short term solution.

> Finding the names and assessing how they're impacted by the dotted key
> convention is going to be a substantial chunk of work.

Possibly.

The other option would be to take it step by step and let every user of
QemuOpts specify on creation whether dots are used for structure or
should be taken literally. Then you can convert user by user and assess
them one at a time (which is still a substantial chunk for -object and
-device, but at least you can concentrate on only these for now).

If you're concerned about compatibility issues if we find a dot in a
name only in one of the later users: We can handle it. If you then
stumble across an obscure "weird.name" property where the dot is really
part of the name and not already representing structure, you can still
artificially reinterpret it as a nested structure with a single field
even if logically that's not what it is... Might be a bit ugly, but
keeps it working.

Kevin
Markus Armbruster Oct. 18, 2016, 10:59 a.m. UTC | #8
Eric Blake <eblake@redhat.com> writes:

> On 10/17/2016 09:50 AM, Markus Armbruster wrote:
>>> But even if I realised that QemuOpts support this syntax, I think we
>>> would still have to use the dotted syntax because it's explicit about
>>> the index and we need that because the list can contains dicts.
>>>
>>> Compare this:
>>>
>>>     driver=quorum,
>>>     child.0.driver=file,child.0.filename=disk1.img,
>>>     child.1.driver=host_device,child.1.filename=/dev/sdb,
>>>     child.2.driver=nbd,child.2.host=localhost
>>>
>>> And this:
>>>
>>>     driver=quorum,
>>>     child.driver=file,child.filename=disk1.img,
>>>     child.driver=host_device,child.filename=/dev/sdb,
>>>     child.driver=nbd,child.host=localhost
>> 
>> Aside: both are about equally illegible, to be honest.
>
>> Permit me to digress.
>> 
>> QemuOpts wasn't designed for list-values keys.  Doing lists by
>> repetition was clever.
>> 
>> QemuOpts wasn't designed for structured values.  Doing structured values
>> by a dotted key convention plus repetition was clever.
>> 
>> And there's the problem: too much cleverness, not enough "this is being
>> pushed way beyond its design limits, time to replace it".
>> 
>> For me, a replacement should do structured values by providing suitable
>> *value* syntax instead of hacking it into the keys:
>> 
>>     { "driver": "quorum",
>>       "child": [ { "driver": "file", "filename": "disk1.img" },
>>                  { "driver": "host_device", "filename=/dev/sdb" },
>>                  { "driver": "nbd", "host": "localhost" } ] }
>
> Possible hack solution:
>
> QemuOpts already special-cases id=.  What if we ALSO make it
> special-case a leading json=?  Shown here with shell quoting, the above
> example of creating a Quorum -drive argument could then be:
>
> -drive json='
>     { "driver": "quorum",
>       "child": [ { "driver": "file", "filename": "disk1.img" },
>                  { "driver": "host_device", "filename=/dev/sdb" },
>                  { "driver": "nbd", "host": "localhost" } ] }
> '
>
> As far as I know, we don't have 'json' as any existing QemuOpts key (do
> we? A full audit may be better than my quick git grep '"json"').  Thus,
> if QemuOpts sees a leading json=, it hands off the rest of the string to
> the same parser as we use for QMP, where we no longer have to escape
> commas (even nicer than the drive hack where we support
> filename=json:{...} but have to double up all commas to make it through
> the QemuOpts layer).  Encountering json= as anything other than the
> first option would be an error, and you would be unable to combine a
> json= option with any other old-style option.  In other words, the use
> of leading json= would be the switch for whether to do old-style parsing
> or to use a saner syntax for everything else that needs structure, on a
> per-argument basis.

Slight variation: omit the 'json=' and recognize the '{':

    -drive '{ "driver": "quorum",
              "child": [ { "driver": "file", "filename": "disk1.img" },
                         { "driver": "host_device", "filename=/dev/sdb" },
                         { "driver": "nbd", "host": "localhost" } ] }'

As always when extending haphazardly defined syntax, the question to ask
is whether this makes the syntax (more) ambiguous.

So what is the option argument syntax now?  The abstract syntax is
simple enough: "list of (key, value) pairs".  For the concrete syntax,
we need to study opts_do_parse().

Each iteration of its loop accepts an abstract (key, value).  It first
looks for the leftmost '=' and ','.  Cases:

1. There is no '=', or it is to the right of the leftmost ','.  In other
words, this (key, value) can only be key with an implied value or a
value with an implied key.

1a. If this is the first iteration, and we accept an implied key, this
is a value for the implied key.  We consume everything up to the first
non-escaped ','.  This may be more than the leftmost ',' we found above.
The consumed string with escapes processed is the value.

1b. Else, this is a key with an implied value.  We consume everything up
to the leftmost ',' (no escaping here).

1b1. If the consumed string starts with "no", the key is everything after
the "no" and the value is "off".

1b2. Else the key is the string and the value is "on".

2. This is a key and a value.  We first consume everything up to the
leftmost '=' (no escaping here).  The consumed string is the key.  We
then consume '='.  Finally, we consume everything up to the first
non-escaped ','The consumed string with escapes processed is the value.

Thus, the option argument starts either with a key (case 1b1, 2), "no"
(case 1b2) or a value (case 1a).

Adding JSON object syntax (which always starts with '{') is ambiguous
when a key can start with '{' (case 1b1, 2) or when a value can (case
1a).

Keys starting with '{' are basically foolish.  Let's outlaw them by
adopting QAPI's name rules.

Values starting with '{' are possible.  The implied keys I can remember
don't have such values, though.  If we interpret '{' as JSON, then users
can't omit the key when the value starts with '{'.  Not pretty, but I'd
say it's tolerable.
Markus Armbruster Oct. 18, 2016, 3:35 p.m. UTC | #9
Kevin Wolf <kwolf@redhat.com> writes:

> Am 17.10.2016 um 16:50 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 13.10.2016 um 14:35 hat Markus Armbruster geschrieben:
>> >> Cc: Kevin for discussion of QemuOpts dotted key convention
>> >> 
>> >> "Daniel P. Berrange" <berrange@redhat.com> writes:
>> >> 
>> >> > Currently qdict_crumple requires a totally flat QDict as its
>> >> > input. i.e. all values in the QDict must be scalar types.
>> >> >
>> >> > In order to have backwards compatibility with the OptsVisitor,
>> >> > qemu_opt_to_qdict() has a new mode where it may return a QList
>> >> > for values in the QDict, if there was a repeated key. We thus
>> >> > need to allow compound types to appear as values in the input
>> >> > dict given to qdict_crumple().
>> >> >
>> >> > To avoid confusion, we sanity check that the user has not mixed
>> >> > the old and new syntax at the same time. e.g. these are allowed
>> >> >
>> >> >    foo=hello,foo=world,foo=wibble
>> >> >    foo.0=hello,foo.1=world,foo.2=wibble
>> >> >
>> >> > but this is forbidden
>> >> >
>> >> >    foo=hello,foo=world,foo.2=wibble
>> >> 
>> >> I understand the need for foo.bar=val.  It makes it possible to specify
>> >> nested dictionaries with QemuOpts.
>> >> 
>> >> The case for foo.0=val is less clear.  QemuOpts already supports lists,
>> >> by repeating keys.  Why do we need a second, wordier way to specify
>> >> them?
>> >
>> > Probably primarily because someone didn't realise this when introducing
>> > the dotted syntax.
>> 
>> Can't even blame "someone" for that; it's an obscure, underdocumented
>> feature of an interface that's collapsing under its load of obscure,
>> underdocumented features.
>> 
>> On the other hand, that's not exactly a state that allows for *more*
>> obscure features.
>
> I don't really think we're introducing more obscure features here, but
> rather trying to implement a single, and rather straightforward, way as
> the standard.
>
> Dotted syntax for hierarchy has actually plenty of precedence in qemu if
> you look a bit closer (the block layer, -global, -device foo,help, even
> the bus names you mentioned below are really just flattened lists), so
> we're only making things more consistent.
>
>> >                    Also because flat QDicts can't represent this.
>> 
>> Explain?
>
> Repeated options are parsed into QLists. If you keep it at that without
> flattening you have at least a QDict that contains a QList that contains
> simple types. This is not flat any more.

*All* options are parsed into a (non-QList) list.  That's what is flat.

Only when you start crumpling things go beyond flat, and QDict/QList
come into play.

> Of course, you could argue that flat QDicts are the wrong data structure
> in the first place and instead of flatting everything we should have
> done the equivalent of qdict_crumple from the beginning, but they were
> the natural extension of what was already there and happened to work
> good enough, so this is what we're currently using.

We didn't "flatten everything", because QemuOpts is and has always been
flat to begin with.  Rather we've started to crumple a few things, and
in a separate layer.

I now think this is the wrong approach.  We have clearly outgrown flat
options.  We should redo QemuOpts to support what we need instead of
bolting on an optional crumpling layer.

I guess I reached the place you described, just on a different path :)

>> > But even if I realised that QemuOpts support this syntax, I think we
>> > would still have to use the dotted syntax because it's explicit about
>> > the index and we need that because the list can contains dicts.
>> >
>> > Compare this:
>> >
>> >     driver=quorum,
>> >     child.0.driver=file,child.0.filename=disk1.img,
>> >     child.1.driver=host_device,child.1.filename=/dev/sdb,
>> >     child.2.driver=nbd,child.2.host=localhost
>> >
>> > And this:
>> >
>> >     driver=quorum,
>> >     child.driver=file,child.filename=disk1.img,
>> >     child.driver=host_device,child.filename=/dev/sdb,
>> >     child.driver=nbd,child.host=localhost
>> 
>> Aside: both are about equally illegible, to be honest.
>
> Not sure about equally, but let's agree on "both are illegible".
>
>> > For starters, can we really trust the order in QemuOpts so that the
>> > right driver and filename are associated with each other?
>> 
>> The order is trustworthy, but...
>> 
>> > We would also have code to associate the third child.driver with the
>> > first child.host (because file and host_device don't have a host
>> > property). And this isn't even touching optional arguments yet. Would
>> > you really want to implement or review this?
>> 
>> ... you're right, doing lists by repeating keys breaks down when
>> combined with the dotted key convention's use of repetition to do
>> structured values.
>> 
>> Permit me to digress.
>> 
>> QemuOpts wasn't designed for list-values keys.  Doing lists by
>> repetition was clever.
>> 
>> QemuOpts wasn't designed for structured values.  Doing structured values
>> by a dotted key convention plus repetition was clever.
>> 
>> And there's the problem: too much cleverness, not enough "this is being
>> pushed way beyond its design limits, time to replace it".
>> 
>> For me, a replacement should do structured values by providing suitable
>> *value* syntax instead of hacking it into the keys:
>> 
>>     { "driver": "quorum",
>>       "child": [ { "driver": "file", "filename": "disk1.img" },
>>                  { "driver": "host_device", "filename=/dev/sdb" },
>>                  { "driver": "nbd", "host": "localhost" } ] }
>> 
>> Note how the structure is obvious.  It isn't with dotted keys, not even
>> if you order them sensibly (which users inevitably won't).
>> 
>> Also not that the value needs to be parsed by QemuOpts!  You can't leave
>> it to the consumer of the QemuOpts, because if you did, you'd have to
>> escape the commas.
>
> Okay, so I like JSON. It's a great format for our monitor protocol. We
> even have pretty printers so that it's more or less readable as output
> for human users. However, there is one thing for which it is horrible:
> Getting user input.
>
> Yes, getting rid of the comma escaping is a first step, but typing JSON
> on the command line remains a PITA. Mostly because of the quotes (which
> you probably have to escape), but also things like the useless outer
> brackets.

As long as you don't need "'" in your JSON, you can simply enclose in
"'" and be done.  Since "'" can only occur in JSON strings, and the same
strings would be present in any other syntax, any other syntax would
be pretty much the same PITA.

>> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
>> could try
>> 
>>     driver=quorum,
>>     child=[{ driver=file, filename=disk1.img },
>>            { driver=host_device, filename=/dev/sdb },
>>            { driver=nbd, host=localhost } ]
>
> This looks a lot more agreeable as something that I have to type on the
> command line.
>
> What's more, this is a direct extension of the existing syntax. You
> complained about how the step from simple configurations with -hda to
> more complex ones with -drive completely changes the syntax (and rightly
> so). Going from simple QemuOpts syntax to doing JSON as soon as you get
> structured values and lists would be another similar step. In contrast,
> the new syntax you're suggesting above is a natural extension of what's
> there.

Point taken.  I just don't like inventing syntax, because as a rule, way
too much syntax gets invented, and almost always badly.

>> I'd go with some existing syntax, though.  The one we already use is
>> JSON.
>
> The one we already use in this context is comma separated key/value
> pairs as parsed by QemuOpts.

Which is flat, and flat doesn't cut it anymore.

If you make it non-flat with dotted key convention, the dotted key
convention becomes part of the syntax.  Counts as inventing syntax in my
book.

>> Your dotted key convention requires two rules: 1. names must not look
>> like integers, and 2. names must not contain '.'.
>> 
>> We can avoid rule 2 by requiring '.' to be escaped.  Dan's
>> qdict_crumple() currently does that, to your surprise.  Adding the
>> escaping to existing options is a compatibility break, however.  So, if
>> names with '.' already exist, we can either break compatibility by
>> renaming them, or break it by requiring the '.' to be escaped.
>
> I don't think we should support any escaping and rather forbid '.'
> completely in names.

I think we should adopt QAPI's naming rules.

>> The names in question are the names declared in QemuOptDesc (easy enough
>> to find) plus any names that the code may use with empty QemuOptDesc
>> (harder).  Empty QemuOptDesc are used with the following option groups:
>> 
>> * "drive", qemu_drive_opts in blockdev.c
>>   Also empty_opts in qemu-io.c and qemu_drive_opts in test-replication.c
>> 
>> * "numa", qemu_numa_opts in numa.c
>> 
>> * "device", qemu_device_opts in qdev-monitor.c
>> 
>>   This one pulls in qdev properties.  Properties violating rule 2 exist.
>
> You posted a list of them in another email. Apart from not being
> user-configurable, I think it actually makes sense to interpret these as
> being structured.
>
> A lot of them are really array/list properties with a single entry. It
> seems we have some inconsistency there as to whether it the syntax is
> 'list.index' or 'list[index]'.

Yes.

>                                The former matches the dotted syntax
> rule and is used in external interfaces (e.g. in bus=...), and I seem to
> remember the latter is a newer QOM thing.

Commit 3396590 "qom: Add automatic arrayification to
object_property_add()".  I tried to strangle it in the crib,
unsuccessfully:
Message-Id: <1415812130-2592-1-git-send-email-armbru@redhat.com>
http://lists.gnu.org/archive/html/qemu-devel/2014-11/msg01905.html

> Would it be too late to change the latter into the former? In other
> words, is the difference externally visible outside of qom-*, which we
> declared a non-ABI, as far as I know?

Paolo?

>> * "object", qemu_object_opts in vl.c
>>   Also qemu_object_opts in qemu-img.c, qemu-io.c and qemu-nbd.c (WTF?)
>> 
>>   This one pulls in properties of user-creatable objects.
>>
>> * "source", qemu_soource_opts (sic!) in qemu-img.c
>> 
>> * "reopen", reopen_opts in qemu-io-cmds.c
>> 
>> * "file", file_opts in qemu-io.c and qemu-nbd.c
>> 
>> * "machine", qemu_machine_opts in vl.c
>> 
>>   This one pulls in machine properties.
>> 
>> * "tpmdev", qemu_tpmdev_opts in vl.c
>> 
>> * "netdev", qemu_netdev_opts in net.c
>> 
>> * "net", qemu_net_opts in net.c
>> 
>> * "userdef", userdef_opts in tests-opts-visitor.c
>> 
>> * "opts_list_03", opts_list_03 in test-qemu-opts.c
>> 
>> * "acpi", qemu_acpi_opts in hw/acpi/core.c
>> 
>> * "smbios", qemu_smbios_opts in smbios.c
>> 
>> Empty QemuOptDesc is used this widely because QemuOpts is too primitive
>> to support the needs of its users.  Unlike a QAPI schema.  Just sayin'.
>
> I don't think you have to convince anyone that QemuOpts is too limited.
> But today it's still a piece that is needed to get from the command line
> to something superior like QAPI. This is the whole reason why we're
> doing all of this.
>
> Completely QAPIfication of everything and going directly from the
> command line to QAPI objects would be nice, but somehow I feel it's not
> a realistic hope to get this as a short term solution.

No argument.  However, we should try hard to avoid short term hacks that
make a QAPI-based solution harder.  I believe the hard part will be
retaining all the special cases that got in for convenience, or by
accident, then were repurposed for whatever.  And that's why I'm
resisting additional syntactic/semantic cleverness *especially* when
it's used only sometimes.  Stuff that's used only sometimes can easily
conflict with stuff used at other times.

I don't think we understand the existing cleverness sufficiently to
predict interactions with new cleverness with confidence.

>> Finding the names and assessing how they're impacted by the dotted key
>> convention is going to be a substantial chunk of work.
>
> Possibly.
>
> The other option would be to take it step by step and let every user of
> QemuOpts specify on creation whether dots are used for structure or
> should be taken literally. Then you can convert user by user and assess
> them one at a time (which is still a substantial chunk for -object and
> -device, but at least you can concentrate on only these for now).

I'm wary of protracted iterative conversion.  We're better at starting
these than finishing them.  The started-but-not-finished state is
exactly the state that scares me here: still more special cases
basically nobody fully understands.

> If you're concerned about compatibility issues if we find a dot in a
> name only in one of the later users: We can handle it. If you then
> stumble across an obscure "weird.name" property where the dot is really
> part of the name and not already representing structure, you can still
> artificially reinterpret it as a nested structure with a single field
> even if logically that's not what it is... Might be a bit ugly, but
> keeps it working.

Yes, that's how we'd do backward compatibility, if needed.

So, what can we do?  Perhaps something like this:

* Pick a concrete syntax that supports nested stuff.  Proposals on the
  table so far are

  1. Dotted key convention as used in the block layer

  2. Array syntax as used in QOM's automatic arrayification (not a
     complete solution, mentioned because it conflicts with the
     previous)

  3. JSON

  4. JSON with ':' replaced by '=', double-quotes and the outermost pair
     of braces dropped.

* Replace the QemuOpts internal representation: QDict instead of list.

* Replace the QemuOpts parser.

* Add suitable struct- and list-valued accessors.

  qdict_crumple(qemu_opt_to_qdict(...), true, &errp) gets replaced by a
  trivial "get the whole dictionary" operation.

  Other uses of qemu_opt_to_qdict() need to be rewritten.

* "Repetition builds a list" backward compatibility.  It's used in only
  a few places.

* Possibly "keys with funny characters" backward compatibility.

Feels doable to me.  2.8 would be ambitious, though: soft freeze in in
less than two weeks.  However, getting this series into 2.8 has started
to feel ambitious to me, too.
Kevin Wolf Oct. 19, 2016, 9:25 a.m. UTC | #10
Am 18.10.2016 um 17:35 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> > Of course, you could argue that flat QDicts are the wrong data structure
> > in the first place and instead of flatting everything we should have
> > done the equivalent of qdict_crumple from the beginning, but they were
> > the natural extension of what was already there and happened to work
> > good enough, so this is what we're currently using.
> 
> We didn't "flatten everything", because QemuOpts is and has always been
> flat to begin with.  Rather we've started to crumple a few things, and
> in a separate layer.

That's the QemuOpts point of view.

I was talking from a block layer view. There we get data in two
different formats, QMP and the command line. The first is structured,
the second is flat. We need to make both uniform before we can pass them
to the actual block layer functions.

The current code chose to flatten what we get from QMP blockdev-add and
use that throughout the block layer. We could instead have decided that
we leave the blockdev-add input as it is, crumple what we get from
QemuOpts and use nested QObjects throughout the block layer.

> I now think this is the wrong approach.  We have clearly outgrown flat
> options.  We should redo QemuOpts to support what we need instead of
> bolting on an optional crumpling layer.
> 
> I guess I reached the place you described, just on a different path :)

Yes, I think the conclusion is easy to agree on.

> > Okay, so I like JSON. It's a great format for our monitor protocol. We
> > even have pretty printers so that it's more or less readable as output
> > for human users. However, there is one thing for which it is horrible:
> > Getting user input.
> >
> > Yes, getting rid of the comma escaping is a first step, but typing JSON
> > on the command line remains a PITA. Mostly because of the quotes (which
> > you probably have to escape), but also things like the useless outer
> > brackets.
> 
> As long as you don't need "'" in your JSON, you can simply enclose in
> "'" and be done.  Since "'" can only occur in JSON strings, and the same
> strings would be present in any other syntax, any other syntax would
> be pretty much the same PITA.

I've written enough scripts (qemu-iotests cases) that produce JSON with
shell variables in it, so if anything you can use "" quoting for the
shell and make use of qemu's extension that '' is accepted in JSON, too.

Anyway, the quotes aren't only nasty because of the escaping, but also
just because I have to type them.

> >> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
> >> could try
> >> 
> >>     driver=quorum,
> >>     child=[{ driver=file, filename=disk1.img },
> >>            { driver=host_device, filename=/dev/sdb },
> >>            { driver=nbd, host=localhost } ]
> >
> > This looks a lot more agreeable as something that I have to type on the
> > command line.
> >
> > What's more, this is a direct extension of the existing syntax. You
> > complained about how the step from simple configurations with -hda to
> > more complex ones with -drive completely changes the syntax (and rightly
> > so). Going from simple QemuOpts syntax to doing JSON as soon as you get
> > structured values and lists would be another similar step. In contrast,
> > the new syntax you're suggesting above is a natural extension of what's
> > there.
> 
> Point taken.  I just don't like inventing syntax, because as a rule, way
> too much syntax gets invented, and almost always badly.
> 
> >> I'd go with some existing syntax, though.  The one we already use is
> >> JSON.
> >
> > The one we already use in this context is comma separated key/value
> > pairs as parsed by QemuOpts.
> 
> Which is flat, and flat doesn't cut it anymore.
> 
> If you make it non-flat with dotted key convention, the dotted key
> convention becomes part of the syntax.  Counts as inventing syntax in my
> book.

Yes, it is. Though in the context of command line options, dotted syntax
is an invention already made, whereas JSON would be a new invention.
(Well, not completely, because for block devices we already have json: -
but that works a little different again.)

> >> Your dotted key convention requires two rules: 1. names must not look
> >> like integers, and 2. names must not contain '.'.
> >> 
> >> We can avoid rule 2 by requiring '.' to be escaped.  Dan's
> >> qdict_crumple() currently does that, to your surprise.  Adding the
> >> escaping to existing options is a compatibility break, however.  So, if
> >> names with '.' already exist, we can either break compatibility by
> >> renaming them, or break it by requiring the '.' to be escaped.
> >
> > I don't think we should support any escaping and rather forbid '.'
> > completely in names.
> 
> I think we should adopt QAPI's naming rules.

Which includes what I said, so fine with me.

> > If you're concerned about compatibility issues if we find a dot in a
> > name only in one of the later users: We can handle it. If you then
> > stumble across an obscure "weird.name" property where the dot is really
> > part of the name and not already representing structure, you can still
> > artificially reinterpret it as a nested structure with a single field
> > even if logically that's not what it is... Might be a bit ugly, but
> > keeps it working.
> 
> Yes, that's how we'd do backward compatibility, if needed.
> 
> So, what can we do?  Perhaps something like this:
> 
> * Pick a concrete syntax that supports nested stuff.  Proposals on the
>   table so far are
> 
>   1. Dotted key convention as used in the block layer
> 
>   2. Array syntax as used in QOM's automatic arrayification (not a
>      complete solution, mentioned because it conflicts with the
>      previous)
> 
>   3. JSON
> 
>   4. JSON with ':' replaced by '=', double-quotes and the outermost pair
>      of braces dropped.

Biased block layer view on what we ideally want to have in the end:
Anthony's QCFG proposal plus a [] syntax for lists.

This is more or less what you describe in option 4, except that the
existing dotted syntax is still supported. It is part of the ABI by now,
so we can't get rid of it anyway, and supporting it everywhere is not
only more consistent but probably also easier than making it a special
case.

The tricky part here is the {} and [] syntax for dicts and lists:
Parsing this requires a schema if you don't want to restrict string
values to not contain any of these characters. Such a restriction would
be incompatible, so we probably can't make it.

Essentially this means that we're back to "QAPIfy everything" before we
can make this work. Leaves us with implementing only the dotted syntax
subset for now, which is equally powerful, even if a lot more verbose.

> * Replace the QemuOpts internal representation: QDict instead of list.
> 
> * Replace the QemuOpts parser.
> 
> * Add suitable struct- and list-valued accessors.
> 
>   qdict_crumple(qemu_opt_to_qdict(...), true, &errp) gets replaced by a
>   trivial "get the whole dictionary" operation.
> 
>   Other uses of qemu_opt_to_qdict() need to be rewritten.
> 
> * "Repetition builds a list" backward compatibility.  It's used in only
>   a few places.
> 
> * Possibly "keys with funny characters" backward compatibility.
> 
> Feels doable to me.  2.8 would be ambitious, though: soft freeze in in
> less than two weeks.  However, getting this series into 2.8 has started
> to feel ambitious to me, too.

Let's merge qdict_crumple() at least, this would allow us to add the SSH
and NBD blockdev-add support for 2.8, possibly NFS to. No command line
syntax involved at all there. I can take the patch through my tree if
nobody objects.

With -blockdev I'll have to wait for the final decision, but if we
decide to use a syntax that includes dotted syntax (possibly as a
subset), I think this series would provide the right external interface
even if there is still some infrastructure cleanup work to do
internally.

Kevin
Daniel P. Berrangé Oct. 19, 2016, 9:42 a.m. UTC | #11
On Wed, Oct 19, 2016 at 11:25:27AM +0200, Kevin Wolf wrote:
> Am 18.10.2016 um 17:35 hat Markus Armbruster geschrieben:
> > Kevin Wolf <kwolf@redhat.com> writes:
> > > Of course, you could argue that flat QDicts are the wrong data structure
> > > in the first place and instead of flatting everything we should have
> > > done the equivalent of qdict_crumple from the beginning, but they were
> > > the natural extension of what was already there and happened to work
> > > good enough, so this is what we're currently using.
> > 
> > We didn't "flatten everything", because QemuOpts is and has always been
> > flat to begin with.  Rather we've started to crumple a few things, and
> > in a separate layer.
> 
> That's the QemuOpts point of view.
> 
> I was talking from a block layer view. There we get data in two
> different formats, QMP and the command line. The first is structured,
> the second is flat. We need to make both uniform before we can pass them
> to the actual block layer functions.
> 
> The current code chose to flatten what we get from QMP blockdev-add and
> use that throughout the block layer. We could instead have decided that
> we leave the blockdev-add input as it is, crumple what we get from
> QemuOpts and use nested QObjects throughout the block layer.

I would very much like to see BlockdevOptions structs passed around
the block drivers, instead of QemuOpts - I think it'd lead to much
clearer code than the QemuOpts stuff we have today IMHO

Regards,
Daniel
Eric Blake Oct. 19, 2016, 1:31 p.m. UTC | #12
On 10/18/2016 10:35 AM, Markus Armbruster wrote:

>>>> But even if I realised that QemuOpts support this syntax, I think we
>>>> would still have to use the dotted syntax because it's explicit about
>>>> the index and we need that because the list can contains dicts.
>>>>
>>>> Compare this:
>>>>
>>>>     driver=quorum,
>>>>     child.0.driver=file,child.0.filename=disk1.img,
>>>>     child.1.driver=host_device,child.1.filename=/dev/sdb,
>>>>     child.2.driver=nbd,child.2.host=localhost
>>>>
>>>> And this:
>>>>
>>>>     driver=quorum,
>>>>     child.driver=file,child.filename=disk1.img,
>>>>     child.driver=host_device,child.filename=/dev/sdb,
>>>>     child.driver=nbd,child.host=localhost
>>>
>>> Aside: both are about equally illegible, to be honest.

Furthermore, both of these currently require really long command lines.
Overnight, I was thinking about whether it would be worth improving
QemuOpts to ignore whitespace after (unquoted) commas, so you can do:

-drive 'driver=quorum,
        child.0.driver=file,        child.0.filename=disk1.img,
        child.1.driver=host_device, child.1.filename=/dev/sdb,
        child.2.driver=nbd,         child.2.host=localhost
'

I think this one should be just fine - we don't have ANY keys that start
with leading whitespace, so ignoring space will still let us parse out
key names, but allow for much more legibility in the command lines.  And
this is true whether we use dotted notation...

>>> If you'd rather invent syntax closer to QemuOpts than reuse JSON, you
>>> could try
>>>
>>>     driver=quorum,
>>>     child=[{ driver=file, filename=disk1.img },
>>>            { driver=host_device, filename=/dev/sdb },
>>>            { driver=nbd, host=localhost } ]
>>
>> This looks a lot more agreeable as something that I have to type on the
>> command line.

...or new syntax (where of course the new syntax has already
demonstrated that we want to support strategic ignoring of whitespace).

So I'll probably propose a patch along those lines soon, as it seems
like an orthogonal improvement.
Daniel P. Berrangé Oct. 20, 2016, 2:46 p.m. UTC | #13
On Thu, Oct 13, 2016 at 02:35:38PM +0200, Markus Armbruster wrote:
> Cc: Kevin for discussion of QemuOpts dotted key convention
> 
> "Daniel P. Berrange" <berrange@redhat.com> writes:
> 
> > Currently qdict_crumple requires a totally flat QDict as its
> > input. i.e. all values in the QDict must be scalar types.
> >
> > In order to have backwards compatibility with the OptsVisitor,
> > qemu_opt_to_qdict() has a new mode where it may return a QList
> > for values in the QDict, if there was a repeated key. We thus
> > need to allow compound types to appear as values in the input
> > dict given to qdict_crumple().
> >
> > To avoid confusion, we sanity check that the user has not mixed
> > the old and new syntax at the same time. e.g. these are allowed
> >
> >    foo=hello,foo=world,foo=wibble
> >    foo.0=hello,foo.1=world,foo.2=wibble
> >
> > but this is forbidden
> >
> >    foo=hello,foo=world,foo.2=wibble
> 
> I understand the need for foo.bar=val.  It makes it possible to specify
> nested dictionaries with QemuOpts.
> 
> The case for foo.0=val is less clear.  QemuOpts already supports lists,
> by repeating keys.  Why do we need a second, wordier way to specify
> them?

Two reasons I did this. First blockdev already uses this foo.0=val
syntax, and I wanted to be compatible with blockdev, so it could be
converted to use this new code.

Second, using foo.0 syntax means that you can unambigously determine
whether a key is going to be a scalar or a list. This lets the
qdict_crumple() method convert the QemuOpts to a QDict without
needing to know anything about the QAPI schema.

Of course I later had to add hacks to the visitor to cope with
the bare repeated key syntax, so I lost some of that benefit.

Personally I still prefer the unambiguous syntax as it lets us
give clear error messages when users do unexpected things, instead
of say, silently ignoring all but the last key.

> Note that this second way creates entirely new failure modes and
> restrictions.  Let me show using an example derived from one in
> qdict_crumple()'s contract:
> 
>     foo.0.bar=bla,foo.eek.bar=blubb
> 
>     Without the dotted key convention, this is perfectly fine: key
>     "foo.0.bar" has the single value "bla", and key "foo.eek.bar" has
>     the single value "blubb".  Equivalent JSON would be
> 
>       { "foo.0.bar": "bla", "foo.eek.bar": "blubb" }
> 
>     With just the struct convention, it's still fine: it obviously means
>     the same as JSON
> 
>       { "foo": { "0": { "bar": "bla" }, "eek": { "bar": "blubb" } } }
> 
>     Adding the list convention makes it invalid.  It also outlaws a
>     bunch of keys that would be just fine in JSON, namely any that get
>     recognized as list index.  Raise your hand if you're willing to bet
>     real money on your predictions of what will be recognized as list
>     index, without looking at the code.  I'm not.
> 
> I'm afraid I have growing doubts regarding the QemuOpts dotted key
> convention in general.
> 
> The convention makes '.' a special character in keys, but only
> sometimes.  If the key gets consumed by something that uses dotted key
> convention, '.' is special, and to get a non-special '.', you need to
> escape it by doubling.  Else, it's not.
> 
> Since the same key can be used differently by different code, the same
> '.' could in theory be both special and non-special.  In practice, this
> would be madness.
> 
> Adopting the dotted key convention for an existing QemuOpts option, say
> -object [PATCH 15], *breaks* existing command line usage of keys
> containing '.', because you now have to escape the '.'.  Dan, I'm afraid
> you need to show that no such keys exist, or if they exist they don't
> matter.

I checked the things that I converted (eg -net, -object, -numa, etc),
but I didn't check -device since that's processed using different code.

> 
> I know we have keys containing '.' elsewhere, e.g. device "macio-ide"
> property "ide.0".  Our chronic inability to consistently restrict names
> in ABI to something sane is beyond foolish.
> 
> It's probably too late to back out the dotted key convention completely.
> Kevin?
> 
> Can we still back out the list part of the convention, and use repeated
> keys instead?
> 
> If we're stuck with some form of the dotted key convention, can we at
> least make it a more integral part of QemuOpts rather than something
> bolted on as an afterthought?  Here's my thinking on how that might be
> done:

The only issue with dropping the dotted list convention is compat
with the block layer code - we couldn't easily use this new visitor
logic to turn -drive into a QAPI BlockOptions object.  Kevin's new
-blockdev arg would potentially be ok with it since its a new arg,
but IIUC, we would have to do some cleanup inside various block
driver impls, since block layer doesn't use the QAPI objects
internally - they all get converted back into QemuOpts :-(

> 
> * Have a QemuOptsList flag @flat.
> 
> * If @flat, QemuOpts behaves as it always has: the special characters
>   are ',' and '=', and parsing a key=value,... string produces a list
>   where each element represents one key=value from the string, in the
>   same order.
> 
> * If not @flat, '.' becomes an additional special character, and parsing
>   a key=value,... string produces a dictionary, similar to the one you
>   get now by converting with qemu_opts_to_qdict() and filtering through
>   qdict_crumple().
> 
> The difference to now is that you either always crumple, or not at all,
> and the meaning of '.' is unambiguous.
> 
> I wish we had refrained from saddling QemuOpts with even more magic.
> Compared to this swamp, use of JSON on the command line looks rather
> appealing to me.

Regards,
Daniel
diff mbox

Patch

diff --git a/qobject/qdict.c b/qobject/qdict.c
index c38e90e..83384b2 100644
--- a/qobject/qdict.c
+++ b/qobject/qdict.c
@@ -814,15 +814,16 @@  static int qdict_is_list(QDict *maybe_list, Error **errp)
 
 /**
  * qdict_crumple:
- * @src: the original flat dictionary (only scalar values) to crumple
+ * @src: the original dictionary to crumple
  * @recursive: true to recursively crumple nested dictionaries
  *
- * Takes a flat dictionary whose keys use '.' separator to indicate
- * nesting, and values are scalars, and crumples it into a nested
- * structure. If the @recursive parameter is false, then only the
- * first level of structure implied by the keys will be crumpled. If
- * @recursive is true, then the input will be recursively crumpled to
- * expand all levels of structure in the keys.
+ * Takes a dictionary whose keys use '.' separator to indicate
+ * nesting, crumples it into a nested structure. The values in
+ * @src are permitted to be scalars or compound typee. If the
+ * @recursive parameter is false, then only the first level of
+ * structure implied by the keys will be crumpled. If @recursive
+ * is true, then the input will be recursively crumpled to expand
+ * all levels of structure in the keys.
  *
  * To include a literal '.' in a key name, it must be escaped as '..'
  *
@@ -843,7 +844,10 @@  static int qdict_is_list(QDict *maybe_list, Error **errp)
  * The following scenarios in the input dict will result in an
  * error being returned:
  *
- *  - Any values in @src are non-scalar types
+ *  - If a key in @src has a value that is non-scalar, and
+ *    a later key implies creation of a compound type. e.g.
+ *    if 'foo' point to a 'QList' or 'QDict', then it is
+ *    not permitted to also have 'foo.NNN' keys later.
  *  - If keys in @src imply that a particular level is both a
  *    list and a dict. e.g., "foo.0.bar" and "foo.eek.bar".
  *  - If keys in @src imply that a particular level is a list,
@@ -864,21 +868,26 @@  QObject *qdict_crumple(const QDict *src, bool recursive, Error **errp)
     char *prefix = NULL;
     const char *suffix = NULL;
     int is_list;
+    QDict *compound;
 
+    compound = qdict_new();
     two_level = qdict_new();
 
     /* Step 1: split our totally flat dict into a two level dict */
     for (ent = qdict_first(src); ent != NULL; ent = qdict_next(src, ent)) {
-        if (qobject_type(ent->value) == QTYPE_QDICT ||
-            qobject_type(ent->value) == QTYPE_QLIST) {
-            error_setg(errp, "Value %s is not a scalar",
-                       ent->key);
-            goto error;
-        }
-
         qdict_split_flat_key(ent->key, &prefix, &suffix);
 
         child = qdict_get(two_level, prefix);
+
+        if (qdict_haskey(compound, prefix)) {
+            error_setg(errp, "Key %s is already set as a list/dict", prefix);
+            goto error;
+        }
+        if (qobject_type(ent->value) == QTYPE_QLIST ||
+            qobject_type(ent->value) == QTYPE_QDICT) {
+            qobject_incref(ent->value);
+            qdict_put_obj(compound, prefix, ent->value);
+        }
         if (suffix) {
             if (child) {
                 if (qobject_type(child) != QTYPE_QDICT) {
@@ -913,7 +922,8 @@  QObject *qdict_crumple(const QDict *src, bool recursive, Error **errp)
         for (ent = qdict_first(two_level); ent != NULL;
              ent = qdict_next(two_level, ent)) {
 
-            if (qobject_type(ent->value) == QTYPE_QDICT) {
+            if (qobject_type(ent->value) == QTYPE_QDICT &&
+                !qdict_haskey(compound, ent->key)) {
                 child = qdict_crumple(qobject_to_qdict(ent->value),
                                       recursive, errp);
                 if (!child) {
@@ -961,10 +971,13 @@  QObject *qdict_crumple(const QDict *src, bool recursive, Error **errp)
         dst = QOBJECT(multi_level);
     }
 
+    QDECREF(compound);
+
     return dst;
 
  error:
     g_free(prefix);
+    QDECREF(compound);
     QDECREF(multi_level);
     QDECREF(two_level);
     qobject_decref(dst);
diff --git a/tests/check-qdict.c b/tests/check-qdict.c
index 64c33ab..29d7362 100644
--- a/tests/check-qdict.c
+++ b/tests/check-qdict.c
@@ -844,6 +844,173 @@  static void qdict_crumple_test_bad_inputs(void)
     QDECREF(src);
 }
 
+static void qdict_crumple_test_non_flat_list_ok(void)
+{
+    QDict *src, *dst, *vnc, *listen;
+    QObject *child, *res;
+    QList *list;
+    QString *match;
+
+    src = qdict_new();
+    list = qlist_new();
+    qlist_append(list, qstring_from_str("fred"));
+    qlist_append(list, qstring_from_str("bob"));
+
+    qdict_put(src, "vnc.listen.addr", qstring_from_str("127.0.0.1"));
+    qdict_put(src, "vnc.listen.port", qstring_from_str("5901"));
+    qdict_put(src, "match", list);
+
+    res = qdict_crumple(src, true, &error_abort);
+
+    g_assert_cmpint(qobject_type(res), ==, QTYPE_QDICT);
+
+    dst = qobject_to_qdict(res);
+
+    g_assert_cmpint(qdict_size(dst), ==, 2);
+
+    child = qdict_get(dst, "vnc");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QDICT);
+    vnc = qobject_to_qdict(child);
+
+    child = qdict_get(vnc, "listen");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QDICT);
+    listen = qobject_to_qdict(child);
+    g_assert_cmpstr("127.0.0.1", ==, qdict_get_str(listen, "addr"));
+    g_assert_cmpstr("5901", ==, qdict_get_str(listen, "port"));
+
+    child = qdict_get(dst, "match");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QLIST);
+    list = qobject_to_qlist(child);
+
+    g_assert_cmpint(qlist_size(list), ==, 2);
+
+    child = qlist_pop(list);
+    g_assert(child);
+    match = qobject_to_qstring(child);
+    g_assert_cmpstr("fred", ==, qstring_get_str(match));
+    qobject_decref(child);
+
+    child = qlist_pop(list);
+    g_assert(child);
+    match = qobject_to_qstring(child);
+    g_assert_cmpstr("bob", ==, qstring_get_str(match));
+    qobject_decref(child);
+
+    child = qlist_pop(list);
+    g_assert(!child);
+
+    QDECREF(src);
+    QDECREF(dst);
+}
+
+
+static void qdict_crumple_test_non_flat_list_bad(void)
+{
+    QDict *src;
+    QObject *res;
+    QList *list;
+    Error *err = NULL;
+
+    src = qdict_new();
+    list = qlist_new();
+    qlist_append(list, qstring_from_str("fred"));
+    qlist_append(list, qstring_from_str("bob"));
+
+    qdict_put(src, "vnc.listen.addr", qstring_from_str("127.0.0.1"));
+    qdict_put(src, "vnc.listen.port", qstring_from_str("5901"));
+    qdict_put(src, "match.0", qstring_from_str("frank"));
+    qdict_put(src, "match", list);
+
+    res = qdict_crumple(src, false, &err);
+    g_assert(!res);
+    error_free_or_abort(&err);
+
+    QDECREF(src);
+}
+
+
+static void qdict_crumple_test_non_flat_dict_ok(void)
+{
+    QDict *src, *dst, *rule, *vnc, *acl;
+    QObject *child, *res;
+    QList *rules;
+
+    vnc = qdict_new();
+    qdict_put(vnc, "listen.addr", qstring_from_str("127.0.0.1"));
+    qdict_put(vnc, "listen.port", qstring_from_str("5901"));
+
+    src = qdict_new();
+    qdict_put(src, "vnc", vnc);
+    qdict_put(src, "acl.rules.0.match", qstring_from_str("fred"));
+    qdict_put(src, "acl.rules.0.policy", qstring_from_str("allow"));
+    qdict_put(src, "acl.rules.1.match", qstring_from_str("bob"));
+    qdict_put(src, "acl.rules.1.policy", qstring_from_str("deny"));
+
+    res = qdict_crumple(src, true, &error_abort);
+
+    g_assert_cmpint(qobject_type(res), ==, QTYPE_QDICT);
+
+    dst = qobject_to_qdict(res);
+
+    g_assert_cmpint(qdict_size(dst), ==, 2);
+
+    child = qdict_get(dst, "vnc");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QDICT);
+    vnc = qobject_to_qdict(child);
+
+    g_assert_cmpstr("127.0.0.1", ==, qdict_get_str(vnc, "listen.addr"));
+    g_assert_cmpstr("5901", ==, qdict_get_str(vnc, "listen.port"));
+
+    child = qdict_get(dst, "acl");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QDICT);
+    acl = qobject_to_qdict(child);
+
+    child = qdict_get(acl, "rules");
+    g_assert_cmpint(qobject_type(child), ==, QTYPE_QLIST);
+    rules = qobject_to_qlist(child);
+    g_assert_cmpint(qlist_size(rules), ==, 2);
+
+    rule = qobject_to_qdict(qlist_pop(rules));
+    g_assert_cmpint(qdict_size(rule), ==, 2);
+    g_assert_cmpstr("fred", ==, qdict_get_str(rule, "match"));
+    g_assert_cmpstr("allow", ==, qdict_get_str(rule, "policy"));
+    QDECREF(rule);
+
+    rule = qobject_to_qdict(qlist_pop(rules));
+    g_assert_cmpint(qdict_size(rule), ==, 2);
+    g_assert_cmpstr("bob", ==, qdict_get_str(rule, "match"));
+    g_assert_cmpstr("deny", ==, qdict_get_str(rule, "policy"));
+    QDECREF(rule);
+
+    QDECREF(src);
+    QDECREF(dst);
+}
+
+
+static void qdict_crumple_test_non_flat_dict_bad(void)
+{
+    QDict *src, *vnc;
+    QObject *res;
+    Error *err = NULL;
+
+    vnc = qdict_new();
+    qdict_put(vnc, "listen.addr", qstring_from_str("127.0.0.1"));
+
+    src = qdict_new();
+    qdict_put(src, "vnc", vnc);
+    qdict_put(src, "vnc.listen.port", qstring_from_str("5901"));
+    qdict_put(src, "acl.rules.0.match", qstring_from_str("fred"));
+    qdict_put(src, "acl.rules.0.policy", qstring_from_str("allow"));
+    qdict_put(src, "acl.rules.1.match", qstring_from_str("bob"));
+    qdict_put(src, "acl.rules.1.policy", qstring_from_str("deny"));
+
+    res = qdict_crumple(src, true, &err);
+    g_assert(!res);
+    error_free_or_abort(&err);
+
+    QDECREF(src);
+}
+
 
 /*
  * Errors test-cases
@@ -1002,6 +1169,14 @@  int main(int argc, char **argv)
                     qdict_crumple_test_empty);
     g_test_add_func("/public/crumple/bad_inputs",
                     qdict_crumple_test_bad_inputs);
+    g_test_add_func("/public/crumple/non-flat-list-ok",
+                    qdict_crumple_test_non_flat_list_ok);
+    g_test_add_func("/public/crumple/non-flat-list-bad",
+                    qdict_crumple_test_non_flat_list_bad);
+    g_test_add_func("/public/crumple/non-flat-dict-ok",
+                    qdict_crumple_test_non_flat_dict_ok);
+    g_test_add_func("/public/crumple/non-flat-dict-bad",
+                    qdict_crumple_test_non_flat_dict_bad);
 
     /* The Big one */
     if (g_test_slow()) {