Message ID | 1455900463-16007-2-git-send-email-berrange@redhat.com |
---|---|
State | New |
Headers | show |
On 02/19/2016 09:47 AM, Daniel P. Berrange wrote: > The qdict_flatten() method will take a dict whose elements are > further nested dicts/lists and flatten them by concatenating > keys. > > The qdict_crumple() method aims todo the reverse, taking a flat s/todo/to do/ > qdict, and turning it into a set of nested dicts/lists. It will > apply nesting based on the key name, with a '.' indicating a > new level in the hierarchy. If the keys in the nested structure > are all numeric, it will create a list, otherwise it will create > a dict. > Interesting idea. I haven't closely reviewed the patch yet, but do have a comment on the commit message: > > If the key is intended to contain a literal '.', then it must > be escaped as '..'. ie a flat dict Hmm. We forbid the use of '.' inside QAPI key names in most places, but have an exception that downstream extensions use '.'. So, I guess the only place you'd really use this is trying to target a downstream-extension key name: > > { > 'foo..bar': 'wizz', > 'bar.foo..bar': 'eek', as in '__com..example_bar': 'eek' to set the downstream key-name of '__com.example_bar'. I'm not sure the special casing of '.' is worth it, particularly since qdict_flatten() doesn't have it in the reverse direction. > The intent of this function is that it allows a set of QemuOpts > to be turned into a nested data structure that mirrors the nested > used when the same object is defined over QMP. > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > --- I hope to offer a more detailed review next week (I'm trying to get some of my own patches polished today).
On Fri, Feb 19, 2016 at 10:01:07AM -0700, Eric Blake wrote: > On 02/19/2016 09:47 AM, Daniel P. Berrange wrote: > > The qdict_flatten() method will take a dict whose elements are > > further nested dicts/lists and flatten them by concatenating > > keys. > > > > The qdict_crumple() method aims todo the reverse, taking a flat > > s/todo/to do/ > > > qdict, and turning it into a set of nested dicts/lists. It will > > apply nesting based on the key name, with a '.' indicating a > > new level in the hierarchy. If the keys in the nested structure > > are all numeric, it will create a list, otherwise it will create > > a dict. > > > > Interesting idea. I haven't closely reviewed the patch yet, but do have > a comment on the commit message: > > > > > If the key is intended to contain a literal '.', then it must > > be escaped as '..'. ie a flat dict > > Hmm. We forbid the use of '.' inside QAPI key names in most places, but > have an exception that downstream extensions use '.'. So, I guess the > only place you'd really use this is trying to target a > downstream-extension key name: I originally put code into object_property_add() to forbid use of a '.' in a property name, but quickly found we have alot of such usage already :-( On a x86_64 default machine I get: Property sse4.1 on class qemu64-x86_64-cpu Property sse4.2 on class qemu64-x86_64-cpu Property pc.ram[*] on class container Property pc.ram[0] on class container Property pc.bios[*] on class container Property pc.bios[0] on class container Property pc.rom[*] on class container Property pc.rom[0] on class container Property fwcfg.dma[*] on class fw_cfg_io Property fwcfg.dma[0] on class fw_cfg_io Property pci.0 on class i440FX-pcihost Property isa.0 on class PIIX3 Property vga.vram[*] on class VGA Property vga.vram[0] on class VGA Property vga.mmio[*] on class container Property vga.mmio[0] on class container Property vga.rom[*] on class VGA Property vga.rom[0] on class VGA Property e1000.rom[*] on class e1000 Property e1000.rom[0] on class e1000 Property ide.0 on class piix3-ide Property ide.1 on class piix3-ide Now none of those are implementing the UserCreatable interface right now, so wouldn't be a problem for -object usage, but I felt if we allow '.' in property names for devices, we ought to make sure the code deals with it everything. So it seems forbiding '.' in property names won't fly :-( Regards, Daniel
On 19.02.2016 17:47, Daniel P. Berrange wrote: > The qdict_flatten() method will take a dict whose elements are > further nested dicts/lists and flatten them by concatenating > keys. > > The qdict_crumple() method aims todo the reverse, taking a flat > qdict, and turning it into a set of nested dicts/lists. It will > apply nesting based on the key name, with a '.' indicating a > new level in the hierarchy. If the keys in the nested structure > are all numeric, it will create a list, otherwise it will create > a dict. > > If the keys are a mixture of numeric and non-numeric, or the > numeric keys are not in strictly ascending order, an error will > be reported. > > As an example, a flat dict containing > > { > 'foo.0.bar': 'one', > 'foo.0.wizz': '1', > 'foo.1.bar': 'two', > 'foo.1.wizz': '2' > } > > will get turned into a dict with one element 'foo' whose > value is a list. The list elements will each in turn be > dicts. > > { > 'foo' => [ > { 'bar': 'one', 'wizz': '1' } > { 'bar': 'two', 'wizz': '2' } > ], > } > > If the key is intended to contain a literal '.', then it must > be escaped as '..'. ie a flat dict > > { > 'foo..bar': 'wizz', > 'bar.foo..bar': 'eek', > 'bar.hello': 'world' > } > > Will end up as > > { > 'foo.bar': 'wizz', > 'bar': { > 'foo.bar': 'eek', > 'hello': 'world' > } > } > > The intent of this function is that it allows a set of QemuOpts > to be turned into a nested data structure that mirrors the nested > used when the same object is defined over QMP. > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > --- > include/qapi/qmp/qdict.h | 1 + > qobject/qdict.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++ > tests/check-qdict.c | 39 ++++++++++ > 3 files changed, 220 insertions(+) > > diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h > index 6c2a0e5..baf45d5 100644 > --- a/include/qapi/qmp/qdict.h > +++ b/include/qapi/qmp/qdict.h > @@ -75,6 +75,7 @@ void qdict_flatten(QDict *qdict); > void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start); > void qdict_array_split(QDict *src, QList **dst); > int qdict_array_entries(QDict *src, const char *subqdict); > +QObject *qdict_crumple(QDict *src, Error **errp); > > void qdict_join(QDict *dest, QDict *src, bool overwrite); > > diff --git a/qobject/qdict.c b/qobject/qdict.c > index 9833bd0..ff4caf8 100644 > --- a/qobject/qdict.c > +++ b/qobject/qdict.c > @@ -608,6 +608,7 @@ void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start) > } > } > > + > static int qdict_count_prefixed_entries(const QDict *src, const char *start) > { > const QDictEntry *entry; This hunk seems unintended. > @@ -682,6 +683,185 @@ void qdict_array_split(QDict *src, QList **dst) > } > } > > + > +/** > + * qdict_crumple: > + * > + * Reverses the flattening done by qdict_flatten by > + * crumpling the dicts into a nested structure. Similar > + * qdict_array_split, but copes with arbitrary nesting > + * of dicts & arrays, not meely one level of arrays > + * > + * { 'foo.0.bar': 'one', 'foo.0.wizz': '1', > + * 'foo.1.bar': 'two', 'foo.1.wizz': '2' } > + * > + * => > + * > + * { > + * 'foo' => [ > + * { 'bar': 'one', 'wizz': '1' } > + * { 'bar': 'two', 'wizz': '2' } > + * ], > + * } > + * > + */ > +QObject *qdict_crumple(QDict *src, Error **errp) > +{ > + const QDictEntry *entry, *next; > + const char *p = NULL; > + QDict *tmp1 = NULL, *tmp2 = NULL; > + QObject *dst = NULL, *child; > + bool isList = false; > + ssize_t listMax = -1; > + size_t listLen = 0; As far as I'm aware we don't have any strict policy on names, but in general structs use UpperCamelCase and variables and functions use c_style_naming. > + size_t i, j; > + int64_t val; > + char *key; > + > + tmp1 = qdict_new(); I guess I'd like clearer variable names. > + entry = qdict_first(src); > + > + /* Step 1: extract everything as nested dicts */ > + while (entry != NULL) { > + next = qdict_next(src, entry); > + qobject_incref(entry->value); > + > + /* Find first '.' separator, but treat '..' as > + * an escape sequence */ > + p = NULL; How about at least "dot" or "separator"? > + do { > + if (p) { > + p += 2; > + } else { > + p = entry->key; > + } > + p = strchr(p, '.'); > + } while (p && *(p + 1) == '.'); > + > + if (p) { > + key = g_strndup(entry->key, > + p - entry->key); > + } else { > + key = g_strdup(entry->key); > + } > + > + for (i = 0, j = 0; key[i] != '\0'; i++, j++) { > + if (key[i] == '.' && > + key[i + 1] == '.') { > + i++; > + } > + key[j] = key[i]; > + } > + key[j] = '\0'; I general I think it would be nice to put this escaping code into an own static function which returns a strdup'd key for the QDict and this entry's key in that QDict. > + > + if (p) { > + tmp2 = qdict_get_qdict(tmp1, key); > + p++; > + if (!tmp2) { > + tmp2 = qdict_new(); > + qdict_put(tmp1, key, tmp2); This... > + } > + qdict_put_obj(tmp2, p, entry->value); > + } else { > + qdict_put_obj(tmp1, key, entry->value); ...and this may silently overwrite entries, e.g. when crumpling { "x": 42, "x.y": 23 } > + } key is leaked. > + > + entry = next; > + } > + > + /* Step 2: crumple the new dicts we just created */ > + tmp2 = qdict_new(); Please use more variables with clearer names. > + entry = qdict_first(tmp1); > + while (entry != NULL) { > + next = qdict_next(tmp1, entry); > + > + if (qobject_type(entry->value) == QTYPE_QDICT) { > + child = qdict_crumple((QDict *)entry->value, errp); The cast works, but I'd prefer qobject_to_qdict() nonetheless. > + if (!child) { > + goto error; > + } > + > + qdict_put_obj(tmp2, entry->key, child); > + } else { > + qobject_incref(entry->value); > + qdict_put_obj(tmp2, entry->key, entry->value); > + } > + > + entry = next; > + } > + QDECREF(tmp1); > + > + /* Step 3: detect if we need to turn our dict into list */ You could use qdict_array_entries(tmp2, "") > 0 for this. If you do want to make { "0": 42, "2": 23 } and { "0": 42, "x": 23 } errors (my qdict_unflatten() just kept those QDicts), then you'd have to check on qdict_array_entries() error whether the QDict contained an integer key, but that would still be simpler than the following loop and the check in step 4. > + entry = qdict_first(tmp2); > + while (entry != NULL) { > + next = qdict_next(tmp2, entry); > + > + errno = 0; > + if (qemu_strtoll(entry->key, NULL, 10, &val) == 0) { > + if (!dst) { > + dst = (QObject *)qlist_new(); > + isList = true; > + } else if (!isList) { > + error_setg(errp, > + "Key '%s' is for a list, but previous key is " > + "for a dict", entry->key); > + goto error; > + } > + listLen++; > + if (val > listMax) { > + listMax = val; > + } > + } else { > + if (!dst) { > + dst = (QObject *)tmp2; > + qobject_incref(dst); > + isList = false; > + } else if (isList) { > + error_setg(errp, > + "Key '%s' is for a dict, but previous key is " > + "for a list", entry->key); > + goto error; > + } > + } > + > + entry = next; > + } > + > + /* Step 4: Turn the dict into a list */ Why not just use qdict_array_split(tmp2, &dst);? Max > + if (isList) { > + if (listLen != (listMax + 1)) { > + error_setg(errp, "List indexes are not continuous, " > + "saw %zu elements but %zu largest index", > + listLen, listMax); > + goto error; > + } > + > + for (i = 0; i < listLen; i++) { > + char *key = g_strdup_printf("%zu", i); > + > + child = qdict_get(tmp2, key); > + g_free(key); > + if (!child) { > + error_setg(errp, "Unexpected missing list entry %zu", i); > + goto error; > + } > + > + qobject_incref(child); > + qlist_append_obj((QList *)dst, child); > + } > + } > + QDECREF(tmp2); > + > + return dst; > + > + error: > + QDECREF(tmp2); > + QDECREF(tmp1); > + qobject_decref(dst); > + return NULL; > +} > + > + > /** > * qdict_array_entries(): Returns the number of direct array entries if the > * sub-QDict of src specified by the prefix in subqdict (or src itself for > diff --git a/tests/check-qdict.c b/tests/check-qdict.c > index a43056c..fb8184c 100644 > --- a/tests/check-qdict.c > +++ b/tests/check-qdict.c > @@ -596,6 +596,43 @@ static void qdict_join_test(void) > QDECREF(dict2); > } > > + > +static void qdict_crumple_test(void) > +{ > + QDict *src, *dst, *rule, *eek; > + QList *rules; > + > + src = qdict_new(); > + qdict_put(src, "rule.0.match", qstring_from_str("fred")); > + qdict_put(src, "rule.0.policy", qstring_from_str("allow")); > + qdict_put(src, "rule.1.match", qstring_from_str("bob")); > + qdict_put(src, "rule.1.policy", qstring_from_str("deny")); > + qdict_put(src, "foo..bar", qstring_from_str("wibble")); > + qdict_put(src, "eek.foo..bar", qstring_from_str("wizz")); > + > + dst = (QDict *)qdict_crumple(src, &error_abort); > + > + g_assert_cmpint(qdict_size(dst), ==, 3); > + > + rules = qdict_get_qlist(dst, "rule"); > + > + g_assert_cmpint(qlist_size(rules), ==, 2); > + > + rule = qobject_to_qdict(qlist_pop(rules)); > + g_assert_cmpstr("fred", ==, qdict_get_str(rule, "match")); > + g_assert_cmpstr("allow", ==, qdict_get_str(rule, "policy")); > + > + rule = qobject_to_qdict(qlist_pop(rules)); > + g_assert_cmpstr("bob", ==, qdict_get_str(rule, "match")); > + g_assert_cmpstr("deny", ==, qdict_get_str(rule, "policy")); > + > + g_assert_cmpstr("wibble", ==, qdict_get_str(dst, "foo.bar")); > + > + eek = qdict_get_qdict(dst, "eek"); > + g_assert_cmpstr("wizz", ==, qdict_get_str(eek, "foo.bar")); > +} > + > + > /* > * Errors test-cases > */ > @@ -743,6 +780,8 @@ int main(int argc, char **argv) > g_test_add_func("/errors/put_exists", qdict_put_exists_test); > g_test_add_func("/errors/get_not_exists", qdict_get_not_exists_test); > > + g_test_add_func("/public/crumple", qdict_crumple_test); > + > /* The Big one */ > if (g_test_slow()) { > g_test_add_func("/stress/test", qdict_stress_test); >
On Wed, Mar 02, 2016 at 05:13:59PM +0100, Max Reitz wrote: > On 19.02.2016 17:47, Daniel P. Berrange wrote: > > The qdict_flatten() method will take a dict whose elements are > > further nested dicts/lists and flatten them by concatenating > > keys. > > > > The qdict_crumple() method aims todo the reverse, taking a flat > > qdict, and turning it into a set of nested dicts/lists. It will > > apply nesting based on the key name, with a '.' indicating a > > new level in the hierarchy. If the keys in the nested structure > > are all numeric, it will create a list, otherwise it will create > > a dict. > > > > If the keys are a mixture of numeric and non-numeric, or the > > numeric keys are not in strictly ascending order, an error will > > be reported. > > > > As an example, a flat dict containing > > > > { > > 'foo.0.bar': 'one', > > 'foo.0.wizz': '1', > > 'foo.1.bar': 'two', > > 'foo.1.wizz': '2' > > } > > > > will get turned into a dict with one element 'foo' whose > > value is a list. The list elements will each in turn be > > dicts. > > > > { > > 'foo' => [ > > { 'bar': 'one', 'wizz': '1' } > > { 'bar': 'two', 'wizz': '2' } > > ], > > } > > > > If the key is intended to contain a literal '.', then it must > > be escaped as '..'. ie a flat dict > > > > { > > 'foo..bar': 'wizz', > > 'bar.foo..bar': 'eek', > > 'bar.hello': 'world' > > } > > > > Will end up as > > > > { > > 'foo.bar': 'wizz', > > 'bar': { > > 'foo.bar': 'eek', > > 'hello': 'world' > > } > > } > > > > The intent of this function is that it allows a set of QemuOpts > > to be turned into a nested data structure that mirrors the nested > > used when the same object is defined over QMP. > > > > Signed-off-by: Daniel P. Berrange <berrange@redhat.com> > > --- > > include/qapi/qmp/qdict.h | 1 + > > qobject/qdict.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++ > > tests/check-qdict.c | 39 ++++++++++ > > 3 files changed, 220 insertions(+) > > > > +QObject *qdict_crumple(QDict *src, Error **errp) > > +{ > > + const QDictEntry *entry, *next; > > + const char *p = NULL; > > + QDict *tmp1 = NULL, *tmp2 = NULL; > > + QObject *dst = NULL, *child; > > + bool isList = false; > > + ssize_t listMax = -1; > > + size_t listLen = 0; > > As far as I'm aware we don't have any strict policy on names, but in > general structs use UpperCamelCase and variables and functions use > c_style_naming. Yep, ok > > + size_t i, j; > > + int64_t val; > > + char *key; > > + > > + tmp1 = qdict_new(); > > I guess I'd like clearer variable names. Sure > > > + entry = qdict_first(src); > > + > > + /* Step 1: extract everything as nested dicts */ > > + while (entry != NULL) { > > + next = qdict_next(src, entry); > > + qobject_incref(entry->value); > > + > > + /* Find first '.' separator, but treat '..' as > > + * an escape sequence */ > > + p = NULL; > > How about at least "dot" or "separator"? Sure. > > + do { > > + if (p) { > > + p += 2; > > + } else { > > + p = entry->key; > > + } > > + p = strchr(p, '.'); > > + } while (p && *(p + 1) == '.'); > > + > > + if (p) { > > + key = g_strndup(entry->key, > > + p - entry->key); > > + } else { > > + key = g_strdup(entry->key); > > + } > > + > > + for (i = 0, j = 0; key[i] != '\0'; i++, j++) { > > + if (key[i] == '.' && > > + key[i + 1] == '.') { > > + i++; > > + } > > + key[j] = key[i]; > > + } > > + key[j] = '\0'; > > I general I think it would be nice to put this escaping code into an own > static function which returns a strdup'd key for the QDict and this > entry's key in that QDict. Yep, good idea. > > > + > > + if (p) { > > + tmp2 = qdict_get_qdict(tmp1, key); > > + p++; > > + if (!tmp2) { > > + tmp2 = qdict_new(); > > + qdict_put(tmp1, key, tmp2); > > This... > > > + } > > + qdict_put_obj(tmp2, p, entry->value); > > + } else { > > + qdict_put_obj(tmp1, key, entry->value); > > ...and this may silently overwrite entries, e.g. when crumpling > > { "x": 42, "x.y": 23 } > > > + } > > key is leaked. Oh that should result in an error being raised as it is a contradictory input. > > + > > + entry = next; > > + } > > + > > + /* Step 2: crumple the new dicts we just created */ > > + tmp2 = qdict_new(); > > Please use more variables with clearer names. > > > + entry = qdict_first(tmp1); > > + while (entry != NULL) { > > + next = qdict_next(tmp1, entry); > > + > > + if (qobject_type(entry->value) == QTYPE_QDICT) { > > + child = qdict_crumple((QDict *)entry->value, errp); > > The cast works, but I'd prefer qobject_to_qdict() nonetheless. Ok > > > + if (!child) { > > + goto error; > > + } > > + > > + qdict_put_obj(tmp2, entry->key, child); > > + } else { > > + qobject_incref(entry->value); > > + qdict_put_obj(tmp2, entry->key, entry->value); > > + } > > + > > + entry = next; > > + } > > + QDECREF(tmp1); > > + > > + /* Step 3: detect if we need to turn our dict into list */ > > You could use qdict_array_entries(tmp2, "") > 0 for this. > > If you do want to make { "0": 42, "2": 23 } and { "0": 42, "x": 23 } > errors (my qdict_unflatten() just kept those QDicts), then you'd have to > check on qdict_array_entries() error whether the QDict contained an > integer key, but that would still be simpler than the following loop and > the check in step 4. If I use qdict_array_entries() then it does 2 O(N) iterations of the input qdict, and to check the errors, I'll have todo yet another iteration. My inlined code here does everything in a single O(N) iteration instead of 3. I think that's compelling enough to reason to not use that function. > > + entry = qdict_first(tmp2); > > + while (entry != NULL) { > > + next = qdict_next(tmp2, entry); > > + > > + errno = 0; > > + if (qemu_strtoll(entry->key, NULL, 10, &val) == 0) { > > + if (!dst) { > > + dst = (QObject *)qlist_new(); > > + isList = true; > > + } else if (!isList) { > > + error_setg(errp, > > + "Key '%s' is for a list, but previous key is " > > + "for a dict", entry->key); > > + goto error; > > + } > > + listLen++; > > + if (val > listMax) { > > + listMax = val; > > + } > > + } else { > > + if (!dst) { > > + dst = (QObject *)tmp2; > > + qobject_incref(dst); > > + isList = false; > > + } else if (isList) { > > + error_setg(errp, > > + "Key '%s' is for a dict, but previous key is " > > + "for a list", entry->key); > > + goto error; > > + } > > + } > > + > > + entry = next; > > + } > > + > > + /* Step 4: Turn the dict into a list */ > > Why not just use qdict_array_split(tmp2, &dst);? Again qdict_array_split() is a pretty inefficiently written function. It does multiple nested iterations over its input giving it O(n^2) time, compared to O(n) for my code here. The only user of qdict_array_entries() and qdict_array_split() is the block quorum driver. From a brief look at that code I think it could probably be changed to just call this new qdict_crumple() method, and those 2 inefficient functions could be deleted, so we don't have duplication of code. > > + if (isList) { > > + if (listLen != (listMax + 1)) { > > + error_setg(errp, "List indexes are not continuous, " > > + "saw %zu elements but %zu largest index", > > + listLen, listMax); > > + goto error; > > + } > > + > > + for (i = 0; i < listLen; i++) { > > + char *key = g_strdup_printf("%zu", i); > > + > > + child = qdict_get(tmp2, key); > > + g_free(key); > > + if (!child) { > > + error_setg(errp, "Unexpected missing list entry %zu", i); > > + goto error; > > + } > > + > > + qobject_incref(child); > > + qlist_append_obj((QList *)dst, child); > > + } > > + } > > + QDECREF(tmp2); > > + > > + return dst; > > + > > + error: > > + QDECREF(tmp2); > > + QDECREF(tmp1); > > + qobject_decref(dst); > > + return NULL; > > +} > > + > > + Regards, Daniel
On 03.03.2016 12:01, Daniel P. Berrange wrote: > On Wed, Mar 02, 2016 at 05:13:59PM +0100, Max Reitz wrote: >> On 19.02.2016 17:47, Daniel P. Berrange wrote: >>> The qdict_flatten() method will take a dict whose elements are >>> further nested dicts/lists and flatten them by concatenating >>> keys. >>> >>> The qdict_crumple() method aims todo the reverse, taking a flat >>> qdict, and turning it into a set of nested dicts/lists. It will >>> apply nesting based on the key name, with a '.' indicating a >>> new level in the hierarchy. If the keys in the nested structure >>> are all numeric, it will create a list, otherwise it will create >>> a dict. >>> >>> If the keys are a mixture of numeric and non-numeric, or the >>> numeric keys are not in strictly ascending order, an error will >>> be reported. [...] >>> + if (!child) { >>> + goto error; >>> + } >>> + >>> + qdict_put_obj(tmp2, entry->key, child); >>> + } else { >>> + qobject_incref(entry->value); >>> + qdict_put_obj(tmp2, entry->key, entry->value); >>> + } >>> + >>> + entry = next; >>> + } >>> + QDECREF(tmp1); >>> + >>> + /* Step 3: detect if we need to turn our dict into list */ >> >> You could use qdict_array_entries(tmp2, "") > 0 for this. >> >> If you do want to make { "0": 42, "2": 23 } and { "0": 42, "x": 23 } >> errors (my qdict_unflatten() just kept those QDicts), then you'd have to >> check on qdict_array_entries() error whether the QDict contained an >> integer key, but that would still be simpler than the following loop and >> the check in step 4. > > If I use qdict_array_entries() then it does 2 O(N) iterations > of the input qdict, and to check the errors, I'll have todo > yet another iteration. My inlined code here does everything > in a single O(N) iteration instead of 3. I think that's > compelling enough to reason to not use that function. O(3N) = O(N) :-) Making qdict_array_entries() O(N) (pretending that O(N) and O(2N) were different things) for this case is trivial: Just omit the second loop if "" has been passed as the subqdict. So then you'd end up with "O(2N)", if you do the error checking. So you are right, there is a reason not to use qdict_array_entries(), but I'd personally still use it. I only have very limited knowledge of the whole qemu code base, but it doesn't appear to me as if QDict functions are ever used in a hot path or as if QDicts ever contain a large number of elements (with large being more than, say, 1000), especially not the kind of QDicts you'd want to crumple. Because of that, I chose to use these existing functions, even while knowing that they are probably not optimal for this case. You did propose removing qdict_array_entries() and qdict_array_split() in favor of just using this function in their stead (see my reply below), so if we do so, reusing them would obviously not be ideal any more. In that case, I'd still put this into its own static function if possible. tl;dr: I don't think runtime complexity is an issue here, but if we are going to drop qdict_array_entries() and qdict_array_split() anyway, then using them is a bad idea, obviously. It would then still be nice to put this into an own function. >>> + entry = qdict_first(tmp2); >>> + while (entry != NULL) { >>> + next = qdict_next(tmp2, entry); >>> + >>> + errno = 0; >>> + if (qemu_strtoll(entry->key, NULL, 10, &val) == 0) { >>> + if (!dst) { >>> + dst = (QObject *)qlist_new(); >>> + isList = true; >>> + } else if (!isList) { >>> + error_setg(errp, >>> + "Key '%s' is for a list, but previous key is " >>> + "for a dict", entry->key); >>> + goto error; >>> + } >>> + listLen++; >>> + if (val > listMax) { >>> + listMax = val; >>> + } >>> + } else { >>> + if (!dst) { >>> + dst = (QObject *)tmp2; >>> + qobject_incref(dst); >>> + isList = false; >>> + } else if (isList) { >>> + error_setg(errp, >>> + "Key '%s' is for a dict, but previous key is " >>> + "for a list", entry->key); >>> + goto error; >>> + } >>> + } >>> + >>> + entry = next; >>> + } >>> + >>> + /* Step 4: Turn the dict into a list */ >> >> Why not just use qdict_array_split(tmp2, &dst);? > > Again qdict_array_split() is a pretty inefficiently written > function. It does multiple nested iterations over its input > giving it O(n^2) time, compared to O(n) for my code here. Strictly speaking, it's not O(n^2) but O(nm), where n is qdict_size(src) and m is the size of the resulting QList. (Side note: This function only is O(n) if qdict_get() is O(1), which it is in best case. In worst case, it's O(n), and then this code becomes O(n^2).) Again, I don't think that O(nm) is much worse than O(n) for the use case at hand, but if you are going to drop qdict_array_split() anyway, then, again, it doesn't make sense to call it at all. You could again put the code inside of the "if (isList) {}" block into an own function, but it's not that long so I'd be fine with it staying here (although it is certainly self-contained enough to look good in an own function :-)). > The only user of qdict_array_entries() and qdict_array_split() > is the block quorum driver. From a brief look at that code > I think it could probably be changed to just call this new > qdict_crumple() method, and those 2 inefficient functions > could be deleted, so we don't have duplication of code. I'm not sure. First, you do not want to crumple recursively there, so you'd have to re-flatten all the QList elements after you have crumpled them. Could be solved by adding a "bool recurse" parameter to this function. Second, yes, qdict_array_entries() is used by quorum. However, it does (no longer) use qdict_array_split(); the users of that are util/qemu-config.c and blockdev.c. Replacing those with a non-recursing call to qdict_crumple() should be trivial, but quorum is going to be a bit more difficult. You could make it just call qdict_crumple(), get the size of the resulting list and then discard it, but that seems a bit over the top. All in all, I think you're right in that this function can replace the (potentially) worse qdict_array_split() function (if the caller can stop it from recursing), but I don't know whether we can get rid of qdict_array_entries() so easily. Max >>> + if (isList) { >>> + if (listLen != (listMax + 1)) { >>> + error_setg(errp, "List indexes are not continuous, " >>> + "saw %zu elements but %zu largest index", >>> + listLen, listMax); >>> + goto error; >>> + } >>> + >>> + for (i = 0; i < listLen; i++) { >>> + char *key = g_strdup_printf("%zu", i); >>> + >>> + child = qdict_get(tmp2, key); >>> + g_free(key); >>> + if (!child) { >>> + error_setg(errp, "Unexpected missing list entry %zu", i); >>> + goto error; >>> + } >>> + >>> + qobject_incref(child); >>> + qlist_append_obj((QList *)dst, child); >>> + } >>> + } >>> + QDECREF(tmp2); >>> + >>> + return dst; >>> + >>> + error: >>> + QDECREF(tmp2); >>> + QDECREF(tmp1); >>> + qobject_decref(dst); >>> + return NULL; >>> +} >>> + >>> + > > Regards, > Daniel >
On Sat, Mar 05, 2016 at 04:15:59PM +0100, Max Reitz wrote: > On 03.03.2016 12:01, Daniel P. Berrange wrote: > > On Wed, Mar 02, 2016 at 05:13:59PM +0100, Max Reitz wrote: > >> On 19.02.2016 17:47, Daniel P. Berrange wrote: > >>> The qdict_flatten() method will take a dict whose elements are > >>> further nested dicts/lists and flatten them by concatenating > >>> keys. > >>> > >>> The qdict_crumple() method aims todo the reverse, taking a flat > >>> qdict, and turning it into a set of nested dicts/lists. It will > >>> apply nesting based on the key name, with a '.' indicating a > >>> new level in the hierarchy. If the keys in the nested structure > >>> are all numeric, it will create a list, otherwise it will create > >>> a dict. > >>> > >>> If the keys are a mixture of numeric and non-numeric, or the > >>> numeric keys are not in strictly ascending order, an error will > >>> be reported. > > [...] > > >>> + if (!child) { > >>> + goto error; > >>> + } > >>> + > >>> + qdict_put_obj(tmp2, entry->key, child); > >>> + } else { > >>> + qobject_incref(entry->value); > >>> + qdict_put_obj(tmp2, entry->key, entry->value); > >>> + } > >>> + > >>> + entry = next; > >>> + } > >>> + QDECREF(tmp1); > >>> + > >>> + /* Step 3: detect if we need to turn our dict into list */ > >> > >> You could use qdict_array_entries(tmp2, "") > 0 for this. > >> > >> If you do want to make { "0": 42, "2": 23 } and { "0": 42, "x": 23 } > >> errors (my qdict_unflatten() just kept those QDicts), then you'd have to > >> check on qdict_array_entries() error whether the QDict contained an > >> integer key, but that would still be simpler than the following loop and > >> the check in step 4. > > > > If I use qdict_array_entries() then it does 2 O(N) iterations > > of the input qdict, and to check the errors, I'll have todo > > yet another iteration. My inlined code here does everything > > in a single O(N) iteration instead of 3. I think that's > > compelling enough to reason to not use that function. > > O(3N) = O(N) :-) > > Making qdict_array_entries() O(N) (pretending that O(N) and O(2N) were > different things) for this case is trivial: Just omit the second loop if > "" has been passed as the subqdict. > > So then you'd end up with "O(2N)", if you do the error checking. > > So you are right, there is a reason not to use qdict_array_entries(), > but I'd personally still use it. I only have very limited knowledge of > the whole qemu code base, but it doesn't appear to me as if QDict > functions are ever used in a hot path or as if QDicts ever contain a > large number of elements (with large being more than, say, 1000), > especially not the kind of QDicts you'd want to crumple. > > Because of that, I chose to use these existing functions, even while > knowing that they are probably not optimal for this case. > > You did propose removing qdict_array_entries() and qdict_array_split() > in favor of just using this function in their stead (see my reply > below), so if we do so, reusing them would obviously not be ideal any > more. In that case, I'd still put this into its own static function if > possible. > > tl;dr: I don't think runtime complexity is an issue here, but if we are > going to drop qdict_array_entries() and qdict_array_split() anyway, then > using them is a bad idea, obviously. It would then still be nice to put > this into an own function. So looking at the current usage of qdict_flatten, qdict_extract_subqdict, qdict_array_split and qdict_array_entries, it is basically only the block layer. The root cause of this usage all stems from that fact that historically the block layer was driven off QemuOpts CLI args which are a flat data structure. Over time we've tried to bolt on recursive data structure support, but the main APIs are still accepting QDicts whose contents are flat. Increasingly I see the internal code is based on QAPI which is an inherantly nested data structure, as a result the block layer is getting more & more places where it has to process the flat data structure to extract nested bits (qdict_extract_subqdict, qdict_array_entries and qdict_array_split). Conversely when fed data coming from QMP it has to flatten the data structure with qdict_flatten. With this new qdict_crumple() method (or equivalently you qdict_unflatten) it strikes me we can significantly simplify the block layer to always use a nested QDict internally. All that would be required is to call the qdict_crumple() method in the places which have the flat QemuOpts derived QDict. At that point we would potentially be able to delete all of qdict_flatten, qdict_extract_subqdict, qdict_array_split and qdict_array_entries Of course I'm not suggesting we do that for 2.6, but it actually doesn't look like it would be that hard todo this conversion. > > The only user of qdict_array_entries() and qdict_array_split() > > is the block quorum driver. From a brief look at that code > > I think it could probably be changed to just call this new > > qdict_crumple() method, and those 2 inefficient functions > > could be deleted, so we don't have duplication of code. > > I'm not sure. First, you do not want to crumple recursively there, so > you'd have to re-flatten all the QList elements after you have crumpled > them. Could be solved by adding a "bool recurse" parameter to this function. > > Second, yes, qdict_array_entries() is used by quorum. However, it does > (no longer) use qdict_array_split(); the users of that are > util/qemu-config.c and blockdev.c. Replacing those with a non-recursing > call to qdict_crumple() should be trivial, but quorum is going to be a > bit more difficult. You could make it just call qdict_crumple(), get the > size of the resulting list and then discard it, but that seems a bit > over the top. > > All in all, I think you're right in that this function can replace the > (potentially) worse qdict_array_split() function (if the caller can stop > it from recursing), but I don't know whether we can get rid of > qdict_array_entries() so easily. Yeah, that would require the big block layer conversion to always use a nested QDict and never use a flat QDict. Regards, Daniel
On 03/07/2016 08:06 AM, Daniel P. Berrange wrote: > So looking at the current usage of qdict_flatten, qdict_extract_subqdict, > qdict_array_split and qdict_array_entries, it is basically only the block > layer. > > The root cause of this usage all stems from that fact that historically > the block layer was driven off QemuOpts CLI args which are a flat data > structure. Over time we've tried to bolt on recursive data structure > support, but the main APIs are still accepting QDicts whose contents are > flat. Increasingly I see the internal code is based on QAPI which is an > inherantly nested data structure, as a result the block layer is getting > more & more places where it has to process the flat data structure to > extract nested bits (qdict_extract_subqdict, qdict_array_entries and > qdict_array_split). Conversely when fed data coming from QMP it has to > flatten the data structure with qdict_flatten. > > With this new qdict_crumple() method (or equivalently you qdict_unflatten) > it strikes me we can significantly simplify the block layer to always use > a nested QDict internally. All that would be required is to call the > qdict_crumple() method in the places which have the flat QemuOpts derived > QDict. At that point we would potentially be able to delete all of > qdict_flatten, qdict_extract_subqdict, qdict_array_split and > qdict_array_entries > > Of course I'm not suggesting we do that for 2.6, but it actually doesn't > look like it would be that hard todo this conversion. Agreed that it's too late for 2.6, but would make a good project for 2.7. For that matter, rather than passing a QDict around, I'd like to see if we could just directly use the QAPI types instead (the way you just recently converted from QDict to SocketAddress).
diff --git a/include/qapi/qmp/qdict.h b/include/qapi/qmp/qdict.h index 6c2a0e5..baf45d5 100644 --- a/include/qapi/qmp/qdict.h +++ b/include/qapi/qmp/qdict.h @@ -75,6 +75,7 @@ void qdict_flatten(QDict *qdict); void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start); void qdict_array_split(QDict *src, QList **dst); int qdict_array_entries(QDict *src, const char *subqdict); +QObject *qdict_crumple(QDict *src, Error **errp); void qdict_join(QDict *dest, QDict *src, bool overwrite); diff --git a/qobject/qdict.c b/qobject/qdict.c index 9833bd0..ff4caf8 100644 --- a/qobject/qdict.c +++ b/qobject/qdict.c @@ -608,6 +608,7 @@ void qdict_extract_subqdict(QDict *src, QDict **dst, const char *start) } } + static int qdict_count_prefixed_entries(const QDict *src, const char *start) { const QDictEntry *entry; @@ -682,6 +683,185 @@ void qdict_array_split(QDict *src, QList **dst) } } + +/** + * qdict_crumple: + * + * Reverses the flattening done by qdict_flatten by + * crumpling the dicts into a nested structure. Similar + * qdict_array_split, but copes with arbitrary nesting + * of dicts & arrays, not meely one level of arrays + * + * { 'foo.0.bar': 'one', 'foo.0.wizz': '1', + * 'foo.1.bar': 'two', 'foo.1.wizz': '2' } + * + * => + * + * { + * 'foo' => [ + * { 'bar': 'one', 'wizz': '1' } + * { 'bar': 'two', 'wizz': '2' } + * ], + * } + * + */ +QObject *qdict_crumple(QDict *src, Error **errp) +{ + const QDictEntry *entry, *next; + const char *p = NULL; + QDict *tmp1 = NULL, *tmp2 = NULL; + QObject *dst = NULL, *child; + bool isList = false; + ssize_t listMax = -1; + size_t listLen = 0; + size_t i, j; + int64_t val; + char *key; + + tmp1 = qdict_new(); + entry = qdict_first(src); + + /* Step 1: extract everything as nested dicts */ + while (entry != NULL) { + next = qdict_next(src, entry); + qobject_incref(entry->value); + + /* Find first '.' separator, but treat '..' as + * an escape sequence */ + p = NULL; + do { + if (p) { + p += 2; + } else { + p = entry->key; + } + p = strchr(p, '.'); + } while (p && *(p + 1) == '.'); + + if (p) { + key = g_strndup(entry->key, + p - entry->key); + } else { + key = g_strdup(entry->key); + } + + for (i = 0, j = 0; key[i] != '\0'; i++, j++) { + if (key[i] == '.' && + key[i + 1] == '.') { + i++; + } + key[j] = key[i]; + } + key[j] = '\0'; + + if (p) { + tmp2 = qdict_get_qdict(tmp1, key); + p++; + if (!tmp2) { + tmp2 = qdict_new(); + qdict_put(tmp1, key, tmp2); + } + qdict_put_obj(tmp2, p, entry->value); + } else { + qdict_put_obj(tmp1, key, entry->value); + } + + entry = next; + } + + /* Step 2: crumple the new dicts we just created */ + tmp2 = qdict_new(); + entry = qdict_first(tmp1); + while (entry != NULL) { + next = qdict_next(tmp1, entry); + + if (qobject_type(entry->value) == QTYPE_QDICT) { + child = qdict_crumple((QDict *)entry->value, errp); + if (!child) { + goto error; + } + + qdict_put_obj(tmp2, entry->key, child); + } else { + qobject_incref(entry->value); + qdict_put_obj(tmp2, entry->key, entry->value); + } + + entry = next; + } + QDECREF(tmp1); + + /* Step 3: detect if we need to turn our dict into list */ + entry = qdict_first(tmp2); + while (entry != NULL) { + next = qdict_next(tmp2, entry); + + errno = 0; + if (qemu_strtoll(entry->key, NULL, 10, &val) == 0) { + if (!dst) { + dst = (QObject *)qlist_new(); + isList = true; + } else if (!isList) { + error_setg(errp, + "Key '%s' is for a list, but previous key is " + "for a dict", entry->key); + goto error; + } + listLen++; + if (val > listMax) { + listMax = val; + } + } else { + if (!dst) { + dst = (QObject *)tmp2; + qobject_incref(dst); + isList = false; + } else if (isList) { + error_setg(errp, + "Key '%s' is for a dict, but previous key is " + "for a list", entry->key); + goto error; + } + } + + entry = next; + } + + /* Step 4: Turn the dict into a list */ + if (isList) { + if (listLen != (listMax + 1)) { + error_setg(errp, "List indexes are not continuous, " + "saw %zu elements but %zu largest index", + listLen, listMax); + goto error; + } + + for (i = 0; i < listLen; i++) { + char *key = g_strdup_printf("%zu", i); + + child = qdict_get(tmp2, key); + g_free(key); + if (!child) { + error_setg(errp, "Unexpected missing list entry %zu", i); + goto error; + } + + qobject_incref(child); + qlist_append_obj((QList *)dst, child); + } + } + QDECREF(tmp2); + + return dst; + + error: + QDECREF(tmp2); + QDECREF(tmp1); + qobject_decref(dst); + return NULL; +} + + /** * qdict_array_entries(): Returns the number of direct array entries if the * sub-QDict of src specified by the prefix in subqdict (or src itself for diff --git a/tests/check-qdict.c b/tests/check-qdict.c index a43056c..fb8184c 100644 --- a/tests/check-qdict.c +++ b/tests/check-qdict.c @@ -596,6 +596,43 @@ static void qdict_join_test(void) QDECREF(dict2); } + +static void qdict_crumple_test(void) +{ + QDict *src, *dst, *rule, *eek; + QList *rules; + + src = qdict_new(); + qdict_put(src, "rule.0.match", qstring_from_str("fred")); + qdict_put(src, "rule.0.policy", qstring_from_str("allow")); + qdict_put(src, "rule.1.match", qstring_from_str("bob")); + qdict_put(src, "rule.1.policy", qstring_from_str("deny")); + qdict_put(src, "foo..bar", qstring_from_str("wibble")); + qdict_put(src, "eek.foo..bar", qstring_from_str("wizz")); + + dst = (QDict *)qdict_crumple(src, &error_abort); + + g_assert_cmpint(qdict_size(dst), ==, 3); + + rules = qdict_get_qlist(dst, "rule"); + + g_assert_cmpint(qlist_size(rules), ==, 2); + + rule = qobject_to_qdict(qlist_pop(rules)); + g_assert_cmpstr("fred", ==, qdict_get_str(rule, "match")); + g_assert_cmpstr("allow", ==, qdict_get_str(rule, "policy")); + + rule = qobject_to_qdict(qlist_pop(rules)); + g_assert_cmpstr("bob", ==, qdict_get_str(rule, "match")); + g_assert_cmpstr("deny", ==, qdict_get_str(rule, "policy")); + + g_assert_cmpstr("wibble", ==, qdict_get_str(dst, "foo.bar")); + + eek = qdict_get_qdict(dst, "eek"); + g_assert_cmpstr("wizz", ==, qdict_get_str(eek, "foo.bar")); +} + + /* * Errors test-cases */ @@ -743,6 +780,8 @@ int main(int argc, char **argv) g_test_add_func("/errors/put_exists", qdict_put_exists_test); g_test_add_func("/errors/get_not_exists", qdict_get_not_exists_test); + g_test_add_func("/public/crumple", qdict_crumple_test); + /* The Big one */ if (g_test_slow()) { g_test_add_func("/stress/test", qdict_stress_test);
The qdict_flatten() method will take a dict whose elements are further nested dicts/lists and flatten them by concatenating keys. The qdict_crumple() method aims todo the reverse, taking a flat qdict, and turning it into a set of nested dicts/lists. It will apply nesting based on the key name, with a '.' indicating a new level in the hierarchy. If the keys in the nested structure are all numeric, it will create a list, otherwise it will create a dict. If the keys are a mixture of numeric and non-numeric, or the numeric keys are not in strictly ascending order, an error will be reported. As an example, a flat dict containing { 'foo.0.bar': 'one', 'foo.0.wizz': '1', 'foo.1.bar': 'two', 'foo.1.wizz': '2' } will get turned into a dict with one element 'foo' whose value is a list. The list elements will each in turn be dicts. { 'foo' => [ { 'bar': 'one', 'wizz': '1' } { 'bar': 'two', 'wizz': '2' } ], } If the key is intended to contain a literal '.', then it must be escaped as '..'. ie a flat dict { 'foo..bar': 'wizz', 'bar.foo..bar': 'eek', 'bar.hello': 'world' } Will end up as { 'foo.bar': 'wizz', 'bar': { 'foo.bar': 'eek', 'hello': 'world' } } The intent of this function is that it allows a set of QemuOpts to be turned into a nested data structure that mirrors the nested used when the same object is defined over QMP. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> --- include/qapi/qmp/qdict.h | 1 + qobject/qdict.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++ tests/check-qdict.c | 39 ++++++++++ 3 files changed, 220 insertions(+)