diff mbox

Re: [PATCH 01/10] Introduce qmisc module

Message ID 4AD87424.3010000@redhat.com
State New
Headers show

Commit Message

Paolo Bonzini Oct. 16, 2009, 1:24 p.m. UTC
On 10/15/2009 08:34 PM, Anthony Liguori wrote:
> Luiz Capitulino wrote:
>> Not the right context but I was going to post about this soon, so
>> I think this is a good opportunity to talk about it.
>>
>> I didn't look at all available parsers from json.org, but this one:
>>
>> http://fara.cs.uni-potsdam.de/~jsg/json_parser/
>>
>> Seems interesting.
>>
>> Anthony, are you ok in using external implementations like that
>> if they meet our requirements?
>
> Otherwise, pulling the code into the tree isn't so bad provided that
> it's not huge.

It's 36k, and pulling it in gives the opportunity to customize it.  For 
example, the attached patch allows to parse a "%BLAH" extension to JSON 
that is passed to the callback (since the parsing is done 
character-by-character, the callback can consume whatever it wants after 
the % sign).  Asprintf+parse JSON unfortunately isn't enough because 
you'd need to escape all strings.

Paolo
From aa76cd652d740f535a7dfb47d2c4ecbd5d28d47a Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri, 16 Oct 2009 15:22:01 +0200
Subject: [PATCH] add a % escape that is passed verbatim to the callback

This could be used to parse something like { %s : %s } and
fetch the values for the placeholders from an external va_list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 JSON_parser.c |  104 +++++++++++++++++++++++++++++++++++----------------------
 JSON_parser.h |    2 +
 comments.json |    6 +++-
 main.c        |   11 ++++++
 4 files changed, 82 insertions(+), 41 deletions(-)

Comments

Anthony Liguori Oct. 16, 2009, 1:45 p.m. UTC | #1
Paolo Bonzini wrote:
> It's 36k, and pulling it in gives the opportunity to customize it.  
> For example, the attached patch allows to parse a "%BLAH" extension to 
> JSON that is passed to the callback (since the parsing is done 
> character-by-character, the callback can consume whatever it wants 
> after the % sign).  Asprintf+parse JSON unfortunately isn't enough 
> because you'd need to escape all strings.

What's the state of this library's upstream?  Should we be pushing these 
changes there and then attempting to package it?

I'd rather pull this in a submodule, try to get it packaged properly, 
and then eventually drop the submodule.  I don't want us to fork the 
library unless we have to.

Regards,

Anthony Liguori
Paolo Bonzini Oct. 16, 2009, 5:35 p.m. UTC | #2
On 10/16/2009 03:45 PM, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> It's 36k, and pulling it in gives the opportunity to customize it. For
>> example, the attached patch allows to parse a "%BLAH" extension to
>> JSON that is passed to the callback (since the parsing is done
>> character-by-character, the callback can consume whatever it wants
>> after the % sign). Asprintf+parse JSON unfortunately isn't enough
>> because you'd need to escape all strings.
>
> What's the state of this library's upstream? Should we be pushing these
> changes there and then attempting to package it?

There's no repository, there's no mention of it in the author's blog, it 
has seen six changes in two years according to the file's heading.  The 
only reference on da Internet is at 
http://tech.groups.yahoo.com/group/json/message/928.

On the other hand, it's down to the point (it has no object model of 
it's own), and it is fully asynchronous since it works 
character-by-character which makes it easier to extend as in my patch above.

Paolo
Anthony Liguori Oct. 16, 2009, 5:38 p.m. UTC | #3
Paolo Bonzini wrote:
> On 10/16/2009 03:45 PM, Anthony Liguori wrote:
>> Paolo Bonzini wrote:
>>> It's 36k, and pulling it in gives the opportunity to customize it. For
>>> example, the attached patch allows to parse a "%BLAH" extension to
>>> JSON that is passed to the callback (since the parsing is done
>>> character-by-character, the callback can consume whatever it wants
>>> after the % sign). Asprintf+parse JSON unfortunately isn't enough
>>> because you'd need to escape all strings.
>>
>> What's the state of this library's upstream? Should we be pushing these
>> changes there and then attempting to package it?
>
> There's no repository, there's no mention of it in the author's blog, 
> it has seen six changes in two years according to the file's heading.  
> The only reference on da Internet is at 
> http://tech.groups.yahoo.com/group/json/message/928.
>
> On the other hand, it's down to the point (it has no object model of 
> it's own), and it is fully asynchronous since it works 
> character-by-character which makes it easier to extend as in my patch 
> above.

Ugh!  I hate people trying to be clever.  The copyright is:

/*
Copyright (c) 2005 JSON.org

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to 
deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included 
in all
copies or substantial portions of the Software.

The Software shall be used for Good, not Evil.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
IN THE
SOFTWARE.
*/

"The Software shall be used for Good, not Evil." is added as part of the 
licensing text.  That screws up the otherwise X11 license and is highly 
unlikely to be GPL compatible.

We can't pull this into the tree or even link against it as a library.  
Try contacting the others and see about getting that silliness removed.

Regards,

Anthony Liguori
Paolo Bonzini Oct. 16, 2009, 7:36 p.m. UTC | #4
On 10/16/2009 07:38 PM, Anthony Liguori wrote:
>
> Ugh!  I hate people trying to be clever.

Grrr, good catch.

I wrote this to the guy via Yahoo! but I cannot see his email address, 
so I've no clue if the email will actually reach him.

 > Hi, the QEMU project is discussing using your JSON parser.  However, 
 > the sentence "The Software shall be used for Good, not Evil" that
 > appears in the file is too clever and (even though the humorous
 > intent is obvious) it could be considered GPL-incompatible (or
 > any-other-license-incompatible for that matter).
 >
 > Would you consider removing that sentence from http://fara.cs.uni-
 > potsdam.de/~jsg/json_parser/JSON_parser.c?  If you cannot, you can
 > send it to me and CC anthony@codemonkey.ws.
 >
 > Thanks in advance.
 >
 > Paolo Bonzini

There can always be a plan B---Dan Berrange found a parser with a 
similar interface and if the weather doesn't improve I may even give a 
shot at writing one over the weekend.

Paolo
Anthony Liguori Oct. 16, 2009, 9:37 p.m. UTC | #5
Paolo Bonzini wrote:
>
> > Thanks in advance.
> >
> > Paolo Bonzini
>
> There can always be a plan B---Dan Berrange found a parser with a 
> similar interface and if the weather doesn't improve I may even give a 
> shot at writing one over the weekend.

I already am :-)  Stay tuned, I should have a patch later this afternoon.

I'd like to move all of the QObject/json code to a shared library too so 
that other tools like libvirt can just use that code.  Ideally, we would 
also provide a higher level monitor API too.

> Paolo

Regards,

Anthony Liguori
Paolo Bonzini Oct. 17, 2009, 12:32 a.m. UTC | #6
On 10/16/2009 11:37 PM, Anthony Liguori wrote:
>
> I already am :-)  Stay tuned, I should have a patch later this afternoon.

Was it a race?  (Seriously, sorry I didn't notice a couple of hours ago).

This one is ~5% slower than the "Evil" one, but half the size.  Tested 
against the comments.json file from the "Evil" parser and with valgrind 
too.  Does all the funky Unicode stuff too.

Paolo
/*
 * An event-based, asynchronous JSON parser.
 *
 * Copyright (C) 2009 Red Hat Inc.
 *
 * Authors:
 *  Paolo Bonzini <pbonzini@redhat.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */


#include "json.h"
#include <string.h>
#include <stdlib.h>

/* Common character classes.  */

#define CASE_XDIGIT \
        case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': \
        case 'A': case 'B': case 'C': case 'D': case 'E': case 'F'

#define CASE_DIGIT \
        case '0': case '1': case '2': case '3': case '4': \
        case '5': case '6': case '7': case '8': case '9'

/* Helper function to go from \uXXXX-encoded UTF-16 to UTF-8.  */

static bool hex_to_utf8 (char *buf, char **dest, char *src)
{
    int i, n;
    uint8_t *p;

    for (i = n = 0; i < 4; i++) {
        n <<= 4;
        switch (src[i])
        {
        CASE_DIGIT: n |= src[i] - '0'; break;
        CASE_XDIGIT: n |= (src[i] & ~32) - 'A' + 10; break;
        default: return false;
        }
    }

    p = (uint8_t *)*dest;
    if (n < 128) {
        *p++ = n;
    } else if (n < 2048) {
        *p++ = 0xC0 | (n >> 6);
        *p++ = 0x80 | (n & 63);
    } else if (n < 0xDC00 || n > 0xDFFF) {
        *p++ = 0xE0 | (n >> 12);
        *p++ = 0x80 | ((n >> 6) & 63);
        *p++ = 0x80 | (n & 63);
    } else {
        /* Merge with preceding high surrogate.  */
        if (p - (uint8_t *)buf < 3
            || p[-3] != 0xED
            || p[-2] < 0xA0 || p[-2] > 0xAF) /* 0xD800..0xDBFF */
            return false;

        n += 0x10000 - 0xDC00;
        n |= ((p[-2] & 15) << 16) | ((p[-1] & 63) << 10);

        /* Overwrite high surrogate.  */
        p[-3] = 0xF0 | (n >> 18);
        p[-2] = 0x80 | ((n >> 12) & 63);
        p[-1] = 0x80 | ((n >> 6) & 63);
        *p++ = 0x80 | (n & 63);
    }
    *dest = (char *)p;
    return true;
}

struct json_parser {
    struct    json_parser_config c;
    size_t    n, alloc;
    char      *buf;
    size_t    sp;
    uint32_t  state, stack[128];
    char      start_buffer[4];
};

/* Managing the state stack.  */

static inline uint32_t *push_state (struct json_parser *p)
{
    p->stack[p->sp++] = p->state;
    return &p->state;
}

static inline void pop_state (struct json_parser *p)
{
    p->state = p->stack[--p->sp];
}


/* Managing the string/number buffer.  */

static inline void clear_buffer (struct json_parser *p)
{
    p->n = 0;
}

static inline void push_buffer (struct json_parser *p, char c)
{
    if (p->n == p->alloc) {
        size_t new_alloc = p->alloc * 2;
        if (p->buf == p->start_buffer) {
            p->buf = malloc (new_alloc);
            memcpy (p->buf, p->start_buffer, p->alloc);
        } else {
            p->buf = realloc (p->buf, new_alloc);
        }
        p->alloc = new_alloc;
    }
    p->buf[p->n++] = c;
}


/*
 * Parser states are organized like this:
 *   bit 0-7:   enum parser_state
 *   bit 8-15:  for IN_KEYWORD, index in keyword table
 *   bit 16-31: additional substate (enum parser_cookies)
 */

enum parser_state {
    START_PARSE,                /* at start of parsing */
    IN_KEYWORD,                 /* parsing keyword (match exactly) */
    START_KEY,                  /* expecting key */
    END_KEY,                    /* expecting colon */
    START_VALUE,                /* expecting value */
    END_VALUE,                  /* expecting comma or closing parenthesis */
    IN_NUMBER,                  /* parsing number (up to whitespace) */
    IN_STRING,                  /* parsing string */
    IN_STRING_BACKSLASH,        /* parsing string, copy one char verbatim */
    IN_COMMENT,                 /* comment mini-scanner */
};

enum parser_cookies {
    IN_UNUSED,

    IN_TRUE,                    /* for IN_KEYWORD */
    IN_FALSE,
    IN_NULL,

    IN_ARRAY,                   /* for {START,END}_{KEY,VALUE} */
    IN_DICT,

    IN_KEY,                     /* for IN_STRING */
    IN_VALUE,
};

#define STATE(state, cookie) \
    (((cookie) << 16) | (state))

#define STATE_KEYWORD(n, cookie) \
    (((cookie) << 16) | ((n) << 8) | IN_KEYWORD)

static const char keyword_table[] = "rue\0alse\0ull";
enum keyword_indices {
    KW_TRUE = 0,
    KW_FALSE = 4,
    KW_NULL = 9,
};



/* Parser actions.  These transfer to the appropriate state,
 * and invoke the callbacks.
 *
 * If there is a begin/end pair, begin pushes a state
 * and end pops it.
 */

static inline bool array_begin (struct json_parser *p)
{
    *push_state (p) = STATE (START_VALUE, IN_ARRAY); 
    return !p->c.array_begin || p->c.array_begin (p->c.data);
}

static inline bool array_end (struct json_parser *p)
{
    int state_cookie = (p->state >> 16);
    if (state_cookie != IN_ARRAY) return false;
    pop_state (p);
    return !p->c.array_end || p->c.array_end (p->c.data);
}


static inline bool object_begin (struct json_parser *p)
{
    *push_state (p) = STATE (START_KEY, IN_DICT);
    return !p->c.object_begin || p->c.object_begin (p->c.data);
}

static inline bool object_end (struct json_parser *p)
{
    int state_cookie = (p->state >> 16);
    if (state_cookie != IN_DICT) return false;
    pop_state (p);
    return !p->c.object_end || p->c.object_end (p->c.data);
}


static inline bool key_user (struct json_parser *p)
{
    return p->c.value_user && p->c.key (p->c.data, NULL, 0);
}


static inline bool number_begin (struct json_parser *p, char ch)
{
    *push_state (p) = IN_NUMBER;
    push_buffer (p, ch);
    return true;
}

static inline bool number_end (struct json_parser *p)
{
    char *end;
    bool result;
    long long ll;
    double d;

    pop_state (p);
    push_buffer (p, 0);
    ll = strtoll (p->buf, &end, 0);
    if (!*end)
        result = (!p->c.value_integer || p->c.value_integer (p->c.data, ll));
    else {
        d = strtod (p->buf, &end);
        result = (!*end &&
                  (!p->c.value_float || p->c.value_float (p->c.data, d)));
    }

    clear_buffer(p);
    return result;
}


static inline bool value_null (struct json_parser *p)
{
    return !p->c.value_null || p->c.value_null (p->c.data);
}


static inline bool value_boolean (struct json_parser *p, int n)
{
    return !p->c.value_boolean || p->c.value_boolean (p->c.data, n);
}


static inline bool string_begin (struct json_parser *p, int cookie)
{
    *push_state (p) = STATE (IN_STRING, cookie);
    return true;
}

static inline bool string_end (struct json_parser *p, int cookie)
{
    bool result;
    char *buf, *src, *dest;
    size_t n;

    pop_state (p); 
    push_buffer (p, 0);

    /* Unescape in place.  */
    for (n = p->n, buf = src = dest = p->buf; n > 0; n--) {
        if (*src != '\\') {
            *dest++ = *src++;
            continue;
        }
        if (n < 2)
            return false;

        src++;
        n--;
        switch (*src++) {
        case 'b': *dest++ = '\b'; continue;
        case 'f': *dest++ = '\f'; continue;
        case 'n': *dest++ = '\n'; continue;
        case 'r': *dest++ = '\r'; continue;
        case 't': *dest++ = '\t'; continue;

        case 'U': case 'u': 
            /* The [uU] has not been removed from n yet, hence subtract 5.  */
            if (n < 5 || !hex_to_utf8 (buf, &dest, src))
                return false;
            src += 4;
            n -= 4;
            continue;

        default: *dest++ = src[-1]; continue;
        }
    }

    buf = p->buf;
    n = dest - buf;
    if (cookie == IN_KEY)
        result = !p->c.key || p->c.key (p->c.data, buf, n);
    else
        result = !p->c.value_string || p->c.value_string (p->c.data, buf, n);
    clear_buffer(p);
    return result;
}


static inline bool value_user (struct json_parser *p)
{
    return p->c.value_user && p->c.value_user (p->c.data);
}


static inline bool comment (struct json_parser *p)
{
    return !p->c.comment || p->c.comment (p->c.data, p->buf, p->n);
}


bool json_parser_char(struct json_parser *p, char ch)
{
    for (;;) {
        int state = p->state & 255;
        int state_data = (p->state >> 8) & 255;
        int state_cookie = (p->state >> 16);
        // printf ("%d %d | %d %d\n", state, ch, state_cookie, p->sp);

        /* The big ugly parser.  Each case will always return or
         * continue, and we want to check this at link time if
         * possible.  */
#ifndef __OPTIMIZE__
#define link_error abort
#endif
        extern void link_error (void);

        switch (state)
        {
        /* First, however, a helpful definition...  */
#define SKIP_WHITE \
            switch (ch) { \
            case '/': goto do_start_comment; \
            case ' ': case '\t': case '\n': case '\r': case '\f': \
                return true; \
            default: \
                break; \
            }

        /* Unlike START_VALUE, this only accepts compound values.  */
        case START_PARSE:
            SKIP_WHITE;
            p->state = STATE (END_VALUE, state_cookie); 
            switch (ch)
            {
            case '[': return array_begin (p);
            case '{': return object_begin (p);
            default: return false;
            }
            link_error ();

        /* Only strings and user values are accepted here.  */
        case START_KEY:
            SKIP_WHITE;
            p->state = STATE (END_KEY, IN_DICT);
            switch (ch)
            {
            case '"': return string_begin (p, IN_KEY);
            case '%': return key_user (p);
            case '}': return object_end (p);
            default: return false;
            }
            link_error ();

        /* Accept any Javascript literal.  Checking p->sp ensures that
         * something like "[] []" is rejected (the first array is parsed
         * from START_PARSE.  */
        case START_VALUE:
            SKIP_WHITE;
            if (p->sp == 0)
                return false;
            p->state = STATE (END_VALUE, state_cookie); 
            switch (ch)
            {
            case 't': *push_state (p) = STATE_KEYWORD(KW_TRUE, IN_TRUE); return true;
            case 'f': *push_state (p) = STATE_KEYWORD(KW_FALSE, IN_FALSE); return true;
            case 'n': *push_state (p) = STATE_KEYWORD(KW_NULL, IN_NULL); return true;
            case '"': return string_begin (p, IN_VALUE);
            case '-':
            CASE_DIGIT: return number_begin (p, ch);
            case '[': return array_begin (p);
            case '{': return object_begin (p);
            case '%': return value_user (p);
            case ']': return array_end (p);
            default: return false;
            }
            link_error ();

        /* End of a key, look for a colon.  */
        case END_KEY:
            SKIP_WHITE;
            p->state = STATE (START_VALUE, IN_DICT);
            return (ch == ':');

        /* End of a value, look for a comma or closing parenthesis.  */
        case END_VALUE:
            SKIP_WHITE;
            p->state = STATE (state_cookie == IN_DICT ? START_KEY : START_VALUE,
                              state_cookie);
            switch (ch)
            {
            case ',': return true;
            case '}': return object_end (p);
            case ']': return array_end (p);
            default: return false;
            }
            link_error ();

        /* Table-driven keyword scanner.  Advance until mismatch or end
         * of keyword.  */
        case IN_KEYWORD:
            if (ch != keyword_table[state_data])
                return false;
            if (keyword_table[state_data + 1] != 0) {
                p->state = STATE_KEYWORD(state_data + 1, state_cookie);
                return true;
            }

            pop_state (p);
            switch (state_cookie) {
            case IN_TRUE: return value_boolean (p, 1);
            case IN_FALSE: return value_boolean (p, 0);
            case IN_NULL: return value_null (p);
            default: abort ();
            }
            link_error ();

        /* Eat until closing quote (special-casing \"). */
        case IN_STRING:
            switch (ch) {
            case '"': return string_end (p, state_cookie);
            case '\\': p->state = STATE (IN_STRING_BACKSLASH, state_cookie);
            default: push_buffer (p, ch); return true;
            }
            link_error ();

        /* Eat any character */
        case IN_STRING_BACKSLASH:
            push_buffer (p, ch); 
            p->state = STATE (IN_STRING, state_cookie);
            return true;

        /* Eat until a "bad" character is found, then we refine with
         * strtod/strtoll.  The character we end on is reprocessed in
         * the new state!  */
        case IN_NUMBER:
            switch (ch) {
            case '+':
            case '-':
            case '.':
            CASE_DIGIT:
            CASE_XDIGIT: push_buffer (p, ch); return true;
            default: if (!number_end (p)) return false; continue;
            }
            link_error ();

        /* Parse until '*' '/', then convert the whole comment to a
         * single blank and rescan. */
        do_start_comment:
            *push_state(p) = IN_COMMENT;
            if (p->c.comment) push_buffer(p, ch);
            return true;

        case IN_COMMENT:
            if (p->c.comment) push_buffer(p, ch);

            if      (state_cookie == 0 && ch != '*') return false;
            else if (state_cookie == 0             ) state_cookie = 1;
            else if (state_cookie == 1 && ch == '*') state_cookie = 2;
            else if (state_cookie == 2 && ch == '*') state_cookie = 2;
            else if (state_cookie == 2 && ch == '/') state_cookie = 3;
            else                                     state_cookie = 1;

            if (state_cookie < 3) {
                p->state = STATE(state, state_cookie);
                return true;
            } else {
                comment (p);
                pop_state (p);
                ch = ' ';
                continue;
            }
            link_error ();

        default:
            abort ();
        }

        link_error ();
    }
}

bool json_parser_string(struct json_parser *p, char *s, size_t n)
{
    while (n--)
        if (!json_parser_char(p, *s++))
            return false;
    return true;
}

struct json_parser *json_parser_new(struct json_parser_config *config)
{
    struct json_parser *p;
    p = malloc (sizeof *p);
    memcpy (&p->c, config, sizeof *config);
    p->n = 0;
    p->alloc = sizeof p->start_buffer;
    p->state = START_PARSE;
    p->buf = p->start_buffer;
    p->sp = 0;
    return p;
}

bool json_parser_destroy(struct json_parser *p)
{
    bool result = (p->state == END_VALUE) && (p->sp == 0);
    if (p->buf != p->start_buffer)
        free (p->buf);
    free (p);
    return result;
}
/*
 * An event-based, asynchronous JSON parser.
 *
 * Copyright (C) 2009 Red Hat Inc.
 *
 * Authors:
 *  Paolo Bonzini <pbonzini@redhat.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 * SOFTWARE.
 */


#ifndef JSON_H
#define JSON_H

#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>

struct json_parser_config {
    bool (*array_begin) (void *);
    bool (*array_end) (void *);
    bool (*object_begin) (void *);
    bool (*object_end) (void *);
    bool (*key) (void *, const char *, size_t);
    bool (*value_integer) (void *, long long);
    bool (*value_float) (void *, double);
    bool (*value_null) (void *);
    bool (*value_boolean) (void *, int);
    bool (*value_string) (void *, const char *, size_t);
    bool (*value_user) (void *);
    bool (*comment) (void *, const char *, size_t);
    void *data;
};

struct json_parser;

struct json_parser *json_parser_new(struct json_parser_config *config);
bool json_parser_destroy(struct json_parser *p);
bool json_parser_char(struct json_parser *p, char ch);
bool json_parser_string(struct json_parser *p, char *buf, size_t n);

#endif /* JSON_H */
/* main.c */

/*
    This program demonstrates a simple application of JSON_parser. It reads
    a JSON text from STDIN, producing an error message if the text is rejected.

        % JSON_parser <test/pass1.json
*/

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <locale.h>

#include "json.h"

#include <stddef.h>
#include <stdint.h>
#include <stdbool.h>

static int level = 0;
static int got_key = 0;

static void print_indent()
{
    printf ("%*s", 2 * level, "");
}
 
static bool array_begin (void *data)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("[\n");
    ++level;
    return true;
}

static bool array_end (void *data)
{
    --level;
    print_indent ();
    printf ("]\n");
    return true;
}

static bool object_begin (void *data)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("{\n");
    ++level;
    return true;
}

static bool object_end (void *data)
{
    --level;
    print_indent ();
    printf ("}\n");
    return true;
}

static bool key (void *data, const char *buf, size_t n)
{
    got_key = 1;
    print_indent ();
    if (buf)
	printf ("key = '%s', value = ", buf);
    else
	printf ("user key = %%%c, value = ", getchar());
    return true;
}

static bool value_integer (void *data, long long ll)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("integer: %lld\n", ll);
    return true;
}

static bool value_float (void *data, double d)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("float: %f\n", d);
    return true;
}

static bool value_null (void *data)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("null\n");
    return true;
}

static bool value_boolean (void *data, int val)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("%s\n", val ? "true" : "false");
    return true;
}

static bool value_string (void *data, const char *buf, size_t n)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("string: '%s'\n", buf);
    return true;
}

static bool value_user (void *data)
{
    if (!got_key) print_indent(); else got_key = 0;
    printf ("user: %%%c\n", getchar());
    return true;
}



int main(int argc, char* argv[]) {
    static struct json_parser_config parser_config = {
        .array_begin = array_begin,
        .array_end = array_end,
        .object_begin = object_begin,
        .object_end = object_end,
        .key = key,
        .value_integer = value_integer,
        .value_float = value_float,
        .value_null = value_null,
        .value_boolean = value_boolean,
        .value_string = value_string,
        .value_user = value_user,
    };

    struct json_parser *p = json_parser_new(&parser_config);
    int count = 0;
    int ch;
    while ((ch = getchar ()) != EOF && json_parser_char (p, ch))
	count++;

    if (ch != EOF) {
	fprintf (stderr, "error at character %d\n", count);
	exit (1);
    }
    if (!json_parser_destroy (p)) {
	fprintf (stderr, "error at end of file\n");
	exit (1);
    }

    exit (0);
}
malc Oct. 17, 2009, 12:38 a.m. UTC | #7
On Sat, 17 Oct 2009, Paolo Bonzini wrote:

> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
> > 
> > I already am :-)  Stay tuned, I should have a patch later this afternoon.
> 
> Was it a race?  (Seriously, sorry I didn't notice a couple of hours ago).
> 
> This one is ~5% slower than the "Evil" one, but half the size.  Tested against
> the comments.json file from the "Evil" parser and with valgrind too.  Does all
> the funky Unicode stuff too.
> 

Just from cursory glance:

a. allocation can fail
b. strtod is locale dependent
Paolo Bonzini Oct. 17, 2009, 12:46 a.m. UTC | #8
On 10/17/2009 02:38 AM, malc wrote:
> a. allocation can fail

s/malloc/qemu_malloc/ etc. when it is time to merge.

> b. strtod is locale dependent

Right, but qemu probably would prefer to always do setlocale 
(LC_NUMERIC, "C"), or add a c_strtod function like

http://git.sv.gnu.org/gitweb/?p=gnulib.git;a=blob_plain;f=lib/c-strtod.c

Thanks for the review, any additional pair of eyes can only help.

Paolo
Anthony Liguori Oct. 17, 2009, 1:49 a.m. UTC | #9
Paolo Bonzini wrote:
> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
>>
>> I already am :-)  Stay tuned, I should have a patch later this 
>> afternoon.
>
> Was it a race?  (Seriously, sorry I didn't notice a couple of hours ago).
>
> This one is ~5% slower than the "Evil" one, but half the size.  Tested 
> against the comments.json file from the "Evil" parser and with 
> valgrind too.  Does all the funky Unicode stuff too.
>
> Paolo
> /*
>  * An event-based, asynchronous JSON parser.
>  *
>  * Copyright (C) 2009 Red Hat Inc.
>  *
>  * Authors:
>  *  Paolo Bonzini <pbonzini@redhat.com>
>  *
>  * Permission is hereby granted, free of charge, to any person obtaining a copy
>  * of this software and associated documentation files (the "Software"), to deal
>  * in the Software without restriction, including without limitation the rights
>  * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>  * copies of the Software, and to permit persons to whom the Software is
>  * furnished to do so, subject to the following conditions:
>  * 
>  * The above copyright notice and this permission notice shall be included in
>  * all copies or substantial portions of the Software.
>  *
>  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
>  * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>  * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>  * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
>  * SOFTWARE.
>  */
>
>
> #include "json.h"
> #include <string.h>
> #include <stdlib.h>
>
> /* Common character classes.  */
>
> #define CASE_XDIGIT \
>         case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': \
>         case 'A': case 'B': case 'C': case 'D': case 'E': case 'F'
>
> #define CASE_DIGIT \
>         case '0': case '1': case '2': case '3': case '4': \
>         case '5': case '6': case '7': case '8': case '9'
>
> /* Helper function to go from \uXXXX-encoded UTF-16 to UTF-8.  */
>
> static bool hex_to_utf8 (char *buf, char **dest, char *src)
> {
>     int i, n;
>     uint8_t *p;
>
>     for (i = n = 0; i < 4; i++) {
>         n <<= 4;
>         switch (src[i])
>         {
>         CASE_DIGIT: n |= src[i] - '0'; break;
>         CASE_XDIGIT: n |= (src[i] & ~32) - 'A' + 10; break;
>         default: return false;
>         }
>     }
>
>     p = (uint8_t *)*dest;
>     if (n < 128) {
>         *p++ = n;
>     } else if (n < 2048) {
>         *p++ = 0xC0 | (n >> 6);
>         *p++ = 0x80 | (n & 63);
>     } else if (n < 0xDC00 || n > 0xDFFF) {
>         *p++ = 0xE0 | (n >> 12);
>         *p++ = 0x80 | ((n >> 6) & 63);
>         *p++ = 0x80 | (n & 63);
>     } else {
>         /* Merge with preceding high surrogate.  */
>         if (p - (uint8_t *)buf < 3
>             || p[-3] != 0xED
>             || p[-2] < 0xA0 || p[-2] > 0xAF) /* 0xD800..0xDBFF */
>             return false;
>
>         n += 0x10000 - 0xDC00;
>         n |= ((p[-2] & 15) << 16) | ((p[-1] & 63) << 10);
>
>         /* Overwrite high surrogate.  */
>         p[-3] = 0xF0 | (n >> 18);
>         p[-2] = 0x80 | ((n >> 12) & 63);
>         p[-1] = 0x80 | ((n >> 6) & 63);
>         *p++ = 0x80 | (n & 63);
>     }
>     *dest = (char *)p;
>     return true;
> }
>
> struct json_parser {
>     struct    json_parser_config c;
>     size_t    n, alloc;
>     char      *buf;
>     size_t    sp;
>     uint32_t  state, stack[128];
>     char      start_buffer[4];
> };
>   

Having an explicit stack is unnecessary I think.  You can use a very 
simple scheme to detect the end of messages by simply counting {}, [], 
and being aware of the lexical rules.

Regards,

Anthony Liguori
Anthony Liguori Oct. 17, 2009, 1:50 a.m. UTC | #10
Paolo Bonzini wrote:
> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
>>
>> I already am :-)  Stay tuned, I should have a patch later this 
>> afternoon.
>
> Was it a race?  (Seriously, sorry I didn't notice a couple of hours ago).
>
> This one is ~5% slower than the "Evil" one, but half the size.  Tested 
> against the comments.json file from the "Evil" parser and with 
> valgrind too.  Does all the funky Unicode stuff too.

I haven't benchmarked mine.  While yours came out an hour earlier, I 
included a full test suite, output QObjects, and support vararg parsing 
so I think I win :-)

Regards,

Anthony Liguori
Paolo Bonzini Oct. 17, 2009, 7:48 a.m. UTC | #11
On 10/17/2009 03:50 AM, Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
>>>
>>> I already am :-) Stay tuned, I should have a patch later this afternoon.
>>
>> Was it a race? (Seriously, sorry I didn't notice a couple of hours ago).
>>
>> This one is ~5% slower than the "Evil" one, but half the size. Tested
>> against the comments.json file from the "Evil" parser and with
>> valgrind too. Does all the funky Unicode stuff too.
>
> I haven't benchmarked mine. While yours came out an hour earlier, I
> included a full test suite, output QObjects, and support vararg parsing
> so I think I win :-)

Heh, Luiz and I had talked offlist and he'd take care of the rest 
(except the test suite) :-).

 > Having an explicit stack is unnecessary I think.

I'm curious to see yours now---the stack is used to detect things like 
[{"a":"b"},"c":"d"].  You could do that in the event handlers of course, 
but that kind of breaks the interface between the parser and event handlers.

Paolo
Vincent Hanquez Oct. 17, 2009, 10:01 a.m. UTC | #12
Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
>>>
>>> I already am :-)  Stay tuned, I should have a patch later this 
>>> afternoon.
>>
>> Was it a race?  (Seriously, sorry I didn't notice a couple of hours 
>> ago).
>>
>> This one is ~5% slower than the "Evil" one, but half the size.  
>> Tested against the comments.json file from the "Evil" parser and with 
>> valgrind too.  Does all the funky Unicode stuff too.
>
> I haven't benchmarked mine.  While yours came out an hour earlier, I 
> included a full test suite, output QObjects, and support vararg 
> parsing so I think I win :-)
ar.. got mine too, i've been doing for the last 3 weeks slowly;

it got a raw/pretty printer, an interruptible parser (on the same idea 
as JSON_parser.c), it's faster than JSON_parser.c [1],
it's completely generic (more like a library than an embedded thing), 
fully JSON compliant (got a test suite too), support
user supplied alloc functions, and callback for integer/float doesn't 
have their data converted automatically which means
that the user of the library can use whatever it want to support the 
non-limited size JSON number (or just return errors for user that want 
the limit).

the library by itself is 39K with -g last time i've looked.

also the library comes with a jsonlint binary that's equivalent to 
xmllint (well formatting and verification).

I'll package thing up and post a link to it on monday.
Luiz Capitulino Oct. 18, 2009, 2:06 p.m. UTC | #13
On Sat, 17 Oct 2009 11:01:33 +0100
Vincent Hanquez <vincent@snarc.org> wrote:

> Anthony Liguori wrote:
> > Paolo Bonzini wrote:
> >> On 10/16/2009 11:37 PM, Anthony Liguori wrote:
> >>>
> >>> I already am :-)  Stay tuned, I should have a patch later this 
> >>> afternoon.
> >>
> >> Was it a race?  (Seriously, sorry I didn't notice a couple of hours 
> >> ago).
> >>
> >> This one is ~5% slower than the "Evil" one, but half the size.  
> >> Tested against the comments.json file from the "Evil" parser and with 
> >> valgrind too.  Does all the funky Unicode stuff too.
> >
> > I haven't benchmarked mine.  While yours came out an hour earlier, I 
> > included a full test suite, output QObjects, and support vararg 
> > parsing so I think I win :-)
> ar.. got mine too, i've been doing for the last 3 weeks slowly;

 Very nice to see all these contributions.

> it got a raw/pretty printer, an interruptible parser (on the same idea 
> as JSON_parser.c), it's faster than JSON_parser.c [1],
> it's completely generic (more like a library than an embedded thing), 
> fully JSON compliant (got a test suite too), support
> user supplied alloc functions, and callback for integer/float doesn't 
> have their data converted automatically which means
> that the user of the library can use whatever it want to support the 
> non-limited size JSON number (or just return errors for user that want 
> the limit).
> 
> the library by itself is 39K with -g last time i've looked.

 Integration with QObjects is a killer feature, I think it's the
stronger argument against grabbing one from the internet.
Paolo Bonzini Oct. 18, 2009, 2:08 p.m. UTC | #14
On 10/18/2009 04:06 PM, Luiz Capitulino wrote:
>   Integration with QObjects is a killer feature, I think it's the
> stronger argument against grabbing one from the internet.

Yeah, I'd say let's go with Anthony's stuff.  I'll rebase the encoder on 
top of it soonish (I still think it's best if JSON encoding lies in 
QObject like a kind of toString).  If we'll need the asynchronous 
parsing later, we can easily replace it with mine or Vincent's.

Paolo
Anthony Liguori Oct. 18, 2009, 2:49 p.m. UTC | #15
Paolo Bonzini wrote:
> On 10/18/2009 04:06 PM, Luiz Capitulino wrote:
>>   Integration with QObjects is a killer feature, I think it's the
>> stronger argument against grabbing one from the internet.
>
> Yeah, I'd say let's go with Anthony's stuff.  I'll rebase the encoder 
> on top of it soonish (I still think it's best if JSON encoding lies in 
> QObject like a kind of toString).  If we'll need the asynchronous 
> parsing later, we can easily replace it with mine or Vincent's.

One thing I want to add as a feature to the 0.12 release is a nice 
client API.  To have this, we'll need message boundary identification 
and a JSON encoder.  I'll focus on the message boundary identification 
today.

I'd strongly suggest making the JSON encoder live outside of QObject.  
There are many possible ways to represent a QObject.  Think of JSON as a 
view of the QObject model.  The human monitor mode representation is a 
different view.

Regards,

Anthony Liguori

> Paolo
Vincent Hanquez Oct. 18, 2009, 3:06 p.m. UTC | #16
Luiz Capitulino wrote:
>> it got a raw/pretty printer, an interruptible parser (on the same idea 
>> as JSON_parser.c), it's faster than JSON_parser.c [1],
>> it's completely generic (more like a library than an embedded thing), 
>> fully JSON compliant (got a test suite too), support
>> user supplied alloc functions, and callback for integer/float doesn't 
>> have their data converted automatically which means
>> that the user of the library can use whatever it want to support the 
>> non-limited size JSON number (or just return errors for user that want 
>> the limit).
>>
>> the library by itself is 39K with -g last time i've looked.
>>     
>
>  Integration with QObjects is a killer feature, I think it's the
> stronger argument against grabbing one from the internet.
>   
I can't think of any reason why integration with qobject would take more 
than 50 lines of C on the user side of the library.
since the API is completely SAX like (i call it SAJ for obvious reason), 
you get callback entering/leaving object/array
and callback for every values (string, int, float, null, true, false) as 
a char * + length. for exactly the same reason, integration with glib 
would take the same 50 lines "effort".

note that FTR, obviously i'ld like to have my library used, but i'm 
happy that any library that is *fully* JSON compliant is used (no 
extensions however since you're obviously loosing the benefit of using 
JSON if you create extensions).
Luiz Capitulino Oct. 18, 2009, 3:18 p.m. UTC | #17
On Sun, 18 Oct 2009 09:49:55 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> Paolo Bonzini wrote:
> > On 10/18/2009 04:06 PM, Luiz Capitulino wrote:
> >>   Integration with QObjects is a killer feature, I think it's the
> >> stronger argument against grabbing one from the internet.
> >
> > Yeah, I'd say let's go with Anthony's stuff.  I'll rebase the encoder 
> > on top of it soonish (I still think it's best if JSON encoding lies in 
> > QObject like a kind of toString).  If we'll need the asynchronous 
> > parsing later, we can easily replace it with mine or Vincent's.
> 
> One thing I want to add as a feature to the 0.12 release is a nice 
> client API.  To have this, we'll need message boundary identification 
> and a JSON encoder.  I'll focus on the message boundary identification 
> today.
> 
> I'd strongly suggest making the JSON encoder live outside of QObject.  
> There are many possible ways to represent a QObject.  Think of JSON as a 
> view of the QObject model.  The human monitor mode representation is a 
> different view.

 I agree.

 QObject's methods should only be used/needed by the object layer itself,
if the problem at hand handles high level data types (QInt, QDict, etc)
then we need a new type.

 The right way to have what Paolo is suggesting, would be to have a
toString() method in the object layer and allow it to be overridden.
Paolo Bonzini Oct. 18, 2009, 3:25 p.m. UTC | #18
>> I'd strongly suggest making the JSON encoder live outside of QObject.
>> There are many possible ways to represent a QObject.  Think of JSON as a
>> view of the QObject model.  The human monitor mode representation is a
>> different view.

My rationale was that since QObject is tailored over JSON, we might as 
well declare JSON to be "the" preferred view of the QObject model.

The human monitor representation would be provided by qstring_format in 
my patches (and a QError method would call qstring_format in the 
appropriate way, returning a C string with the result).

I think the different opinions is also due to different background; mine 
is in Smalltalk where class extensions---aka monkeypatching---are done 
in a different style than for example in Python.  Adding a "write as 
escaped JSON" method to QString would be akin to monkeypatching.

>   I agree.
>
>   QObject's methods should only be used/needed by the object layer itself,
> if the problem at hand handles high level data types (QInt, QDict, etc)
> then we need a new type.
>
>   The right way to have what Paolo is suggesting, would be to have a
> toString() method in the object layer and allow it to be overridden.

That's exactly what I did in my patches, except I called it encode_json 
rather than toString.

Paolo
Luiz Capitulino Oct. 18, 2009, 3:35 p.m. UTC | #19
On Sun, 18 Oct 2009 16:06:29 +0100
Vincent Hanquez <vincent@snarc.org> wrote:

> Luiz Capitulino wrote:
> >> it got a raw/pretty printer, an interruptible parser (on the same idea 
> >> as JSON_parser.c), it's faster than JSON_parser.c [1],
> >> it's completely generic (more like a library than an embedded thing), 
> >> fully JSON compliant (got a test suite too), support
> >> user supplied alloc functions, and callback for integer/float doesn't 
> >> have their data converted automatically which means
> >> that the user of the library can use whatever it want to support the 
> >> non-limited size JSON number (or just return errors for user that want 
> >> the limit).
> >>
> >> the library by itself is 39K with -g last time i've looked.
> >>     
> >
> >  Integration with QObjects is a killer feature, I think it's the
> > stronger argument against grabbing one from the internet.
> >   
> I can't think of any reason why integration with qobject would take more 
> than 50 lines of C on the user side of the library.
> since the API is completely SAX like (i call it SAJ for obvious reason), 
> you get callback entering/leaving object/array
> and callback for every values (string, int, float, null, true, false) as 
> a char * + length. for exactly the same reason, integration with glib 
> would take the same 50 lines "effort".

 No lines is a lot better than 50. :)

 The real problem though is that the parsers I looked at had their own
"object model", some of them are quite simple others are more sophisticated
than QObject. Making no use of any kind of intermediate representation like
this is a feature, as things get simpler.

 Also, don't get me wrong, but if we would consider your parser we
would have to consider the others two or three that are listed in
json.org and have a compatible license.

> note that FTR, obviously i'ld like to have my library used, but i'm 
> happy that any library that is *fully* JSON compliant is used (no 
> extensions however since you're obviously loosing the benefit of using 
> JSON if you create extensions).

 This is already settled, I hope.
Paolo Bonzini Oct. 18, 2009, 3:39 p.m. UTC | #20
On 10/18/2009 05:35 PM, Luiz Capitulino wrote:
>> (no
>>  extensions however since you're obviously loosing the benefit of using
>>  JSON if you create extensions).
>   This is already settled, I hope.

I think he's referring to things such as putting things such as 
single-quoted strings, or % escapes for formatting.  I have no qualms 
with that as long as what goes on the wire is 100% JSON.

Paolo
Luiz Capitulino Oct. 18, 2009, 4:05 p.m. UTC | #21
On Sun, 18 Oct 2009 17:25:47 +0200
Paolo Bonzini <bonzini@gnu.org> wrote:

> 
> >> I'd strongly suggest making the JSON encoder live outside of QObject.
> >> There are many possible ways to represent a QObject.  Think of JSON as a
> >> view of the QObject model.  The human monitor mode representation is a
> >> different view.
> 
> My rationale was that since QObject is tailored over JSON, we might as 
> well declare JSON to be "the" preferred view of the QObject model.

 Maybe this makes sense today as the Monitor is the only heavy user
of QObjects, but I don't think we should count on that.

 As things evolve, I believe more subsystems will start using QObjects
and any "particular" view of it will make little sense.

 To be honest I don't know if this is good, I fear we will end up
enhancing QObjects to the extreme to do OOP in QEMU...

> The human monitor representation would be provided by qstring_format in 
> my patches (and a QError method would call qstring_format in the 
> appropriate way, returning a C string with the result).
> 
> I think the different opinions is also due to different background; mine 
> is in Smalltalk where class extensions---aka monkeypatching---are done 
> in a different style than for example in Python.  Adding a "write as 
> escaped JSON" method to QString would be akin to monkeypatching.

 True.

> >   I agree.
> >
> >   QObject's methods should only be used/needed by the object layer itself,
> > if the problem at hand handles high level data types (QInt, QDict, etc)
> > then we need a new type.
> >
> >   The right way to have what Paolo is suggesting, would be to have a
> > toString() method in the object layer and allow it to be overridden.
> 
> That's exactly what I did in my patches, except I called it encode_json 
> rather than toString.

 Okay, I just took a quick look at them and am looking at Anthony's
right now.

 Anyway, my brainstorm on this would be to have to_string() and have
default methods on all types to return a simple string representation.
The QJson type could override to_string() if needed, this way specific
json bits stays inside the json module.

 But I see that Anthony has added a qjson type already..
Anthony Liguori Oct. 18, 2009, 4:26 p.m. UTC | #22
Anthony Liguori wrote:
> Paolo Bonzini wrote:
>> On 10/18/2009 04:06 PM, Luiz Capitulino wrote:
>>>   Integration with QObjects is a killer feature, I think it's the
>>> stronger argument against grabbing one from the internet.
>>
>> Yeah, I'd say let's go with Anthony's stuff.  I'll rebase the encoder 
>> on top of it soonish (I still think it's best if JSON encoding lies 
>> in QObject like a kind of toString).  If we'll need the asynchronous 
>> parsing later, we can easily replace it with mine or Vincent's.
>
> One thing I want to add as a feature to the 0.12 release is a nice 
> client API.  To have this, we'll need message boundary identification 
> and a JSON encoder.  I'll focus on the message boundary identification 
> today.

Here's a first pass.  I'll clean up this afternoon and post a proper 
patch.  It turned out to work pretty well.

Regards,

Anthony Liguori
Anthony Liguori Oct. 18, 2009, 4:29 p.m. UTC | #23
Vincent Hanquez wrote: 
> I can't think of any reason why integration with qobject would take 
> more than 50 lines of C on the user side of the library.
> since the API is completely SAX like (i call it SAJ for obvious 
> reason), you get callback entering/leaving object/array
> and callback for every values (string, int, float, null, true, false) 
> as a char * + length. for exactly the same reason, integration with 
> glib would take the same 50 lines "effort".
>
> note that FTR, obviously i'ld like to have my library used, but i'm 
> happy that any library that is *fully* JSON compliant is used (no 
> extensions however since you're obviously loosing the benefit of using 
> JSON if you create extensions).

We need two sets of extensions for use within qemu.  Single quoted 
strings and varargs support.  While single quoted strings would be easy 
to add to any library, vararg support is a bit more tricky as you need 
to carefully consider where you pop from the varargs list.  A simple 
sprintf() isn't sufficient for embedding QObjects.

When generating on-the-wire response traffic, we shouldn't use any of 
the extensions so it will be 100% json.org compliant.

I'm pretty sure if you tried to duplicate the functionality of my 
patches, it would be much more than 50 lines.  That's not saying it's a 
better json parser, just that we're looking for very particular features 
from it.

Regards,

Anthony Liguori
Anthony Liguori Oct. 18, 2009, 4:32 p.m. UTC | #24
Luiz Capitulino wrote:
>  Okay, I just took a quick look at them and am looking at Anthony's
> right now.
>
>  Anyway, my brainstorm on this would be to have to_string() and have
> default methods on all types to return a simple string representation.
>   

What's the value of integrating into the objects verses having a 
separate function that can apply it to the objects?

Prototype languages are very different and it's not typically a good 
idea to mix styles like this.

Regards,

Anthony Liguori
Vincent Hanquez Oct. 18, 2009, 4:46 p.m. UTC | #25
Anthony Liguori wrote:
> Vincent Hanquez wrote:
>> I can't think of any reason why integration with qobject would take 
>> more than 50 lines of C on the user side of the library.
>> since the API is completely SAX like (i call it SAJ for obvious 
>> reason), you get callback entering/leaving object/array
>> and callback for every values (string, int, float, null, true, false) 
>> as a char * + length. for exactly the same reason, integration with 
>> glib would take the same 50 lines "effort".
>>
>> note that FTR, obviously i'ld like to have my library used, but i'm 
>> happy that any library that is *fully* JSON compliant is used (no 
>> extensions however since you're obviously loosing the benefit of 
>> using JSON if you create extensions).
>
> We need two sets of extensions for use within qemu.  Single quoted 
> strings and varargs support.  While single quoted strings would be 
> easy to add to any library, vararg support is a bit more tricky as you 
> need to carefully consider where you pop from the varargs list.  A 
> simple sprintf() isn't sufficient for embedding QObjects.
care to explain what's a single quoted string and varargs support means 
in your context ? (just a simple example you do maybe ?)
> When generating on-the-wire response traffic, we shouldn't use any of 
> the extensions so it will be 100% json.org compliant.
great.
> I'm pretty sure if you tried to duplicate the functionality of my 
> patches, it would be much more than 50 lines.  That's not saying it's 
> a better json parser, just that we're looking for very particular 
> features from it.
Since it doesn't appears to be linked to json particularly, I don't 
understand why it's a feature of the parser though.. and then any parser 
could grow the support you need on top of the parser couldn't they ?
Vincent Hanquez Oct. 18, 2009, 4:56 p.m. UTC | #26
Luiz Capitulino wrote:
>> I can't think of any reason why integration with qobject would take more 
>> than 50 lines of C on the user side of the library.
>> since the API is completely SAX like (i call it SAJ for obvious reason), 
>> you get callback entering/leaving object/array
>> and callback for every values (string, int, float, null, true, false) as 
>> a char * + length. for exactly the same reason, integration with glib 
>> would take the same 50 lines "effort".
>>     
>
>  No lines is a lot better than 50. :)
>   
well it all depends on how you see thing; whilst i'm happy to help all 
sort of integration (qemu in this case), my library has been made for 
integrating with absolutely any object model. so 50 lines seems like a 
win to me, because I could do the same thing on a project that use glib, 
or some QT model using exactly the same engine. Hence the reason why i'm 
packaging it as a .a/.so library. (not that I particularly object to an 
embedded use case too).

I think that's a win in the end when people can just reuse wheels 
instead of designing new one for catering for special needs.
>  The real problem though is that the parsers I looked at had their own
> "object model", some of them are quite simple others are more sophisticated
> than QObject. Making no use of any kind of intermediate representation like
> this is a feature, as things get simpler.
>
>  Also, don't get me wrong, but if we would consider your parser we
> would have to consider the others two or three that are listed in
> json.org and have a compatible license.
>   
most of the parser there are either, weirdly licensed, have an object 
model integrated with it, are not interruptible,
or are quite complex for no apparent reason; I carefully read all of 
them, before choosing to reimplement one from scratch.
Vincent Hanquez Oct. 18, 2009, 5:32 p.m. UTC | #27
Anthony Liguori wrote:
> Here's a first pass.  I'll clean up this afternoon and post a proper 
> patch.  It turned out to work pretty well.
It doesn't seems to validate anything ?? or is it just a lexer ?
you're also including ' as a string escape value (is that the single 
quote thing you were talking about ?) which strikes me as invalid JSON.
Paolo Bonzini Oct. 18, 2009, 5:59 p.m. UTC | #28
On 10/18/2009 06:46 PM, Vincent Hanquez wrote:
> care to explain what's a single quoted string and varargs support means
> in your context ? (just a simple example you do maybe ?)

single-quoted string: Being able to parse 'name' in addition to "name", 
which is convenient because in C the latter would be \"name\".

varargs: Being able to call some external function when a %+letter 
sequence is found, which would fetch the key or value for an external 
source (for example a varargs list so that you can do a printf-style 
QObject factory function, where the template is itself written in 
JSON-like syntax).

The important thing anyway is that the encoder is conservative (i.e. 
100% valid JSON) in what it emits.  This is something everybody totally 
agrees on.

Paolo
Paolo Bonzini Oct. 18, 2009, 6:04 p.m. UTC | #29
On 10/18/2009 06:32 PM, Anthony Liguori wrote:
>
> What's the value of integrating into the objects verses having a
> separate function that can apply it to the objects?

That's just different style.  Of course you could do a 
switch(qobject_type(qobject)) instead of using polymorphism.  It would 
be nicer in some ways, and uglier in other ways.  toString however seems 
pervasive enough that it could deserve a place as a QObject virtual method.

Anyway, I probably won't have much code in QEMU in the end, so there's 
no value in arguing when anyway a very nice design is emerging.  It 
looks like Anthony has most of the JSON plumbing in his brain, so it's 
better if he keeps the flow going.  Feel free to steal my code.

Once your stuff is settled I'll see what's missing and rebase/resend.

Paolo, at one point tempted to s/encode_json/to_string/ and resubmit :-)
Anthony Liguori Oct. 18, 2009, 9:24 p.m. UTC | #30
On Sun, Oct 18, 2009 at 12:32 PM, Vincent Hanquez <vincent@snarc.org> wrote:
> Anthony Liguori wrote:
>>
>> Here's a first pass.  I'll clean up this afternoon and post a proper
>> patch.  It turned out to work pretty well.
>
> It doesn't seems to validate anything ?? or is it just a lexer ?

That's just a lexer.  I posted a parser earlier.  However, now I'm
thinking I should update the parser to use the lexer.

> you're also including ' as a string escape value (is that the single quote
> thing you were talking about ?) which strikes me as invalid JSON.

It's a compatible extension.  We accept strings with those escapes but
our encoder won't generate them.

> --
> Vincent
>
Luiz Capitulino Oct. 18, 2009, 10 p.m. UTC | #31
On Sun, 18 Oct 2009 11:32:11 -0500
Anthony Liguori <anthony@codemonkey.ws> wrote:

> Luiz Capitulino wrote:
> >  Okay, I just took a quick look at them and am looking at Anthony's
> > right now.
> >
> >  Anyway, my brainstorm on this would be to have to_string() and have
> > default methods on all types to return a simple string representation.
> >   
> 
> What's the value of integrating into the objects verses having a 
> separate function that can apply it to the objects?

 Right now it doesn't have any real value, besides being a different
style which seems to fit well with the QObject design.

 In the future it might be needed though, common code might want to
change certain methods' behavior before passing QObjects down a call
stack.
diff mbox

Patch

diff --git a/JSON_parser.c b/JSON_parser.c
index 93e98c8..4c360be 100644
--- a/JSON_parser.c
+++ b/JSON_parser.c
@@ -151,6 +151,7 @@  enum classes {
     C_E,      /* E */
     C_ETC,    /* everything else */
     C_STAR,   /* * */   
+    C_PCT,    /* % - user escape */
     NR_CLASSES
 };
 
@@ -165,7 +166,7 @@  static int ascii_class[128] = {
     __,      __,      __,      __,      __,      __,      __,      __,
     __,      __,      __,      __,      __,      __,      __,      __,
 
-    C_SPACE, C_ETC,   C_QUOTE, C_ETC,   C_ETC,   C_ETC,   C_ETC,   C_ETC,
+    C_SPACE, C_ETC,   C_QUOTE, C_ETC,   C_ETC,   C_PCT,   C_ETC,   C_ETC,
     C_ETC,   C_ETC,   C_STAR,   C_PLUS,  C_COMMA, C_MINUS, C_POINT, C_SLASH,
     C_ZERO,  C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT, C_DIGIT,
     C_DIGIT, C_DIGIT, C_COLON, C_ETC,   C_ETC,   C_ETC,   C_ETC,   C_ETC,
@@ -239,7 +240,8 @@  enum actions
     ZX = -19, /* integer detected by zero */
     IX = -20, /* integer detected by 1-9 */
     EX = -21, /* next char is escaped */
-    UC = -22  /* Unicode character read */
+    UC = -22, /* Unicode character read */
+    XC = -23, /* Escape to callback */
 };
 
 
@@ -251,43 +253,43 @@  static int state_transition_table[NR_STATES][NR_CLASSES] = {
     state is OK and if the mode is MODE_DONE.
 
                  white                                      1-9                                   ABCDF  etc
-             space |  {  }  [  ]  :  ,  "  \  /  +  -  .  0  |  a  b  c  d  e  f  l  n  r  s  t  u  |  E  |  * */
-/*start  GO*/ {GO,GO,-6,__,-5,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*ok     OK*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*object OB*/ {OB,OB,__,-9,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*key    KE*/ {KE,KE,__,__,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*colon  CO*/ {CO,CO,__,__,__,__,-2,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*value  VA*/ {VA,VA,-6,__,-5,__,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__},
-/*array  AR*/ {AR,AR,-6,__,-5,-7,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__},
-/*string ST*/ {ST,__,ST,ST,ST,ST,ST,ST,-4,EX,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST},
-/*escape ES*/ {__,__,__,__,__,__,__,__,ST,ST,ST,__,__,__,__,__,__,ST,__,__,__,ST,__,ST,ST,__,ST,U1,__,__,__,__},
-/*u1     U1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U2,U2,U2,U2,U2,U2,U2,U2,__,__,__,__,__,__,U2,U2,__,__},
-/*u2     U2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U3,U3,U3,U3,U3,U3,U3,U3,__,__,__,__,__,__,U3,U3,__,__},
-/*u3     U3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U4,U4,U4,U4,U4,U4,U4,U4,__,__,__,__,__,__,U4,U4,__,__},
-/*u4     U4*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,UC,UC,UC,UC,UC,UC,UC,UC,__,__,__,__,__,__,UC,UC,__,__},
-/*minus  MI*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,ZE,IT,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*zero   ZE*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*int    IT*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,IT,IT,__,__,__,__,DE,__,__,__,__,__,__,__,__,DE,__,__},
-/*frac   FR*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__},
-/*e      E1*/ {__,__,__,__,__,__,__,__,__,__,__,E2,E2,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*ex     E2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*exp    E3*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*tr     T1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T2,__,__,__,__,__,__,__},
-/*tru    T2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T3,__,__,__,__},
-/*true   T3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__},
-/*fa     F1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*fal    F2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F3,__,__,__,__,__,__,__,__,__},
-/*fals   F3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F4,__,__,__,__,__,__},
-/*false  F4*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__},
-/*nu     N1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N2,__,__,__,__},
-/*nul    N2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N3,__,__,__,__,__,__,__,__,__},
-/*null   N3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__},
-/*/      C1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,C2},
-/*/*     C2*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3},
-/**      C3*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,CE,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3},
-/*_.     FX*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__},
-/*\      D1*/ {__,__,__,__,__,__,__,__,__,D2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
-/*\      D2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,U1,__,__,__,__},
+             space |  {  }  [  ]  :  ,  "  \  /  +  -  .  0  |  a  b  c  d  e  f  l  n  r  s  t  u  |  E  |  *  % */
+/*start  GO*/ {GO,GO,-6,__,-5,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC},
+/*ok     OK*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*object OB*/ {OB,OB,__,-9,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC},
+/*key    KE*/ {KE,KE,__,__,__,__,__,__,SB,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,XC},
+/*colon  CO*/ {CO,CO,__,__,__,__,-2,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*value  VA*/ {VA,VA,-6,__,-5,__,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__,XC},
+/*array  AR*/ {AR,AR,-6,__,-5,-7,__,__,SB,__,CB,__,MX,__,ZX,IX,__,__,__,__,__,FA,__,NU,__,__,TR,__,__,__,__,__,XC},
+/*string ST*/ {ST,__,ST,ST,ST,ST,ST,ST,-4,EX,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST,ST},
+/*escape ES*/ {__,__,__,__,__,__,__,__,ST,ST,ST,__,__,__,__,__,__,ST,__,__,__,ST,__,ST,ST,__,ST,U1,__,__,__,__,__},
+/*u1     U1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U2,U2,U2,U2,U2,U2,U2,U2,__,__,__,__,__,__,U2,U2,__,__,__},
+/*u2     U2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U3,U3,U3,U3,U3,U3,U3,U3,__,__,__,__,__,__,U3,U3,__,__,__},
+/*u3     U3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,U4,U4,U4,U4,U4,U4,U4,U4,__,__,__,__,__,__,U4,U4,__,__,__},
+/*u4     U4*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,UC,UC,UC,UC,UC,UC,UC,UC,__,__,__,__,__,__,UC,UC,__,__,__},
+/*minus  MI*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,ZE,IT,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*zero   ZE*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*int    IT*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,DF,IT,IT,__,__,__,__,DE,__,__,__,__,__,__,__,__,DE,__,__,__},
+/*frac   FR*/ {OK,OK,__,-8,__,-7,__,-3,__,__,CB,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__,__},
+/*e      E1*/ {__,__,__,__,__,__,__,__,__,__,__,E2,E2,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*ex     E2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*exp    E3*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,E3,E3,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*tr     T1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T2,__,__,__,__,__,__,__,__},
+/*tru    T2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,T3,__,__,__,__,__},
+/*true   T3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__,__},
+/*fa     F1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*fal    F2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F3,__,__,__,__,__,__,__,__,__,__},
+/*fals   F3*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,F4,__,__,__,__,__,__,__},
+/*false  F4*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__,__,__},
+/*nu     N1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N2,__,__,__,__,__},
+/*nul    N2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,N3,__,__,__,__,__,__,__,__,__,__},
+/*null   N3*/ {__,__,__,__,__,__,__,__,__,__,CB,__,__,__,__,__,__,__,__,__,__,__,OK,__,__,__,__,__,__,__,__,__,__},
+/*/      C1*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,C2,__},
+/*/*     C2*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3,C2},
+/**      C3*/ {C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,CE,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C2,C3,C2},
+/*_.     FX*/ {OK,OK,__,-8,__,-7,__,-3,__,__,__,__,__,__,FR,FR,__,__,__,__,E1,__,__,__,__,__,__,__,__,E1,__,__,__},
+/*\      D1*/ {__,__,__,__,__,__,__,__,__,D2,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__},
+/*\      D2*/ {__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,__,U1,__,__,__,__,__},
 };
 
 
@@ -680,7 +682,6 @@  JSON_parser_char(JSON_parser jc, int next_char)
             return false;
         }
     }
-    
     add_char_to_parse_buffer(jc, next_char, next_class);
     
 /*
@@ -818,6 +819,29 @@  JSON_parser_char(JSON_parser jc, int next_char)
             jc->state = C1;
             jc->comment = 1;
             break;
+
+/* external callback */
+	case XC:
+            parse_buffer_pop_back_char(jc);
+	    switch (jc->stack[jc->top])
+	    {
+	    case MODE_KEY:
+                jc->type = JSON_T_NONE;
+		jc->state = CO;
+                if (!(*jc->callback)(jc->ctx, JSON_T_USER_KEY, NULL)) {
+                    return false;
+                }
+	        break;
+	    default:
+	        jc->state = OK;
+                jc->type = JSON_T_NONE;
+                if (!(*jc->callback)(jc->ctx, JSON_T_USER, NULL)) {
+                    return false;
+                }
+	        break;
+	    }
+	    break;
+
 /* empty } */
         case -9:        
             parse_buffer_clear(jc);
diff --git a/JSON_parser.h b/JSON_parser.h
index 3780aae..50cec2d 100644
--- a/JSON_parser.h
+++ b/JSON_parser.h
@@ -47,6 +47,8 @@  typedef enum
     JSON_T_FALSE,
     JSON_T_STRING,
     JSON_T_KEY,
+    JSON_T_USER,
+    JSON_T_USER_KEY,
     JSON_T_MAX
 } JSON_type;
 
diff --git a/comments.json b/comments.json
index 244f5e3..ad79ab0 100644
--- a/comments.json
+++ b/comments.json
@@ -113,4 +113,8 @@ 
 0.1e1,
 1e-1,
 1e00,2e+00,2e-00
-,"rosebud", "\u005C"]/**                       ******/
\ No newline at end of file
+,"rosebud", "\u005C",/**                  %%%     *%%%***%**/
+[%s],
+{%d:%s},
+{"name":%s},
+%s]/**                       ******/
diff --git a/main.c b/main.c
index 6651e12..226b125 100644
--- a/main.c
+++ b/main.c
@@ -29,6 +29,7 @@  int main(int argc, char* argv[]) {
     
     config.depth                  = 20;
     config.callback               = &print;
+    config.callback_ctx           = &input;
     config.allow_comments         = 1;
     config.handle_floats_manually = 0;
     
@@ -142,6 +143,16 @@  static int print(void* ctx, int type, const JSON_value* value)
         s_IsKey = 0;
         printf("string: '%s'\n", value->vu.str.value);
         break;
+    case JSON_T_USER_KEY:
+        s_IsKey = 1;
+        print_indention();
+        printf("user key = %%%c, value = ", fgetc(*(FILE**) ctx));
+        break;
+    case JSON_T_USER:
+        if (!s_IsKey) print_indention();
+        s_IsKey = 0;
+        printf("user: %%%c\n", fgetc(*(FILE**) ctx));
+        break;
     default:
         assert(0);
         break;