From patchwork Wed Nov 11 17:29:01 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Anthony Liguori X-Patchwork-Id: 38150 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 53AB8B6F1E for ; Thu, 12 Nov 2009 04:41:03 +1100 (EST) Received: from localhost ([127.0.0.1]:37001 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N8HBk-0000VO-6V for incoming@patchwork.ozlabs.org; Wed, 11 Nov 2009 12:41:00 -0500 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1N8H0X-0006pb-1R for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:25 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1N8H0Q-0006l2-ML for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:22 -0500 Received: from [199.232.76.173] (port=48455 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1N8H0Q-0006ky-6Q for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:18 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:59470) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1N8H0P-00020b-FQ for qemu-devel@nongnu.org; Wed, 11 Nov 2009 12:29:18 -0500 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e33.co.us.ibm.com (8.14.3/8.13.1) with ESMTP id nABHQadP021205 for ; Wed, 11 Nov 2009 10:26:36 -0700 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v9.1) with ESMTP id nABHT9E3229650 for ; Wed, 11 Nov 2009 10:29:09 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVout) with ESMTP id nABHT83b007379 for ; Wed, 11 Nov 2009 10:29:08 -0700 Received: from localhost.localdomain (sig-9-65-32-87.mts.ibm.com [9.65.32.87]) by d03av03.boulder.ibm.com (8.14.3/8.13.1/NCO v10.0 AVin) with ESMTP id nABHT43K007179; Wed, 11 Nov 2009 10:29:08 -0700 From: Anthony Liguori To: qemu-devel@nongnu.org Date: Wed, 11 Nov 2009 11:29:01 -0600 Message-Id: <1257960543-26373-9-git-send-email-aliguori@us.ibm.com> X-Mailer: git-send-email 1.6.2.5 In-Reply-To: <1257960543-26373-1-git-send-email-aliguori@us.ibm.com> References: <1257960543-26373-1-git-send-email-aliguori@us.ibm.com> X-detected-operating-system: by monty-python.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) Cc: Anthony Liguori , Luiz Capitulino Subject: [Qemu-devel] [PATCH 09/11] Add a JSON parser X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org This is the third and final stage of the JSON parser. It parses lexical tokens performing grammar validation and creating the final QObject representation. It uses a recursive decent parser. Signed-off-by: Anthony Liguori --- Makefile | 2 +- json-parser.c | 560 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ json-parser.h | 22 +++ 3 files changed, 583 insertions(+), 1 deletions(-) create mode 100644 json-parser.c create mode 100644 json-parser.h diff --git a/Makefile b/Makefile index a2aab60..6d68a1f 100644 --- a/Makefile +++ b/Makefile @@ -136,7 +136,7 @@ obj-y += qemu-char.o aio.o savevm.o obj-y += msmouse.o ps2.o obj-y += qdev.o qdev-properties.o obj-y += qint.o qstring.o qdict.o qlist.o qfloat.o qbool.o json-lexer.o -obj-y += json-streamer.o +obj-y += json-streamer.o json-parser.o obj-y += qemu-config.o obj-$(CONFIG_BRLAPI) += baum.o diff --git a/json-parser.c b/json-parser.c new file mode 100644 index 0000000..a0c0dca --- /dev/null +++ b/json-parser.c @@ -0,0 +1,560 @@ +/* + * JSON Parser + * + * Copyright IBM, Corp. 2009 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include + +#include "qemu-common.h" +#include "qstring.h" +#include "qint.h" +#include "qdict.h" +#include "qlist.h" +#include "qfloat.h" +#include "qbool.h" +#include "json-parser.h" +#include "json-lexer.h" + +typedef struct JSONParserContext +{ +} JSONParserContext; + +#define BUG_ON(cond) assert(!(cond)) + +/** + * TODO + * + * 0) make errors meaningful again + * 1) add geometry information to tokens + * 3) should we return a parsed size? + * 4) deal with premature EOI + */ + +static QObject *parse_value(JSONParserContext *ctxt, QList **tokens, va_list *ap); + +/** + * Token manipulators + * + * tokens are dictionaries that contain a type, a string value, and geometry information + * about a token identified by the lexer. These are routines that make working with + * these objects a bit easier. + */ +static const char *token_get_value(QObject *obj) +{ + return qdict_get_str(qobject_to_qdict(obj), "token"); +} + +static JSONTokenType token_get_type(QObject *obj) +{ + return qdict_get_int(qobject_to_qdict(obj), "type"); +} + +static int token_is_operator(QObject *obj, char op) +{ + const char *val; + + if (token_get_type(obj) != JSON_OPERATOR) { + return 0; + } + + val = token_get_value(obj); + + return (val[0] == op) && (val[1] == 0); +} + +static int token_is_keyword(QObject *obj, const char *value) +{ + if (token_get_type(obj) != JSON_KEYWORD) { + return 0; + } + + return strcmp(token_get_value(obj), value) == 0; +} + +static int token_is_escape(QObject *obj, const char *value) +{ + if (token_get_type(obj) != JSON_ESCAPE) { + return 0; + } + + return (strcmp(token_get_value(obj), value) == 0); +} + +/** + * Error handler + */ +static void parse_error(JSONParserContext *ctxt, QObject *token, const char *msg, ...) +{ + fprintf(stderr, "parse error: %s\n", msg); +} + +/** + * String helpers + * + * These helpers are used to unescape strings. + */ +static void wchar_to_utf8(uint16_t wchar, char *buffer, size_t buffer_length) +{ + if (wchar <= 0x007F) { + BUG_ON(buffer_length < 2); + + buffer[0] = wchar & 0x7F; + buffer[1] = 0; + } else if (wchar <= 0x07FF) { + BUG_ON(buffer_length < 3); + + buffer[0] = 0xC0 | ((wchar >> 6) & 0x1F); + buffer[1] = 0x80 | (wchar & 0x3F); + buffer[2] = 0; + } else { + BUG_ON(buffer_length < 4); + + buffer[0] = 0xE0 | ((wchar >> 12) & 0x0F); + buffer[1] = 0x80 | ((wchar >> 6) & 0x3F); + buffer[2] = 0x80 | (wchar & 0x3F); + buffer[3] = 0; + } +} + +static int hex2decimal(char ch) +{ + if (ch >= '0' && ch <= '9') { + return (ch - '0'); + } else if (ch >= 'a' && ch <= 'f') { + return 10 + (ch - 'a'); + } else if (ch >= 'A' && ch <= 'F') { + return 10 + (ch - 'A'); + } + + return -1; +} + +/** + * parse_string(): Parse a json string and return a QObject + * + * string + * "" + * " chars " + * chars + * char + * char chars + * char + * any-Unicode-character- + * except-"-or-\-or- + * control-character + * \" + * \\ + * \/ + * \b + * \f + * \n + * \r + * \t + * \u four-hex-digits + */ +static QString *qstring_from_escaped_str(JSONParserContext *ctxt, QObject *token) +{ + const char *ptr = token_get_value(token); + QString *str; + int double_quote = 1; + + if (*ptr == '"') { + double_quote = 1; + } else { + double_quote = 0; + } + ptr++; + + str = qstring_new(); + while (*ptr && + ((double_quote && *ptr != '"') || (!double_quote && *ptr != '\''))) { + if (*ptr == '\\') { + ptr++; + + switch (*ptr) { + case '"': + qstring_append(str, "\""); + ptr++; + break; + case '\'': + qstring_append(str, "'"); + ptr++; + break; + case '\\': + qstring_append(str, "\\"); + ptr++; + break; + case '/': + qstring_append(str, "/"); + ptr++; + break; + case 'b': + qstring_append(str, "\b"); + ptr++; + break; + case 'n': + qstring_append(str, "\n"); + ptr++; + break; + case 'r': + qstring_append(str, "\r"); + ptr++; + break; + case 't': + qstring_append(str, "\t"); + ptr++; + break; + case 'u': { + uint16_t unicode_char = 0; + char utf8_char[4]; + int i = 0; + + ptr++; + + for (i = 0; i < 4; i++) { + if (qemu_isxdigit(*ptr)) { + unicode_char |= hex2decimal(*ptr) << ((3 - i) * 4); + } else { + parse_error(ctxt, token, + "invalid hex escape sequence in string"); + goto out; + } + ptr++; + } + + wchar_to_utf8(unicode_char, utf8_char, sizeof(utf8_char)); + qstring_append(str, utf8_char); + } break; + default: + parse_error(ctxt, token, "invalid escape sequence in string"); + goto out; + } + } else { + char dummy[2]; + + dummy[0] = *ptr++; + dummy[1] = 0; + + qstring_append(str, dummy); + } + } + + ptr++; + + return str; + +out: + QDECREF(str); + return NULL; +} + +/** + * Parsing rules + */ +static int parse_pair(JSONParserContext *ctxt, QDict *dict, QList **tokens, va_list *ap) +{ + QObject *key, *token = NULL, *value, *peek; + QList *working = qlist_copy(*tokens); + + peek = qlist_peek(working); + key = parse_value(ctxt, &working, ap); + if (qobject_type(key) != QTYPE_QSTRING) { + parse_error(ctxt, peek, "key is not a string in object"); + goto out; + } + + token = qlist_pop(working); + if (!token_is_operator(token, ':')) { + parse_error(ctxt, token, "missing : in object pair"); + goto out; + } + + value = parse_value(ctxt, &working, ap); + if (value == NULL) { + parse_error(ctxt, token, "Missing value in dict"); + goto out; + } + + qdict_put_obj(dict, qstring_get_str(qobject_to_qstring(key)), value); + + qobject_decref(token); + qobject_decref(key); + QDECREF(*tokens); + *tokens = working; + + return 0; + +out: + qobject_decref(token); + qobject_decref(key); + QDECREF(working); + + return -1; +} + +static QObject *parse_object(JSONParserContext *ctxt, QList **tokens, va_list *ap) +{ + QDict *dict = NULL; + QObject *token, *peek; + QList *working = qlist_copy(*tokens); + + token = qlist_pop(working); + if (!token_is_operator(token, '{')) { + goto out; + } + qobject_decref(token); + token = NULL; + + dict = qdict_new(); + + peek = qlist_peek(working); + if (!token_is_operator(peek, '}')) { + if (parse_pair(ctxt, dict, &working, ap) == -1) { + goto out; + } + + token = qlist_pop(working); + while (!token_is_operator(token, '}')) { + if (!token_is_operator(token, ',')) { + parse_error(ctxt, token, "expected separator in dict"); + goto out; + } + qobject_decref(token); + token = NULL; + + if (parse_pair(ctxt, dict, &working, ap) == -1) { + goto out; + } + + token = qlist_pop(working); + } + qobject_decref(token); + token = NULL; + } + + QDECREF(*tokens); + *tokens = working; + + return QOBJECT(dict); + +out: + qobject_decref(token); + QDECREF(working); + QDECREF(dict); + return NULL; +} + +static QObject *parse_array(JSONParserContext *ctxt, QList **tokens, va_list *ap) +{ + QList *list = NULL; + QObject *token, *peek; + QList *working = qlist_copy(*tokens); + + token = qlist_pop(working); + if (!token_is_operator(token, '[')) { + goto out; + } + qobject_decref(token); + token = NULL; + + list = qlist_new(); + + peek = qlist_peek(working); + if (!token_is_operator(peek, ']')) { + QObject *obj; + + obj = parse_value(ctxt, &working, ap); + if (obj == NULL) { + parse_error(ctxt, token, "expecting value"); + goto out; + } + + qlist_append_obj(list, obj); + + token = qlist_pop(working); + while (!token_is_operator(token, ']')) { + if (!token_is_operator(token, ',')) { + parse_error(ctxt, token, "expected separator in list"); + goto out; + } + + qobject_decref(token); + token = NULL; + + obj = parse_value(ctxt, &working, ap); + if (obj == NULL) { + parse_error(ctxt, token, "expecting value"); + goto out; + } + + qlist_append_obj(list, obj); + + token = qlist_pop(working); + } + + qobject_decref(token); + token = NULL; + } + + QDECREF(*tokens); + *tokens = working; + + return QOBJECT(list); + +out: + qobject_decref(token); + QDECREF(working); + QDECREF(list); + return NULL; +} + +static QObject *parse_keyword(JSONParserContext *ctxt, QList **tokens) +{ + QObject *token, *ret; + QList *working = qlist_copy(*tokens); + + token = qlist_pop(working); + + if (token_get_type(token) != JSON_KEYWORD) { + goto out; + } + + if (token_is_keyword(token, "true")) { + ret = QOBJECT(qbool_from_int(true)); + } else if (token_is_keyword(token, "false")) { + ret = QOBJECT(qbool_from_int(false)); + } else { + parse_error(ctxt, token, "invalid keyword `%s'", token_get_value(token)); + goto out; + } + + qobject_decref(token); + QDECREF(*tokens); + *tokens = working; + + return ret; + +out: + qobject_decref(token); + QDECREF(working); + + return NULL; +} + +static QObject *parse_escape(JSONParserContext *ctxt, QList **tokens, va_list *ap) +{ + QObject *token = NULL, *obj; + QList *working = qlist_copy(*tokens); + + if (ap == NULL) { + goto out; + } + + token = qlist_pop(working); + + if (token_is_escape(token, "%p")) { + obj = va_arg(*ap, QObject *); + } else if (token_is_escape(token, "%i")) { + obj = QOBJECT(qbool_from_int(va_arg(*ap, int))); + } else if (token_is_escape(token, "%d")) { + obj = QOBJECT(qint_from_int(va_arg(*ap, int))); + } else if (token_is_escape(token, "%ld")) { + obj = QOBJECT(qint_from_int(va_arg(*ap, long))); + } else if (token_is_escape(token, "%lld")) { + obj = QOBJECT(qint_from_int(va_arg(*ap, long long))); + } else if (token_is_escape(token, "%s")) { + obj = QOBJECT(qstring_from_str(va_arg(*ap, const char *))); + } else if (token_is_escape(token, "%f")) { + obj = QOBJECT(qfloat_from_double(va_arg(*ap, double))); + } else { + goto out; + } + + qobject_decref(token); + QDECREF(*tokens); + *tokens = working; + + return obj; + +out: + qobject_decref(token); + QDECREF(working); + + return NULL; +} + +static QObject *parse_literal(JSONParserContext *ctxt, QList **tokens) +{ + QObject *token, *obj; + QList *working = qlist_copy(*tokens); + + token = qlist_pop(working); + switch (token_get_type(token)) { + case JSON_STRING: + obj = QOBJECT(qstring_from_escaped_str(ctxt, token)); + break; + case JSON_INTEGER: + obj = QOBJECT(qint_from_int(strtoll(token_get_value(token), NULL, 10))); + break; + case JSON_FLOAT: + /* FIXME dependent on locale */ + obj = QOBJECT(qfloat_from_double(strtod(token_get_value(token), NULL))); + break; + default: + goto out; + } + + qobject_decref(token); + QDECREF(*tokens); + *tokens = working; + + return obj; + +out: + qobject_decref(token); + QDECREF(working); + + return NULL; +} + +static QObject *parse_value(JSONParserContext *ctxt, QList **tokens, va_list *ap) +{ + QObject *obj; + + obj = parse_object(ctxt, tokens, ap); + if (obj == NULL) { + obj = parse_array(ctxt, tokens, ap); + } + if (obj == NULL) { + obj = parse_escape(ctxt, tokens, ap); + } + if (obj == NULL) { + obj = parse_keyword(ctxt, tokens); + } + if (obj == NULL) { + obj = parse_literal(ctxt, tokens); + } + + return obj; +} + +QObject *json_parser_parse(QList *tokens, va_list *ap) +{ + JSONParserContext ctxt = {}; + QList *working = qlist_copy(tokens); + QObject *result; + + result = parse_value(&ctxt, &working, ap); + + QDECREF(working); + + return result; +} diff --git a/json-parser.h b/json-parser.h new file mode 100644 index 0000000..97f43f6 --- /dev/null +++ b/json-parser.h @@ -0,0 +1,22 @@ +/* + * JSON Parser + * + * Copyright IBM, Corp. 2009 + * + * Authors: + * Anthony Liguori + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QEMU_JSON_PARSER_H +#define QEMU_JSON_PARSER_H + +#include "qemu-common.h" +#include "qlist.h" + +QObject *json_parser_parse(QList *tokens, va_list *ap); + +#endif