From patchwork Mon Aug 27 07:00:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962348 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zN8p2b6Kz9s2P for ; Mon, 27 Aug 2018 17:01:29 +1000 (AEST) Received: from localhost ([::1]:51690 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBWe-0005uR-W1 for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:01:25 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57809) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005ti-5c for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVs-00067q-Fb for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:40126 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVs-00067e-7f for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:36 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DD21940216E9; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F13F7A30A1; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id EE34711385F9; Mon, 27 Aug 2018 09:00:21 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:16 +0200 Message-Id: <20180827070021.11931-2-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 1/6] json: Fix lexer for lookahead character beyond '\x7F' X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The lexer fails to end a valid token when the lookahead character is beyond '\x7F'. For instance, input true\xC2\xA2 produces the tokens JSON_ERROR true\xC2 JSON_ERROR \xA2 The first token should be JSON_KEYWORD true instead. The culprit is #define TERMINAL(state) [0 ... 0x7F] = (state) It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken. Fix it to initialize the complete array. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index e1745a3d95..4867839f66 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -123,7 +123,7 @@ enum json_lexer_state { QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP); QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1); -#define TERMINAL(state) [0 ... 0x7F] = (state) +#define TERMINAL(state) [0 ... 0xFF] = (state) /* Return whether TERMINAL is a terminal state and the transition to it from OLD_STATE required lookahead. This happens whenever the table From patchwork Mon Aug 27 07:00:17 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962350 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zNCw4HdWz9s1c for ; Mon, 27 Aug 2018 17:04:12 +1000 (AEST) Received: from localhost ([::1]:51704 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBZJ-0007lB-Pw for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:04:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57812) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005tl-5c for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVr-00067W-Tq for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:53036 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVr-00066t-LT for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:35 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 270CB4023461; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F12B22026E18; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id F0F931138603; Mon, 27 Aug 2018 09:00:21 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:17 +0200 Message-Id: <20180827070021.11931-3-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.6]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 2/6] json: Clean up how lexer consumes "end of input" X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" When the lexer isn't in its start state at the end of input, it's working on a token. To flush it out, it needs to transit to its start state on "end of input" lookahead. There are two ways to the start state, depending on the current state: * If the lexer is in a TERMINAL(JSON_FOO) state, it can emit a JSON_FOO token. * Else, it can go to IN_ERROR state, and emit a JSON_ERROR token. There are complications, however: * The transition to IN_ERROR state consumes the input character and adds it to the JSON_ERROR token. The latter is inappropriate for the "end of input" character, so we suppress that. See also recent commit "json: Fix lexer to include the bad character in JSON_ERROR token". * The transition to a TERMINAL(JSON_FOO) state doesn't consume the input character. In that case, the lexer normally loops until it is consumed. We have to suppress that for the "end of input" input character. If we didn't, the lexer would consume it by entering IN_ERROR state, emitting a bogus JSON_ERROR token. We fixed that in commit bd3924a33a6. However, simply breaking the loop this way assumes that the lexer needs exactly one state transition to reach its start state. That assumption is correct now, but it's unclean, and I'll soon break it. Clean up: instead of breaking the loop after one iteration, break it after it reached the start state. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 4867839f66..ec3aec726f 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -261,7 +261,8 @@ void json_lexer_init(JSONLexer *lexer, bool enable_interpolation) static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) { - int char_consumed, new_state; + int new_state; + bool char_consumed = false; lexer->x++; if (ch == '\n') { @@ -269,11 +270,12 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) lexer->y++; } - do { + while (flush ? lexer->state != lexer->start_state : !char_consumed) { assert(lexer->state <= ARRAY_SIZE(json_lexer)); new_state = json_lexer[lexer->state][(uint8_t)ch]; - char_consumed = !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state); - if (char_consumed && !flush) { + char_consumed = !flush + && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state); + if (char_consumed) { g_string_append_c(lexer->token, ch); } @@ -318,7 +320,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) break; } lexer->state = new_state; - } while (!char_consumed && !flush); + } /* Do not let a single token grow to an arbitrarily large size, * this is a security consideration. @@ -342,9 +344,8 @@ void json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size) void json_lexer_flush(JSONLexer *lexer) { - if (lexer->state != lexer->start_state) { - json_lexer_feed_char(lexer, 0, true); - } + json_lexer_feed_char(lexer, 0, true); + assert(lexer->state == lexer->start_state); json_message_process_token(lexer, lexer->token, JSON_END_OF_INPUT, lexer->x, lexer->y); } From patchwork Mon Aug 27 07:00:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962346 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zN8p2dZZz9s47 for ; Mon, 27 Aug 2018 17:01:29 +1000 (AEST) Received: from localhost ([::1]:51693 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBWg-0005wD-EQ for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:01:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57827) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005tn-Ti for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVr-00067I-RW for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55140 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVr-00066u-Kv for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:35 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42CC8814FDD2; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F18052166B41; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 004AA113860E; Mon, 27 Aug 2018 09:00:21 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:18 +0200 Message-Id: <20180827070021.11931-4-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The lexer uses macro TERMINAL_NEEDED_LOOKAHEAD() to decide whether a state transition consumes the input character. It returns true when the state transition is defined with the TERMINAL() macro. To detect that, it checks whether input '\0' would have resulted in the same state transition, and the new state is not IN_ERROR. Why does that even work? For all states, the new state on input '\0' is either IN_ERROR or defined with TERMINAL(). If the state transition equals the one we'd get for input '\0', it goes to IN_ERROR or to the argument of TERMINAL(). We never use TERMINAL(IN_ERROR), because it makes no sense. Thus, if it doesn't go to IN_ERROR, it must be defined with TERMINAL(). Since this isn't quite confusing enough, we negate the result to get @char_consumed, and ignore it when @flush is true. Instead of deriving the lookahead bit from the state transition, make it explicit. This is easier to understand, and a bit more flexible, too. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 27 ++++++++++++++++----------- qobject/json-parser-int.h | 1 + 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index ec3aec726f..28582e17d9 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -121,15 +121,11 @@ enum json_lexer_state { }; QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP); +QEMU_BUILD_BUG_ON(JSON_MAX >= 0x80); QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1); -#define TERMINAL(state) [0 ... 0xFF] = (state) - -/* Return whether TERMINAL is a terminal state and the transition to it - from OLD_STATE required lookahead. This happens whenever the table - below uses the TERMINAL macro. */ -#define TERMINAL_NEEDED_LOOKAHEAD(old_state, terminal) \ - (terminal != IN_ERROR && json_lexer[(old_state)][0] == (terminal)) +#define LOOKAHEAD 0x80 +#define TERMINAL(state) [0 ... 0xFF] = ((state) | LOOKAHEAD) static const uint8_t json_lexer[][256] = { /* Relies on default initialization to IN_ERROR! */ @@ -251,6 +247,17 @@ static const uint8_t json_lexer[][256] = { [IN_START_INTERP]['%'] = IN_INTERP, }; +static inline uint8_t next_state(JSONLexer *lexer, char ch, bool flush, + bool *char_consumed) +{ + uint8_t next; + + assert(lexer->state <= ARRAY_SIZE(json_lexer)); + next = json_lexer[lexer->state][(uint8_t)ch]; + *char_consumed = !flush && !(next & LOOKAHEAD); + return next & ~LOOKAHEAD; +} + void json_lexer_init(JSONLexer *lexer, bool enable_interpolation) { lexer->start_state = lexer->state = enable_interpolation @@ -271,11 +278,9 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) } while (flush ? lexer->state != lexer->start_state : !char_consumed) { - assert(lexer->state <= ARRAY_SIZE(json_lexer)); - new_state = json_lexer[lexer->state][(uint8_t)ch]; - char_consumed = !flush - && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state); + new_state = next_state(lexer, ch, flush, &char_consumed); if (char_consumed) { + assert(!flush); g_string_append_c(lexer->token, ch); } diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h index ceaa890ec6..abeec63af5 100644 --- a/qobject/json-parser-int.h +++ b/qobject/json-parser-int.h @@ -33,6 +33,7 @@ typedef enum json_token_type { JSON_SKIP, JSON_ERROR, JSON_END_OF_INPUT, + JSON_MAX = JSON_END_OF_INPUT } JSONTokenType; typedef struct JSONToken JSONToken; From patchwork Mon Aug 27 07:00:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962352 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zNG80X66z9s1c for ; Mon, 27 Aug 2018 17:06:08 +1000 (AEST) Received: from localhost ([::1]:51715 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBbB-0000jX-QZ for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:06:05 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57811) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005tk-5f for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVr-00067Q-T6 for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:54312 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVr-00066z-LE for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:35 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42CE640241D8; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F147A1049481; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 040A811385CB; Mon, 27 Aug 2018 09:00:22 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:19 +0200 Message-Id: <20180827070021.11931-5-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.7]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 4/6] json: Nicer recovery from lexical errors X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" When the lexer chokes on an input character, it consumes the character, emits a JSON error token, and enters its start state. This can lead to suboptimal error recovery. For instance, input 0123 , produces the tokens JSON_ERROR 01 JSON_INTEGER 23 JSON_COMMA , Make the lexer skip characters after a lexical error until a structural character ('[', ']', '{', '}', ':', ','), an ASCII control character, or '\xFE', or '\xFF'. Note that we must not skip ASCII control characters, '\xFE', '\xFF', because those are documented to force the JSON parser into known-good state, by docs/interop/qmp-spec.txt. The lexer now produces JSON_ERROR 01 JSON_COMMA , Update qmp-test for the nicer error recovery: QMP now report just one error for input %p instead of two. Also drop the newline after %p; it was needed to tease out the second error. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 43 +++++++++++++++++++++++++++++-------------- tests/qmp-test.c | 6 +----- 2 files changed, 30 insertions(+), 19 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 28582e17d9..39c7ce7adc 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -101,6 +101,7 @@ enum json_lexer_state { IN_ERROR = 0, /* must really be 0, see json_lexer[] */ + IN_RECOVERY, IN_DQ_STRING_ESCAPE, IN_DQ_STRING, IN_SQ_STRING_ESCAPE, @@ -130,6 +131,28 @@ QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1); static const uint8_t json_lexer[][256] = { /* Relies on default initialization to IN_ERROR! */ + /* error recovery */ + [IN_RECOVERY] = { + /* + * Skip characters until a structural character, an ASCII + * control character other than '\t', or impossible UTF-8 + * bytes '\xFE', '\xFF'. Structural characters and line + * endings are promising resynchronization points. Clients + * may use the others to force the JSON parser into known-good + * state; see docs/interop/qmp-spec.txt. + */ + [0 ... 0x1F] = IN_START | LOOKAHEAD, + [0x20 ... 0xFD] = IN_RECOVERY, + [0xFE ... 0xFF] = IN_START | LOOKAHEAD, + ['\t'] = IN_RECOVERY, + ['['] = IN_START | LOOKAHEAD, + [']'] = IN_START | LOOKAHEAD, + ['{'] = IN_START | LOOKAHEAD, + ['}'] = IN_START | LOOKAHEAD, + [':'] = IN_START | LOOKAHEAD, + [','] = IN_START | LOOKAHEAD, + }, + /* double quote string */ [IN_DQ_STRING_ESCAPE] = { [0x20 ... 0xFD] = IN_DQ_STRING, @@ -301,26 +324,18 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) /* fall through */ case JSON_SKIP: g_string_truncate(lexer->token, 0); + /* fall through */ + case IN_START: new_state = lexer->start_state; break; case IN_ERROR: - /* XXX: To avoid having previous bad input leaving the parser in an - * unresponsive state where we consume unpredictable amounts of - * subsequent "good" input, percolate this error state up to the - * parser by emitting a JSON_ERROR token, then reset lexer state. - * - * Also note that this handling is required for reliable channel - * negotiation between QMP and the guest agent, since chr(0xFF) - * is placed at the beginning of certain events to ensure proper - * delivery when the channel is in an unknown state. chr(0xFF) is - * never a valid ASCII/UTF-8 sequence, so this should reliably - * induce an error/flush state. - */ json_message_process_token(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y); + new_state = IN_RECOVERY; + /* fall through */ + case IN_RECOVERY: g_string_truncate(lexer->token, 0); - lexer->state = lexer->start_state; - return; + break; default: break; } diff --git a/tests/qmp-test.c b/tests/qmp-test.c index 4ae2245484..04ad7648d2 100644 --- a/tests/qmp-test.c +++ b/tests/qmp-test.c @@ -93,11 +93,7 @@ static void test_malformed(QTestState *qts) g_assert(recovered(qts)); /* lexical error: interpolation */ - qtest_qmp_send_raw(qts, "%%p\n"); - /* two errors, one for "%", one for "p" */ - resp = qtest_qmp_receive(qts); - g_assert_cmpstr(get_error_class(resp), ==, "GenericError"); - qobject_unref(resp); + qtest_qmp_send_raw(qts, "%%p"); resp = qtest_qmp_receive(qts); g_assert_cmpstr(get_error_class(resp), ==, "GenericError"); qobject_unref(resp); From patchwork Mon Aug 27 07:00:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962353 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zNJL4VBCz9s1c for ; Mon, 27 Aug 2018 17:08:02 +1000 (AEST) Received: from localhost ([::1]:51723 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBd2-0002es-A3 for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:08:00 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57859) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVx-0005uS-O6 for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVw-0006CU-PF for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:41 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55150 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVw-0006BK-FR for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9FF59814FDD2; Mon, 27 Aug 2018 07:00:39 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 74C6E2166B41; Mon, 27 Aug 2018 07:00:39 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 077861138539; Mon, 27 Aug 2018 09:00:22 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:20 +0200 Message-Id: <20180827070021.11931-6-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:39 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:39 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 5/6] json: Eliminate lexer state IN_ERROR X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 9 +++++---- qobject/json-parser-int.h | 8 ++++---- 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 39c7ce7adc..2a5561c917 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -100,8 +100,7 @@ */ enum json_lexer_state { - IN_ERROR = 0, /* must really be 0, see json_lexer[] */ - IN_RECOVERY, + IN_RECOVERY = 1, IN_DQ_STRING_ESCAPE, IN_DQ_STRING, IN_SQ_STRING_ESCAPE, @@ -121,6 +120,8 @@ enum json_lexer_state { IN_START_INTERP, /* must be IN_START + 1 */ }; +QEMU_BUILD_BUG_ON(JSON_ERROR != 0); +QEMU_BUILD_BUG_ON(IN_RECOVERY != JSON_ERROR + 1); QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP); QEMU_BUILD_BUG_ON(JSON_MAX >= 0x80); QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1); @@ -176,7 +177,7 @@ static const uint8_t json_lexer[][256] = { /* Zero */ [IN_ZERO] = { TERMINAL(JSON_INTEGER), - ['0' ... '9'] = IN_ERROR, + ['0' ... '9'] = JSON_ERROR, ['.'] = IN_MANTISSA, }, @@ -328,7 +329,7 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) case IN_START: new_state = lexer->start_state; break; - case IN_ERROR: + case JSON_ERROR: json_message_process_token(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y); new_state = IN_RECOVERY; diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h index abeec63af5..57cb8e79d3 100644 --- a/qobject/json-parser-int.h +++ b/qobject/json-parser-int.h @@ -16,10 +16,11 @@ #include "qapi/qmp/json-parser.h" - typedef enum json_token_type { - JSON_MIN = 100, - JSON_LCURLY = JSON_MIN, + JSON_ERROR = 0, /* must be zero, see json_lexer[] */ + /* Gap for lexer states */ + JSON_LCURLY = 100, + JSON_MIN = JSON_LCURLY, JSON_RCURLY, JSON_LSQUARE, JSON_RSQUARE, @@ -31,7 +32,6 @@ typedef enum json_token_type { JSON_STRING, JSON_INTERP, JSON_SKIP, - JSON_ERROR, JSON_END_OF_INPUT, JSON_MAX = JSON_END_OF_INPUT } JSONTokenType; From patchwork Mon Aug 27 07:00:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962351 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zNCw4HfZz9s2P for ; Mon, 27 Aug 2018 17:04:12 +1000 (AEST) Received: from localhost ([::1]:51705 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBZJ-0007lQ-Nq for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:04:09 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57860) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVx-0005uc-Os for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVw-0006Cd-QJ for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:41 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:50374 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVw-0006BJ-FC for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9C5018573A; Mon, 27 Aug 2018 07:00:39 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 746621010405; Mon, 27 Aug 2018 07:00:39 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 0B034113853C; Mon, 27 Aug 2018 09:00:22 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:21 +0200 Message-Id: <20180827070021.11931-7-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 27 Aug 2018 07:00:39 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.2]); Mon, 27 Aug 2018 07:00:39 +0000 (UTC) for IP:'10.11.54.3' DOMAIN:'int-mx03.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 6/6] json: Eliminate lexer state IN_WHITESPACE, pseudo-token JSON_SKIP X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The lexer ignores whitespace like this: on whitespace on non-ws spontaneously IN_START --> IN_WHITESPACE --> JSON_SKIP --> IN_START ^ | \__/ on whitespace This accumulates a whitespace token in state IN_WHITESPACE, only to throw it away on the transition via JSON_SKIP to the start state. Wasteful. Go from IN_START to IN_START on whitspace directly, dropping the whitespace character. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 22 +++++----------------- qobject/json-parser-int.h | 1 - 2 files changed, 5 insertions(+), 18 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index 2a5561c917..a7df2093aa 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -115,7 +115,6 @@ enum json_lexer_state { IN_SIGN, IN_KEYWORD, IN_INTERP, - IN_WHITESPACE, IN_START, IN_START_INTERP, /* must be IN_START + 1 */ }; @@ -228,15 +227,6 @@ static const uint8_t json_lexer[][256] = { ['a' ... 'z'] = IN_KEYWORD, }, - /* whitespace */ - [IN_WHITESPACE] = { - TERMINAL(JSON_SKIP), - [' '] = IN_WHITESPACE, - ['\t'] = IN_WHITESPACE, - ['\r'] = IN_WHITESPACE, - ['\n'] = IN_WHITESPACE, - }, - /* interpolation */ [IN_INTERP] = { TERMINAL(JSON_INTERP), @@ -263,10 +253,10 @@ static const uint8_t json_lexer[][256] = { [','] = JSON_COMMA, [':'] = JSON_COLON, ['a' ... 'z'] = IN_KEYWORD, - [' '] = IN_WHITESPACE, - ['\t'] = IN_WHITESPACE, - ['\r'] = IN_WHITESPACE, - ['\n'] = IN_WHITESPACE, + [' '] = IN_START, + ['\t'] = IN_START, + ['\r'] = IN_START, + ['\n'] = IN_START, }, [IN_START_INTERP]['%'] = IN_INTERP, }; @@ -323,10 +313,8 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) json_message_process_token(lexer, lexer->token, new_state, lexer->x, lexer->y); /* fall through */ - case JSON_SKIP: - g_string_truncate(lexer->token, 0); - /* fall through */ case IN_START: + g_string_truncate(lexer->token, 0); new_state = lexer->start_state; break; case JSON_ERROR: diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h index 57cb8e79d3..16a25d00bb 100644 --- a/qobject/json-parser-int.h +++ b/qobject/json-parser-int.h @@ -31,7 +31,6 @@ typedef enum json_token_type { JSON_KEYWORD, JSON_STRING, JSON_INTERP, - JSON_SKIP, JSON_END_OF_INPUT, JSON_MAX = JSON_END_OF_INPUT } JSONTokenType;