From patchwork Mon Aug 27 07:00:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Markus Armbruster X-Patchwork-Id: 962346 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=nongnu.org (client-ip=2001:4830:134:3::11; helo=lists.gnu.org; envelope-from=qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=redhat.com Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:4830:134:3::11]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41zN8p2dZZz9s47 for ; Mon, 27 Aug 2018 17:01:29 +1000 (AEST) Received: from localhost ([::1]:51693 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBWg-0005wD-EQ for incoming@patchwork.ozlabs.org; Mon, 27 Aug 2018 03:01:26 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57827) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fuBVv-0005tn-Ti for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fuBVr-00067I-RW for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:39 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:55140 helo=mx1.redhat.com) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1fuBVr-00066u-Kv for qemu-devel@nongnu.org; Mon, 27 Aug 2018 03:00:35 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 42CC8814FDD2; Mon, 27 Aug 2018 07:00:35 +0000 (UTC) Received: from blackfin.pond.sub.org (ovpn-116-97.ams2.redhat.com [10.36.116.97]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F18052166B41; Mon, 27 Aug 2018 07:00:32 +0000 (UTC) Received: by blackfin.pond.sub.org (Postfix, from userid 1000) id 004AA113860E; Mon, 27 Aug 2018 09:00:21 +0200 (CEST) From: Markus Armbruster To: qemu-devel@nongnu.org Date: Mon, 27 Aug 2018 09:00:18 +0200 Message-Id: <20180827070021.11931-4-armbru@redhat.com> In-Reply-To: <20180827070021.11931-1-armbru@redhat.com> References: <20180827070021.11931-1-armbru@redhat.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Mon, 27 Aug 2018 07:00:35 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'armbru@redhat.com' RCPT:'' X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 66.187.233.73 Subject: [Qemu-devel] [PATCH 3/6] json: Make lexer's "character consumed" logic less confusing X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: marcandre.lureau@redhat.com, mdroth@linux.vnet.ibm.com Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" The lexer uses macro TERMINAL_NEEDED_LOOKAHEAD() to decide whether a state transition consumes the input character. It returns true when the state transition is defined with the TERMINAL() macro. To detect that, it checks whether input '\0' would have resulted in the same state transition, and the new state is not IN_ERROR. Why does that even work? For all states, the new state on input '\0' is either IN_ERROR or defined with TERMINAL(). If the state transition equals the one we'd get for input '\0', it goes to IN_ERROR or to the argument of TERMINAL(). We never use TERMINAL(IN_ERROR), because it makes no sense. Thus, if it doesn't go to IN_ERROR, it must be defined with TERMINAL(). Since this isn't quite confusing enough, we negate the result to get @char_consumed, and ignore it when @flush is true. Instead of deriving the lookahead bit from the state transition, make it explicit. This is easier to understand, and a bit more flexible, too. Signed-off-by: Markus Armbruster Reviewed-by: Eric Blake --- qobject/json-lexer.c | 27 ++++++++++++++++----------- qobject/json-parser-int.h | 1 + 2 files changed, 17 insertions(+), 11 deletions(-) diff --git a/qobject/json-lexer.c b/qobject/json-lexer.c index ec3aec726f..28582e17d9 100644 --- a/qobject/json-lexer.c +++ b/qobject/json-lexer.c @@ -121,15 +121,11 @@ enum json_lexer_state { }; QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP); +QEMU_BUILD_BUG_ON(JSON_MAX >= 0x80); QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1); -#define TERMINAL(state) [0 ... 0xFF] = (state) - -/* Return whether TERMINAL is a terminal state and the transition to it - from OLD_STATE required lookahead. This happens whenever the table - below uses the TERMINAL macro. */ -#define TERMINAL_NEEDED_LOOKAHEAD(old_state, terminal) \ - (terminal != IN_ERROR && json_lexer[(old_state)][0] == (terminal)) +#define LOOKAHEAD 0x80 +#define TERMINAL(state) [0 ... 0xFF] = ((state) | LOOKAHEAD) static const uint8_t json_lexer[][256] = { /* Relies on default initialization to IN_ERROR! */ @@ -251,6 +247,17 @@ static const uint8_t json_lexer[][256] = { [IN_START_INTERP]['%'] = IN_INTERP, }; +static inline uint8_t next_state(JSONLexer *lexer, char ch, bool flush, + bool *char_consumed) +{ + uint8_t next; + + assert(lexer->state <= ARRAY_SIZE(json_lexer)); + next = json_lexer[lexer->state][(uint8_t)ch]; + *char_consumed = !flush && !(next & LOOKAHEAD); + return next & ~LOOKAHEAD; +} + void json_lexer_init(JSONLexer *lexer, bool enable_interpolation) { lexer->start_state = lexer->state = enable_interpolation @@ -271,11 +278,9 @@ static void json_lexer_feed_char(JSONLexer *lexer, char ch, bool flush) } while (flush ? lexer->state != lexer->start_state : !char_consumed) { - assert(lexer->state <= ARRAY_SIZE(json_lexer)); - new_state = json_lexer[lexer->state][(uint8_t)ch]; - char_consumed = !flush - && !TERMINAL_NEEDED_LOOKAHEAD(lexer->state, new_state); + new_state = next_state(lexer, ch, flush, &char_consumed); if (char_consumed) { + assert(!flush); g_string_append_c(lexer->token, ch); } diff --git a/qobject/json-parser-int.h b/qobject/json-parser-int.h index ceaa890ec6..abeec63af5 100644 --- a/qobject/json-parser-int.h +++ b/qobject/json-parser-int.h @@ -33,6 +33,7 @@ typedef enum json_token_type { JSON_SKIP, JSON_ERROR, JSON_END_OF_INPUT, + JSON_MAX = JSON_END_OF_INPUT } JSONTokenType; typedef struct JSONToken JSONToken;