From patchwork Fri Mar 25 19:47:48 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Roth X-Patchwork-Id: 88411 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from lists.gnu.org (lists.gnu.org [199.232.76.165]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id 58105B6F85 for ; Sat, 26 Mar 2011 06:50:00 +1100 (EST) Received: from localhost ([127.0.0.1]:51292 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q3D1B-0008Sg-HA for incoming@patchwork.ozlabs.org; Fri, 25 Mar 2011 15:49:57 -0400 Received: from [140.186.70.92] (port=33544 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q3Czg-0008Qn-3P for qemu-devel@nongnu.org; Fri, 25 Mar 2011 15:48:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q3Cze-0006K8-6T for qemu-devel@nongnu.org; Fri, 25 Mar 2011 15:48:23 -0400 Received: from e8.ny.us.ibm.com ([32.97.182.138]:45584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q3Czd-0006Jh-Ve for qemu-devel@nongnu.org; Fri, 25 Mar 2011 15:48:22 -0400 Received: from d01dlp01.pok.ibm.com (d01dlp01.pok.ibm.com [9.56.224.56]) by e8.ny.us.ibm.com (8.14.4/8.13.1) with ESMTP id p2PJNgYL004859 for ; Fri, 25 Mar 2011 15:23:42 -0400 Received: from d01relay03.pok.ibm.com (d01relay03.pok.ibm.com [9.56.227.235]) by d01dlp01.pok.ibm.com (Postfix) with ESMTP id C205B38C8038 for ; Fri, 25 Mar 2011 15:48:15 -0400 (EDT) Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay03.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id p2PJmLlx333182 for ; Fri, 25 Mar 2011 15:48:21 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id p2PJmKtx024621 for ; Fri, 25 Mar 2011 16:48:21 -0300 Received: from localhost.localdomain (sig-9-65-252-65.mts.ibm.com [9.65.252.65]) by d01av03.pok.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id p2PJm3t6022972; Fri, 25 Mar 2011 16:48:20 -0300 From: Michael Roth To: qemu-devel@nongnu.org Date: Fri, 25 Mar 2011 14:47:48 -0500 Message-Id: <1301082479-4058-2-git-send-email-mdroth@linux.vnet.ibm.com> X-Mailer: git-send-email 1.7.0.4 In-Reply-To: <1301082479-4058-1-git-send-email-mdroth@linux.vnet.ibm.com> References: <1301082479-4058-1-git-send-email-mdroth@linux.vnet.ibm.com> X-Content-Scanned: Fidelis XPS MAILER X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.6, seldom 2.4 (older, 4) X-Received-From: 32.97.182.138 Cc: aliguori@linux.vnet.ibm.com, agl@linux.vnet.ibm.com, mdroth@linux.vnet.ibm.com, Jes.Sorensen@redhat.com Subject: [Qemu-devel] [RFC][PATCH v1 01/12] json-lexer: make lexer error-recovery more deterministic X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Currently when we reach an error state we effectively flush everything fed to the lexer, which can put us in a state where we keep feeding tokens into the parser at arbitrary offsets in the stream. This makes it difficult for the lexer/tokenizer/parser to get back in sync when bad input is made by the client. With these changes we emit an error state/token up to the tokenizer as soon as we reach an error state, and continue processing any data passed in rather than bailing out. The reset token will be used to reset the tokenizer and parser, such that they'll recover state as soon as the lexer begins generating valid token sequences again. We also map chr(0xFF) to an error state here, since it's an invalid UTF-8 character. QMP guest proxy/agent use this to force a flush/reset of previous input for reliable delivery of certain events, so also we document that thoroughly here. Signed-off-by: Michael Roth --- json-lexer.c | 22 ++++++++++++++++++---- json-lexer.h | 1 + 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/json-lexer.c b/json-lexer.c index 3462c89..21aa03a 100644 --- a/json-lexer.c +++ b/json-lexer.c @@ -105,7 +105,7 @@ static const uint8_t json_lexer[][256] = { ['u'] = IN_DQ_UCODE0, }, [IN_DQ_STRING] = { - [1 ... 0xFF] = IN_DQ_STRING, + [1 ... 0xFE] = IN_DQ_STRING, ['\\'] = IN_DQ_STRING_ESCAPE, ['"'] = JSON_STRING, }, @@ -144,7 +144,7 @@ static const uint8_t json_lexer[][256] = { ['u'] = IN_SQ_UCODE0, }, [IN_SQ_STRING] = { - [1 ... 0xFF] = IN_SQ_STRING, + [1 ... 0xFE] = IN_SQ_STRING, ['\\'] = IN_SQ_STRING_ESCAPE, ['\''] = JSON_STRING, }, @@ -305,10 +305,25 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) new_state = IN_START; break; case ERROR: + /* XXX: To avoid having previous bad input leaving the parser in an + * unresponsive state where we consume unpredictable amounts of + * subsequent "good" input, percolate this error state up to the + * tokenizer/parser by forcing a NULL object to be emitted, then + * reset state. + * + * Also note that this handling is required for reliable channel + * negotiation between QMP and the guest agent, since chr(0xFF) + * is placed at the beginning of certain events to ensure proper + * delivery when the channel is in an unknown state. chr(0xFF) is + * never a valid ASCII/UTF-8 sequence, so this should reliably + * induce an error/flush state. + */ + lexer->emit(lexer, lexer->token, JSON_ERROR, lexer->x, lexer->y); QDECREF(lexer->token); lexer->token = qstring_new(); new_state = IN_START; - return -EINVAL; + lexer->state = new_state; + return 0; default: break; } @@ -334,7 +349,6 @@ int json_lexer_feed(JSONLexer *lexer, const char *buffer, size_t size) for (i = 0; i < size; i++) { int err; - err = json_lexer_feed_char(lexer, buffer[i]); if (err < 0) { return err; diff --git a/json-lexer.h b/json-lexer.h index 3b50c46..10bc0a7 100644 --- a/json-lexer.h +++ b/json-lexer.h @@ -25,6 +25,7 @@ typedef enum json_token_type { JSON_STRING, JSON_ESCAPE, JSON_SKIP, + JSON_ERROR, } JSONTokenType; typedef struct JSONLexer JSONLexer;