Patchwork [08/11] json-lexer: reset the lexer state on an invalid token

login
register
mail settings
Submitter Anthony Liguori
Date March 11, 2011, 9 p.m.
Message ID <1299877249-13433-9-git-send-email-aliguori@us.ibm.com>
Download mbox | patch
Permalink /patch/86458/
State New
Headers show

Comments

Anthony Liguori - March 11, 2011, 9 p.m.
Not everything handles errors from json parsing gracefully.  By at least
resetting the lexer, we'll start generating valid tokens again and hopefully
recover the stream.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Luiz Capitulino - March 14, 2011, 7:22 p.m.
On Fri, 11 Mar 2011 15:00:46 -0600
Anthony Liguori <aliguori@us.ibm.com> wrote:

> Not everything handles errors from json parsing gracefully.  By at least
> resetting the lexer, we'll start generating valid tokens again and hopefully
> recover the stream.
> 
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> 
> diff --git a/json-lexer.c b/json-lexer.c
> index c736f42..834d7af 100644
> --- a/json-lexer.c
> +++ b/json-lexer.c
> @@ -303,6 +303,9 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>              new_state = IN_START;
>              break;
>          case ERROR:
> +            QDECREF(lexer->token);
> +            lexer->token = qstring_new();
> +            new_state = IN_START;
>              return -EINVAL;

This makes the parser accept broken input like:

  { "execute": xxxxx }
  {"return": {}}

  { "execute": _ }
  {"return": {}}

Today, it handles this kind of input correctly:

  { "execute": xxxxx }
  {"error": {"class": "JSONParsing", "desc": "Invalid JSON syntax", "data": {}}}

Although it also accepts broken stuff today, like:

 { "execute": ___"query-block" }


>          default:
>              break;
Anthony Liguori - March 14, 2011, 7:43 p.m.
On 03/14/2011 02:22 PM, Luiz Capitulino wrote:
> On Fri, 11 Mar 2011 15:00:46 -0600
> Anthony Liguori<aliguori@us.ibm.com>  wrote:
>
>> Not everything handles errors from json parsing gracefully.  By at least
>> resetting the lexer, we'll start generating valid tokens again and hopefully
>> recover the stream.
>>
>> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
>>
>> diff --git a/json-lexer.c b/json-lexer.c
>> index c736f42..834d7af 100644
>> --- a/json-lexer.c
>> +++ b/json-lexer.c
>> @@ -303,6 +303,9 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>>               new_state = IN_START;
>>               break;
>>           case ERROR:
>> +            QDECREF(lexer->token);
>> +            lexer->token = qstring_new();
>> +            new_state = IN_START;
>>               return -EINVAL;
> This makes the parser accept broken input like:
>
>    { "execute": xxxxx }
>    {"return": {}}

This is a bug in the current QMP server.  Here's how my new QMP server 
responds:

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 13, "major": 0}, 
"package": ""}, "capabilities": []}}
{"error": {"class": "JSONParseError", "data": {"message": "Missing value 
in dict"}}}

>    { "execute": _ }
>    {"return": {}}

Likewise, the new QMP server does not respond to this at all (which 
confuses me TBH).

> Today, it handles this kind of input correctly:
>
>    { "execute": xxxxx }
>    {"error": {"class": "JSONParsing", "desc": "Invalid JSON syntax", "data": {}}}

The parser rejects this verses trying to get what it can out of it and 
passing that to QMP.  The idea here is to be more graceful in dealing 
with bad input and trying to recover.

> Although it also accepts broken stuff today, like:
>
>   { "execute": ___"query-block" }

This is really the server, not the parser.  The new server doesn't 
accept this.

I guess QMP today just ignores the incoming QObject in capabilities mode 
and always returns {}.  You'll see the same thing with:

{ "execute": "not-a-valid-command" }
{"return": {}}

But once you're in command mode, it does the right thing.

Regards,

Anthony Liguori

>>           default:
>>               break;
Luiz Capitulino - March 14, 2011, 8:12 p.m.
On Mon, 14 Mar 2011 14:43:48 -0500
Anthony Liguori <aliguori@us.ibm.com> wrote:

> On 03/14/2011 02:22 PM, Luiz Capitulino wrote:
> > On Fri, 11 Mar 2011 15:00:46 -0600
> > Anthony Liguori<aliguori@us.ibm.com>  wrote:
> >
> >> Not everything handles errors from json parsing gracefully.  By at least
> >> resetting the lexer, we'll start generating valid tokens again and hopefully
> >> recover the stream.
> >>
> >> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
> >>
> >> diff --git a/json-lexer.c b/json-lexer.c
> >> index c736f42..834d7af 100644
> >> --- a/json-lexer.c
> >> +++ b/json-lexer.c
> >> @@ -303,6 +303,9 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
> >>               new_state = IN_START;
> >>               break;
> >>           case ERROR:
> >> +            QDECREF(lexer->token);
> >> +            lexer->token = qstring_new();
> >> +            new_state = IN_START;
> >>               return -EINVAL;
> > This makes the parser accept broken input like:
> >
> >    { "execute": xxxxx }
> >    {"return": {}}
> 
> This is a bug in the current QMP server.  Here's how my new QMP server 
> responds:
> 
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 13, "major": 0}, 
> "package": ""}, "capabilities": []}}
> {"error": {"class": "JSONParseError", "data": {"message": "Missing value 
> in dict"}}}

How do you handle it? Do you check the return of json_message_parser_feed()?

If that's the case, then the real problem in the current server is that we
use qemu's chardev interface and its read handler doesn't allow for
signaling errors. I did not consider not using it.

By looking at your branch I have the impression you wrote your own stuff,
am I right? If yes, doesn't it duplicate the chardev implementation?

> 
> >    { "execute": _ }
> >    {"return": {}}
> 
> Likewise, the new QMP server does not respond to this at all (which 
> confuses me TBH).
> 
> > Today, it handles this kind of input correctly:
> >
> >    { "execute": xxxxx }
> >    {"error": {"class": "JSONParsing", "desc": "Invalid JSON syntax", "data": {}}}
> 
> The parser rejects this verses trying to get what it can out of it and 
> passing that to QMP.  The idea here is to be more graceful in dealing 
> with bad input and trying to recover.

I'm all for trying to recover, but we can't have varied responses for
bad input. It seems easier to just fail.
> 
> > Although it also accepts broken stuff today, like:
> >
> >   { "execute": ___"query-block" }
> 
> This is really the server, not the parser.  The new server doesn't 
> accept this.

This is probably the same as the xxxxx above.

> I guess QMP today just ignores the incoming QObject in capabilities mode 
> and always returns {}.  You'll see the same thing with:
> 
> { "execute": "not-a-valid-command" }
> {"return": {}}
> 
> But once you're in command mode, it does the right thing.

I can't reproduce it w/o this series applied:

{"QMP": {"version": {"qemu": {"micro": 50, "minor": 14, "major": 0}, "package": ""}, "capabilities": []}}
{ "execute": "not-a-valid-command" }
{"error": {"class": "CommandNotFound", "desc": "The command not-a-valid-command has not been found", "data": {"name": "not-a-valid-command"}}}
Anthony Liguori - March 14, 2011, 8:30 p.m.
On 03/14/2011 03:12 PM, Luiz Capitulino wrote:
> On Mon, 14 Mar 2011 14:43:48 -0500
> Anthony Liguori<aliguori@us.ibm.com>  wrote:
>
>> On 03/14/2011 02:22 PM, Luiz Capitulino wrote:
>>> On Fri, 11 Mar 2011 15:00:46 -0600
>>> Anthony Liguori<aliguori@us.ibm.com>   wrote:
>>>
>>>> Not everything handles errors from json parsing gracefully.  By at least
>>>> resetting the lexer, we'll start generating valid tokens again and hopefully
>>>> recover the stream.
>>>>
>>>> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
>>>>
>>>> diff --git a/json-lexer.c b/json-lexer.c
>>>> index c736f42..834d7af 100644
>>>> --- a/json-lexer.c
>>>> +++ b/json-lexer.c
>>>> @@ -303,6 +303,9 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
>>>>                new_state = IN_START;
>>>>                break;
>>>>            case ERROR:
>>>> +            QDECREF(lexer->token);
>>>> +            lexer->token = qstring_new();
>>>> +            new_state = IN_START;
>>>>                return -EINVAL;
>>> This makes the parser accept broken input like:
>>>
>>>     { "execute": xxxxx }
>>>     {"return": {}}
>> This is a bug in the current QMP server.  Here's how my new QMP server
>> responds:
>>
>> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 13, "major": 0},
>> "package": ""}, "capabilities": []}}
>> {"error": {"class": "JSONParseError", "data": {"message": "Missing value
>> in dict"}}}
> How do you handle it? Do you check the return of json_message_parser_feed()?
>
> If that's the case, then the real problem in the current server is that we
> use qemu's chardev interface and its read handler doesn't allow for
> signaling errors. I did not consider not using it.
>
> By looking at your branch I have the impression you wrote your own stuff,
> am I right? If yes, doesn't it duplicate the chardev implementation?

No, that test was with the chardev interface.  There is both a chardev 
server and a unix domain socket server.

I'm not really sure why the current server isn't working correctly.  I'd 
have to investigate.

>>>     { "execute": _ }
>>>     {"return": {}}
>> Likewise, the new QMP server does not respond to this at all (which
>> confuses me TBH).
>>
>>> Today, it handles this kind of input correctly:
>>>
>>>     { "execute": xxxxx }
>>>     {"error": {"class": "JSONParsing", "desc": "Invalid JSON syntax", "data": {}}}
>> The parser rejects this verses trying to get what it can out of it and
>> passing that to QMP.  The idea here is to be more graceful in dealing
>> with bad input and trying to recover.
> I'm all for trying to recover, but we can't have varied responses for
> bad input. It seems easier to just fail.

I think we need to make sure that we don't ever succeed in the face of 
bad input, right?

So far, none of the test cases (against the new QMP server) succeed 
given bad input.

>> I guess QMP today just ignores the incoming QObject in capabilities mode
>> and always returns {}.  You'll see the same thing with:
>>
>> { "execute": "not-a-valid-command" }
>> {"return": {}}
>>
>> But once you're in command mode, it does the right thing.
> I can't reproduce it w/o this series applied:
>
> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 14, "major": 0}, "package": ""}, "capabilities": []}}
> { "execute": "not-a-valid-command" }
> {"error": {"class": "CommandNotFound", "desc": "The command not-a-valid-command has not been found", "data": {"name": "not-a-valid-command"}}}

Curious, maybe I'm remembering this wrong then.  Let me dig in a big.

Regards,

Anthony Liguori
Luiz Capitulino - March 14, 2011, 8:43 p.m.
On Mon, 14 Mar 2011 15:30:33 -0500
Anthony Liguori <aliguori@us.ibm.com> wrote:

> On 03/14/2011 03:12 PM, Luiz Capitulino wrote:
> > On Mon, 14 Mar 2011 14:43:48 -0500
> > Anthony Liguori<aliguori@us.ibm.com>  wrote:
> >
> >> On 03/14/2011 02:22 PM, Luiz Capitulino wrote:
> >>> On Fri, 11 Mar 2011 15:00:46 -0600
> >>> Anthony Liguori<aliguori@us.ibm.com>   wrote:
> >>>
> >>>> Not everything handles errors from json parsing gracefully.  By at least
> >>>> resetting the lexer, we'll start generating valid tokens again and hopefully
> >>>> recover the stream.
> >>>>
> >>>> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com>
> >>>>
> >>>> diff --git a/json-lexer.c b/json-lexer.c
> >>>> index c736f42..834d7af 100644
> >>>> --- a/json-lexer.c
> >>>> +++ b/json-lexer.c
> >>>> @@ -303,6 +303,9 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch)
> >>>>                new_state = IN_START;
> >>>>                break;
> >>>>            case ERROR:
> >>>> +            QDECREF(lexer->token);
> >>>> +            lexer->token = qstring_new();
> >>>> +            new_state = IN_START;
> >>>>                return -EINVAL;
> >>> This makes the parser accept broken input like:
> >>>
> >>>     { "execute": xxxxx }
> >>>     {"return": {}}
> >> This is a bug in the current QMP server.  Here's how my new QMP server
> >> responds:
> >>
> >> {"QMP": {"version": {"qemu": {"micro": 50, "minor": 13, "major": 0},
> >> "package": ""}, "capabilities": []}}
> >> {"error": {"class": "JSONParseError", "data": {"message": "Missing value
> >> in dict"}}}
> > How do you handle it? Do you check the return of json_message_parser_feed()?
> >
> > If that's the case, then the real problem in the current server is that we
> > use qemu's chardev interface and its read handler doesn't allow for
> > signaling errors. I did not consider not using it.
> >
> > By looking at your branch I have the impression you wrote your own stuff,
> > am I right? If yes, doesn't it duplicate the chardev implementation?
> 
> No, that test was with the chardev interface.  There is both a chardev 
> server and a unix domain socket server.
> 
> I'm not really sure why the current server isn't working correctly.  I'd 
> have to investigate.

That's the only problem wrt bad input that should be handled by the current
server but isn't afaik.

> >>>     { "execute": _ }
> >>>     {"return": {}}
> >> Likewise, the new QMP server does not respond to this at all (which
> >> confuses me TBH).
> >>
> >>> Today, it handles this kind of input correctly:
> >>>
> >>>     { "execute": xxxxx }
> >>>     {"error": {"class": "JSONParsing", "desc": "Invalid JSON syntax", "data": {}}}
> >> The parser rejects this verses trying to get what it can out of it and
> >> passing that to QMP.  The idea here is to be more graceful in dealing
> >> with bad input and trying to recover.
> > I'm all for trying to recover, but we can't have varied responses for
> > bad input. It seems easier to just fail.
> 
> I think we need to make sure that we don't ever succeed in the face of 
> bad input, right?

Right.

> So far, none of the test cases (against the new QMP server) succeed 
> given bad input.

But the server has to return an error, not responding is also wrong.

Also note that I'm not aware of this kind of thing happening with the
current server either (only with this series applied).

> >> I guess QMP today just ignores the incoming QObject in capabilities mode
> >> and always returns {}.  You'll see the same thing with:
> >>
> >> { "execute": "not-a-valid-command" }
> >> {"return": {}}
> >>
> >> But once you're in command mode, it does the right thing.
> > I can't reproduce it w/o this series applied:
> >
> > {"QMP": {"version": {"qemu": {"micro": 50, "minor": 14, "major": 0}, "package": ""}, "capabilities": []}}
> > { "execute": "not-a-valid-command" }
> > {"error": {"class": "CommandNotFound", "desc": "The command not-a-valid-command has not been found", "data": {"name": "not-a-valid-command"}}}
> 
> Curious, maybe I'm remembering this wrong then.  Let me dig in a big.
> 
> Regards,
> 
> Anthony Liguori
> 
>

Patch

diff --git a/json-lexer.c b/json-lexer.c
index c736f42..834d7af 100644
--- a/json-lexer.c
+++ b/json-lexer.c
@@ -303,6 +303,9 @@  static int json_lexer_feed_char(JSONLexer *lexer, char ch)
             new_state = IN_START;
             break;
         case ERROR:
+            QDECREF(lexer->token);
+            lexer->token = qstring_new();
+            new_state = IN_START;
             return -EINVAL;
         default:
             break;