Message ID | 1274303733-3700-3-git-send-email-lcapitulino@redhat.com |
---|---|
State | New |
Headers | show |
On 05/19/2010 04:15 PM, Luiz Capitulino wrote: > The JSON escape sequence "\/" and "\\" are valid and should be > handled. > > Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com> > Good catch. > --- > json-lexer.c | 4 ++++ > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/json-lexer.c b/json-lexer.c > index 0b145d1..5cc7e6c 100644 > --- a/json-lexer.c > +++ b/json-lexer.c > @@ -97,6 +97,8 @@ static const uint8_t json_lexer[][256] = { > ['n'] = IN_DQ_STRING, > ['r'] = IN_DQ_STRING, > ['t'] = IN_DQ_STRING, > + ['/'] = IN_DQ_STRING, > + ['\\'] = IN_DQ_STRING, > ['\''] = IN_DQ_STRING, > ['\"'] = IN_DQ_STRING, > ['u'] = IN_DQ_UCODE0, > @@ -134,6 +136,8 @@ static const uint8_t json_lexer[][256] = { > ['n'] = IN_SQ_STRING, > ['r'] = IN_SQ_STRING, > ['t'] = IN_SQ_STRING, > + ['/'] = IN_DQ_STRING, > + ['\\'] = IN_DQ_STRING, > ['\''] = IN_SQ_STRING, > ['\"'] = IN_SQ_STRING, > ['u'] = IN_SQ_UCODE0, >
On Wed, 19 May 2010 16:44:47 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/19/2010 04:15 PM, Luiz Capitulino wrote: > > The JSON escape sequence "\/" and "\\" are valid and should be > > handled. > > > > Signed-off-by: Luiz Capitulino<lcapitulino@redhat.com> > > > > Good catch. I think there's another issue in the handling of strings. The spec says that valid unescaped chars are in the following range: unescaped = %x20-21 / %x23-5B / %x5D-10FFFF But we do: [IN_DQ_STRING] = { [1 ... 0xFF] = IN_DQ_STRING, ['\\'] = IN_DQ_STRING_ESCAPE, ['"'] = IN_DONE_STRING, }, Shouldn't we cover 0x20 .. 0xFF instead?
On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > I think there's another issue in the handling of strings. > > The spec says that valid unescaped chars are in the following range: > > unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > > But we do: > > [IN_DQ_STRING] = { > [1 ... 0xFF] = IN_DQ_STRING, > ['\\'] = IN_DQ_STRING_ESCAPE, > ['"'] = IN_DONE_STRING, > }, > > Shouldn't we cover 0x20 .. 0xFF instead? If it's the lexer, isn't just it being liberal in what it accepts? paolo
On Thu, 20 May 2010 17:16:01 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > > I think there's another issue in the handling of strings. > > > > The spec says that valid unescaped chars are in the following range: > > > > unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > > > > But we do: > > > > [IN_DQ_STRING] = { > > [1 ... 0xFF] = IN_DQ_STRING, > > ['\\'] = IN_DQ_STRING_ESCAPE, > > ['"'] = IN_DONE_STRING, > > }, > > > > Shouldn't we cover 0x20 .. 0xFF instead? > > If it's the lexer, isn't just it being liberal in what it accepts? Yes, it's the lexer, but you meant that the fix should be in somewhere else?
On 05/20/2010 05:25 PM, Luiz Capitulino wrote: > On Thu, 20 May 2010 17:16:01 +0200 > Paolo Bonzini<pbonzini@redhat.com> wrote: > >> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >>> I think there's another issue in the handling of strings. >>> >>> The spec says that valid unescaped chars are in the following range: >>> >>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF >>> >>> But we do: >>> >>> [IN_DQ_STRING] = { >>> [1 ... 0xFF] = IN_DQ_STRING, >>> ['\\'] = IN_DQ_STRING_ESCAPE, >>> ['"'] = IN_DONE_STRING, >>> }, >>> >>> Shouldn't we cover 0x20 .. 0xFF instead? >> >> If it's the lexer, isn't just it being liberal in what it accepts? > > Yes, it's the lexer, but you meant that the fix should be in > somewhere else? I meant that we're just accepting some invalid JSON and that's not a big deal. Paolo
On Thu, 20 May 2010 17:26:03 +0200 Paolo Bonzini <pbonzini@redhat.com> wrote: > On 05/20/2010 05:25 PM, Luiz Capitulino wrote: > > On Thu, 20 May 2010 17:16:01 +0200 > > Paolo Bonzini<pbonzini@redhat.com> wrote: > > > >> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >>> I think there's another issue in the handling of strings. > >>> > >>> The spec says that valid unescaped chars are in the following range: > >>> > >>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > >>> > >>> But we do: > >>> > >>> [IN_DQ_STRING] = { > >>> [1 ... 0xFF] = IN_DQ_STRING, > >>> ['\\'] = IN_DQ_STRING_ESCAPE, > >>> ['"'] = IN_DONE_STRING, > >>> }, > >>> > >>> Shouldn't we cover 0x20 .. 0xFF instead? > >> > >> If it's the lexer, isn't just it being liberal in what it accepts? > > > > Yes, it's the lexer, but you meant that the fix should be in > > somewhere else? > > I meant that we're just accepting some invalid JSON and that's not a big > deal. It can become a big deal if clients rely on it and for some reason we decide we should drop it. Ie. after QMP is declared stable such changes won't be allowed. Yes, I know, the chances of someone relying on this kind of thing is probably almost zero. At the same time I think we should be very conservative if there's no good reason to do otherwise.
On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >> I think there's another issue in the handling of strings. >> >> The spec says that valid unescaped chars are in the following range: >> >> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in strings. Any parser that didn't accept that would be broken. >> >> But we do: >> >> [IN_DQ_STRING] = { >> [1 ... 0xFF] = IN_DQ_STRING, >> ['\\'] = IN_DQ_STRING_ESCAPE, >> ['"'] = IN_DONE_STRING, >> }, >> >> Shouldn't we cover 0x20 .. 0xFF instead? > > If it's the lexer, isn't just it being liberal in what it accepts? I believe the parser correctly rejects invalid UTF-8 sequences. Regards, Anthony Liguori > paolo
On 05/20/2010 10:35 AM, Luiz Capitulino wrote: >> I meant that we're just accepting some invalid JSON and that's not a big >> deal. >> > It can become a big deal if clients rely on it and for some reason we > decide we should drop it. Ie. after QMP is declared stable such changes > won't be allowed. > Clients should only rely on standard JSON. Anything else is a bug in the client. Regards, Anthony Liguori > Yes, I know, the chances of someone relying on this kind of thing is > probably almost zero. At the same time I think we should be very > conservative if there's no good reason to do otherwise. >
On Thu, 20 May 2010 10:50:41 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > > On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >> I think there's another issue in the handling of strings. > >> > >> The spec says that valid unescaped chars are in the following range: > >> > >> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > > That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in > strings. Any parser that didn't accept that would be broken. Honestly, I had the impression this should be encoded as: %x5C %x74, but if you're right, wouldn't this be true for other sequences as well? > >> > >> But we do: > >> > >> [IN_DQ_STRING] = { > >> [1 ... 0xFF] = IN_DQ_STRING, > >> ['\\'] = IN_DQ_STRING_ESCAPE, > >> ['"'] = IN_DONE_STRING, > >> }, > >> > >> Shouldn't we cover 0x20 .. 0xFF instead? > > > > If it's the lexer, isn't just it being liberal in what it accepts? > > I believe the parser correctly rejects invalid UTF-8 sequences. Will check.
On Thu, 20 May 2010 10:54:42 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/20/2010 10:35 AM, Luiz Capitulino wrote: > >> I meant that we're just accepting some invalid JSON and that's not a big > >> deal. > >> > > It can become a big deal if clients rely on it and for some reason we > > decide we should drop it. Ie. after QMP is declared stable such changes > > won't be allowed. > > > > Clients should only rely on standard JSON. Anything else is a bug in > the client. I feel this is like a trap, why exposing it if don't want clients to use them?
On 05/20/2010 11:27 AM, Luiz Capitulino wrote: > On Thu, 20 May 2010 10:50:41 -0500 > Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: >> >>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >>> >>>> I think there's another issue in the handling of strings. >>>> >>>> The spec says that valid unescaped chars are in the following range: >>>> >>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF >>>> >> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in >> strings. Any parser that didn't accept that would be broken. >> > Honestly, I had the impression this should be encoded as: %x5C %x74, but > if you're right, wouldn't this be true for other sequences as well? > I don't think most reasonable clients are going to quote tabs as '\t'. Regards, Anthony Liguori >>>> But we do: >>>> >>>> [IN_DQ_STRING] = { >>>> [1 ... 0xFF] = IN_DQ_STRING, >>>> ['\\'] = IN_DQ_STRING_ESCAPE, >>>> ['"'] = IN_DONE_STRING, >>>> }, >>>> >>>> Shouldn't we cover 0x20 .. 0xFF instead? >>>> >>> If it's the lexer, isn't just it being liberal in what it accepts? >>> >> I believe the parser correctly rejects invalid UTF-8 sequences. >> > Will check. >
On Thu, 20 May 2010 11:55:00 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/20/2010 11:27 AM, Luiz Capitulino wrote: > > On Thu, 20 May 2010 10:50:41 -0500 > > Anthony Liguori<anthony@codemonkey.ws> wrote: > > > > > >> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > >> > >>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >>> > >>>> I think there's another issue in the handling of strings. > >>>> > >>>> The spec says that valid unescaped chars are in the following range: > >>>> > >>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > >>>> > >> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in > >> strings. Any parser that didn't accept that would be broken. > >> > > Honestly, I had the impression this should be encoded as: %x5C %x74, but > > if you're right, wouldn't this be true for other sequences as well? > > > > I don't think most reasonable clients are going to quote tabs as '\t'. That would be a bug, wouldn't it? Python example: >>> json.dumps('\t') '"\\t"' >>> YAJL example (inlined below): /tmp/ ./teste 0x22 0x5c 0x74 0x22 /tmp/ I think we should strictly conform to the spec, quirks should only be added when really needed. #include <stdio.h> #include <yajl/yajl_gen.h> int main(void) { yajl_gen g; unsigned int i, len = 0; const unsigned char *str = NULL; yajl_gen_config conf = { 0, " " }; g = yajl_gen_alloc(&conf, NULL); if (yajl_gen_string(g, (unsigned char *) "\t", 1) != yajl_gen_status_ok) return 1; if (yajl_gen_get_buf(g, &str, &len) != yajl_gen_status_ok) return 1; for (i = 0; i < len; i++) printf("0x%x ", str[i]); printf("\n"); return 0; }
On 05/20/2010 01:47 PM, Luiz Capitulino wrote: > On Thu, 20 May 2010 11:55:00 -0500 > Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 05/20/2010 11:27 AM, Luiz Capitulino wrote: >> >>> On Thu, 20 May 2010 10:50:41 -0500 >>> Anthony Liguori<anthony@codemonkey.ws> wrote: >>> >>> >>> >>>> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: >>>> >>>> >>>>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >>>>> >>>>> >>>>>> I think there's another issue in the handling of strings. >>>>>> >>>>>> The spec says that valid unescaped chars are in the following range: >>>>>> >>>>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF >>>>>> >>>>>> >>>> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in >>>> strings. Any parser that didn't accept that would be broken. >>>> >>>> >>> Honestly, I had the impression this should be encoded as: %x5C %x74, but >>> if you're right, wouldn't this be true for other sequences as well? >>> >>> >> I don't think most reasonable clients are going to quote tabs as '\t'. >> > That would be a bug, wouldn't it? > Tabs are valid in JavaScript strings and I don't think it's reasonable to expect that a valid JavaScript string is not a valid JSON string. Regards, Anthony Liguori
On Thu, 20 May 2010 13:52:08 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/20/2010 01:47 PM, Luiz Capitulino wrote: > > On Thu, 20 May 2010 11:55:00 -0500 > > Anthony Liguori<anthony@codemonkey.ws> wrote: > > > > > >> On 05/20/2010 11:27 AM, Luiz Capitulino wrote: > >> > >>> On Thu, 20 May 2010 10:50:41 -0500 > >>> Anthony Liguori<anthony@codemonkey.ws> wrote: > >>> > >>> > >>> > >>>> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > >>>> > >>>> > >>>>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >>>>> > >>>>> > >>>>>> I think there's another issue in the handling of strings. > >>>>>> > >>>>>> The spec says that valid unescaped chars are in the following range: > >>>>>> > >>>>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > >>>>>> > >>>>>> > >>>> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in > >>>> strings. Any parser that didn't accept that would be broken. > >>>> > >>>> > >>> Honestly, I had the impression this should be encoded as: %x5C %x74, but > >>> if you're right, wouldn't this be true for other sequences as well? > >>> > >>> > >> I don't think most reasonable clients are going to quote tabs as '\t'. > >> > > That would be a bug, wouldn't it? > > > > Tabs are valid in JavaScript strings and I don't think it's reasonable > to expect that a valid JavaScript string is not a valid JSON string. IMO, we should do what the spec says and what bug free clients expect, what we consider reasonable or unreasonable is a different matter. I would be with you if the spec was proved wrong, specially if reference implementations out there didn't follow it either, but everything I found so far shows this is not the case. Another example: http://www.json.org/json2.js Search for 'character substitutions'.
On 05/20/2010 02:22 PM, Luiz Capitulino wrote: > On Thu, 20 May 2010 13:52:08 -0500 > Anthony Liguori<anthony@codemonkey.ws> wrote: > > >> On 05/20/2010 01:47 PM, Luiz Capitulino wrote: >> >>> On Thu, 20 May 2010 11:55:00 -0500 >>> Anthony Liguori<anthony@codemonkey.ws> wrote: >>> >>> >>> >>>> On 05/20/2010 11:27 AM, Luiz Capitulino wrote: >>>> >>>> >>>>> On Thu, 20 May 2010 10:50:41 -0500 >>>>> Anthony Liguori<anthony@codemonkey.ws> wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> I think there's another issue in the handling of strings. >>>>>>>> >>>>>>>> The spec says that valid unescaped chars are in the following range: >>>>>>>> >>>>>>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in >>>>>> strings. Any parser that didn't accept that would be broken. >>>>>> >>>>>> >>>>>> >>>>> Honestly, I had the impression this should be encoded as: %x5C %x74, but >>>>> if you're right, wouldn't this be true for other sequences as well? >>>>> >>>>> >>>>> >>>> I don't think most reasonable clients are going to quote tabs as '\t'. >>>> >>>> >>> That would be a bug, wouldn't it? >>> >>> >> Tabs are valid in JavaScript strings and I don't think it's reasonable >> to expect that a valid JavaScript string is not a valid JSON string. >> > IMO, we should do what the spec says and what bug free clients expect, > what we consider reasonable or unreasonable is a different matter. > How we encode strings is one thing, what we accept is something else. Why shouldn't we be liberal in what we accept? It doesn't violate the spec to accept more than it requires so why shouldn't we? Regards, Anthony Liguori > I would be with you if the spec was proved wrong, specially if reference > implementations out there didn't follow it either, but everything I found > so far shows this is not the case. > > Another example: > > http://www.json.org/json2.js > > Search for 'character substitutions'. > > >
On Mon, 24 May 2010 14:29:58 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 05/20/2010 02:22 PM, Luiz Capitulino wrote: > > On Thu, 20 May 2010 13:52:08 -0500 > > Anthony Liguori<anthony@codemonkey.ws> wrote: > > > > > >> On 05/20/2010 01:47 PM, Luiz Capitulino wrote: > >> > >>> On Thu, 20 May 2010 11:55:00 -0500 > >>> Anthony Liguori<anthony@codemonkey.ws> wrote: > >>> > >>> > >>> > >>>> On 05/20/2010 11:27 AM, Luiz Capitulino wrote: > >>>> > >>>> > >>>>> On Thu, 20 May 2010 10:50:41 -0500 > >>>>> Anthony Liguori<anthony@codemonkey.ws> wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> On 05/20/2010 10:16 AM, Paolo Bonzini wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On 05/20/2010 03:44 PM, Luiz Capitulino wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> I think there's another issue in the handling of strings. > >>>>>>>> > >>>>>>>> The spec says that valid unescaped chars are in the following range: > >>>>>>>> > >>>>>>>> unescaped = %x20-21 / %x23-5B / %x5D-10FFFF > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> That's a spec bug IMHO. Tab is %x09. Surely you can include tabs in > >>>>>> strings. Any parser that didn't accept that would be broken. > >>>>>> > >>>>>> > >>>>>> > >>>>> Honestly, I had the impression this should be encoded as: %x5C %x74, but > >>>>> if you're right, wouldn't this be true for other sequences as well? > >>>>> > >>>>> > >>>>> > >>>> I don't think most reasonable clients are going to quote tabs as '\t'. > >>>> > >>>> > >>> That would be a bug, wouldn't it? > >>> > >>> > >> Tabs are valid in JavaScript strings and I don't think it's reasonable > >> to expect that a valid JavaScript string is not a valid JSON string. > >> > > IMO, we should do what the spec says and what bug free clients expect, > > what we consider reasonable or unreasonable is a different matter. > > > > How we encode strings is one thing, what we accept is something else. True. > Why shouldn't we be liberal in what we accept? It doesn't violate the > spec to accept more than it requires so why shouldn't we? For the reasons outlined by Avi, not sure how this serious this is though.
diff --git a/json-lexer.c b/json-lexer.c index 0b145d1..5cc7e6c 100644 --- a/json-lexer.c +++ b/json-lexer.c @@ -97,6 +97,8 @@ static const uint8_t json_lexer[][256] = { ['n'] = IN_DQ_STRING, ['r'] = IN_DQ_STRING, ['t'] = IN_DQ_STRING, + ['/'] = IN_DQ_STRING, + ['\\'] = IN_DQ_STRING, ['\''] = IN_DQ_STRING, ['\"'] = IN_DQ_STRING, ['u'] = IN_DQ_UCODE0, @@ -134,6 +136,8 @@ static const uint8_t json_lexer[][256] = { ['n'] = IN_SQ_STRING, ['r'] = IN_SQ_STRING, ['t'] = IN_SQ_STRING, + ['/'] = IN_DQ_STRING, + ['\\'] = IN_DQ_STRING, ['\''] = IN_SQ_STRING, ['\"'] = IN_SQ_STRING, ['u'] = IN_SQ_UCODE0,
The JSON escape sequence "\/" and "\\" are valid and should be handled. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> --- json-lexer.c | 4 ++++ 1 files changed, 4 insertions(+), 0 deletions(-)