Message ID | 1299877249-13433-10-git-send-email-aliguori@us.ibm.com |
---|---|
State | New |
Headers | show |
On Fri, 11 Mar 2011 15:00:47 -0600 Anthony Liguori <aliguori@us.ibm.com> wrote: > This is a security consideration. We don't want a client to cause an arbitrary > amount of memory to be allocated in QEMU. For now, we use a limit of 64MB > which should be large enough for any reasonably sized token. > > This is important for parsing JSON from untrusted sources. > > Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> > > diff --git a/json-lexer.c b/json-lexer.c > index 834d7af..3462c89 100644 > --- a/json-lexer.c > +++ b/json-lexer.c > @@ -18,6 +18,8 @@ > #include "qemu-common.h" > #include "json-lexer.h" > > +#define MAX_TOKEN_SIZE (64ULL << 20) > + > /* > * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" > * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' > @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) > } > lexer->state = new_state; > } while (!char_consumed); > + > + /* Do not let a single token grow to an arbitrarily large size, > + * this is a security consideration. > + */ > + if (lexer->token->length > MAX_TOKEN_SIZE) { > + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); > + QDECREF(lexer->token); > + lexer->token = qstring_new(); > + lexer->state = IN_START; > + } Entering an invalid token is an error, we should fail here. Which brings two features: 1. A test code could trigger this condition and check for the specific error code 2. Developers will know when they hit the limit. Although I don't expect expect this to happen, there was talking about adding base64 support to transfer something (I can't remember what, but we never know how the protocol will evolve). Also, by testing this I found that the parser seems to get confused when the limit is reached: it stops responding. > + > return 0; > } >
On 03/14/2011 02:25 PM, Luiz Capitulino wrote: > On Fri, 11 Mar 2011 15:00:47 -0600 > Anthony Liguori<aliguori@us.ibm.com> wrote: > >> This is a security consideration. We don't want a client to cause an arbitrary >> amount of memory to be allocated in QEMU. For now, we use a limit of 64MB >> which should be large enough for any reasonably sized token. >> >> This is important for parsing JSON from untrusted sources. >> >> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com> >> >> diff --git a/json-lexer.c b/json-lexer.c >> index 834d7af..3462c89 100644 >> --- a/json-lexer.c >> +++ b/json-lexer.c >> @@ -18,6 +18,8 @@ >> #include "qemu-common.h" >> #include "json-lexer.h" >> >> +#define MAX_TOKEN_SIZE (64ULL<< 20) >> + >> /* >> * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" >> * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) >> } >> lexer->state = new_state; >> } while (!char_consumed); >> + >> + /* Do not let a single token grow to an arbitrarily large size, >> + * this is a security consideration. >> + */ >> + if (lexer->token->length> MAX_TOKEN_SIZE) { >> + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); >> + QDECREF(lexer->token); >> + lexer->token = qstring_new(); >> + lexer->state = IN_START; >> + } > Entering an invalid token is an error, we should fail here. It's not so clear to me. I think of it like GCC. GCC doesn't bail out on the first invalid character the lexer encounters. Instead, it records that an error has occurred and tries its best to recover. The result is that instead of getting an error message about the first error in your code, you'll get a long listing of all the mistakes you made (usually). One thing that makes this more difficult in our case is that when you're testing, we don't have a clear EOI to flush things out. So a bad sequence of inputs might make the message parser wait for an a character that you're not necessarily inputting which makes the session appear hung. Usually, if you throw a couple extra brackets around, you'll get back to valid input. > Which brings > two features: > > 1. A test code could trigger this condition and check for the specific > error code > > 2. Developers will know when they hit the limit. Although I don't expect > expect this to happen, there was talking about adding base64 support > to transfer something (I can't remember what, but we never know how the > protocol will evolve). > > Also, by testing this I found that the parser seems to get confused when > the limit is reached: it stops responding. Actually, it does respond. The lexer just takes an incredibly long time to process a large token because qstring_append_ch is incredibly slow :-) If you drop the token size down to 64k instead of 64mb, it'll seem a lot more reasonable. Regards, Anthony Liguori >> + >> return 0; >> } >> >
On Mon, 14 Mar 2011 15:18:57 -0500 Anthony Liguori <anthony@codemonkey.ws> wrote: > On 03/14/2011 02:25 PM, Luiz Capitulino wrote: > > On Fri, 11 Mar 2011 15:00:47 -0600 > > Anthony Liguori<aliguori@us.ibm.com> wrote: > > > >> This is a security consideration. We don't want a client to cause an arbitrary > >> amount of memory to be allocated in QEMU. For now, we use a limit of 64MB > >> which should be large enough for any reasonably sized token. > >> > >> This is important for parsing JSON from untrusted sources. > >> > >> Signed-off-by: Anthony Liguori<aliguori@us.ibm.com> > >> > >> diff --git a/json-lexer.c b/json-lexer.c > >> index 834d7af..3462c89 100644 > >> --- a/json-lexer.c > >> +++ b/json-lexer.c > >> @@ -18,6 +18,8 @@ > >> #include "qemu-common.h" > >> #include "json-lexer.h" > >> > >> +#define MAX_TOKEN_SIZE (64ULL<< 20) > >> + > >> /* > >> * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" > >> * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' > >> @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) > >> } > >> lexer->state = new_state; > >> } while (!char_consumed); > >> + > >> + /* Do not let a single token grow to an arbitrarily large size, > >> + * this is a security consideration. > >> + */ > >> + if (lexer->token->length> MAX_TOKEN_SIZE) { > >> + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); > >> + QDECREF(lexer->token); > >> + lexer->token = qstring_new(); > >> + lexer->state = IN_START; > >> + } > > Entering an invalid token is an error, we should fail here. > > It's not so clear to me. > > I think of it like GCC. GCC doesn't bail out on the first invalid > character the lexer encounters. Instead, it records that an error has > occurred and tries its best to recover. > > The result is that instead of getting an error message about the first > error in your code, you'll get a long listing of all the mistakes you > made (usually). I'm not a compiler expert, but I think compilers do this because it would be very problematic from a usability perspective to report one error at time: by reporting most errors per run, the compiler gives the user to fix as much as possible programming mistakes w/o having to re-run. Our target are not humans, though. > One thing that makes this more difficult in our case is that when you're > testing, we don't have a clear EOI to flush things out. So a bad > sequence of inputs might make the message parser wait for an a character > that you're not necessarily inputting which makes the session appear > hung. Usually, if you throw a couple extra brackets around, you'll get > back to valid input. Not sure if this is easy to do, but shouldn't we fail if we don't get the expected character? > > Which brings > > two features: > > > > 1. A test code could trigger this condition and check for the specific > > error code > > > > 2. Developers will know when they hit the limit. Although I don't expect > > expect this to happen, there was talking about adding base64 support > > to transfer something (I can't remember what, but we never know how the > > protocol will evolve). > > > > Also, by testing this I found that the parser seems to get confused when > > the limit is reached: it stops responding. > > Actually, it does respond. The lexer just takes an incredibly long time > to process a large token because qstring_append_ch is incredibly slow > :-) If you drop the token size down to 64k instead of 64mb, it'll seem > a lot more reasonable. Actually, I dropped it down to 20 :) This is supposed to work like your 64k suggestion, right? > > Regards, > > Anthony Liguori > > >> + > >> return 0; > >> } > >> > > >
diff --git a/json-lexer.c b/json-lexer.c index 834d7af..3462c89 100644 --- a/json-lexer.c +++ b/json-lexer.c @@ -18,6 +18,8 @@ #include "qemu-common.h" #include "json-lexer.h" +#define MAX_TOKEN_SIZE (64ULL << 20) + /* * \"([^\\\"]|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*\" * '([^\\']|(\\\"\\'\\\\\\/\\b\\f\\n\\r\\t\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F]))*' @@ -312,6 +314,17 @@ static int json_lexer_feed_char(JSONLexer *lexer, char ch) } lexer->state = new_state; } while (!char_consumed); + + /* Do not let a single token grow to an arbitrarily large size, + * this is a security consideration. + */ + if (lexer->token->length > MAX_TOKEN_SIZE) { + lexer->emit(lexer, lexer->token, lexer->state, lexer->x, lexer->y); + QDECREF(lexer->token); + lexer->token = qstring_new(); + lexer->state = IN_START; + } + return 0; }
This is a security consideration. We don't want a client to cause an arbitrary amount of memory to be allocated in QEMU. For now, we use a limit of 64MB which should be large enough for any reasonably sized token. This is important for parsing JSON from untrusted sources. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>