Message ID | 20191214080944.GR10088@tucnak |
---|---|
State | New |
Headers | show |
Series | Fix out of bounds array access in the preprocessor (PR preprocessor/92919) | expand |
On 12/14/19 3:09 AM, Jakub Jelinek wrote: > Hi! > > wide_str_to_charconst function relies on the string passed to it > having at least two wide characters, the one we are looking for and > the terminating NUL. The empty wide character literal like > L'' or u'' or U'' is handled earlier and will not reach this function, > but unfortunately for > const char16_t p = u'\U00110003'; > while we do emit an error wide_str_to_charconst is called with a string > that contains just the NUL terminator and nothing else. That is because > U110003 is too large and can't be represented even as a surrogate pair > in char16_t, but the handling of it doesn't give up on the whole string, > because other wide characters could be fine. Say u'a\U00110003' would > be passed to wide_str_to_charconst after diagnosing an error > because the too large char would be thrown away and we'd end up with > u'a'. > The following patch fixes it by just checking for this condition and > punting. I think it is undesirable to print further error. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? OK. > 2019-12-13 Jakub Jelinek <jakub@redhat.com> > > PR preprocessor/92919 > * charset.c (wide_str_to_charconst): If str contains just the > NUL terminator, punt quietly. > > --- libcpp/charset.c.jj 2019-12-10 00:56:07.552291870 +0100 > +++ libcpp/charset.c 2019-12-13 12:23:59.096150225 +0100 > @@ -1970,6 +1970,17 @@ wide_str_to_charconst (cpp_reader *pfile > size_t off, i; > cppchar_t result = 0, c; > > + if (str.len <= nbwc) > + { > + /* Error recovery, if no errors have been diagnosed previously, > + there should be at least two wide characters. Empty literals > + are diagnosed earlier and we can get just the zero terminator > + only if there were errors diagnosed during conversion. */ > + *pchars_seen = 0; > + *unsignedp = 0; > + return 0; > + } > + > /* This is finicky because the string is in the target's byte order, > which may not be our byte order. Only the last character, ignoring > the NUL terminator, is relevant. */ > > Jakub >
--- libcpp/charset.c.jj 2019-12-10 00:56:07.552291870 +0100 +++ libcpp/charset.c 2019-12-13 12:23:59.096150225 +0100 @@ -1970,6 +1970,17 @@ wide_str_to_charconst (cpp_reader *pfile size_t off, i; cppchar_t result = 0, c; + if (str.len <= nbwc) + { + /* Error recovery, if no errors have been diagnosed previously, + there should be at least two wide characters. Empty literals + are diagnosed earlier and we can get just the zero terminator + only if there were errors diagnosed during conversion. */ + *pchars_seen = 0; + *unsignedp = 0; + return 0; + } + /* This is finicky because the string is in the target's byte order, which may not be our byte order. Only the last character, ignoring the NUL terminator, is relevant. */