Fix out of bounds array access in the preprocessor (PR preprocessor/92919)
diff mbox series

Message ID 20191214080944.GR10088@tucnak
State New
Headers show
Series
  • Fix out of bounds array access in the preprocessor (PR preprocessor/92919)
Related show

Commit Message

Jakub Jelinek Dec. 14, 2019, 8:09 a.m. UTC
Hi!

wide_str_to_charconst function relies on the string passed to it
having at least two wide characters, the one we are looking for and
the terminating NUL.  The empty wide character literal like
L'' or u'' or U'' is handled earlier and will not reach this function,
but unfortunately for
const char16_t p = u'\U00110003';
while we do emit an error wide_str_to_charconst is called with a string
that contains just the NUL terminator and nothing else.  That is because
U110003 is too large and can't be represented even as a surrogate pair
in char16_t, but the handling of it doesn't give up on the whole string,
because other wide characters could be fine.  Say u'a\U00110003' would
be passed to wide_str_to_charconst after diagnosing an error 
because the too large char would be thrown away and we'd end up with
u'a'.
The following patch fixes it by just checking for this condition and
punting.  I think it is undesirable to print further error.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-12-13  Jakub Jelinek  <jakub@redhat.com>

	PR preprocessor/92919
	* charset.c (wide_str_to_charconst): If str contains just the
	NUL terminator, punt quietly.


	Jakub

Comments

Jason Merrill Dec. 14, 2019, 3:15 p.m. UTC | #1
On 12/14/19 3:09 AM, Jakub Jelinek wrote:
> Hi!
> 
> wide_str_to_charconst function relies on the string passed to it
> having at least two wide characters, the one we are looking for and
> the terminating NUL.  The empty wide character literal like
> L'' or u'' or U'' is handled earlier and will not reach this function,
> but unfortunately for
> const char16_t p = u'\U00110003';
> while we do emit an error wide_str_to_charconst is called with a string
> that contains just the NUL terminator and nothing else.  That is because
> U110003 is too large and can't be represented even as a surrogate pair
> in char16_t, but the handling of it doesn't give up on the whole string,
> because other wide characters could be fine.  Say u'a\U00110003' would
> be passed to wide_str_to_charconst after diagnosing an error
> because the too large char would be thrown away and we'd end up with
> u'a'.
> The following patch fixes it by just checking for this condition and
> punting.  I think it is undesirable to print further error.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2019-12-13  Jakub Jelinek  <jakub@redhat.com>
> 
> 	PR preprocessor/92919
> 	* charset.c (wide_str_to_charconst): If str contains just the
> 	NUL terminator, punt quietly.
> 
> --- libcpp/charset.c.jj	2019-12-10 00:56:07.552291870 +0100
> +++ libcpp/charset.c	2019-12-13 12:23:59.096150225 +0100
> @@ -1970,6 +1970,17 @@ wide_str_to_charconst (cpp_reader *pfile
>     size_t off, i;
>     cppchar_t result = 0, c;
>   
> +  if (str.len <= nbwc)
> +    {
> +      /* Error recovery, if no errors have been diagnosed previously,
> +	 there should be at least two wide characters.  Empty literals
> +	 are diagnosed earlier and we can get just the zero terminator
> +	 only if there were errors diagnosed during conversion.  */
> +      *pchars_seen = 0;
> +      *unsignedp = 0;
> +      return 0;
> +    }
> +
>     /* This is finicky because the string is in the target's byte order,
>        which may not be our byte order.  Only the last character, ignoring
>        the NUL terminator, is relevant.  */
> 
> 	Jakub
>

Patch
diff mbox series

--- libcpp/charset.c.jj	2019-12-10 00:56:07.552291870 +0100
+++ libcpp/charset.c	2019-12-13 12:23:59.096150225 +0100
@@ -1970,6 +1970,17 @@  wide_str_to_charconst (cpp_reader *pfile
   size_t off, i;
   cppchar_t result = 0, c;
 
+  if (str.len <= nbwc)
+    {
+      /* Error recovery, if no errors have been diagnosed previously,
+	 there should be at least two wide characters.  Empty literals
+	 are diagnosed earlier and we can get just the zero terminator
+	 only if there were errors diagnosed during conversion.  */
+      *pchars_seen = 0;
+      *unsignedp = 0;
+      return 0;
+    }
+
   /* This is finicky because the string is in the target's byte order,
      which may not be our byte order.  Only the last character, ignoring
      the NUL terminator, is relevant.  */