Patchwork Go patch committed: Reject surrogate pairs converting int to string

login
register
mail settings
Submitter Ian Taylor
Date Sept. 22, 2012, 6:52 a.m.
Message ID <mcrk3vmo8gq.fsf@google.com>
Download mbox | patch
Permalink /patch/186099/
State New
Headers show

Comments

Ian Taylor - Sept. 22, 2012, 6:52 a.m.
This patch to the Go frontend and libgo rejects surrogate pairs when
converting an int to a string.  They are not valid UTF-8.  The patch
also rejects a negative int--negative ints were already rejected by the
compiler, but not by the runtime.  Bootstrapped and ran Go testsuite on
x86_64-unknown-linux-gnu.  Committed to mainline and 4.7 branch.

Ian

Patch

diff -r f16ad4ccc868 go/lex.cc
--- a/go/lex.cc	Fri Sep 21 23:32:36 2012 -0700
+++ b/go/lex.cc	Fri Sep 21 23:42:31 2012 -0700
@@ -1312,6 +1312,12 @@ 
 	  // Turn it into the "replacement character".
 	  v = 0xfffd;
 	}
+      if (v >= 0xd800 && v < 0xe000)
+	{
+	  warning_at(location, 0,
+		     "unicode code point 0x%x is invalid surrogate pair", v);
+	  v = 0xfffd;
+	}
       if (v <= 0xffff)
 	{
 	  buf[0] = 0xe0 + (v >> 12);
diff -r f16ad4ccc868 libgo/runtime/go-int-to-string.c
--- a/libgo/runtime/go-int-to-string.c	Fri Sep 21 23:32:36 2012 -0700
+++ b/libgo/runtime/go-int-to-string.c	Fri Sep 21 23:42:31 2012 -0700
@@ -17,6 +17,11 @@ 
   unsigned char *retdata;
   struct __go_string ret;
 
+  /* A negative value is not valid UTF-8; turn it into the replacement
+     character.  */
+  if (v < 0)
+    v = 0xfffd;
+
   if (v <= 0x7f)
     {
       buf[0] = v;
@@ -34,6 +39,10 @@ 
 	 "replacement character".  */
       if (v > 0x10ffff)
 	v = 0xfffd;
+      /* If the value is a surrogate pair, which is invalid in UTF-8,
+	 turn it into the replacement character.  */
+      if (v >= 0xd800 && v < 0xe000)
+	v = 0xfffd;
 
       if (v <= 0xffff)
 	{