From patchwork Tue Aug 31 18:33:21 2010
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ian Lance Taylor <iant@google.com>
X-Patchwork-Id: 63312
Return-Path: 
 <gcc-patches-return-271659-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	by ozlabs.org (Postfix) with SMTP id E160FB70D6
	for <incoming@patchwork.ozlabs.org>;
	Wed,  1 Sep 2010 04:33:53 +1000 (EST)
Received: (qmail 23620 invoked by alias); 31 Aug 2010 18:33:51 -0000
Received: (qmail 23600 invoked by uid 22791); 31 Aug 2010 18:33:50 -0000
X-SWARE-Spam-Status: No, hits=-2.2 required=5.0	tests=AWL, BAYES_00,
	DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_PASS, TW_CC,
	TW_XC, T_RP_MATCHES_RCVD, T_TVD_MIME_NO_HEADERS
X-Spam-Check-By: sourceware.org
Received: from smtp-out.google.com (HELO smtp-out.google.com)
	(216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with
	ESMTP; Tue, 31 Aug 2010 18:33:45 +0000
Received: from wpaz5.hot.corp.google.com (wpaz5.hot.corp.google.com
	[172.24.198.69])	by smtp-out.google.com with ESMTP id
	o7VIXhCY021925	for <gcc-patches@gcc.gnu.org>;
	Tue, 31 Aug 2010 11:33:43 -0700
Received: from yxp4 (yxp4.prod.google.com [10.190.4.196])	by
	wpaz5.hot.corp.google.com with ESMTP id o7VIXgdR026181	for
	<gcc-patches@gcc.gnu.org>; Tue, 31 Aug 2010 11:33:42 -0700
Received: by yxp4 with SMTP id 4so1167490yxp.33 for
	<gcc-patches@gcc.gnu.org>; Tue, 31 Aug 2010 11:33:42 -0700 (PDT)
Received: by 10.90.90.13 with SMTP id n13mr4951126agb.31.1283279604535;
	Tue, 31 Aug 2010 11:33:24 -0700 (PDT)
Received: from coign.google.com (dhcp-172-22-124-178.mtv.corp.google.com
	[172.22.124.178]) by mx.google.com with ESMTPS id
	h8sm9065044ibk.9.2010.08.31.11.33.22 (version=TLSv1/SSLv3
	cipher=RC4-MD5); Tue, 31 Aug 2010 11:33:23 -0700 (PDT)
From: Ian Lance Taylor <iant@google.com>
To: gcc-patches@gcc.gnu.org, gofrontend-dev@googlegroups.com
Subject: [gccgo] Don't permit Unicode surrogate pairs in escape sequences
Date: Tue, 31 Aug 2010 11:33:21 -0700
Message-ID: <mcroccieioe.fsf@google.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux)
MIME-Version: 1.0
X-System-Of-Record: true
X-IsSubscribed: yes
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: 
 <mailto:gcc-patches-unsubscribe-incoming=patchwork.ozlabs.org@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org

This patch changes gccgo to not permit Unicode surrogate pairs in
Unicode escape sequences.  Surrogate pairs are only used in UTF-16, but
in Go escape sequences always generate UTF-8.  Committed to gccgo
branch.

Ian

diff -r 79786d3fc04a go/lex.cc
--- a/go/lex.cc	Tue Aug 31 11:08:59 2010 -0700
+++ b/go/lex.cc	Tue Aug 31 11:30:44 2010 -0700
@@ -1173,9 +1173,16 @@
 			+ (hex_value(p[2]) << 8)
 			+ (hex_value(p[3]) << 4)
 			+ hex_value(p[4]));
+	      if (*value >= 0xd800 && *value < 0xe000)
+		{
+		  error_at(this->location(),"invalid unicode code point 0x%x",
+			   *value);
+		  // Use the replacement character.
+		  *value = 0xfffd;
+		}
 	      return p + 5;
 	    }
-	  this->error("invalid little unicode character");
+	  this->error("invalid little unicode code point");
 	  return p + 1;
 
 	case 'U':
@@ -1192,9 +1199,17 @@
 			+ (hex_value(p[6]) << 8)
 			+ (hex_value(p[7]) << 4)
 			+ hex_value(p[8]));
+	      if (*value > 0x10ffff
+		  || (*value >= 0xd800 && *value < 0xe000))
+		{
+		  error_at(this->location(), "invalid unicode code point 0x%x",
+			   *value);
+		  // Use the replacement character.
+		  *value = 0xfffd;
+		}
 	      return p + 9;
 	    }
-	  this->error("invalid big unicode character");
+	  this->error("invalid big unicode code point");
 	  return p + 1;
 
 	default:
@@ -1231,7 +1246,7 @@
       if (v > 0x10ffff)
 	{
 	  warning_at(location, 0,
-		     "unicode character 0x%x out of range in string", v);
+		     "unicode code point 0x%x out of range in string", v);
 	  // Turn it into the "replacement character".
 	  v = 0xfffd;
 	}
Index: gcc/testsuite/go.test/test/char_lit.go
===================================================================
--- gcc/testsuite/go.test/test/char_lit.go	(revision 163682)
+++ gcc/testsuite/go.test/test/char_lit.go	(working copy)
@@ -30,15 +30,15 @@ func main() {
 		'\xFE' +
 		'\u0123' +
 		'\ubabe' +
-		'\U0123ABCD' +
-		'\Ucafebabe'
+		'\U0010FFFF' +
+		'\U000ebabe'
 		;
-	if '\Ucafebabe' != 0xcafebabe {
-		print("cafebabe wrong\n");
+	if '\U000ebabe' != 0x000ebabe {
+		print("ebabe wrong\n");
 		os.Exit(1)
 	}
-	if i != 0xcc238de1 {
-		print("number is ", i, " should be ", 0xcc238de1, "\n");
+	if i != 0x20e213 {
+		print("number is ", i, " should be ", 0x20e213, "\n");
 		os.Exit(1)
 		}
 }