From patchwork Tue Aug 31 18:33:21 2010 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Lance Taylor X-Patchwork-Id: 63312 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) by ozlabs.org (Postfix) with SMTP id E160FB70D6 for ; Wed, 1 Sep 2010 04:33:53 +1000 (EST) Received: (qmail 23620 invoked by alias); 31 Aug 2010 18:33:51 -0000 Received: (qmail 23600 invoked by uid 22791); 31 Aug 2010 18:33:50 -0000 X-SWARE-Spam-Status: No, hits=-2.2 required=5.0 tests=AWL, BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, SPF_HELO_PASS, TW_CC, TW_XC, T_RP_MATCHES_RCVD, T_TVD_MIME_NO_HEADERS X-Spam-Check-By: sourceware.org Received: from smtp-out.google.com (HELO smtp-out.google.com) (216.239.44.51) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Tue, 31 Aug 2010 18:33:45 +0000 Received: from wpaz5.hot.corp.google.com (wpaz5.hot.corp.google.com [172.24.198.69]) by smtp-out.google.com with ESMTP id o7VIXhCY021925 for ; Tue, 31 Aug 2010 11:33:43 -0700 Received: from yxp4 (yxp4.prod.google.com [10.190.4.196]) by wpaz5.hot.corp.google.com with ESMTP id o7VIXgdR026181 for ; Tue, 31 Aug 2010 11:33:42 -0700 Received: by yxp4 with SMTP id 4so1167490yxp.33 for ; Tue, 31 Aug 2010 11:33:42 -0700 (PDT) Received: by 10.90.90.13 with SMTP id n13mr4951126agb.31.1283279604535; Tue, 31 Aug 2010 11:33:24 -0700 (PDT) Received: from coign.google.com (dhcp-172-22-124-178.mtv.corp.google.com [172.22.124.178]) by mx.google.com with ESMTPS id h8sm9065044ibk.9.2010.08.31.11.33.22 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 31 Aug 2010 11:33:23 -0700 (PDT) From: Ian Lance Taylor To: gcc-patches@gcc.gnu.org, gofrontend-dev@googlegroups.com Subject: [gccgo] Don't permit Unicode surrogate pairs in escape sequences Date: Tue, 31 Aug 2010 11:33:21 -0700 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.1 (gnu/linux) MIME-Version: 1.0 X-System-Of-Record: true X-IsSubscribed: yes Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org This patch changes gccgo to not permit Unicode surrogate pairs in Unicode escape sequences. Surrogate pairs are only used in UTF-16, but in Go escape sequences always generate UTF-8. Committed to gccgo branch. Ian diff -r 79786d3fc04a go/lex.cc --- a/go/lex.cc Tue Aug 31 11:08:59 2010 -0700 +++ b/go/lex.cc Tue Aug 31 11:30:44 2010 -0700 @@ -1173,9 +1173,16 @@ + (hex_value(p[2]) << 8) + (hex_value(p[3]) << 4) + hex_value(p[4])); + if (*value >= 0xd800 && *value < 0xe000) + { + error_at(this->location(),"invalid unicode code point 0x%x", + *value); + // Use the replacement character. + *value = 0xfffd; + } return p + 5; } - this->error("invalid little unicode character"); + this->error("invalid little unicode code point"); return p + 1; case 'U': @@ -1192,9 +1199,17 @@ + (hex_value(p[6]) << 8) + (hex_value(p[7]) << 4) + hex_value(p[8])); + if (*value > 0x10ffff + || (*value >= 0xd800 && *value < 0xe000)) + { + error_at(this->location(), "invalid unicode code point 0x%x", + *value); + // Use the replacement character. + *value = 0xfffd; + } return p + 9; } - this->error("invalid big unicode character"); + this->error("invalid big unicode code point"); return p + 1; default: @@ -1231,7 +1246,7 @@ if (v > 0x10ffff) { warning_at(location, 0, - "unicode character 0x%x out of range in string", v); + "unicode code point 0x%x out of range in string", v); // Turn it into the "replacement character". v = 0xfffd; } Index: gcc/testsuite/go.test/test/char_lit.go =================================================================== --- gcc/testsuite/go.test/test/char_lit.go (revision 163682) +++ gcc/testsuite/go.test/test/char_lit.go (working copy) @@ -30,15 +30,15 @@ func main() { '\xFE' + '\u0123' + '\ubabe' + - '\U0123ABCD' + - '\Ucafebabe' + '\U0010FFFF' + + '\U000ebabe' ; - if '\Ucafebabe' != 0xcafebabe { - print("cafebabe wrong\n"); + if '\U000ebabe' != 0x000ebabe { + print("ebabe wrong\n"); os.Exit(1) } - if i != 0xcc238de1 { - print("number is ", i, " should be ", 0xcc238de1, "\n"); + if i != 0x20e213 { + print("number is ", i, " should be ", 0x20e213, "\n"); os.Exit(1) } }