From patchwork Fri Aug 5 20:11:33 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ian Lance Taylor X-Patchwork-Id: 656291 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3s5dJ03fNwz9sxS for ; Sat, 6 Aug 2016 06:11:59 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b=Mfs7/+qp; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=wpN6015au3xavctwEdYdm9RzQyTA5lY8pBM2zA2SwC8vEI 4NSxqb5DvIX39WrcLWQDzJMn06+1H86STfwIVHm61rbgus4pAcrzqi214viVj41z ZekMSB0SrlIe+bAy/66/WUMqlM0PbGybL4Lh7JvemubIETX+EYQPQrA7TcF8I= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=Sw0rhUadirjIUhidZ0udtFjG+Uo=; b=Mfs7/+qpT59bP3CbEZb2 wKOEgdvlass0BUKWIlMDahXmPp5DFaOfdLZU6eR03n+Cg5P3Ulnb/AZPnQ/5MhT2 awl+M+iKfro6G6T4GMpUaTWO2jhyV9AeT29R9R2BPdl5S+dnQ89PbmY1iwwPmfaK R5wIwxfeVbD62CAv99MAP/Y= Received: (qmail 82380 invoked by alias); 5 Aug 2016 20:11:49 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 82305 invoked by uid 89); 5 Aug 2016 20:11:48 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL, BAYES_00, RCVD_IN_DNSWL_LOW, SPF_PASS autolearn=ham version=3.3.2 spammy=tree_static, encoded, TREE_STATIC, sk:locatio X-HELO: mail-io0-f175.google.com Received: from mail-io0-f175.google.com (HELO mail-io0-f175.google.com) (209.85.223.175) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Fri, 05 Aug 2016 20:11:36 +0000 Received: by mail-io0-f175.google.com with SMTP id q83so310339820iod.1 for ; Fri, 05 Aug 2016 13:11:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=mLHOWnGbm0atMHkkxvhff80NPc3vxLjEQXv8LXnlxxY=; b=DhJbEIXWMscbp8t3X0c3KykawUS24zgj2R2tDHoC//eRREUn3wkidv+v/Ia4K7GrqN PcosHuaxpfsMhWGIp4OIEpmdk7/2HJ4xkUR7YmAht+hIT9D8rY4UqB0HhF5PUmAIJaTM /EG7gAe+OV0aOA4WUnREMdT17z+vRbMrYyLxfJJwnttoxa2eCuFJMF/pnuOQ97aGq4UW eD+GsV+F/NvyH81mWIOK8syX1f4GuGmb3Y8bhSsmOoFzx+8zZ8B0UsZQF+X9VGOxJSYC 32wnoCiarC56cAHYjql65s6cCARiDzCJUlXPIBGgzhFKHmouoKXKs5M6kO3xDeUtssJw AjJQ== X-Gm-Message-State: AEkoousk2FUYzBCa6GJDa4hAXxI5fdqwqEe9auJiRz0OfpI/+efREQpcdyVs0Skb7zO7Za0Lwj/zyejoeegUog== X-Received: by 10.107.181.13 with SMTP id e13mr93761802iof.88.1470427893498; Fri, 05 Aug 2016 13:11:33 -0700 (PDT) MIME-Version: 1.0 Received: by 10.79.35.199 with HTTP; Fri, 5 Aug 2016 13:11:33 -0700 (PDT) From: Ian Lance Taylor Date: Fri, 5 Aug 2016 13:11:33 -0700 Message-ID: Subject: Go patch committed: Avoid non-ASCII characters in asm identifiers To: gcc-patches , "gofrontend-dev@googlegroups.com" PR 72812 points out that Go can generate non-ASCII characters in assembly code. This is a consequence of the fact that Go permits identifiers to contain non-ASCII Unicode code points. The GNU assembler doesn't seem to mind, but the Solaris assembler does. This patch changes the GCC Go interface to encode non-ASCII characters in identifier names by setting DECL_ASSEMBLER_NAME to an encoded value. This fixes the problem with the Solaris assembler without disturbing the debug info. Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu. Ran reflect package tests on Solaris and confirmed that they now pass. Committed to mainline. Ian 2016-08-05 Ian Lance Taylor PR go/72812 * go-gcc.cc (char_needs_encoding): New static function. (needs_encoding, fetch_utf8_char): New static functions. (encode_id): New static function. (Gcc_backend::global_variable): Set asm name if the name is not simple ASCII. (Gcc_backend::implicit_variable): Likewise. (Gcc_backend::implicit_variable_reference): Likewise. (Gcc_backend::immutable_struct): Likewise. (Gcc_backend::immutable_struct_reference): Likewise. (Gcc_backend::function): Likewise. Index: gcc/go/go-gcc.cc =================================================================== --- gcc/go/go-gcc.cc (revision 238653) +++ gcc/go/go-gcc.cc (working copy) @@ -541,7 +541,7 @@ private: std::map builtin_functions_; }; -// A helper function. +// A helper function to create a GCC identifier from a C++ string. static inline tree get_identifier_from_string(const std::string& str) @@ -549,6 +549,102 @@ get_identifier_from_string(const std::st return get_identifier_with_length(str.data(), str.length()); } +// Return whether the character c is OK to use in the assembler. + +static bool +char_needs_encoding(char c) +{ + switch (c) + { + case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': + case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': + case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': + case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': + case 'Y': case 'Z': + case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': + case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': + case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': + case 's': case 't': case 'u': case 'v': case 'w': case 'x': + case 'y': case 'z': + case '0': case '1': case '2': case '3': case '4': + case '5': case '6': case '7': case '8': case '9': + case '_': case '.': case '$': case '/': + return false; + default: + return true; + } +} + +// Return whether the identifier needs to be translated because it +// contains non-ASCII characters. + +static bool +needs_encoding(const std::string& str) +{ + for (std::string::const_iterator p = str.begin(); + p != str.end(); + ++p) + if (char_needs_encoding(*p)) + return true; + return false; +} + +// Pull the next UTF-8 character out of P and store it in *PC. Return +// the number of bytes read. + +static size_t +fetch_utf8_char(const char* p, unsigned int* pc) +{ + unsigned char c = *p; + if ((c & 0x80) == 0) + { + *pc = c; + return 1; + } + size_t len = 0; + while ((c & 0x80) != 0) + { + ++len; + c <<= 1; + } + unsigned int rc = *p & ((1 << (7 - len)) - 1); + for (size_t i = 1; i < len; i++) + { + unsigned int u = p[i]; + rc <<= 6; + rc |= u & 0x3f; + } + *pc = rc; + return len; +} + +// Encode an identifier using ASCII characters. + +static std::string +encode_id(const std::string id) +{ + std::string ret; + const char* p = id.c_str(); + const char* pend = p + id.length(); + while (p < pend) + { + unsigned int c; + size_t len = fetch_utf8_char(p, &c); + if (len == 1 && !char_needs_encoding(c)) + ret += c; + else + { + ret += "$U"; + char buf[30]; + snprintf(buf, sizeof buf, "%x", c); + ret += buf; + ret += "$"; + } + p += len; + } + return ret; +} + // Define the built-in functions that are exposed to GCCGo. Gcc_backend::Gcc_backend() @@ -2454,8 +2550,14 @@ Gcc_backend::global_variable(const std:: std::string asm_name(pkgpath); asm_name.push_back('.'); asm_name.append(name); + if (needs_encoding(asm_name)) + asm_name = encode_id(asm_name); SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(asm_name)); } + else if (needs_encoding(var_name)) + SET_DECL_ASSEMBLER_NAME(decl, + get_identifier_from_string(encode_id(var_name))); + TREE_USED(decl) = 1; if (in_unique_section) @@ -2690,6 +2792,8 @@ Gcc_backend::implicit_variable(const std SET_DECL_ALIGN(decl, alignment * BITS_PER_UNIT); DECL_USER_ALIGN(decl) = 1; } + if (needs_encoding(name)) + SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name))); go_preserve_from_gc(decl); return new Bvariable(decl); @@ -2742,6 +2846,8 @@ Gcc_backend::implicit_variable_reference TREE_PUBLIC(decl) = 1; TREE_STATIC(decl) = 1; DECL_ARTIFICIAL(decl) = 1; + if (needs_encoding(name)) + SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name))); go_preserve_from_gc(decl); return new Bvariable(decl); } @@ -2766,6 +2872,8 @@ Gcc_backend::immutable_struct(const std: DECL_ARTIFICIAL(decl) = 1; if (!is_hidden) TREE_PUBLIC(decl) = 1; + if (needs_encoding(name)) + SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name))); // When the initializer for one immutable_struct refers to another, // it needs to know the visibility of the referenced struct so that @@ -2840,6 +2948,8 @@ Gcc_backend::immutable_struct_reference( DECL_ARTIFICIAL(decl) = 1; TREE_PUBLIC(decl) = 1; DECL_EXTERNAL(decl) = 1; + if (needs_encoding(name)) + SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name))); go_preserve_from_gc(decl); return new Bvariable(decl); } @@ -2931,6 +3041,8 @@ Gcc_backend::function(Btype* fntype, con tree decl = build_decl(location.gcc_location(), FUNCTION_DECL, id, functype); if (!asm_name.empty()) SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(asm_name)); + else if (needs_encoding(name)) + SET_DECL_ASSEMBLER_NAME(decl, get_identifier_from_string(encode_id(name))); if (is_visible) TREE_PUBLIC(decl) = 1; if (is_declaration)