From patchwork Wed Aug 14 17:22:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eduard-Mihai Burtescu X-Patchwork-Id: 1147139 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-506964-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=lyken.rs Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="DizH+aC2"; dkim=pass (2048-bit key; unprotected) header.d=lyken.rs header.i=@lyken.rs header.b="JRoydyJH"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=messagingengine.com header.i=@messagingengine.com header.b="Sq6SH/VX"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 467xKt2smGz9sSv for ; Thu, 15 Aug 2019 03:25:06 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:message-id:date:from:to:cc:subject:content-type; q=dns; s=default; b=YNwnR3KiPLKUBR5ITp2PQpaIY/ENPn5P/Ff/5W4zego xvdd+C7199Ihd8MJx6J+zY51LBpC27MXSptTa+BOSIMDpjmhCJqnM6nCnGHczFUt +pSfpa4JugNcW+HR8fYBJaF3bPrNlZ+J7lV+kPLC9ZF6jCBSBZvUvZse1dr8jtGw = DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:message-id:date:from:to:cc:subject:content-type; s=default; bh=eNP/0/WhJxzNitQV4XeQzFd4azQ=; b=DizH+aC2b0BdUraAx M8kseAOK3bSJiKXPazmt9tCcZL5OB22V/0tm7Qw9qL4IOh7mvCDFnMJsMyWUMCvo bIuplvzwHPliaB4gZkA2/pOqKRHb8mSFcuvzUzpDGR8bpRWUwnc6uSuSF0aUWe1m CNvxLUitvKdjC5NgiB1jgWhn8w= Received: (qmail 113679 invoked by alias); 14 Aug 2019 17:24:56 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 113561 invoked by uid 89); 14 Aug 2019 17:24:48 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-21.4 required=5.0 tests=AWL, BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_LOW, SPF_HELO_PASS, SPF_PASS autolearn=ham version=3.3.1 spammy=1593, H*F:D*rs, lp, Unique X-HELO: out1-smtp.messagingengine.com Received: from out1-smtp.messagingengine.com (HELO out1-smtp.messagingengine.com) (66.111.4.25) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 14 Aug 2019 17:24:45 +0000 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.nyi.internal (Postfix) with ESMTP id B16AF20B25; Wed, 14 Aug 2019 13:24:43 -0400 (EDT) Received: from imap1 ([10.202.2.51]) by compute4.internal (MEProxy); Wed, 14 Aug 2019 13:24:43 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lyken.rs; h= mime-version:message-id:date:from:to:cc:subject:content-type; s= fm1; bh=rS55S9yHysMbzLDrjdLINQ9Mgo4hFk7slmgLvlraNAw=; b=JRoydyJH hAsn8dYHbna5Mp1NzMSXGCC7gXosHJYEw18+rsZ4powA3RMGClCqUtnsOszDATWL y/h/blhe4xsS39oQ51L+635IU3JI1Po8kwhMIhRH7fd1XDGvMZ+LLY3H8eHN09T+ FgQr0JC6s6l0S2DV9GB2dScDlpJ14MfHE4ByakfhjIWf2WEgoIgM77fr8D2JJobc MZ4MkixNTIfEoUaIbJkOIPzioJ/WcfKe+c0TunVEKvz9QX4xftJkdg8PNAj7aHVz WcZ72WluNmMRlBLXiXx069BhIYbhLSolW/Lv/S5/MYFYh9heCyIG/GnVGH1nvBOv ZHBJTFuCCN61Rw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:from:message-id :mime-version:subject:to:x-me-proxy:x-me-proxy:x-me-sender :x-me-sender:x-sasl-enc; s=fm3; bh=rS55S9yHysMbzLDrjdLINQ9Mgo4hF k7slmgLvlraNAw=; b=Sq6SH/VXBAFrUWOhQ9ZOs2sFeQXD58k9opYZEqZvIWMuz yJm0mT11kooGALjm/TgQwHknQ3IKSqYFb7BOigGJyR/Yh/i0KVzHxsbmTtckSyDm V5YDenx5lR0HV1F6GiloBtbY9FlChzRCZbMtE70F9XRh6MFSFmtykW3L7+y6uC0s yvA+SExsmsDf9Gf/TU5weMgmLwYWaI6jc2uRJpAP1urqKwnu9zxxR0QwMd/9TvCc KNchoJF51vP+OLjFKnFfK1LAF6307jFLMgrozJ4EP/kNctB+dHu3VAV75NUPUxw/ jvjLMREZYkIjNL68CP9k+M9U+AZHFp8zo4jgCj7ng== X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduvddruddvledghedtucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefofgggkfffhffvufgtsehttdertd erredtnecuhfhrohhmpedfgfguuhgrrhguqdfoihhhrghiuceuuhhrthgvshgtuhdfuceo vgguugihsgeslhihkhgvnhdrrhhsqeenucffohhmrghinhepghhithhhuhgsrdgtohhmne curfgrrhgrmhepmhgrihhlfhhrohhmpegvugguhigssehlhihkvghnrdhrshenucevlhhu shhtvghrufhiiigvpedt Received: by mailuser.nyi.internal (Postfix, from userid 501) id AA2EFC200A4; Wed, 14 Aug 2019 13:24:42 -0400 (EDT) User-Agent: Cyrus-JMAP/3.1.6-869-g2d94aad-fmstable-20190814v1 Mime-Version: 1.0 Message-Id: <98de4121-72c8-4a87-bf44-ab917f6c2055@www.fastmail.com> Date: Wed, 14 Aug 2019 20:22:38 +0300 From: "Eduard-Mihai Burtescu" To: gcc-patches@gcc.gnu.org Cc: ian@airs.com, iant@google.com Subject: [PATCH] Simplify and generalize rust-demangle's unescaping logic. Previously, rust-demangle.c was special-casing a fixed number of '$uXY$' escapes, but 'XY' can technically be any hex value, representing some Unicode codepoint. This patch adds more general support for '$u...$' escapes, similar to https://github.com/alexcrichton/rustc-demangle/pull/29, but only for the the ASCII subset. More complete Unicode support may come at a later time, but right now I want to keep it simple. Escapes that decode to ASCII control codes are considered invalid, as the Rust compiler should never emit them, and to avoid any undesirable effects from accidentally outputting a control code. Additionally, the switch statements, which had one case for each alphanumeric character, were replaced with if-else chains. Bootstrapped and tested on x86_64-unknown-linux-gnu. 2019-08-14 Eduard-Mihai Burtescu libiberty/ChangeLog: * rust-demangle.c (unescape): Remove. (parse_lower_hex_nibble): New function. (parse_legacy_escape): New function. (is_prefixed_hash): Use parse_lower_hex_nibble. (looks_like_rust): Use parse_legacy_escape. (rust_demangle_sym): Use parse_legacy_escape. * testsuite/rust-demangle-expected: Add 'llv$u6d$' test. diff --git a/libiberty/rust-demangle.c b/libiberty/rust-demangle.c index 2302db45b6f..da591902db1 100644 --- a/libiberty/rust-demangle.c +++ b/libiberty/rust-demangle.c @@ -50,7 +50,7 @@ extern void *memset(void *s, int c, size_t n); #include "rust-demangle.h" -/* Mangled Rust symbols look like this: +/* Mangled (legacy) Rust symbols look like this: _$LT$std..sys..fd..FileDesc$u20$as$u20$core..ops..Drop$GT$::drop::hc68340e1baa4987a The original symbol is: @@ -74,16 +74,7 @@ extern void *memset(void *s, int c, size_t n); ">" => $GT$ "(" => $LP$ ")" => $RP$ - " " => $u20$ - "\"" => $u22$ - "'" => $u27$ - "+" => $u2b$ - ";" => $u3b$ - "[" => $u5b$ - "]" => $u5d$ - "{" => $u7b$ - "}" => $u7d$ - "~" => $u7e$ + "\u{XY}" => $uXY$ A double ".." means "::" and a single "." means "-". @@ -95,7 +86,8 @@ static const size_t hash_len = 16; static int is_prefixed_hash (const char *start); static int looks_like_rust (const char *sym, size_t len); -static int unescape (const char **in, char **out, const char *seq, char value); +static int parse_lower_hex_nibble (char nibble); +static char parse_legacy_escape (const char **in); /* INPUT: sym: symbol that has been through C++ (gnu v3) demangling @@ -149,7 +141,7 @@ is_prefixed_hash (const char *str) const char *end; char seen[16]; size_t i; - int count; + int count, nibble; if (strncmp (str, hash_prefix, hash_prefix_len)) return 0; @@ -157,12 +149,12 @@ is_prefixed_hash (const char *str) memset (seen, 0, sizeof(seen)); for (end = str + hash_len; str < end; str++) - if (*str >= '0' && *str <= '9') - seen[*str - '0'] = 1; - else if (*str >= 'a' && *str <= 'f') - seen[*str - 'a' + 10] = 1; - else - return 0; + { + nibble = parse_lower_hex_nibble (*str); + if (nibble < 0) + return 0; + seen[nibble] = 1; + } /* Count how many distinct digits seen */ count = 0; @@ -179,57 +171,17 @@ looks_like_rust (const char *str, size_t len) const char *end = str + len; while (str < end) - switch (*str) - { - case '$': - if (!strncmp (str, "$C$", 3)) - str += 3; - else if (!strncmp (str, "$SP$", 4) - || !strncmp (str, "$BP$", 4) - || !strncmp (str, "$RF$", 4) - || !strncmp (str, "$LT$", 4) - || !strncmp (str, "$GT$", 4) - || !strncmp (str, "$LP$", 4) - || !strncmp (str, "$RP$", 4)) - str += 4; - else if (!strncmp (str, "$u20$", 5) - || !strncmp (str, "$u22$", 5) - || !strncmp (str, "$u27$", 5) - || !strncmp (str, "$u2b$", 5) - || !strncmp (str, "$u3b$", 5) - || !strncmp (str, "$u5b$", 5) - || !strncmp (str, "$u5d$", 5) - || !strncmp (str, "$u7b$", 5) - || !strncmp (str, "$u7d$", 5) - || !strncmp (str, "$u7e$", 5)) - str += 5; - else - return 0; - break; - case '.': - /* Do not allow three or more consecutive dots */ - if (!strncmp (str, "...", 3)) - return 0; - /* Fall through */ - case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': - case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': - case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': - case 's': case 't': case 'u': case 'v': case 'w': case 'x': - case 'y': case 'z': - case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': - case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': - case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': - case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': - case 'Y': case 'Z': - case '0': case '1': case '2': case '3': case '4': case '5': - case '6': case '7': case '8': case '9': - case '_': - case ':': - str++; - break; - default: - return 0; - } + { + if (*str == '$') + { + if (!parse_legacy_escape (&str)) + return 0; + } + else if (*str == '.' || *str == '_' || *str == ':' || ISALNUM (*str)) + str++; + else + return 0; + } return 1; } @@ -246,6 +198,7 @@ rust_demangle_sym (char *sym) const char *in; char *out; const char *end; + char unescaped; if (!sym) return; @@ -255,75 +208,49 @@ rust_demangle_sym (char *sym) end = sym + strlen (sym) - (hash_prefix_len + hash_len); while (in < end) - switch (*in) - { - case '$': - if (!(unescape (&in, &out, "$C$", ',') - || unescape (&in, &out, "$SP$", '@') - || unescape (&in, &out, "$BP$", '*') - || unescape (&in, &out, "$RF$", '&') - || unescape (&in, &out, "$LT$", '<') - || unescape (&in, &out, "$GT$", '>') - || unescape (&in, &out, "$LP$", '(') - || unescape (&in, &out, "$RP$", ')') - || unescape (&in, &out, "$u20$", ' ') - || unescape (&in, &out, "$u22$", '\"') - || unescape (&in, &out, "$u27$", '\'') - || unescape (&in, &out, "$u2b$", '+') - || unescape (&in, &out, "$u3b$", ';') - || unescape (&in, &out, "$u5b$", '[') - || unescape (&in, &out, "$u5d$", ']') - || unescape (&in, &out, "$u7b$", '{') - || unescape (&in, &out, "$u7d$", '}') - || unescape (&in, &out, "$u7e$", '~'))) { - /* unexpected escape sequence, not looks_like_rust. */ - goto fail; - } - break; - case '_': - /* If this is the start of a path component and the next - character is an escape sequence, ignore the underscore. The - mangler inserts an underscore to make sure the path - component begins with a XID_Start character. */ - if ((in == sym || in[-1] == ':') && in[1] == '$') - in++; - else - *out++ = *in++; - break; - case '.': - if (in[1] == '.') - { - /* ".." becomes "::" */ - *out++ = ':'; - *out++ = ':'; - in += 2; - } - else - { - /* "." becomes "-" */ - *out++ = '-'; - in++; - } - break; - case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': - case 'g': case 'h': case 'i': case 'j': case 'k': case 'l': - case 'm': case 'n': case 'o': case 'p': case 'q': case 'r': - case 's': case 't': case 'u': case 'v': case 'w': case 'x': - case 'y': case 'z': - case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': - case 'G': case 'H': case 'I': case 'J': case 'K': case 'L': - case 'M': case 'N': case 'O': case 'P': case 'Q': case 'R': - case 'S': case 'T': case 'U': case 'V': case 'W': case 'X': - case 'Y': case 'Z': - case '0': case '1': case '2': case '3': case '4': case '5': - case '6': case '7': case '8': case '9': - case ':': - *out++ = *in++; - break; - default: - /* unexpected character in symbol, not looks_like_rust. */ - goto fail; - } + { + if (*in == '$') + { + unescaped = parse_legacy_escape (&in); + if (unescaped) + *out++ = unescaped; + else + /* unexpected escape sequence, not looks_like_rust. */ + goto fail; + } + else if (*in == '_') + { + /* If this is the start of a path component and the next + character is an escape sequence, ignore the underscore. The + mangler inserts an underscore to make sure the path + component begins with a XID_Start character. */ + if ((in == sym || in[-1] == ':') && in[1] == '$') + in++; + else + *out++ = *in++; + } + else if (*in == '.') + { + if (in[1] == '.') + { + /* ".." becomes "::" */ + *out++ = ':'; + *out++ = ':'; + in += 2; + } + else + { + /* "." becomes "-" */ + *out++ = '-'; + in++; + } + } + else if (*in == ':' || ISALNUM (*in)) + *out++ = *in++; + else + /* unexpected character in symbol, not looks_like_rust. */ + goto fail; + } goto done; fail: @@ -332,18 +259,78 @@ done: *out = '\0'; } +/* Return a 0x0-0xf value if the char is 0-9a-f, and -1 otherwise. */ static int -unescape (const char **in, char **out, const char *seq, char value) +parse_lower_hex_nibble (char nibble) { - size_t len = strlen (seq); + if ('0' <= nibble && nibble <= '9') + return nibble - '0'; + if ('a' <= nibble && nibble <= 'f') + return 0xa + (nibble - 'a'); + return -1; +} - if (strncmp (*in, seq, len)) - return 0; +/* Return the unescaped character for a "$...$" escape, or 0 if invalid. */ +static char +parse_legacy_escape (const char **in) +{ + char c = 0; + const char *e; + size_t escape_len = 0; + int lo_nibble = -1, hi_nibble = -1; - **out = value; + if ((*in)[0] != '$') + return 0; - *in += len; - *out += 1; + e = *in + 1; + + if (e[0] == 'C') + { + escape_len = 1; + + c = ','; + } + else + { + escape_len = 2; + + if (e[0] == 'S' && e[1] == 'P') + c = '@'; + else if (e[0] == 'B' && e[1] == 'P') + c = '*'; + else if (e[0] == 'R' && e[1] == 'F') + c = '&'; + else if (e[0] == 'L' && e[1] == 'T') + c = '<'; + else if (e[0] == 'G' && e[1] == 'T') + c = '>'; + else if (e[0] == 'L' && e[1] == 'P') + c = '('; + else if (e[0] == 'R' && e[1] == 'P') + c = ')'; + else if (e[0] == 'u') + { + escape_len = 3; + + hi_nibble = parse_lower_hex_nibble (e[1]); + if (hi_nibble < 0) + return 0; + lo_nibble = parse_lower_hex_nibble (e[2]); + if (lo_nibble < 0) + return 0; + + /* Only allow non-control ASCII characters. */ + if (hi_nibble > 7) + return 0; + c = (hi_nibble << 4) | lo_nibble; + if (c < 0x20) + return 0; + } + } + + if (!c || e[escape_len] != '$') + return 0; - return 1; + *in += 2 + escape_len; + return c; } diff --git a/libiberty/testsuite/rust-demangle-expected b/libiberty/testsuite/rust-demangle-expected index 0b4288fc37d..c3b03f9f02d 100644 --- a/libiberty/testsuite/rust-demangle-expected +++ b/libiberty/testsuite/rust-demangle-expected @@ -159,3 +159,7 @@ _ZN68_$LT$core..nonzero..NonZero$LT$T$GT$$u20$as$u20$core..ops..Deref$GT$5deref1 --format=rust _ZN63_$LT$core..ptr..Unique$LT$T$GT$$u20$as$u20$core..ops..Deref$GT$5deref17h19f2ad4920655e85E as core::ops::Deref>::deref +# +--format=rust +_ZN11issue_609253foo37Foo$LT$issue_60925..llv$u6d$..Foo$GT$3foo17h059a991a004536adE +issue_60925::foo::Foo::foo