Go patch committed: Use SHA1 hash for long gcbits symbols

Message ID	CAOyqgcVdeiEcqhkuNvHUDG7EFxP6wuzNFe3vRgtKpcyQJzQ6iw@mail.gmail.com
State	New
Headers	show Return-Path: <gcc-patches-return-501023-incoming=patchwork.ozlabs.org@gcc.gnu.org> DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=ZZ1bn/qpE+4Ik8fw2+HlC4GF9gNXd0SGW9KWfPrOFmqDHZ NCtUu4du+ZpCij0OTNpUhAk5Q0Ul3AXvvJY5b+bBn9m0/KZiO/fNnX6Vr5GoVxxv fIupQKzM1jxkApEtQY6P9I0NWvabi/GdARxiMpnMYGvtBvz3LbWAKe8TloHBs= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 From: Ian Lance Taylor <iant@golang.org> Date: Fri, 17 May 2019 06:48:32 -0700 Message-ID: <CAOyqgcVdeiEcqhkuNvHUDG7EFxP6wuzNFe3vRgtKpcyQJzQ6iw@mail.gmail.com> Subject: Go patch committed: Use SHA1 hash for long gcbits symbols To: gcc-patches <gcc-patches@gcc.gnu.org>, gofrontend-dev <gofrontend-dev@googlegroups.com> Content-Type: multipart/mixed; boundary="00000000000014af91058915a457"
Series	Go patch committed: Use SHA1 hash for long gcbits symbols \| expand Go patch committed: Use SHA1 hash for long gcbits symbols

Message ID

CAOyqgcVdeiEcqhkuNvHUDG7EFxP6wuzNFe3vRgtKpcyQJzQ6iw@mail.gmail.com

State

New

Headers

DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender
	:mime-version:from:date:message-id:subject:to:content-type; q=
	dns; s=default; b=ZZ1bn/qpE+4Ik8fw2+HlC4GF9gNXd0SGW9KWfPrOFmqDHZ
	NCtUu4du+ZpCij0OTNpUhAk5Q0Ul3AXvvJY5b+bBn9m0/KZiO/fNnX6Vr5GoVxxv
	fIupQKzM1jxkApEtQY6P9I0NWvabi/GdARxiMpnMYGvtBvz3LbWAKe8TloHBs=
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
Sender: gcc-patches-owner@gcc.gnu.org
MIME-Version: 1.0
From: Ian Lance Taylor <iant@golang.org>
Date: Fri, 17 May 2019 06:48:32 -0700
Message-ID: <CAOyqgcVdeiEcqhkuNvHUDG7EFxP6wuzNFe3vRgtKpcyQJzQ6iw@mail.gmail.com>
Subject: Go patch committed: Use SHA1 hash for long gcbits symbols
To: gcc-patches <gcc-patches@gcc.gnu.org>,
	gofrontend-dev <gofrontend-dev@googlegroups.com>
Content-Type: multipart/mixed; boundary="00000000000014af91058915a457"

Series

Go patch committed: Use SHA1 hash for long gcbits symbols | expand

Commit Message

Ian Lance Taylor May 17, 2019, 1:48 p.m. UTC

This patch to the Go frontend by Than McIntosh uses a SHA1-hash for
the symbol name for long gcbits symbols.  The current scheme used by
the compiler for "gcbits" symbols involves generating a symbol name
based on a 32-char encoding of the bits data.  This scheme works well
in most cases but can generate very long symbol names in rare cases.
To help avoid such long symbol names, switch to a different encoding
scheme based on the SHA1 digest of the payload if the symbol size
would be too large.  This fixes https://golang.org/issue/32083.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian

Index: gcc/go/gofrontend/MERGE
===================================================================
--- gcc/go/gofrontend/MERGE	(revision 271310)
+++ gcc/go/gofrontend/MERGE	(working copy)
@@ -1,4 +1,4 @@ 
-b5ab7b419d6328f5126ba8d6795280129eaf6e79
+54aacecc8167bfba8420cb7b245787ff80bde61b
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===================================================================
--- gcc/go/gofrontend/types.cc	(revision 271310)
+++ gcc/go/gofrontend/types.cc	(working copy)
@@ -12,6 +12,7 @@ 
 #include "gogo.h"
 #include "go-diagnostics.h"
 #include "go-encode-id.h"
+#include "go-sha1.h"
 #include "operator.h"
 #include "expressions.h"
 #include "statements.h"
@@ -2776,22 +2777,43 @@  Ptrmask::set_from(Gogo* gogo, Type* type
     }
 }
 
-// Return a symbol name for this ptrmask.  This is used to coalesce
-// identical ptrmasks, which are common.  The symbol name must use
-// only characters that are valid in symbols.  It's nice if it's
-// short.  We convert it to a string that uses only 32 characters,
-// avoiding digits and u and U.
-
+// Return a symbol name for this ptrmask. This is used to coalesce identical
+// ptrmasks, which are common. The symbol name must use only characters that are
+// valid in symbols. It's nice if it's short. For smaller ptrmasks, we convert
+// it to a string that uses only 32 characters, avoiding digits and u and U. For
+// longer pointer masks, apply the same process to the SHA1 digest of the bits,
+// so as to avoid pathologically long symbol names (see related Go issues #32083
+// and #11583 for more on this). To avoid collisions between the two encoding
+// schemes, use a prefix ("X") for the SHA form to disambiguate.
 std::string
 Ptrmask::symname() const
 {
+  const std::vector<unsigned char>* bits(&this->bits_);
+  std::vector<unsigned char> shabits;
+  std::string prefix;
+
+  if (this->bits_.size() > 128)
+    {
+      // Produce a SHA1 digest of the data.
+      Go_sha1_helper* sha1_helper = go_create_sha1_helper();
+      sha1_helper->process_bytes(&this->bits_[0], this->bits_.size());
+      std::string digest = sha1_helper->finish();
+      delete sha1_helper;
+
+      // Redirect the bits vector to the digest, and update the prefix.
+      prefix = "X";
+      for (char c : digest)
+        shabits.push_back((unsigned char) c);
+      bits = &shabits;
+    }
+
   const char chars[33] = "abcdefghijklmnopqrstvwxyzABCDEFG";
   go_assert(chars[32] == '\0');
-  std::string ret;
+  std::string ret(prefix);
   unsigned int b = 0;
   int remaining = 0;
-  for (std::vector<unsigned char>::const_iterator p = this->bits_.begin();
-       p != this->bits_.end();
+  for (std::vector<unsigned char>::const_iterator p = bits->begin();
+       p != bits->end();
        ++p)
     {
       b |= *p << remaining;

Go patch committed: Use SHA1 hash for long gcbits symbols

Commit Message

Patch