From patchwork Mon May 9 17:17:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1628743 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=AHXqbkHA; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KxnzH3x33z9sGJ for ; Tue, 10 May 2022 03:22:03 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4A2C63857362 for ; Mon, 9 May 2022 17:22:01 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4A2C63857362 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652116921; bh=YLd15L5JZOlcw/RzhfsxJ/xKBz/Ern40Lubw0BHZfWw=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=AHXqbkHA8oN/vb6HlIosNLXOw1YH4lNYjOOqXNp2t92RGn3BuZ9F4BgAvhaS+K04n +BBPXfH7BGNm5A48OvFoSyTLXaUK6JWHjOZeKuE/kBAC+O/XfIwmPOFAnY77M0XFue W/TSSvNXIdXJX20qikdKBPy+fLymOz2VrGJF1aL4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd2d.google.com (mail-io1-xd2d.google.com [IPv6:2607:f8b0:4864:20::d2d]) by sourceware.org (Postfix) with ESMTPS id A90423856DE5 for ; Mon, 9 May 2022 17:18:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org A90423856DE5 Received: by mail-io1-xd2d.google.com with SMTP id f2so16048251ioh.7 for ; Mon, 09 May 2022 10:18:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=YLd15L5JZOlcw/RzhfsxJ/xKBz/Ern40Lubw0BHZfWw=; b=65vfAZepscnB2TV6UuHpdrnGZ5MSmkJ6GiQ+/jLbemWCQx3ZwDW9Mm6ARo5y6zW+wS u5odhhkPTrtrgru4idn4/MkDXUlxiewvNCcBiiXZEAzjV6DPYmfeUGinQ4QwrdpGduIo D+7RbhRJ8WvwCzs2Errdu9ey6bJhK2b978vB1uA6+WXmaVAy5jCEJy6PQC1Y7ik89Edj Xu9W12y9P4EoGkX+MNsdkNEfC//urgK6kAZldALvnjJUVQKlV6UBG4rJSfmTnTM9WXWg ZYpywa7A/E0jH5OTeTf4Hq0dSAvJdzhZlFpsjyxOZMVyY7MBmElwKn4LuXZ+newcsOwb /C8w== X-Gm-Message-State: AOAM533arJvWt8EkGut6UboMul6aGopzOrvQy4EsTrjPqAglVGCeag96 B/4F97I+n8XvHPGIISfwCrnhhU5Xn3o= X-Google-Smtp-Source: ABdhPJygtow4SVZiGZ5+Zwn2q3VQFRkI564qsxpq5CRQUL3ajr7aVF2eh7z24ISdgSUuqUVHIgdaKw== X-Received: by 2002:a05:6638:2493:b0:32b:ef71:454d with SMTP id x19-20020a056638249300b0032bef71454dmr3973167jat.250.1652116683831; Mon, 09 May 2022 10:18:03 -0700 (PDT) Received: from noah-tgl.. ([2600:1008:b051:8883:74c9:7302:c07a:dbde]) by smtp.gmail.com with ESMTPSA id r6-20020a056638044600b0032b3a781756sm3734491jap.26.2022.05.09.10.18.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 May 2022 10:18:03 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v5 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Date: Mon, 9 May 2022 12:17:47 -0500 Message-Id: <20220509171747.4153703-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220509171747.4153703-1-goldstein.w.n@gmail.com> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220509171747.4153703-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Cc: Alexander Monakov Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. Note the unrolling allows for pipelined multiplies which helps a bit, but most of the gain is from enforcing better instruction scheduling for more ILP. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=25 runs Geometric of all benchmark New / Old: 0.791 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 0.641, 0.658, 0.974 fixed, 1, 1.888, 1.883, 1.003 fixed, 2, 2.712, 2.833, 0.957 fixed, 3, 3.314, 3.739, 0.886 fixed, 4, 4.316, 4.866, 0.887 fixed, 5, 5.16, 5.966, 0.865 fixed, 6, 5.986, 7.241, 0.827 fixed, 7, 7.264, 8.435, 0.861 fixed, 8, 8.052, 9.846, 0.818 fixed, 9, 9.369, 11.316, 0.828 fixed, 10, 10.256, 12.925, 0.794 fixed, 11, 12.191, 14.546, 0.838 fixed, 12, 12.667, 15.92, 0.796 fixed, 13, 14.442, 17.465, 0.827 fixed, 14, 14.808, 18.981, 0.78 fixed, 15, 16.244, 20.565, 0.79 fixed, 16, 17.166, 22.044, 0.779 fixed, 32, 35.447, 50.558, 0.701 fixed, 64, 86.479, 134.529, 0.643 fixed, 128, 155.453, 287.527, 0.541 fixed, 256, 302.57, 593.64, 0.51 random, 2, 11.168, 10.61, 1.053 random, 4, 13.308, 13.53, 0.984 random, 8, 16.579, 19.437, 0.853 random, 16, 21.292, 24.776, 0.859 random, 32, 30.56, 35.906, 0.851 random, 64, 49.249, 68.577, 0.718 random, 128, 81.845, 140.664, 0.582 random, 256, 152.517, 292.204, 0.522 Co-authored-by: Alexander Monakov --- elf/dl-new-hash.h | 50 ++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h index 40d88c81f9..70891d374c 100644 --- a/elf/dl-new-hash.h +++ b/elf/dl-new-hash.h @@ -20,15 +20,55 @@ #define _DL_NEW_HASH_H 1 #include +/* For __glibc_unlikely. */ +#include static inline uint32_t __attribute__ ((unused)) -_dl_new_hash (const char *s) +_dl_new_hash (const char *signed_s) { - uint32_t h = 5381; - for (unsigned char c = *s; c != '\0'; c = *++s) - h = h * 33 + c; - return h; + const unsigned char *s = signed_s; + unsigned int h = 5381; + unsigned int c0, c1; + for (;;) + { + c0 = (unsigned int) *s; + /* Unlikely length zero string so evens will be slightly less + common. */ + if (__glibc_unlikely (c0 == 0)) + { + return h; + } + + c1 = (unsigned int) *(s + 1); + if (c1 == 0) + { + c0 += h; + /* Ideal instruction scheduling is: + c0 += h; + h *= 32; + h += c0; + The asm statement ensures the compiler can't mess that up. */ + asm("" : "+r"(h) : "r"(c0)); + h = h * 32 + c0; + return h; + } + + /* Ideal instruction scheduling is: + c1 += c0; + h *= 33 * 33; + c0 *= 32; + c1 += c0; + h += c1; + The asm statements ensures the compiler can't mess that up. */ + c1 += c0; + asm("" : "+r"(c1), "+r"(c0)); + h *= 33 * 33; + c1 += c0 * 32; + asm("" : "+r"(c1)); + h += c1; + s += 2; + } }