From patchwork Tue May 10 23:30:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noah Goldstein X-Patchwork-Id: 1629384 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=eY4TuS05; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4KyZBW6Rvsz9sG6 for ; Wed, 11 May 2022 09:34:27 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id BB5193856258 for ; Tue, 10 May 2022 23:34:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org BB5193856258 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1652225665; bh=GS1gok3gbtFzh4ET1bCn/qAZ7EOQZUVVfC6ZJNOyZao=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=eY4TuS05FttsnHeMmMrR0PI44GyTdTAaQewvT3mgzXQ46XTXtJVewbqEJYV5aIc7O 95qK8GhS73lA9KbIY0VhI57kZoPNZB1hmn019VEBIJdMrwyjV7nLZi/UVnd6xsHUov xI0tfRVpyxz7wTgC5QwDA/f2PE2LA7sRn+iJ8ahQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-io1-xd36.google.com (mail-io1-xd36.google.com [IPv6:2607:f8b0:4864:20::d36]) by sourceware.org (Postfix) with ESMTPS id 637AA385735E for ; Tue, 10 May 2022 23:30:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 637AA385735E Received: by mail-io1-xd36.google.com with SMTP id r27so421734iot.1 for ; Tue, 10 May 2022 16:30:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GS1gok3gbtFzh4ET1bCn/qAZ7EOQZUVVfC6ZJNOyZao=; b=bPA9WVQJgFLUmlN1wgT53B3rSDoccZTD3UPAC+R7QTrZZNmO8DhcOLTUhlX9L6/R09 p0qaBdT4r8DkQRgpvGaOo/y9PLbfH+4MH+H+eEQSjMaDHgBsAKxv9rrQ3+/9oXuua7zQ 3C+N9rr18ZUIJZuolWnfku5o6VqaW9+4kcDA16xWWdtk+xgZLIbsfqF85lC5BKrQhI2h BL5m5hIyZyDH2sC5mYVRKGAUNyFjLViQo96cdHsjaCJk0cTPgpr1FvzaOddum40WZSTY 71P3jjejE+lnFBbtd6jmQfc5qEWBasShT1IdabDZ4ojMsHRDcxTCI5L+d66dA+ay8HqU UJqQ== X-Gm-Message-State: AOAM5302YWlLsc38wWqZ4zkN9kjSfNf++/Qxm717VrDNpFHU9Qi9yWvX nMWxiVc75ijKqSVmUpirTIVJuu19feI= X-Google-Smtp-Source: ABdhPJwF3g321c0L62zr2xK4epfa+eK1fBsrPSOOkPqn27SbfahlvQ+RqEKuSbAEO2k3nC7GwBcWyg== X-Received: by 2002:a6b:fd0a:0:b0:657:d01f:87a5 with SMTP id c10-20020a6bfd0a000000b00657d01f87a5mr9922262ioi.201.1652225440348; Tue, 10 May 2022 16:30:40 -0700 (PDT) Received: from noah-tgl.. (node-17-161.flex.volo.net. [76.191.17.161]) by smtp.gmail.com with ESMTPSA id x6-20020a056638026600b0032b3a7817bcsm159972jaq.128.2022.05.10.16.30.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 May 2022 16:30:39 -0700 (PDT) To: libc-alpha@sourceware.org Subject: [PATCH v7 6/6] elf: Optimize _dl_new_hash in dl-new-hash.h Date: Tue, 10 May 2022 18:30:29 -0500 Message-Id: <20220510233029.637071-6-goldstein.w.n@gmail.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220510233029.637071-1-goldstein.w.n@gmail.com> References: <20220414041231.926415-1-goldstein.w.n@gmail.com> <20220510233029.637071-1-goldstein.w.n@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Noah Goldstein via Libc-alpha From: Noah Goldstein Reply-To: Noah Goldstein Cc: Alexander Monakov Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. Note the unrolling allows for pipelined multiplies which helps a bit, but most of the gain is from enforcing better instruction scheduling for more ILP. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov --- elf/dl-new-hash.h | 66 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 61 insertions(+), 5 deletions(-) diff --git a/elf/dl-new-hash.h b/elf/dl-new-hash.h index 40d88c81f9..c6d96a0452 100644 --- a/elf/dl-new-hash.h +++ b/elf/dl-new-hash.h @@ -20,15 +20,71 @@ #define _DL_NEW_HASH_H 1 #include +/* For __glibc_unlikely. */ +#include + +/* The simplest implementation of _dl_new_hash is: + +_dl_new_hash (const char *s) +{ + uint32_t h = 5381; + for (unsigned char c = *s; c != '\0'; c = *++s) + h = h * 33 + c; + return h; +} + + We can get better performance, however, by slightly unrolling the + loop and explicitly specifying order of operations to prevent + reassosiation of instructions and ensure ideal scheduling. */ static inline uint32_t __attribute__ ((unused)) -_dl_new_hash (const char *s) +_dl_new_hash (const char *str) { - uint32_t h = 5381; - for (unsigned char c = *s; c != '\0'; c = *++s) - h = h * 33 + c; - return h; + const unsigned char *s = (const unsigned char *) str; + unsigned int h = 5381; + unsigned int c0, c1; + for (;;) + { + c0 = (unsigned int) *s; + /* Unlikely length zero string so evens will be slightly less + common. */ + if (__glibc_unlikely (c0 == 0)) + return h; + + c1 = (unsigned int) *(s + 1); + if (c1 == 0) + { + c0 += h; + /* Ideal instruction scheduling is: + c0 += h; + h *= 32; + h += c0; + + The asm statements are to prevent reassosiation that would result in + more instruction interdependencies and worse scheduling. */ + asm("" : "+r"(h) : "r"(c0)); + h = h * 32 + c0; + return h; + } + + /* Ideal instruction scheduling is: + c1 += c0; + h *= 33 * 33; + c0 *= 32; + c1 += c0; + h += c1; + + The asm statements are to prevent reassosiation that would result in + more instruction interdependencies and worse scheduling. */ + c1 += c0; + asm("" : "+r"(c1), "+r"(c0)); + h *= 33 * 33; + c1 += c0 * 32; + asm("" : "+r"(c1)); + h += c1; + s += 2; + } }