From patchwork Mon Oct 15 20:46:04 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Fran=C3=A7ois_Dumont?= X-Patchwork-Id: 984375 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-487601-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="niqzKzep"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ty+Jc/md"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 42Yr8D5w6Hz9s9J for ; Tue, 16 Oct 2018 07:46:36 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; q=dns; s= default; b=ODnmeG5BnOeS16kHx2s9ABS2GEU99eJfFEV70gPVkRY2oTapf6bOh QwO3hyWca0X9pfI/8NhfNPHgFsmJFHfuE78E+2PMMU12vlq2MA9HgPRQizy9Er7n xSK2If+q7GVmefMqt6N9+xFgt2OmwHAmNwkB+YYd9N4lpC+6xL50j8= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender:from :subject:to:message-id:date:mime-version:content-type; s= default; bh=393RN0l/dPgPQjogiizYG2fLSUM=; b=niqzKzep6EmkJbeMn4jT lmzyEGPDFjTsyiXP7weJUprP35SKshsf1BP6sx4l47eyFOiD6QiKssPwav2+kQFy i9NEgIYmwGahVHp7zwAfHr/czgkPLIUaIfvMaDTSi9z/QDFo0/kedVm9cIVuDvZY rB2j5TqqLoTWwPLCfxrdLnU= Received: (qmail 2992 invoked by alias); 15 Oct 2018 20:46:28 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 1533 invoked by uid 89); 15 Oct 2018 20:46:13 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.2 spammy=1u, non-empty, computing, wider X-HELO: mail-wm1-f42.google.com Received: from mail-wm1-f42.google.com (HELO mail-wm1-f42.google.com) (209.85.128.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 15 Oct 2018 20:46:08 +0000 Received: by mail-wm1-f42.google.com with SMTP id z25-v6so23848531wmf.1; Mon, 15 Oct 2018 13:46:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:message-id:date:user-agent:mime-version :content-language; bh=dbbUAyG3F44UG4zWhwwPYCOkR5Bpqm4Vi8m/q+1aKVs=; b=ty+Jc/mdMbUVhdFcyabcpgr8jR+IAE5xU6/pJq498tSU8v5Os78w7WgS8yutVizHDB wtgl/Ef8dy2cS8J5kKuUoNj+RC/rigxs905Vs3WcFIQUOlc3+si4EOmINah3CSxRym24 mskMBpacco4814fGBk9prOa7G1oo7Xf7liCU7YZ+68TIjUR26tKxPcgCRaZQdfYxmdup j3TMimdHrGpNxJL1y+CMfn3nXAzeFiZ7CHRd3Fs47v/j5ZEpx5x6xf2Pm5uTnLWIf5iq qy6w7QugJVEEk/QI2CDpuP92jFKVqGXKV8wZ7dXeNIEiGDEwFA/lTUruXcN3hILaoJ4J UzIw== Received: from [192.168.0.22] (arf62-1-82-237-250-248.fbx.proxad.net. [82.237.250.248]) by smtp.googlemail.com with ESMTPSA id t194-v6sm11920034wmd.48.2018.10.15.13.46.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Oct 2018 13:46:05 -0700 (PDT) From: =?utf-8?q?Fran=C3=A7ois_Dumont?= Subject: Hashtable Small size optimization To: "libstdc++@gcc.gnu.org" , gcc-patches Message-ID: Date: Mon, 15 Oct 2018 22:46:04 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 I started considering PR libstdc++/68303. First thing was to write a dedicated performance test case, it is the unordered_small_size.cc I'd like to add with this patch. The first runs show a major difference between tr1 and std implementations, tr1 being much better: std::tr1::unordered_set without hash code cached: 1st insert       9r    9u    1s  14725920mem    0pf std::tr1::unordered_set with hash code cached: 1st insert       8r    9u    0s  14719680mem    0pf std::unordered_set without hash code cached: 1st insert      17r   17u    0s  16640080mem    0pf std::unordered_set with hash code cached: 1st insert      14r   14u    0s  16638656mem    0pf I had a look in gdb to find out why and the answer was quite obvious. For 20 insertions tr1 implementation bucket count goes through [11, 23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash. As unordered containers are dedicated to rather important number of elements I propose to review the rehash policy with this patch so that std also starts at 11 on the 1st insertion. After the patch figures are: std::tr1::unordered_set without hash code cached: 1st insert       9r    9u    0s  14725920mem    0pf std::tr1::unordered_set with hash code cached: 1st insert       8r    8u    0s  14719680mem    0pf std::unordered_set without hash code cached: 1st insert      15r   15u    0s  16640128mem    0pf std::unordered_set with hash code cached: 1st insert      12r   12u    0s  16638688mem    0pf Moreover I noticed that performance tests are built with -O2, is that intentional ? The std implementation uses more abstractions than tr1, looks like building with -O3 optimizes away most of those abstractions making tr1 and std implementation much closer: std::tr1::unordered_set without hash code cached: 1st insert       2r    1u    1s  14725952mem    0pf std::tr1::unordered_set with hash code cached: 1st insert       2r    1u    0s  14719536mem    0pf std::unordered_set without hash code cached: 1st insert       2r    2u    0s  16640064mem    0pf std::unordered_set with hash code cached: 1st insert       2r    2u    0s  16638608mem    0pf Note that this patch also rework the alternative rehash policy based on powers of 2 so that it also starts with a larger number of bucket (16) and respects LWG2156. Last I had to wider the memory column so that alignment is preserved even when memory diff is negative. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index 66fbfbe5f21..70d3c5c8194 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -536,6 +536,13 @@ namespace __detail std::size_t _M_next_bkt(std::size_t __n) noexcept { + if (__n < 0x11) + { + _M_next_resize = + __builtin_floor(0x10 * (long double)_M_max_load_factor); + return 0x10; + } + const auto __max_width = std::min(sizeof(size_t), 8); const auto __max_bkt = size_t(1) << (__max_width * __CHAR_BIT__ - 1); std::size_t __res = __clp2(__n); @@ -553,7 +560,7 @@ namespace __detail _M_next_resize = std::size_t(-1); else _M_next_resize - = __builtin_ceil(__res * (long double)_M_max_load_factor); + = __builtin_floor(__res * (long double)_M_max_load_factor); return __res; } @@ -571,7 +578,7 @@ namespace __detail _M_need_rehash(std::size_t __n_bkt, std::size_t __n_elt, std::size_t __n_ins) noexcept { - if (__n_elt + __n_ins >= _M_next_resize) + if (__n_elt + __n_ins > _M_next_resize) { long double __min_bkts = (__n_elt + __n_ins) / (long double)_M_max_load_factor; diff --git a/libstdc++-v3/src/c++11/hashtable_c++0x.cc b/libstdc++-v3/src/c++11/hashtable_c++0x.cc index b9b11ff4385..6df9775b954 100644 --- a/libstdc++-v3/src/c++11/hashtable_c++0x.cc +++ b/libstdc++-v3/src/c++11/hashtable_c++0x.cc @@ -46,14 +46,11 @@ namespace __detail { // Optimize lookups involving the first elements of __prime_list. // (useful to speed-up, eg, constructors) - static const unsigned char __fast_bkt[] - = { 2, 2, 2, 3, 5, 5, 7, 7, 11, 11, 11, 11, 13, 13 }; - - if (__n < sizeof(__fast_bkt)) + if (__n < 12) { _M_next_resize = - __builtin_floor(__fast_bkt[__n] * (long double)_M_max_load_factor); - return __fast_bkt[__n]; + __builtin_floor(11 * (long double)_M_max_load_factor); + return 11; } // Number of primes (without sentinel). @@ -66,7 +63,7 @@ namespace __detail constexpr auto __last_prime = __prime_list + __n_primes - 1; const unsigned long* __next_bkt = - std::lower_bound(__prime_list + 6, __last_prime, __n); + std::lower_bound(__prime_list + 5, __last_prime, __n); if (__next_bkt == __last_prime) // Set next resize to the max value so that we never try to rehash again diff --git a/libstdc++-v3/testsuite/performance/23_containers/insert_erase/unordered_small_size.cc b/libstdc++-v3/testsuite/performance/23_containers/insert_erase/unordered_small_size.cc new file mode 100644 index 00000000000..a2f44c07082 --- /dev/null +++ b/libstdc++-v3/testsuite/performance/23_containers/insert_erase/unordered_small_size.cc @@ -0,0 +1,269 @@ +// { dg-do run { target c++11 } } + +// Copyright (C) 2018 Free Software Foundation, Inc. +// +// This file is part of the GNU ISO C++ Library. This library is free +// software; you can redistribute it and/or modify it under the +// terms of the GNU General Public License as published by the +// Free Software Foundation; either version 3, or (at your option) +// any later version. + +// This library is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU General Public License for more details. + +// You should have received a copy of the GNU General Public License along +// with this library; see the file COPYING3. If not see +// . + +#include +#include +#include +#include +#include +namespace +{ + const int nb = 20; + const int nb_insts = 10000; + + // Bench using an unordered_set. Hash functor for int is quite + // predictable so it helps bench very specific use cases. + template + void bench(const char* desc) + { + using namespace __gnu_test; + + time_counter time; + resource_counter resource; + + std::vector<_ContType> insts(nb_insts); + for (int j = 0; j != nb_insts; ++j) + insts.emplace_back(); + + start_counters(time, resource); + + for (auto& us : insts) + for (int i = 0; i != nb; ++i) + us.insert(i); + + stop_counters(time, resource); + + std::ostringstream ostr; + ostr << desc << " 1st insert"; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + // Here is the worst erase use case when hashtable implementation was + // something like vector>. Erasing from the end was very + // costly because we need to return the iterator following the erased + // one, as the hashtable is getting emptier at each step there are + // more and more empty bucket to loop through to reach the end of the + // container and find out that it was in fact the last element. + for (auto& us : insts) + for (int i = nb - 1; i >= 0; --i) + { + auto it = us.find(i); + if (it != us.end()) + us.erase(it); + } + + stop_counters(time, resource); + + ostr.str(""); + ostr << desc << " erase it "; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + // This is a worst insertion use case for the current implementation as + // we insert an element at the beginning of the hashtable and then we + // insert starting at the end so that each time we need to seek up to the + // first bucket to find the first non-empty one. + for (auto& us : insts) + { + us.insert(0); + for (int i = nb - 1; i >= 0; --i) + us.insert(i); + } + + stop_counters(time, resource); + ostr.str(""); + ostr << desc << " 2nd insert"; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + for (auto& us : insts) + for (int j = nb - 1; j >= 0; --j) + us.erase(j); + + stop_counters(time, resource); + ostr.str(""); + ostr << desc << " erase key "; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + } + + // Bench using unordered_set that show how important it is to cache + // hash code as computing string hash code is quite expensive compared to + // computing it for int. + template + void bench_str(const char* desc) + { + using namespace __gnu_test; + + time_counter time; + resource_counter resource; + + // First generate once strings that are going to be used throughout the + // bench: + std::ostringstream ostr; + std::vector strs; + { + strs.reserve(nb); + for (int i = 0; i != nb; ++i) + { + ostr.str(""); + ostr << "string #" << i; + strs.push_back(ostr.str()); + } + } + + std::vector<_ContType> insts(nb_insts); + for (int j = 0; j != nb_insts; ++j) + insts.emplace_back(); + + start_counters(time, resource); + + for (auto& us : insts) + for (int i = 0; i != nb; ++i) + us.insert(strs[i]); + + stop_counters(time, resource); + + ostr.str(""); + ostr << desc << " 1st insert"; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + for (auto& us : insts) + for (int j = nb - 1; j >= 0; --j) + { + auto it = us.find(strs[j]); + if (it != us.end()) + us.erase(it); + } + + stop_counters(time, resource); + + ostr.str(""); + ostr << desc << " erase iter"; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + for (auto& us : insts) + { + us.insert(strs[0]); + for (int i = nb - 1; i >= 0; --i) + us.insert(strs[i]); + } + + stop_counters(time, resource); + ostr.str(""); + ostr << desc << " 2nd insert"; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + + start_counters(time, resource); + + for (auto& us : insts) + for (int j = nb - 1; j >= 0; --j) + us.erase(strs[j]); + + stop_counters(time, resource); + ostr.str(""); + ostr << desc << " erase key "; + report_performance(__FILE__, ostr.str().c_str(), time, resource); + } +} + +template + using __uset = + std::__uset_hashtable, std::equal_to, + std::allocator, + std::__uset_traits>; + +template + using __tr1_uset = + std::tr1::__unordered_set, std::equal_to, + std::allocator, + cache>; + +template + using __uset2 = + std::_Hashtable, + std::__detail::_Identity, + std::equal_to, std::hash, + std::__detail::_Mask_range_hashing, + std::__detail::_Default_ranged_hash, + std::__detail::_Power2_rehash_policy, + std::__uset_traits>; + +template + using __str_uset = + std::__uset_hashtable, + std::equal_to, + std::allocator, + std::__uset_traits>; + +template + using __tr1_str_uset = + std::tr1::__unordered_set, + std::equal_to, + std::allocator, + cache>; + +template + using __str_uset2 = + std::_Hashtable, + std::__detail::_Identity, + std::equal_to, std::hash, + std::__detail::_Mask_range_hashing, + std::__detail::_Default_ranged_hash, + std::__detail::_Power2_rehash_policy, + std::__uset_traits>; + +int main() +{ + bench<__tr1_uset>( + "std::tr1::unordered_set without hash code cached: "); + bench<__tr1_uset>( + "std::tr1::unordered_set with hash code cached: "); + bench<__uset>( + "std::unordered_set without hash code cached: "); + bench<__uset>( + "std::unordered_set with hash code cached: "); + bench>( + "std::unordered_set default cache: "); + bench<__uset2>( + "std::unordered_set2 without hash code cached: "); + bench<__uset2>( + "std::unordered_set2 with hash code cached: "); + bench_str<__tr1_str_uset>( + "std::tr1::unordered_set without hash code cached:"); + bench_str<__tr1_str_uset>( + "std::tr1::unordered_set with hash code cached: "); + bench_str<__str_uset>( + "std::unordered_set without hash code cached: "); + bench_str<__str_uset>( + "std::unordered_set with hash code cached: "); + bench_str>( + "std::unordered_set default cache: "); + bench_str<__str_uset2>( + "std::unordered_set2 without hash code cached: "); + bench_str<__str_uset2>( + "std::unordered_set2 with hash code cached: "); + return 0; +} diff --git a/libstdc++-v3/testsuite/util/testsuite_performance.h b/libstdc++-v3/testsuite/util/testsuite_performance.h index 3dfd471221d..7ebf2d5b849 100644 --- a/libstdc++-v3/testsuite/util/testsuite_performance.h +++ b/libstdc++-v3/testsuite/util/testsuite_performance.h @@ -224,7 +224,7 @@ namespace __gnu_test out << std::setw(4) << t.real_time() << "r" << space; out << std::setw(4) << t.user_time() << "u" << space; out << std::setw(4) << t.system_time() << "s" << space; - out << std::setw(8) << r.allocated_memory() << "mem" << space; + out << std::setw(9) << r.allocated_memory() << "mem" << space; out << std::setw(4) << r.hard_page_fault() << "pf" << space; out << std::endl;