From patchwork Fri Aug 30 14:27:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Antony Polukhin X-Patchwork-Id: 1155946 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gcc.gnu.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=gcc-patches-return-508037-incoming=patchwork.ozlabs.org@gcc.gnu.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=gcc.gnu.org header.i=@gcc.gnu.org header.b="ARJslEqv"; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Mp3BktWv"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 46Khds4NzJz9s5b for ; Sat, 31 Aug 2019 00:27:44 +1000 (AEST) DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=QrenJI3vHPUjisrA5kO613Tqi+LJ8ZzDiSXmTTj13HY8N+ w/DC+GUYT3zIxUBHHFNR7LmUkUJW8q3c5JZLr4F4/A13FLd7YXmAt8OtT3S206cQ 06NyHfom1jDYPDckijO5SUc6gyodYHuYoU3hy61cei6zZnVaacJv3sPe7qemg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; s= default; bh=tW+IXzORDvtyNwm02iDVsrW6WvA=; b=ARJslEqvw+I7cVOUEZx1 2j0KccQGcEo7vxzI9U1TkG4rNLGn8dkphC1eElTHuzDgXe6WiMpki5J7mQ0aRJW3 AJgqex4Qlcqs9oRxKv7UUYMtynMgxkJLceW0v79vglMVz46zxEtKJ4/jtlWfLj6m grqIneUAqEiICG+IawQOQd0= Received: (qmail 72587 invoked by alias); 30 Aug 2019 14:27:38 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Delivered-To: mailing list gcc-patches@gcc.gnu.org Received: (qmail 72573 invoked by uid 89); 30 Aug 2019 14:27:38 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-15.6 required=5.0 tests=AWL, BAYES_00, FREEMAIL_FROM, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, SPF_PASS autolearn=ham version=3.3.1 spammy=micro, Leading, 010, locality X-HELO: mail-qk1-f169.google.com Received: from mail-qk1-f169.google.com (HELO mail-qk1-f169.google.com) (209.85.222.169) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 30 Aug 2019 14:27:35 +0000 Received: by mail-qk1-f169.google.com with SMTP id m2so6250340qkd.10; Fri, 30 Aug 2019 07:27:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=6JC8iXcHX6iUIqZMHoOjPQe7INiHWfdGPtS9nTZxP2Q=; b=Mp3BktWvKgr/NgGMZxYnWBSQTy1FcZFTw0fLOTmcFAbKxK3ArUQ25fTBVRVWUCLYA4 oNsPKdFNUcVd033deiUoZ7ThL6/JL8mjOE03u+A31vKmhYkP5V9kzIeixP9YVgzrrlwQ fYr4J9p3zhv+eLTS824EBo3oEmgEIPI1u8Wo4sjqtypK5ibH1/s1Q62l/xt0MOg4RZ/V 2tsN1/yZPuL8pQBcS8U3f7Lbo5W1do7oI3sasSGtcEoLRLUoz2oyT4FQdFjRs42c/Vm4 oSfjtoO9ZH8KMDWmak2PGiEvSk0JdL/rF1UWCHQGOWz+GN6c2YrydPENCwyFuZhRdkhl pU5g== MIME-Version: 1.0 From: Antony Polukhin Date: Fri, 30 Aug 2019 17:27:22 +0300 Message-ID: Subject: [PATCH] Optimize to_chars To: "libstdc++" , gcc-patches List Bunch of micro optimizations for std::to_chars: * For base == 8 replacing the lookup in __digits table with arithmetic computations leads to a same CPU cycles for a loop (exchanges two movzx with 3 bit ops https://godbolt.org/z/RTui7m ). However this saves 129 bytes of data and totally avoids a chance of cache misses on __digits. * For base == 16 replacing the lookup in __digits table with arithmetic computations leads to a few additional instructions, but totally avoids a chance of cache misses on __digits (- ~9 cache misses for worst case) and saves 513 bytes of const data. * Replacing __first[pos] and __first[pos - 1] with __first[1] and __first[0] on final iterations saves ~2% of code size. * Removing trailing '\0' from arrays of digits allows the linker to merge the symbols (so that "0123456789abcdefghijklmnopqrstuvwxyz" and "0123456789abcdef" could share the same address). This improves data locality and reduces binary sizes. * Using __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len makes the operation O(1) instead of O(N). It also makes the code two times shorter ( https://godbolt.org/z/Peq_PG) . In sum: this significantly reduces the size of a binary (for about 4KBs only for base-8 conversion https://godbolt.org/z/WPKijS ), deals with latency (CPU cache misses) without changing the iterations count and without adding costly instructions into the loops. Changelog: * include/std/charconv (__detail::__to_chars_8, __detail::__to_chars_16): Replace array of precomputed digits with arithmetic operations to avoid CPU cache misses. Remove zero termination from array of digits to allow symbol merge with generic implementation of __detail::__to_chars. Replace final offsets with constants. Use __detail::__to_chars_len_2 instead of a generic __detail::__to_chars_len. * include/std/charconv (__detail::__to_chars): Remove zero termination from array of digits. * include/std/charconv (__detail::__to_chars_2): Leading digit is always '1'. diff --git a/libstdc++-v3/ChangeLog b/libstdc++-v3/ChangeLog index eaa6f74..35706d0 100644 --- a/libstdc++-v3/ChangeLog +++ b/libstdc++-v3/ChangeLog @@ -1,3 +1,17 @@ +2019-08-30 Antony Polukhin + + * include/std/charconv (__detail::__to_chars_8, + __detail::__to_chars_16): Replace array of precomputed digits + with arithmetic operations to avoid CPU cache misses. Remove + zero termination from array of digits to allow symbol merge with + generic implementation of __detail::__to_chars. Replace final + offsets with constants. Use __detail::__to_chars_len_2 instead + of a generic __detail::__to_chars_len. + * include/std/charconv (__detail::__to_chars): Remove + zero termination from array of digits. + * include/std/charconv (__detail::__to_chars_2): Leading digit + is always '1'. + 2019-08-29 Jonathan Wakely PR libstdc++/91067 diff --git a/libstdc++-v3/include/std/charconv b/libstdc++-v3/include/std/charconv index 53aa63e..4e94c39 100644 --- a/libstdc++-v3/include/std/charconv +++ b/libstdc++-v3/include/std/charconv @@ -131,7 +131,7 @@ namespace __detail : 1u; } else - return __to_chars_len(__value, 8); + return (__to_chars_len_2(__value) + 2) / 3; } // Generic implementation for arbitrary bases. @@ -155,8 +155,12 @@ namespace __detail unsigned __pos = __len - 1; - static constexpr char __digits[] - = "0123456789abcdefghijklmnopqrstuvwxyz"; + static constexpr char __digits[] = { + '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', + 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', + 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', + 'u', 'v', 'w', 'x', 'y', 'z' + }; while (__val >= __base) { @@ -181,7 +185,7 @@ namespace __detail to_chars_result __res; - const unsigned __len = __to_chars_len(__val, 0x10); + const unsigned __len = (__to_chars_len_2(__val) + 3) / 4; if (__builtin_expect((__last - __first) < __len, 0)) { @@ -190,32 +194,30 @@ namespace __detail return __res; } - static constexpr char __digits[513] = - "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f" - "202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f" - "404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f" - "606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f" - "808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9f" - "a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf" - "c0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf" - "e0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff"; + static constexpr char __digits[] = { + '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', + 'a', 'b', 'c', 'd', 'e', 'f' + }; unsigned __pos = __len - 1; while (__val >= 0x100) { - auto const __num = (__val % 0x100) * 2; - __val /= 0x100; - __first[__pos] = __digits[__num + 1]; + auto __num = __val & 0xF; + __val >>= 4; + __first[__pos] = __digits[__num]; + __num = __val & 0xF; + __val >>= 4; __first[__pos - 1] = __digits[__num]; __pos -= 2; } if (__val >= 0x10) { - auto const __num = __val * 2; - __first[__pos] = __digits[__num + 1]; - __first[__pos - 1] = __digits[__num]; + const auto __num = __val & 0xF; + __val >>= 4; + __first[1] = __digits[__num]; + __first[0] = __digits[__val]; } else - __first[__pos] = "0123456789abcdef"[__val]; + __first[0] = __digits[__val]; __res.ptr = __first + __len; __res.ec = {}; return __res; @@ -263,28 +265,26 @@ namespace __detail return __res; } - static constexpr char __digits[129] = - "00010203040506071011121314151617" - "20212223242526273031323334353637" - "40414243444546475051525354555657" - "60616263646566677071727374757677"; unsigned __pos = __len - 1; while (__val >= 0100) { - auto const __num = (__val % 0100) * 2; - __val /= 0100; - __first[__pos] = __digits[__num + 1]; - __first[__pos - 1] = __digits[__num]; + auto __num = __val & 7; + __val >>= 3; + __first[__pos] = '0' + __num; + __num = __val & 7; + __val >>= 3; + __first[__pos - 1] = '0' + __num; __pos -= 2; } if (__val >= 010) { - auto const __num = __val * 2; - __first[__pos] = __digits[__num + 1]; - __first[__pos - 1] = __digits[__num]; + auto const __num = __val & 7; + __val >>= 3; + __first[1] = '0' + __num; + __first[0] = '0' + __val; } else - __first[__pos] = '0' + __val; + __first[0] = '0' + __val; __res.ptr = __first + __len; __res.ec = {}; return __res; @@ -315,7 +315,10 @@ namespace __detail __first[__pos--] = '0' + (__val & 1); __val >>= 1; } - *__first = '0' + (__val & 1); + // First digit is always '1' because __to_chars_len_2 skips + // leading zero bits and std::to_chars handles zero values + // directly. + __first[0] = '1'; __res.ptr = __first + __len; __res.ec = {};