From patchwork Wed Feb 1 17:03:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1735786 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=ZXIn3mHo; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4P6Szk3gj6z23gY for ; Thu, 2 Feb 2023 04:08:18 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 58026394D821 for ; Wed, 1 Feb 2023 17:08:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 58026394D821 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1675271296; bh=gK9GrUM3bslUVmtrS3stVe3xSJxMT8zT8T6E+gQDgh4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=ZXIn3mHoSYsCz5kmx4rlR9tEOmuLKeSEkynF2FiCEaIqWygWF/Wv8Z1KLw38lfbtz 6BPiWwe/5UmfT2cRrnsswgW2e0daoY4LpFYD1wJdqZQjYBeC3Sj76cSflUyFmrNgtj s4tgL90lMsmZWpflbtgLNyKi6IteFZF4WQY4Uda4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x30.google.com (mail-oa1-x30.google.com [IPv6:2001:4860:4864:20::30]) by sourceware.org (Postfix) with ESMTPS id 292933857400 for ; Wed, 1 Feb 2023 17:04:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 292933857400 Received: by mail-oa1-x30.google.com with SMTP id 586e51a60fabf-1685cf2003aso3977034fac.12 for ; Wed, 01 Feb 2023 09:04:40 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gK9GrUM3bslUVmtrS3stVe3xSJxMT8zT8T6E+gQDgh4=; b=eN8pe6yW/XN66IQT3g2FttylW5KJLMvV3jNW+I7Tmxq3dVSk8Tn5fkz/EgOw4JDS7k AEXUzGRDlm1Tf9e9U6I0wAVdzG4QEK0iMeOvUl1aaEqVB+rgd6R/JB8/2ZXbdmqsIhsP AiGCT8c7O91IGIfMWkQyZ6T268S+R6wuBR92CJ6qA81lfypDMIEPrIpr8/AvFfCHTLoy Lj5QJyu2ytFYDkvlyeGAhulF023x7d1n0QWSmB0iBtYnsHKuJ2lGXGUHISB0+UKlgm/6 UkAf9kk3fPMszvJgTP7L7agROAz7gG362uH8tJjWb9ovvJRPL5g4RHAx5kTbiBBFjOji dFPg== X-Gm-Message-State: AO0yUKXSxcx8nxCH/OIihr3ZBh/+cy0S9zQ6nwwYnKehbT81sIXgYZht nQWMJqAHV3aKZQt5L+O7uLwyg89c5KMiAOWyxb4= X-Google-Smtp-Source: AK7set+viKzbzZ+jxKDIVhM5E5saL5CfOihvv2CJ47cLybYDSc8FQwdsA9VFHoThOJb69nYoPHMruw== X-Received: by 2002:a05:6870:41d4:b0:15b:8856:f0cb with SMTP id z20-20020a05687041d400b0015b8856f0cbmr1894663oac.57.1675271078646; Wed, 01 Feb 2023 09:04:38 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c2:1887:d2ed:98c2:d2cc:bf06]) by smtp.gmail.com with ESMTPSA id b17-20020a056830311100b006863ccbf067sm8077090ots.74.2023.02.01.09.04.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Feb 2023 09:04:37 -0800 (PST) To: libc-alpha@sourceware.org, Richard Henderson , Noah Goldstein , Jeff Law , Xi Ruoyao Subject: [PATCH v11 10/29] string: Improve generic stpcpy Date: Wed, 1 Feb 2023 14:03:47 -0300 Message-Id: <20230201170406.303978-11-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230201170406.303978-1-adhemerval.zanella@linaro.org> References: <20230201170406.303978-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" It follows the strategy: - Align the destination on word boundary using byte operations. - If source is also word aligned, read a word per time, check for null (using has_zero from string-fzb.h), and write the remaining bytes. - If source is not word aligned, loop by aligning the source, and merging the result of two reads. Similar to aligned case, check for null with has_zero, and write the remaining bytes if null is found. Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc64-linux-gnu, and powerpc-linux-gnu by removing the arch-specific assembly implementation and disabling multi-arch (it covers both LE and BE for 64 and 32 bits). Reviewed-by: Richard Henderson --- string/stpcpy.c | 92 +++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 86 insertions(+), 6 deletions(-) diff --git a/string/stpcpy.c b/string/stpcpy.c index 8df5065cfe..dd0fef12ef 100644 --- a/string/stpcpy.c +++ b/string/stpcpy.c @@ -15,12 +15,12 @@ License along with the GNU C Library; if not, see . */ -#ifdef HAVE_CONFIG_H -# include -#endif - #define NO_MEMPCPY_STPCPY_REDIRECT #include +#include +#include +#include +#include #undef __stpcpy #undef stpcpy @@ -29,12 +29,92 @@ # define STPCPY __stpcpy #endif +static __always_inline char * +write_byte_from_word (op_t *dest, op_t word) +{ + char *d = (char *) dest; + for (size_t i = 0; i < OPSIZ; i++, ++d) + { + char c = extractbyte (word, i); + *d = c; + if (c == '\0') + break; + } + return d; +} + +static __always_inline char * +stpcpy_aligned_loop (op_t *restrict dst, const op_t *restrict src) +{ + op_t word; + while (1) + { + word = *src++; + if (has_zero (word)) + break; + *dst++ = word; + } + + return write_byte_from_word (dst, word); +} + +static __always_inline char * +stpcpy_unaligned_loop (op_t *restrict dst, const op_t *restrict src, + uintptr_t ofs) +{ + op_t w2a = *src++; + uintptr_t sh_1 = ofs * CHAR_BIT; + uintptr_t sh_2 = OPSIZ * CHAR_BIT - sh_1; + + op_t w2 = MERGE (w2a, sh_1, (op_t)-1, sh_2); + if (!has_zero (w2)) + { + op_t w2b; + + /* Unaligned loop. The invariant is that W2B, which is "ahead" of W1, + does not contain end-of-string. Therefore it is safe (and necessary) + to read another word from each while we do not have a difference. */ + while (1) + { + w2b = *src++; + w2 = MERGE (w2a, sh_1, w2b, sh_2); + /* Check if there is zero on w2a. */ + if (has_zero (w2)) + goto out; + *dst++ = w2; + if (has_zero (w2b)) + break; + w2a = w2b; + } + + /* Align the final partial of P2. */ + w2 = MERGE (w2b, sh_1, 0, sh_2); + } + +out: + return write_byte_from_word (dst, w2); +} + + /* Copy SRC to DEST, returning the address of the terminating '\0' in DEST. */ char * STPCPY (char *dest, const char *src) { - size_t len = strlen (src); - return memcpy (dest, src, len + 1) + len; + /* Copy just a few bytes to make DEST aligned. */ + size_t len = (-(uintptr_t) dest) % OPSIZ; + for (; len != 0; len--, ++dest) + { + char c = *src++; + *dest = c; + if (c == '\0') + return dest; + } + + /* DEST is now aligned to op_t, SRC may or may not be. */ + uintptr_t ofs = (uintptr_t) src % OPSIZ; + return ofs == 0 ? stpcpy_aligned_loop ((op_t*) dest, (const op_t *) src) + : stpcpy_unaligned_loop ((op_t*) dest, + (const op_t *) (src - ofs) , ofs); } weak_alias (__stpcpy, stpcpy) libc_hidden_def (__stpcpy)