From patchwork Thu Feb 8 13:08:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1896610 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=jNnLtGJ7; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TVy4B3JWqz23h2 for ; Fri, 9 Feb 2024 00:09:14 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5F432385843B for ; Thu, 8 Feb 2024 13:09:12 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by sourceware.org (Postfix) with ESMTPS id 10EDD3858C35 for ; Thu, 8 Feb 2024 13:08:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 10EDD3858C35 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 10EDD3858C35 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::633 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; cv=none; b=T1omClpwBnvM1BdWXh+tWxutC4X+vgOcAaRNJUuy6u4m4V5GtInfB2jTSJ8IkRBCMRRfgHTSn5cAzwlmchbia9t6CRkgKJ+xvD++emjdD06Ut+RQ1bRGH0yJflZrik0h6PldnXxYe/6yCcZiLiYADQSkgia5v1IXMYH5x/3C35U= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397733; c=relaxed/simple; bh=xnJ+tDWAjOBmDgnMtZrrEqUVvVc1F2wm9LR8wmhqXdU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=Edztt+KEomWTfaVOwc9hvMgl6COC/cPeX4r4Vn2Ux/MfTmq6rI7uc2SONB2/a2o/thRZ60nXRbcRcWrYc1QIZUuv4Mzj4g620s4Cgz2L9WG/tTpzS2ZNzlrz2dCMJ6ie3bPXpCJ4Z4YcW+OYi4iA/S44GfzkS2soHwtwu5wXSo4= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x633.google.com with SMTP id d9443c01a7336-1d70b0e521eso13963365ad.1 for ; Thu, 08 Feb 2024 05:08:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397728; x=1708002528; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=jNnLtGJ7XnTvbbyLbqL3j3PkUR9jYDohQZ7pMoIzK8J3fhQhR3u4ddO9aOQKJ/BtDM QHyFp39CMiy3aCzCWffVyREuHm1ciop4oD6v9KhrXb4TyVW3HdA3p/lyOxOGt/y/2Ihc U4DKcyRSKm5TmW88BLXwgKIzwJAZhTr8ORbn+fZseHV4ByrCnm4Ww7Peua9WwqmnRpIr jpFIzKnnHcWR86FxJzgR5llY7pCGAJBLl5DPgUgNU+EE6Bu8NbqjTp8AnkmnFjmpGF5n 1miqvOYQK/lqoJNXQis6fVcS7PijaHlfSCzerfzS3arOV1vxFD+6MUcQVKUnJJeSpn6Y GW/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397728; x=1708002528; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=97QAG9z8C8w+cwX6JnQpqmEqLWnpiF7zMMcL+XGPH/Y=; b=rvDZ9lqqIXh4hLysTQF7i2dCCMNi6TzYPUukjtU9uvFaIoXGfbeUoUXEiQkAqw7pIH 048OfhXNt6px22ORjQCqGMx6o8ge6oaGZcqgMcfxcVPX8yqhhO0vDlxezPyYke/ja6QX 3iP3ds4cIlEuQDwOz9aaW8XSt9yrNUN4i/QOuIkFLO91o04zHVP5i+W0QQtA2+5XLQlD W35/n5SUCCPhcj8tCQtKGVt9NzsfTLNxEPq+WDVc4ARpVG8FmDWgYA6dYq46P063DPnE Y+Hebym/9StSwYQyqRwa2PCCD2Vso83N4DucRlWc6GD5XSOR01tTCdNG+Vzan2ulwKaq wgvA== X-Gm-Message-State: AOJu0YzkShnWYF2qqzNvvtUBDejUd4MPTeGkMUaHvHlI4TpE4zGtoszw T106bioma9HvDsUDmfIdHtxpOt6YyawMtdMQnjwANYM4W468sfWIfeZf2TLXvD4c4gHnhR5EtIS O X-Google-Smtp-Source: AGHT+IFT7ns/AGO18ucYPBUuEQaMKL9+RW5XbGBgitoKZWQSiwjGXBHuXLAu/REOtbcm059oP1q0kg== X-Received: by 2002:a17:902:cec1:b0:1d9:90d6:bed3 with SMTP id d1-20020a170902cec100b001d990d6bed3mr9365887plg.43.1707397728419; Thu, 08 Feb 2024 05:08:48 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUjoLx9ztK2D+zu35uCKav2hRKaOW1qlKifOYqF9AwPwPoGhThDzn/K1XfKcczIH84/I8JVzAHBe7xuFl2xbuBu+gagOGoqgKGHzPNuDZmWDyS6FXF9Scv80ipfWIFLbeYvSxCPeFh8XXo1EBfZucUCcdsQd5bX1ImlBw65kseOjbdyyhp7Nj5tOA== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:47 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 1/3] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Date: Thu, 8 Feb 2024 10:08:38 -0300 Message-Id: <20240208130840.533348-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The REP MOVSB usage on memcpy/memmove does not show much performance improvement on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if the source is aligned and the destination is not the performance can be 20x slower. The performance difference is noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is no drawback to multiple cores sharing the cache. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu --- sysdeps/x86/dl-cacheinfo.h | 38 ++++++++++++++++++-------------------- 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index d5101615e3..f34d12846c 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -791,7 +791,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) long int data = -1; long int shared = -1; long int shared_per_thread = -1; - long int core = -1; unsigned int threads = 0; unsigned long int level1_icache_size = -1; unsigned long int level1_icache_linesize = -1; @@ -809,7 +808,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); shared_per_thread = shared; @@ -822,7 +820,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); level1_dcache_linesize = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); - level2_cache_size = core; + level2_cache_size + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); level2_cache_assoc = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); level2_cache_linesize @@ -835,12 +834,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level4_cache_size = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE); - core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE); shared_per_thread = shared; @@ -849,19 +848,19 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC); level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_amd) { data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); - core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE); @@ -869,7 +868,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_amd (_SC_LEVEL2_CACHE_SIZE);; level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; @@ -880,12 +879,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (shared <= 0) { /* No shared L3 cache. All we have is the L2 cache. */ - shared = core; + shared = level2_cache_size; } else if (cpu_features->basic.family < 0x17) { /* Account for exclusive L2 and L3 caches. */ - shared += core; + shared += level2_cache_size; } shared_per_thread = shared; @@ -987,6 +986,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (CPU_FEATURE_USABLE_P (cpu_features, FSRM)) rep_movsb_threshold = 2112; + /* For AMD CPUs that support ERMS (Zen3+), REP MOVSB is in a lot of + cases slower than the vectorized path (and for some alignments, + it is really slow, check BZ #30994). */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_movsb_threshold = non_temporal_threshold; + /* The default threshold to use Enhanced REP STOSB. */ unsigned long int rep_stosb_threshold = 2048; @@ -1028,16 +1033,9 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) SIZE_MAX); unsigned long int rep_movsb_stop_threshold; - /* ERMS feature is implemented from AMD Zen3 architecture and it is - performing poorly for data above L2 cache size. Henceforth, adding - an upper bound threshold parameter to limit the usage of Enhanced - REP MOVSB operations and setting its value to L2 cache size. */ - if (cpu_features->basic.kind == arch_kind_amd) - rep_movsb_stop_threshold = core; /* Setting the upper bound of ERMS to the computed value of - non-temporal threshold for architectures other than AMD. */ - else - rep_movsb_stop_threshold = non_temporal_threshold; + non-temporal threshold for all architectures. */ + rep_movsb_stop_threshold = non_temporal_threshold; cpu_features->data_cache_size = data; cpu_features->shared_cache_size = shared; From patchwork Thu Feb 8 13:08:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1896613 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=DIz/hjll; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TVy5367ZLz23h2 for ; Fri, 9 Feb 2024 00:09:59 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id C0F673858298 for ; Thu, 8 Feb 2024 13:09:57 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by sourceware.org (Postfix) with ESMTPS id 7E8DD385829B for ; Thu, 8 Feb 2024 13:08:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7E8DD385829B Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7E8DD385829B Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::635 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397734; cv=none; b=IJFvEEBnjS30FyE0wasIWpE4C3iRu+NbF5iBYWidBxCB5oOH4qK84jVE8D6U8E8qmlkRxe+dBg9Ki/mNUSqg7Y0+mhGc+dD7LAhQvjPhNC7typ+ZjRtaMXoVqoeS885FgJwMjhZ3URp7VhaHeO6ZoMWXNxECCgj92TOOu37gUgM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397734; c=relaxed/simple; bh=Ch559aowFxilJioZ5WAccV0cVW/v+zYbT7cvanPw3CU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=EiJ7K28dOaupvkpqW5AR3xG44SdDD+HhuYDO50I4ntcOfrPddopikIlr1X0G08uOAQVytWYkI3+466zBXL5J9wZSmO/h2kjAuN9Lkbz5kiNnS6y1GmCdtqfcI14UsKWEJ8NKWfsMq+0tw1DoHFIZcWt2ORtzigZY0p+u12QxTbI= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1d7393de183so15389835ad.3 for ; Thu, 08 Feb 2024 05:08:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397731; x=1708002531; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=b5wsbm9BSPv8Z6b84ObY/vrxF/C+AQXAZoSPnUTkPBs=; b=DIz/hjllBAqcpEJFdETiOe0tTEru6ur+CH1ebgxhqIxd8arlojkZczO0aRsRgvVFk4 +osmIQGmUl0V1ghr2bH9Qk/U+ngvxLKiZOI1gyweZLEUoYEjyLe87CaCqAqeW+Jet12s 8hMY7+hGHhVGSYuj8FtgB1fvQ3VrlB/zaX1U8pD6S7c85lwcqs9gSUfiBEZpilSUqM6O jig+qRAR2ab/YmN4FdqumHAiNlc8k+TufhLA1GqpsY7YDELBmpKHIXuIDspzo4zw6YPi Lm5xftEAQ6N3DBTZ+bpYa9nO/K/DK+xRUjmGGaq9qZPlEG1gZdlEnNR2VJeVYo6X7tIH 0b+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397731; x=1708002531; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=b5wsbm9BSPv8Z6b84ObY/vrxF/C+AQXAZoSPnUTkPBs=; b=tD3wP4dujyOiVYWsz3keZ/phXuAhk7Aon5zvjBFYEnF2SWZdaDKAXjxENBB8RNVt9L Y1uZaYs5VnMV5Z8LUZRfnm+s900XIwwbsoBpHw5da+4CMzTVJG7nmRwDdSqtS9Udi7qQ IV1zcTJLq/C4GJk7+V0wONFkOLYygNRA++SFqoBdL0AvdhNXVH1hj3p6/Jygw2yqHYO0 cy3P82w4UckEwh+Omyqw5Z2IToU7aldXVUSfmUdEKXsNPBm/vymK+qTrKG9xgXsllN0g 1U0sPGv/kD6ir3m/YHkdkCZwNBwXgDIuXyYF1BWANUOmyxYA3MBblCsJL3NmMsrPpeNa qfSw== X-Gm-Message-State: AOJu0YyltBuxDRMbkDIemrOYbmLKbmxkCMXNlWrJRvQcEgyC+VbmXMPs OajdrxxsHbd/9C11gfgbR/kcOh5lJC+FFJQ0WoyngIcz1GG0XBdsAddxqaiBURSodDC19keCrKu R X-Google-Smtp-Source: AGHT+IHvdmpaEF6T7VfJ5dzblRxy9M/f6qOq7BJ9qBbcuJQmXU7mK2IVTES1C36JUzhxPy3p9LivUg== X-Received: by 2002:a17:902:ce86:b0:1d9:65e6:4acc with SMTP id f6-20020a170902ce8600b001d965e64accmr9413666plg.42.1707397730860; Thu, 08 Feb 2024 05:08:50 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUaw6A9zju5zpMcRwlw0Xpd+wVFqP9Qf7cIhmGntAeI4HVKcKSHCpBpNVYtqsShH383xh2PgESu9zJUVZ3/l5O+UEWVVGNNB0/dEjvWjvoTiNxFioskIgiMHfsAcI7AN2Kl3FR3kWwTpztv00eEkRkX4pFijh8J3uDsyitD93/LNpqxS18s9SjsOA== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:50 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 2/3] x86: Do not prefer ERMS for memset on Zen3+ Date: Thu, 8 Feb 2024 10:08:39 -0300 Message-Id: <20240208130840.533348-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org For AMD Zen3+ architecture, the performance of the vectorized loop is slightly better than ERMS. Checked on x86_64-linux-gnu on Zen3. Reviewed-by: H.J. Lu --- sysdeps/x86/dl-cacheinfo.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index f34d12846c..5a98f70364 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1021,6 +1021,11 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) minimum value is fixed. */ rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, long int, NULL); + if (cpu_features->basic.kind == arch_kind_amd + && !TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* For AMD Zen3+ architecture, the performance of the vectorized loop is + slightly better than ERMS. */ + rep_stosb_threshold = SIZE_MAX; TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); From patchwork Thu Feb 8 13:08:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1896612 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=VZAU99oS; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4TVy4z58Gqz23h2 for ; Fri, 9 Feb 2024 00:09:55 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ACE8E385828F for ; Thu, 8 Feb 2024 13:09:53 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id AE67C38582B0 for ; Thu, 8 Feb 2024 13:08:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE67C38582B0 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AE67C38582B0 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::629 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397737; cv=none; b=d5O0v/mTbTObnLXNuVtfMQpB2rHdqSps0IPy1c+JufTuB0mo0BIz7jvRzCI/6/xPGUo9eRIhU8JNs1UtqKJ7F0v26XU8G6ZiMHC0gsxaz0mign8hUIGw+qp9JTXvzxG6fM9S1VTMz3+FFSFYiU6Zhkul99406+tAYEDOyTM35i0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1707397737; c=relaxed/simple; bh=QybbG1EfGsJYvEBRrGGIS59qa0NkKuxbzLTdVztndRo=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=R7OTCid54xuW8evny3yGOOuFJ8wEfY4zVU5HOb2YGuzBnp6lbg/70t/A8qaXbI4r7o7d1fm73LyFlCAnATXDVOfJRIvxBtNcdGIZElhrKyCPAm6mGL9IRvIxOQ5boUUG9NliIpfYAbkJbYqzlA1PGxNTR5jj08vdgcwbXQIx4GE= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1d91397bd22so13996935ad.0 for ; Thu, 08 Feb 2024 05:08:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1707397733; x=1708002533; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=VZAU99oS8RSTCE7FFfUv/QF+0NToW+s0Gv2pfCTkGixApq+cyKELAhS0CZVJ16BR37 uboissQMk4bgZ0wIWR9pUk7z9/O0ZG/DrMBlcWoImBS3RcAkGg86PNtUsgkd6CYw3bkW EtdNTY4+LrMEvVN+WyLhV88H5suItRThVnGbRyA9RZuwm70m5DxJaW1vqS1TOQ+jAM0h 3Gvwkk9UqcrwCgB0PT+dNJ5VNRJstUE/RleFiUCPRLj9B2WeXJgifSi/EKvCxFWoLeS5 NuIFZEDnt7Yxyf1AErgtFmjiHmFB/UMzKAo1QgBAIWfUggdXjdv9xCKkV91n6rfNj2RA lwmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707397733; x=1708002533; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7Eqs7odOLczXzBnPknEqO+pOqBAHuRBbgACFj6VnnJ0=; b=BC3OOWmOYkid6mlrr8IWzQv2cm3QNXzwCTGMY1KPsgvWe697nU19Fh0s6CZFSu5TjB 6WCeJ2b46xumd0FFJKI5fGRXpG91Owu9Zz4w/uy10c3HiQRwJMAbJGoE9nYGiq13XHk3 qyvi21RbzzNnzgbr0DlPLz4Cnuc4zdvj6FME3l5DKNkvq6HS/cNhwlBKrYylPEhrt3Q6 rlEMyKptNxIYwaa1LfxPq9rm6C3SqkA49by/vbqd0JC9E4ZN6j2/DbcPxTUuRR/0nX92 n/sEAmQWuh/c+2GH/dok79yBBXf777D0+F10DqhwJVfqo9qMRfA44GTilZqlXuJel5hk JR+A== X-Gm-Message-State: AOJu0YzK33Z7s/HnnFrfXoPZL8ORKZw5BAD4ZoNeCEHD7Trtw+OzC6zp aufbLS3QqzIQ+28zTMBuP3BFLOjiVv2vfbiSKrRHxHBvssvV4r6+/vZbuEDFnSE/0ynPdF8x1Li S X-Google-Smtp-Source: AGHT+IFm8W/56fA7ZyWs4BqounrnhQaQbwrjobiUeR5/DeibjMeqS/UlQzvZtRv7dpVjhFfCaEtMXQ== X-Received: by 2002:a17:902:e9cd:b0:1d8:d478:8b90 with SMTP id 13-20020a170902e9cd00b001d8d4788b90mr7187310plk.16.1707397733243; Thu, 08 Feb 2024 05:08:53 -0800 (PST) X-Forwarded-Encrypted: i=1; AJvYcCUnKTUJlOGjvGEMzhqdPo7U8xv22ndra3vAghIA4xh/oIkQyXh+dsNVJEgOSEjRmmcSC1eQptgxq0/RZkzQe/wSr6zgmdhl86ocZY0NseHI51ahGXDHRmZ6z7Rky41/gUllUSmiUIhuT6eWdpqMQrql3O+3vF/CB7zi9TqewxIka/+q36hyTMhhMg== Received: from mandiga.. ([2804:1b3:a7c0:378:6793:1dc3:1346:d6d6]) by smtp.gmail.com with ESMTPSA id 4-20020a170902c14400b001d9fc535378sm1844083plj.236.2024.02.08.05.08.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Feb 2024 05:08:52 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "H . J . Lu" , Noah Goldstein , Sajan Karumanchi , bmerry@sarao.ac.za, pmallapp@amd.com Subject: [PATCH v3 3/3] x86: Expand the comment on when REP STOSB is used on memset Date: Thu, 8 Feb 2024 10:08:40 -0300 Message-Id: <20240208130840.533348-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240208130840.533348-1-adhemerval.zanella@linaro.org> References: <20240208130840.533348-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Reviewed-by: H.J. Lu diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 9984c3ca0f..97839a2248 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -21,7 +21,9 @@ 2. If size is less than VEC, use integer register stores. 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with + 5. On machines ERMS feature, if size is greater or equal than + __x86_rep_stosb_threshold then REP STOSB will be used. + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with 4 VEC stores and store 4 * VEC at a time until done. */ #include