From patchwork Tue Oct 31 20:09:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1857749 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=gOIDZNNB; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKh7Z0RMfz1yQ6 for ; Wed, 1 Nov 2023 07:09:46 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 16979385701C for ; Tue, 31 Oct 2023 20:09:44 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by sourceware.org (Postfix) with ESMTPS id BA5C73858C52 for ; Tue, 31 Oct 2023 20:09:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org BA5C73858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org BA5C73858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1129 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782974; cv=none; b=aSi+MElv0QWBXeWH+/PCprQmU1IkKMVwTMm7Hu8gf5v75Zt62twgbmuolo8s7hntppywOZMoOlix7IAHuPRn2P81s/WUW5v6Mh8a1HZhzputSCL+5GT2V9kjHNmL554x/e1s1k8iB5h7JOgE2J1tEcfJnxUUubNuDm8GKgetTXs= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782974; c=relaxed/simple; bh=m3dkwBibuOaRZ+MwwBzHLQsRix84pP21dJ2j7HtC2ss=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=KgQa+JnmwuLmxWuwMi9n4aHlIXMm4+IZ56ykpIwG4x7TaalphK941umndu/8I/d6dtNMVtaLWZavcBeY578JZVBw5PjtFjt7WniMXuYK1TfN8jO+9Lm/Dt3HeEvB+CHHG/Ri503XN2B6Ftz2GediTd3/Y/7HJBKYUPYevet4wUU= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-59e88a28b98so2145857b3.1 for ; Tue, 31 Oct 2023 13:09:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1698782971; x=1699387771; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=zIauoWevDKiUNM+VEPSV/YJFtLv+N8G4DiKdLq6Kgyw=; b=gOIDZNNB41etjGsFq6lkrEvETKE7oUvf/zsM4OtF3RSF+ymEPfAAj8sAr7nsuWIFXy bDDzmvc4oSxgetcgLwPtfD8TF133au0oBACrSJc0BqtXPEI00jrEpGJwNF0uhD8lW0Es vzr1dnHEWY9UTYIHZCXXdAaV2gyiHSAw7E1c8C7+NEn4ls1QC7mE4jwTYLzagcsuLM/p smklY3beecjyLBh5a5LD5MiQiNLepMxBxhLHDLCTuP6bjOfrWzgqHRsv+Qh/oEGapSt4 Sy6PoZ2p7Hz8yB7sEkxwdexRMwNrXKv7xZX82QEwsHj9dSA7jwWApE3YLLSz0cxywG4Y pOgw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698782971; x=1699387771; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zIauoWevDKiUNM+VEPSV/YJFtLv+N8G4DiKdLq6Kgyw=; b=YfzLwd9QLpAtmXOR+dXFGPAqGpKAr9TL6EMrOewgAq+2aq/McG+D46QCBMb1x/dhpc dciH8sVUrg65lzKb4vFN/tUauw4jeS+U/bZkZuQJOBp+ZuBFp1jjPaNoKEHV1IdZnj6C /EkBuqWi0/IBiTLigehboCH/5YB9eJN6yrUP390jG3MRVfo23ZlPfXN9EOLUaSH/I86a FvQOwQLvKNT3kGZhmNz52qQdNhlPa8GAdhtehWlUSGFXn5sdx+oyMr6f7IVF+7qbc5fg Pk7yPFk/ncysorAKuLgnwl99Apa6f935txAF14VuSbxh6BufqhQ3t7cbZKoH36ztazgh Zf7w== X-Gm-Message-State: AOJu0YyKy6y7Fc9/L/A2S10rT8I/DO7q3aDwQT8Slcb4EHW6KI+yyMRc gOr4hNfu+Nbk4rauUU9dHImfezhsU/is3B3FGavKGw== X-Google-Smtp-Source: AGHT+IFudzflezTias3iCS0bceOYYHVO74epV7SxlSRF9N9tZZkY3p9Gght60fllAOBdMlyK3T31ng== X-Received: by 2002:a81:4c58:0:b0:5a7:b4d1:c4dd with SMTP id z85-20020a814c58000000b005a7b4d1c4ddmr2761716ywa.5.1698782971412; Tue, 31 Oct 2023 13:09:31 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:3d3c:6c87:9be3:8cfc:976d]) by smtp.gmail.com with ESMTPSA id q69-20020a819948000000b005a7fa3ccb32sm1264111ywg.35.2023.10.31.13.09.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 13:09:30 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , "H . J . Lu" , Bruce Merry Subject: [PATCH 1/4] elf: Add a way to check if tunable is set (BZ 27069) Date: Tue, 31 Oct 2023 17:09:22 -0300 Message-Id: <20231031200925.3297456-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> References: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The tunable already keep a field whether it is initialized. To query the default value, it is easier to add a new constant field. The patch adds two new macros, TUNABLE_GET_DEFAULT and TUNABLE_IS_INITIALIZED, where the former get the default value with a signature similar to TUNABLE_GET while the later returns whether the tunable was set by the environment. Checked on x86_64-linux-gnu. --- elf/dl-tunable-types.h | 1 + elf/dl-tunables.c | 40 ++++++++++++++++++++++++++++++++++++++++ elf/dl-tunables.h | 28 ++++++++++++++++++++++++++++ elf/dl-tunables.list | 1 + scripts/gen-tunables.awk | 4 ++-- 5 files changed, 72 insertions(+), 2 deletions(-) diff --git a/elf/dl-tunable-types.h b/elf/dl-tunable-types.h index c88332657e..c41a3b3bdb 100644 --- a/elf/dl-tunable-types.h +++ b/elf/dl-tunable-types.h @@ -61,6 +61,7 @@ struct _tunable { const char name[TUNABLE_NAME_MAX]; /* Internal name of the tunable. */ tunable_type_t type; /* Data type of the tunable. */ + const tunable_val_t def; /* The value. */ tunable_val_t val; /* The value. */ bool initialized; /* Flag to indicate that the tunable is initialized. */ diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c index cae67efa0a..79b4d542a3 100644 --- a/elf/dl-tunables.c +++ b/elf/dl-tunables.c @@ -145,6 +145,13 @@ tunable_initialize (tunable_t *cur, const char *strval) do_tunable_update_val (cur, &val, NULL, NULL); } +bool +__tunable_is_initialized (tunable_id_t id) +{ + return tunable_list[id].initialized; +} +rtld_hidden_def (__tunable_is_initialized) + void __tunable_set_val (tunable_id_t id, tunable_val_t *valp, tunable_num_t *minp, tunable_num_t *maxp) @@ -388,6 +395,39 @@ __tunables_print (void) } } +void +__tunable_get_default (tunable_id_t id, void *valp) +{ + tunable_t *cur = &tunable_list[id]; + + switch (cur->type.type_code) + { + case TUNABLE_TYPE_UINT_64: + { + *((uint64_t *) valp) = (uint64_t) cur->def.numval; + break; + } + case TUNABLE_TYPE_INT_32: + { + *((int32_t *) valp) = (int32_t) cur->def.numval; + break; + } + case TUNABLE_TYPE_SIZE_T: + { + *((size_t *) valp) = (size_t) cur->def.numval; + break; + } + case TUNABLE_TYPE_STRING: + { + *((const char **)valp) = cur->def.strval; + break; + } + default: + __builtin_unreachable (); + } +} +rtld_hidden_def (__tunable_get_default) + /* Set the tunable value. This is called by the module that the tunable exists in. */ void diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h index 45c191e021..0df4dde24e 100644 --- a/elf/dl-tunables.h +++ b/elf/dl-tunables.h @@ -45,18 +45,26 @@ typedef void (*tunable_callback_t) (tunable_val_t *); extern void __tunables_init (char **); extern void __tunables_print (void); +extern bool __tunable_is_initialized (tunable_id_t); extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t); extern void __tunable_set_val (tunable_id_t, tunable_val_t *, tunable_num_t *, tunable_num_t *); +extern void __tunable_get_default (tunable_id_t id, void *valp); rtld_hidden_proto (__tunables_init) rtld_hidden_proto (__tunables_print) +rtld_hidden_proto (__tunable_is_initialized) rtld_hidden_proto (__tunable_get_val) rtld_hidden_proto (__tunable_set_val) +rtld_hidden_proto (__tunable_get_default) /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and TUNABLE_NAMESPACE are defined. This is useful shorthand to get and set tunables within a module. */ #if defined TOP_NAMESPACE && defined TUNABLE_NAMESPACE +# define TUNABLE_IS_INITIALIZED(__id) \ + TUNABLE_IS_INITIALIZED_FULL(TOP_NAMESPACE, TUNABLE_NAMESPACE, __id) +# define TUNABLE_GET_DEFAULT(__id, __type) \ + TUNABLE_GET_DEFAULT_FULL(TOP_NAMESPACE, TUNABLE_NAMESPACE,__id, __type) # define TUNABLE_GET(__id, __type, __cb) \ TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb) # define TUNABLE_SET(__id, __val) \ @@ -65,6 +73,10 @@ rtld_hidden_proto (__tunable_set_val) TUNABLE_SET_WITH_BOUNDS_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, \ __val, __min, __max) #else +# define TUNABLE_IS_INITIALIZED(__top, __ns, __id) \ + TUNABLE_IS_INITIALIZED_FULL(__top, __ns, __id) +# define TUNABLE_GET_DEFAULT(__top, __ns, __type) \ + TUNABLE_GET_DEFAULT_FULL(__top, __ns, __id, __type) # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \ TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb) # define TUNABLE_SET(__top, __ns, __id, __val) \ @@ -73,6 +85,22 @@ rtld_hidden_proto (__tunable_set_val) TUNABLE_SET_WITH_BOUNDS_FULL (__top, __ns, __id, __val, __min, __max) #endif +/* Return whether the tunable was initialized by the environment variable. */ +#define TUNABLE_IS_INITIALIZED_FULL(__top, __ns, __id) \ +({ \ + tunable_id_t id = TUNABLE_ENUM_NAME (__top, __ns, __id); \ + __tunable_is_initialized (id); \ +}) + +/* Return the default value of the tunable. */ +#define TUNABLE_GET_DEFAULT_FULL(__top, __ns, __id, __type) \ +({ \ + tunable_id_t id = TUNABLE_ENUM_NAME (__top, __ns, __id); \ + __type __ret; \ + __tunable_get_default (id, &__ret); \ + __ret; \ +}) + /* Get and return a tunable value. If the tunable was set externally and __CB is defined then call __CB before returning the value. */ #define TUNABLE_GET_FULL(__top, __ns, __id, __type, __cb) \ diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list index 695ba7192e..5bb858b1d8 100644 --- a/elf/dl-tunables.list +++ b/elf/dl-tunables.list @@ -20,6 +20,7 @@ # type: Defaults to STRING # minval: Optional minimum acceptable value # maxval: Optional maximum acceptable value +# default: Optional default value (if not specified it will be 0 or "") # env_alias: An alias environment variable # security_level: Specify security level of the tunable for AT_SECURE binaries. # Valid values are: diff --git a/scripts/gen-tunables.awk b/scripts/gen-tunables.awk index d6de100df0..9726b05217 100644 --- a/scripts/gen-tunables.awk +++ b/scripts/gen-tunables.awk @@ -177,8 +177,8 @@ END { n = indices[2]; m = indices[3]; printf (" {TUNABLE_NAME_S(%s, %s, %s)", t, n, m) - printf (", {TUNABLE_TYPE_%s, %s, %s}, {%s}, false, TUNABLE_SECLEVEL_%s, %s},\n", - types[t,n,m], minvals[t,n,m], maxvals[t,n,m], + printf (", {TUNABLE_TYPE_%s, %s, %s}, {%s}, {%s}, false, TUNABLE_SECLEVEL_%s, %s},\n", + types[t,n,m], minvals[t,n,m], maxvals[t,n,m], default_val[t,n,m], default_val[t,n,m], security_level[t,n,m], env_alias[t,n,m]); } print "};" From patchwork Tue Oct 31 20:09:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1857750 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=V78q9RHD; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKh7f2HjJz1yQ6 for ; Wed, 1 Nov 2023 07:09:50 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 287D03856DC0 for ; Tue, 31 Oct 2023 20:09:48 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x112d.google.com (mail-yw1-x112d.google.com [IPv6:2607:f8b0:4864:20::112d]) by sourceware.org (Postfix) with ESMTPS id B2F153857C43 for ; Tue, 31 Oct 2023 20:09:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B2F153857C43 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org B2F153857C43 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::112d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782976; cv=none; b=MUqahYDIsC6nxHZ6L2mDegngIqQa8F+492Xm/nCKB3WNVCXbkg/nXFMemtnzfU4iXlHSdysPfyx5twq4OCevVHltQsX/HjpHXqH3W5JJB7OJxrjtgDGa9pC+y1tPdvR6myCZNdbKdCdXeW8CuzXEiFBMuZPCVymVA/hYkhFb2mw= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782976; c=relaxed/simple; bh=uoxo7hUdQgKwNmmyubhslxK3kKPhMwm/tHfmjv/gwPU=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=DJ5CVAESWfQifOJPI+GTDjy3ePiSIoZoNrTzo/uHY27ZWevdEasXEhJ7iQ3FyzbxhFUPevtvP1opX5wq42mvmQaf2Yjy7MrgIprC+ebsj/EK/FABZLq22NS6dUmx0stbNOdsht9HRriiw+39qqM7DTLUqkteRWzArET2dpj9c70= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-5a8628e54d4so2244917b3.0 for ; Tue, 31 Oct 2023 13:09:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1698782973; x=1699387773; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=Denfu+yaEZW3lZNjVu3uJCgtHsql+oOPqFX94ecuTp4=; b=V78q9RHDY3mKWXRbJjpz+v4ok5tw8+nbJS66Ap2YBzaPYdqKMeFQ9HWxjC6lYGuYRL d5DGgp4n1ZXSFL1ZJoHJbSNQelRCmjYWmDrDRdLw+95KFT1ekXjy2PwaYBSvd/nQaqd1 I0Qa8sfczo+V6yMUIpqwMzwwedGgG50/Hw2Y6kYTVsd5ZKSSf37r+R4GQiu+rs9o0IqQ dUsgN9cuAGNAOQj5W97FReCKKCp31bJs5M5LU4jn5vcDpGNYXZmdoyBPbBpSaNKA5/tr pRJ333HZ2seg02o7hZLkgv6qNfOQ1lyXNxNAdr1XHCaDROdm2uRUIVI97Lshpd1BF4HY JsSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698782973; x=1699387773; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Denfu+yaEZW3lZNjVu3uJCgtHsql+oOPqFX94ecuTp4=; b=lh65Vz72c3D1+FOYEDYd/17Fhu7p0lsPRYtl/+R+l6fCP4yafp9CiF7L5NX23UakAi zogh9Ma/zg+QHblwvT3jsyaPWbZTUz/GATjEwQIdMcEu9vlm3gnr4iShcGt+Lv9GwEOj m++BuiPeKX8xKjRHs+5RtN88goB+F9+NX6nWB2RNkPn2Rp0UXU9MomaLYayljyKwVtSk NG9qCRzBiEpZjlIxMSPoU58Nl28geWfh1qT0WgMj7jiVRBnp8x4zmh9kuPCvLIgz8BK6 XqNendHr+e8tkxVbw9cTphOiRz98AaWs9pRQJbjtzQcX4TezKaImSvOlXYDqvvY8cV+P vP4Q== X-Gm-Message-State: AOJu0YwMn2RJAkJ1Rg1+4UBkTYHswVeXS93JBPUEr+oRpdnzmEFx/VFn ewfEroHeFgV/c/Dy8YlnT01tLKAsZw7i1ziIBJXI1A== X-Google-Smtp-Source: AGHT+IHAlM85qsyxHTtSLyqv6rD8nGOcxCvXbnicQ5v23bWwIRLBSWLrpyPFmjcjVD2YnuGUxmLi4A== X-Received: by 2002:a05:690c:70a:b0:5a7:ba3e:d1d1 with SMTP id bs10-20020a05690c070a00b005a7ba3ed1d1mr601790ywb.25.1698782973290; Tue, 31 Oct 2023 13:09:33 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:3d3c:6c87:9be3:8cfc:976d]) by smtp.gmail.com with ESMTPSA id q69-20020a819948000000b005a7fa3ccb32sm1264111ywg.35.2023.10.31.13.09.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 13:09:32 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , "H . J . Lu" , Bruce Merry Subject: [PATCH 2/4] x86: Fix Zen3/Zen4 ERMS selection (BZ 30994) Date: Tue, 31 Oct 2023 17:09:23 -0300 Message-Id: <20231031200925.3297456-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> References: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The REP MOVSB usage on memcpy/memmove does show any performance gain on Zen3/Zen4 cores compared to the vectorized loops. Also, as from BZ 30994, if source is aligned and destination is not the performance can be as 20x slower. The perfomance differnce is really noticeable with small buffer sizes, closer to the lower bounds limits when memcpy/memmove starts to use ERMS. The performance of REP MOVSB is similar to vectorized instruction on the size limit (the L2 cache). Also, there is not drawnback of multiple cores sharing the cache. A new tunable, glibc.cpu.x86_rep_movsb_stop_threshold, allows to setup the higher bound size to use 'rep movsb'. Checked on x86_64-linux-gnu on Zen3. --- manual/tunables.texi | 9 ++++++ sysdeps/x86/dl-cacheinfo.h | 58 +++++++++++++++++++++++------------- sysdeps/x86/dl-tunables.list | 10 +++++++ 3 files changed, 56 insertions(+), 21 deletions(-) diff --git a/manual/tunables.texi b/manual/tunables.texi index 776fd93fd9..5d3263bc2e 100644 --- a/manual/tunables.texi +++ b/manual/tunables.texi @@ -570,6 +570,15 @@ greater than zero, and currently defaults to 2048 bytes. This tunable is specific to i386 and x86-64. @end deftp +@deftp Tunable glibc.cpu.x86_rep_movsb_stop_threshold +The @code{glibc.cpu.x86_rep_movsb_threshold} tunable allows the user to +set threshold in bytes to stop using "rep movsb". The value must be +greater than zero, and currently defaults depends of the CPU and the +cache size. + +This tunable is specific to i386 and x86-64. +@end deftp + @deftp Tunable glibc.cpu.x86_rep_stosb_threshold The @code{glibc.cpu.x86_rep_stosb_threshold} tunable allows the user to set threshold in bytes to start using "rep stosb". The value must be diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 87486054f9..51e5ba200f 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -784,6 +784,14 @@ get_common_cache_info (long int *shared_ptr, long int * shared_per_thread_ptr, u *threads_ptr = threads; } +static inline bool +is_rep_movsb_stop_threshold_valid (unsigned long int v) +{ + unsigned long int rep_movsb_threshold + = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL); + return v > rep_movsb_threshold; +} + static void dl_init_cacheinfo (struct cpu_features *cpu_features) { @@ -791,7 +799,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) long int data = -1; long int shared = -1; long int shared_per_thread = -1; - long int core = -1; unsigned int threads = 0; unsigned long int level1_icache_size = -1; unsigned long int level1_icache_linesize = -1; @@ -809,7 +816,6 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (cpu_features->basic.kind == arch_kind_intel) { data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features); - core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features); shared_per_thread = shared; @@ -822,7 +828,8 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) = handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features); level1_dcache_linesize = handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features); - level2_cache_size = core; + level2_cache_size + = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features); level2_cache_assoc = handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features); level2_cache_linesize @@ -835,12 +842,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level4_cache_size = handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_zhaoxin) { data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE); - core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE); shared_per_thread = shared; @@ -849,19 +856,19 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE); level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC); level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE); - get_common_cache_info (&shared, &shared_per_thread, &threads, core); + get_common_cache_info (&shared, &shared_per_thread, &threads, + level2_cache_size); } else if (cpu_features->basic.kind == arch_kind_amd) { data = handle_amd (_SC_LEVEL1_DCACHE_SIZE); - core = handle_amd (_SC_LEVEL2_CACHE_SIZE); shared = handle_amd (_SC_LEVEL3_CACHE_SIZE); level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE); @@ -869,7 +876,7 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) level1_dcache_size = data; level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC); level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE); - level2_cache_size = core; + level2_cache_size = handle_amd (_SC_LEVEL2_CACHE_SIZE);; level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC); level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE); level3_cache_size = shared; @@ -880,12 +887,12 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (shared <= 0) { /* No shared L3 cache. All we have is the L2 cache. */ - shared = core; + shared = level2_cache_size; } else if (cpu_features->basic.family < 0x17) { /* Account for exclusive L2 and L3 caches. */ - shared += core; + shared += level2_cache_size; } shared_per_thread = shared; @@ -1028,16 +1035,25 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) SIZE_MAX); unsigned long int rep_movsb_stop_threshold; - /* ERMS feature is implemented from AMD Zen3 architecture and it is - performing poorly for data above L2 cache size. Henceforth, adding - an upper bound threshold parameter to limit the usage of Enhanced - REP MOVSB operations and setting its value to L2 cache size. */ - if (cpu_features->basic.kind == arch_kind_amd) - rep_movsb_stop_threshold = core; - /* Setting the upper bound of ERMS to the computed value of - non-temporal threshold for architectures other than AMD. */ - else - rep_movsb_stop_threshold = non_temporal_threshold; + /* If the tunable is not set or if the value is not larger than + x86_rep_stosb_threshold, use the default values. */ + rep_movsb_stop_threshold = TUNABLE_GET (x86_rep_movsb_stop_threshold, + long int, NULL); + if (!TUNABLE_IS_INITIALIZED (x86_rep_movsb_stop_threshold) + || !is_rep_movsb_stop_threshold_valid (rep_movsb_stop_threshold)) + { + /* For AMD cpus that support ERMS (Zen3+), REP MOVSB is in a lot case + slower than the vectorized path (and for some alignments it is really + slow, check BZ #30994). */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_movsb_stop_threshold = 0; + else + /* Setting the upper bound of ERMS to the computed value of + non-temporal threshold for architectures other than AMD. */ + rep_movsb_stop_threshold = non_temporal_threshold; + } + TUNABLE_SET_WITH_BOUNDS (x86_rep_stosb_threshold, rep_stosb_threshold, 1, + SIZE_MAX); cpu_features->data_cache_size = data; cpu_features->shared_cache_size = shared; diff --git a/sysdeps/x86/dl-tunables.list b/sysdeps/x86/dl-tunables.list index feb7004036..5e9831b610 100644 --- a/sysdeps/x86/dl-tunables.list +++ b/sysdeps/x86/dl-tunables.list @@ -49,6 +49,16 @@ glibc { # if the tunable value is set by user or not [BZ #27069]. minval: 1 } + x86_rep_movsb_stop_threshold { + # For AMD cpus that support ERMS (Zen3+), REP MOVSB is not faster + # than the vectorized path (and for some destination alignment it + # is really slow, check BZ #30994). On Intel cpus, the size limit + # to use ERMS is is [1/8, 1/2] of size of the chip's cache, check + # the dl-cacheinfo.h). + # This tunable allows the caller to setup the limit where to use + # REP MOVB on memcpy/memmove. + type: SIZE_T + } x86_rep_stosb_threshold { type: SIZE_T # Since there is overhead to set up REP STOSB operation, REP STOSB From patchwork Tue Oct 31 20:09:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1857752 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=i55wSBTN; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=2620:52:3:1:0:246e:9693:128c; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKh851BKTz1yQ5 for ; Wed, 1 Nov 2023 07:10:13 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 13B6B3857355 for ; Tue, 31 Oct 2023 20:10:11 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1133.google.com (mail-yw1-x1133.google.com [IPv6:2607:f8b0:4864:20::1133]) by sourceware.org (Postfix) with ESMTPS id 96BA5385770A for ; Tue, 31 Oct 2023 20:09:36 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 96BA5385770A Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 96BA5385770A Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1133 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782977; cv=none; b=bnxWC4qIapwSlTQjIqUDjJAAjwbLYOHcsfCcRcMJO9b/fbNtXnaCRLqX5oX81R/DxTEVPg2kAq3lVJBsi1Jh0xbTDjv4YzTszz1uuoolbeHicKSvBViCKAdt76ZmARF3HhMirhqWcvnsqsmQNOmFOXanNslkDVxE/yXaf+HkaVo= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782977; c=relaxed/simple; bh=9WsET7lIW9+kn5hSO4zzd6EWHRIXxU5dVbmf+rh3PlQ=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=nsYBClVN0w00fPLx+qd1ARL9oEc0NpVk27B5/AcAy8I0iPf2vUWw8afr1D27jXJI/+gJ2qY3m5jfRk7PnTekD+JbpgPs1x/ldhmzhv1dQZZAQQhAzPjCXXUURmsfbCZeTofKliS+AGqOa/bMuFbBq27Ud1p2m6FTZ+3new8U2Ro= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1133.google.com with SMTP id 00721157ae682-59b5484fbe6so59077927b3.1 for ; Tue, 31 Oct 2023 13:09:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1698782975; x=1699387775; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=sWcO/F3jf5++7bvf7k677eZvBleFBNohDCBgOUG9Cw4=; b=i55wSBTNTJVklV1cQbqK2E7uLdiLhq0/DlItPufE2Mx6Ju2YrHNeuv3X9L7Ww9DnEb Dq89ojhV7gF8f4uYZFtuTz1DWN/7y5PEj0lswJSML2HMIBb0gxikHtQmVi5Z9e43WUEt nwUlwrfrYSH4TYFP9Sc3JkBBhIg9WMkCsHVaWHAsA9LVyoD9bDvvNXK2315X05jpwi+i 74g+EcB2ki+LvGPShfebxSaB6XA5R7f3LT3s/UZ5+ddihUUJmMYfAtnr23WutDEoOeBA A+8x/hRj0mubgBpj8cBInsScIVr0tNPsuLJZXmJjg6A6xP7KA3GS4CBrcgbrmY0CtHWA Gqzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698782975; x=1699387775; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sWcO/F3jf5++7bvf7k677eZvBleFBNohDCBgOUG9Cw4=; b=Mm+Q6SXE3+ERfzWxvR7evAk4itgKlRD66d/yF/RZisLrg9nKY8KMd0sxFiMsBSKCOs OY10aEwZn5zGh+ZLBG07hGRYfYzgxYejyTOz8I3qOFkD76vK/rz3rqFJ+lbpKwI8rOul xx5E+0KQc5jO1TK3kpo05mVx/8n+626x3Gm/LJPikMCIXn8VoFmYGMWK0F47WI08sx4t TPsQLfSqh5vRzbv9MmjO6o1MEeVsVvBr/9aHiOgB8GOVD9ETVYybPqVVVnRvTFJpfjWr OmQb7zjZbadBCtrRDMwAgJjNY9su74J6LjAHquKv0hmrvjcEIyyIc3372QD7zA+erWhp eiqw== X-Gm-Message-State: AOJu0YxSZvJHptowwTHaFq2LDk1PdRcTVU2NdZocA1F6B965j8gru0Nb 5q6ltt8gj23UK5Ln/hWPv15vjEHcIpo92kVrYvjoyw== X-Google-Smtp-Source: AGHT+IGuTU/o/x1A1iq3rBwbQj3AZbB3Z1H8a1P1UpPsuchi6tdrG/+aMbnqPpk6u5ssCuMq5dd8qg== X-Received: by 2002:a81:ed0a:0:b0:5a8:3cb:b53d with SMTP id k10-20020a81ed0a000000b005a803cbb53dmr12418187ywm.1.1698782975259; Tue, 31 Oct 2023 13:09:35 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:3d3c:6c87:9be3:8cfc:976d]) by smtp.gmail.com with ESMTPSA id q69-20020a819948000000b005a7fa3ccb32sm1264111ywg.35.2023.10.31.13.09.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 13:09:34 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , "H . J . Lu" , Bruce Merry Subject: [PATCH 3/4] x86: Do not prefer ERMS for memset on Zen3+ Date: Tue, 31 Oct 2023 17:09:24 -0300 Message-Id: <20231031200925.3297456-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> References: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org The REP STOSB usage on memset does show any performance gain on Zen3/Zen4 cores compared to the vectorized loops. Checked on x86_64-linux-gnu. --- sysdeps/x86/dl-cacheinfo.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h index 51e5ba200f..99ba0f776a 100644 --- a/sysdeps/x86/dl-cacheinfo.h +++ b/sysdeps/x86/dl-cacheinfo.h @@ -1018,11 +1018,17 @@ dl_init_cacheinfo (struct cpu_features *cpu_features) if (tunable_size > minimum_rep_movsb_threshold) rep_movsb_threshold = tunable_size; - /* NB: The default value of the x86_rep_stosb_threshold tunable is the - same as the default value of __x86_rep_stosb_threshold and the - minimum value is fixed. */ - rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, - long int, NULL); + /* For AMD Zen3+ architecture, the performance of vectorized loop is + slight better than ERMS. */ + if (cpu_features->basic.kind == arch_kind_amd) + rep_stosb_threshold = SIZE_MAX; + + if (TUNABLE_IS_INITIALIZED (x86_rep_stosb_threshold)) + /* NB: The default value of the x86_rep_stosb_threshold tunable is the + same as the default value of __x86_rep_stosb_threshold and the + minimum value is fixed. */ + rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold, + long int, NULL); TUNABLE_SET_WITH_BOUNDS (x86_data_cache_size, data, 0, SIZE_MAX); TUNABLE_SET_WITH_BOUNDS (x86_shared_cache_size, shared, 0, SIZE_MAX); From patchwork Tue Oct 31 20:09:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 1857751 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=linaro.org header.i=@linaro.org header.a=rsa-sha256 header.s=google header.b=hGe9aw1A; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=server2.sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=patchwork.ozlabs.org) Received: from server2.sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4SKh7x5Qmtz1yQ6 for ; Wed, 1 Nov 2023 07:10:05 +1100 (AEDT) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A3AA385DC17 for ; Tue, 31 Oct 2023 20:10:03 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-yw1-x1129.google.com (mail-yw1-x1129.google.com [IPv6:2607:f8b0:4864:20::1129]) by sourceware.org (Postfix) with ESMTPS id 56F613857732 for ; Tue, 31 Oct 2023 20:09:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 56F613857732 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 56F613857732 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::1129 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782979; cv=none; b=Uz8CMHeRfQijPXD2WLsKsPx7V2PfdoJWNzub+388buOA5kXe97OUO2xWu25SyD7/LcUgeD/hasPWg/a/H9xKb/ox6wVhtg59B2rPT+exKOEL1/3sZsJ80JVEUAU0nlqK6/9+oDQ4+ZwQV0v1vHUffZZvkVvRCMCLJOItmziQRXg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698782979; c=relaxed/simple; bh=kLCbBIidU3UsjLdccD2m4jrtJ6R7Ek0qnK9dQSWWjAM=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=X55R0X4ak4BLSnQna2CImOzIY+RlQchWyDbJf4DyDmYS4B4cqglKXP0JbbkrjWAd3RxBml31gagvOOdXAlXTjz25Gt4IWldY1WWtimF3f0fvaQTHiMoWt3geA8rVCY5oHzOiLbedeNZYgz4xG6rjIkIaa6+1I3UqGbnx9nAz3gQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-yw1-x1129.google.com with SMTP id 00721157ae682-5a7e5dc8573so59910097b3.0 for ; Tue, 31 Oct 2023 13:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1698782977; x=1699387777; darn=sourceware.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:from:to:cc:subject:date:message-id :reply-to; bh=/ycWYI7V3yRtlW1yrmQJ6N5wMgUs4/SIymF2BafYsjw=; b=hGe9aw1A50CFvi3zpvCjc8PEVyiVzsGq0kTVHX3MOXCvEyAejZXShXSBelqDHwLvLu WR6pwR0GKXG6RLsYtu5FaZvlPqaTuo7qDph5/yGZqNLkIao+fl9QaRjt22X2u43TgQvm q+3vIZphafjq7aqXPgms+M4OdbGVGPIEt2TjnnlsS7vdgybwiTNcUNC7sb6exHRwD8sx Qpumzynz9R3b2R5sYZjCLvY3GuFmqKZ3+BxPv5TYjoxc3RO2vcBCADhI5bXeN60xs8SR XIwebPq6TPfuNaqN6GjkEiR5ABUlEnUs1XxtCsYWOFiexHyiwYpUd4uDo2tmREMYNV9w H5rQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698782977; x=1699387777; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/ycWYI7V3yRtlW1yrmQJ6N5wMgUs4/SIymF2BafYsjw=; b=ILUtMW39x3SWtZZc2BZ1ZZCTC9KAIiqUCyZhMCDhqyg4zL7hoJ185pvDqs04bas+l5 w9fjl1MgGxbBmsfZzfk09m1Pmd2KVN/L6KhHMhXSJSxUqiI7Ywd3bEyloXcPmeylCL+Y dAsZNHEIXu024VdSFPkUb1fz+2SC02+RKZhKdrQQLqGJVmOZRhFWwg7vU+lm+fwPIEOK +uxv7Ty8VbuUejZRQu3LpH9HeVJevVBVm+i6yfVLxJsQ/HqmB1H2vqnR/TZsJdZ56GNg qCXYWVQbU8Uoyv3ad45W19pmFxHNlytO9A+Dfio/OCSG72Bkj8W1tpuXiLK3b/+ZrTR5 Rvig== X-Gm-Message-State: AOJu0YwtqVzCXj/NYjIWRaqckXF2Qv7/RKAPEsTazSnmmuHx8MG4ff0F rBhG7XIoiHcJdwhhDrYfvbdCB7ccl4CXQdQE5PMk/w== X-Google-Smtp-Source: AGHT+IEB4onQzXInrbqpvv6G/jT0cAIoSe40m2ug1nbXgVP4L14i+Xg9WTrbBx2mjutz1cKtcknS3w== X-Received: by 2002:a81:d00d:0:b0:592:ffc:c787 with SMTP id v13-20020a81d00d000000b005920ffcc787mr13368074ywi.30.1698782977120; Tue, 31 Oct 2023 13:09:37 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c0:3d3c:6c87:9be3:8cfc:976d]) by smtp.gmail.com with ESMTPSA id q69-20020a819948000000b005a7fa3ccb32sm1264111ywg.35.2023.10.31.13.09.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 31 Oct 2023 13:09:36 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org, Noah Goldstein , "H . J . Lu" , Bruce Merry Subject: [PATCH 4/4] x86: Expand the comment on when REP STOSB is used on memset Date: Tue, 31 Oct 2023 17:09:25 -0300 Message-Id: <20231031200925.3297456-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> References: <20231031200925.3297456-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org --- sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S index 3d9ad49cb9..0821b32997 100644 --- a/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S +++ b/sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S @@ -21,7 +21,9 @@ 2. If size is less than VEC, use integer register stores. 3. If size is from VEC_SIZE to 2 * VEC_SIZE, use 2 VEC stores. 4. If size is from 2 * VEC_SIZE to 4 * VEC_SIZE, use 4 VEC stores. - 5. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with + 5. On machines ERMS feature, if size is greater or equal than + __x86_rep_stosb_threshold then REP STOSB will be used. + 6. If size is more to 4 * VEC_SIZE, align to 4 * VEC_SIZE with 4 VEC stores and store 4 * VEC at a time until done. */ #include