[{"id":3681361,"web_url":"http://patchwork.ozlabs.org/comment/3681361/","msgid":"<ebfb924e-49c0-787f-b5f2-c201199e1345@ssi.bg>","list_archive_url":null,"date":"2026-04-23T10:05:00","subject":"Re: [PATCHv2 net] ipvs: fix races around est_mutex and est_cpulist","submitter":{"id":2825,"url":"http://patchwork.ozlabs.org/api/people/2825/","name":"Julian Anastasov","email":"ja@ssi.bg"},"content":"Hello,\n\nOn Wed, 22 Apr 2026, Julian Anastasov wrote:\n\n> Sashiko reports for races and possible crash around\n> the usage of est_cpulist_valid and sysctl_est_cpulist.\n> The problem is that we do not lock est_mutex in some\n> places which can lead to wrong write ordering and\n> as result problems when calling cpumask_weight()\n> and cpumask_empty().\n> \n> Fix them by moving the est_max_threads read/write under\n> locked est_mutex. Do the same for one ip_vs_est_reload_start()\n> call to protect the cpumask_empty() usage of sysctl_est_cpulist.\n> \n> To remove the chance of deadlock while stopping the\n> estimation kthreads, use the service_mutex to walk the\n> array with kthreads and hold it during kthread_stop().\n> OTOH, est_mutex is needed only while starting the\n> kthreads, not for stopping. Reorganize the code in\n> kthread 0 to use mutex_trylock() for the service_mutex\n> to ensure kthread_should_stop() is not true while we\n> attempt to lock the mutex.\n> \n> Link: https://sashiko.dev/#/patchset/20260331165015.2777765-1-longman%40redhat.com\n> Link: https://sashiko.dev/#/patchset/20260420171308.87192-1-ja%40ssi.bg\n> Fixes: f0be83d54217 (\"ipvs: add est_cpulist and est_nice sysctl vars\")\n> Signed-off-by: Julian Anastasov <ja@ssi.bg>\n\n\tAccording to Sashiko, this patch needs more\nwork. There is a way to change how kthread 0 is stopped.\nI'll send new patch version soon...\n\npw-bot: changes-requested\n\n> ---\n>  net/netfilter/ipvs/ip_vs_ctl.c | 11 +++++--\n>  net/netfilter/ipvs/ip_vs_est.c | 59 ++++++++++++++++++++++------------\n>  2 files changed, 47 insertions(+), 23 deletions(-)\n> \n> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c\n> index caec516856e9..fda166aca4fb 100644\n> --- a/net/netfilter/ipvs/ip_vs_ctl.c\n> +++ b/net/netfilter/ipvs/ip_vs_ctl.c\n> @@ -250,7 +250,7 @@ static void est_reload_work_handler(struct work_struct *work)\n>  \tint genid;\n>  \tint id;\n>  \n> -\tmutex_lock(&ipvs->est_mutex);\n> +\tmutex_lock(&ipvs->service_mutex);\n>  \tgenid = atomic_read(&ipvs->est_genid);\n>  \tfor (id = 0; id < ipvs->est_kt_count; id++) {\n>  \t\tstruct ip_vs_est_kt_data *kd = ipvs->est_kt_arr[id];\n> @@ -263,12 +263,14 @@ static void est_reload_work_handler(struct work_struct *work)\n>  \t\t/* New config ? Stop kthread tasks */\n>  \t\tif (genid != genid_done)\n>  \t\t\tip_vs_est_kthread_stop(kd);\n> +\t\tmutex_lock(&ipvs->est_mutex);\n>  \t\tif (!kd->task && !ip_vs_est_stopped(ipvs)) {\n>  \t\t\t/* Do not start kthreads above 0 in calc phase */\n>  \t\t\tif ((!id || !ipvs->est_calc_phase) &&\n>  \t\t\t    ip_vs_est_kthread_start(ipvs, kd) < 0)\n>  \t\t\t\trepeat = true;\n>  \t\t}\n> +\t\tmutex_unlock(&ipvs->est_mutex);\n>  \t}\n>  \n>  \tatomic_set(&ipvs->est_genid_done, genid);\n> @@ -278,7 +280,7 @@ static void est_reload_work_handler(struct work_struct *work)\n>  \t\t\t\t   delay);\n>  \n>  unlock:\n> -\tmutex_unlock(&ipvs->est_mutex);\n> +\tmutex_unlock(&ipvs->service_mutex);\n>  }\n>  \n>  static int get_conn_tab_size(struct netns_ipvs *ipvs)\n> @@ -1812,11 +1814,16 @@ ip_vs_add_service(struct netns_ipvs *ipvs, struct ip_vs_service_user_kern *u,\n>  \t*svc_p = svc;\n>  \n>  \tif (!READ_ONCE(ipvs->enable)) {\n> +\t\tmutex_lock(&ipvs->est_mutex);\n> +\n>  \t\t/* Now there is a service - full throttle */\n>  \t\tWRITE_ONCE(ipvs->enable, 1);\n>  \n> +\t\tipvs->est_max_threads = ip_vs_est_max_threads(ipvs);\n> +\n>  \t\t/* Start estimation for first time */\n>  \t\tip_vs_est_reload_start(ipvs);\n> +\t\tmutex_unlock(&ipvs->est_mutex);\n>  \t}\n>  \n>  \treturn 0;\n> diff --git a/net/netfilter/ipvs/ip_vs_est.c b/net/netfilter/ipvs/ip_vs_est.c\n> index 433ba3cab58c..216e3c354125 100644\n> --- a/net/netfilter/ipvs/ip_vs_est.c\n> +++ b/net/netfilter/ipvs/ip_vs_est.c\n> @@ -68,6 +68,10 @@\n>      and the limit of estimators per kthread\n>    - est_add_ktid: ktid where to add new ests, can point to empty slot where\n>      we should add kt data\n> +  - data protected by service_mutex: est_temp_list, est_add_ktid,\n> +    est_kt_count, est_kt_arr, est_genid_done, kd->task\n> +  - data protected by est_mutex: est_genid, est_max_threads, sysctl_est_cpulist,\n> +    est_cpulist_valid, sysctl_est_nice, est_stopped, sysctl_run_estimation\n>   */\n>  \n>  static struct lock_class_key __ipvs_est_key;\n> @@ -229,6 +233,8 @@ static int ip_vs_estimation_kthread(void *data)\n>  /* Schedule stop/start for kthread tasks */\n>  void ip_vs_est_reload_start(struct netns_ipvs *ipvs)\n>  {\n> +\tlockdep_assert_held(&ipvs->est_mutex);\n> +\n>  \t/* Ignore reloads before first service is added */\n>  \tif (!READ_ONCE(ipvs->enable))\n>  \t\treturn;\n> @@ -304,12 +310,17 @@ static int ip_vs_est_add_kthread(struct netns_ipvs *ipvs)\n>  \tvoid *arr = NULL;\n>  \tint i;\n>  \n> -\tif ((unsigned long)ipvs->est_kt_count >= ipvs->est_max_threads &&\n> -\t    READ_ONCE(ipvs->enable) && ipvs->est_max_threads)\n> -\t\treturn -EINVAL;\n> -\n>  \tmutex_lock(&ipvs->est_mutex);\n>  \n> +\t/* Allow kt 0 data to be created before the services are added\n> +\t * and limit the kthreads when services are present.\n> +\t */\n> +\tif ((unsigned long)ipvs->est_kt_count >= ipvs->est_max_threads &&\n> +\t    READ_ONCE(ipvs->enable) && ipvs->est_max_threads) {\n> +\t\tret = -EINVAL;\n> +\t\tgoto out;\n> +\t}\n> +\n>  \tfor (i = 0; i < id; i++) {\n>  \t\tif (!ipvs->est_kt_arr[i])\n>  \t\t\tbreak;\n> @@ -485,9 +496,6 @@ int ip_vs_start_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats)\n>  \tstruct ip_vs_estimator *est = &stats->est;\n>  \tint ret;\n>  \n> -\tif (!ipvs->est_max_threads && READ_ONCE(ipvs->enable))\n> -\t\tipvs->est_max_threads = ip_vs_est_max_threads(ipvs);\n> -\n>  \test->ktid = -1;\n>  \test->ktrow = IPVS_EST_NTICKS - 1;\t/* Initial delay */\n>  \n> @@ -561,7 +569,6 @@ void ip_vs_stop_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats)\n>  \t}\n>  \n>  \tif (ktid > 0) {\n> -\t\tmutex_lock(&ipvs->est_mutex);\n>  \t\tip_vs_est_kthread_destroy(kd);\n>  \t\tipvs->est_kt_arr[ktid] = NULL;\n>  \t\tif (ktid == ipvs->est_kt_count - 1) {\n> @@ -570,7 +577,6 @@ void ip_vs_stop_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats)\n>  \t\t\t       !ipvs->est_kt_arr[ipvs->est_kt_count - 1])\n>  \t\t\t\tipvs->est_kt_count--;\n>  \t\t}\n> -\t\tmutex_unlock(&ipvs->est_mutex);\n>  \n>  \t\t/* This slot is now empty, prefer another available kt slot */\n>  \t\tif (ktid == ipvs->est_add_ktid)\n> @@ -582,13 +588,11 @@ void ip_vs_stop_estimator(struct netns_ipvs *ipvs, struct ip_vs_stats *stats)\n>  \tif (ipvs->est_kt_count == 1 && hlist_empty(&ipvs->est_temp_list)) {\n>  \t\tkd = ipvs->est_kt_arr[0];\n>  \t\tif (!kd || !kd->est_count) {\n> -\t\t\tmutex_lock(&ipvs->est_mutex);\n>  \t\t\tif (kd) {\n>  \t\t\t\tip_vs_est_kthread_destroy(kd);\n>  \t\t\t\tipvs->est_kt_arr[0] = NULL;\n>  \t\t\t}\n>  \t\t\tipvs->est_kt_count--;\n> -\t\t\tmutex_unlock(&ipvs->est_mutex);\n>  \t\t\tipvs->est_add_ktid = 0;\n>  \t\t}\n>  \t}\n> @@ -602,13 +606,17 @@ static void ip_vs_est_drain_temp_list(struct netns_ipvs *ipvs)\n>  \twhile (1) {\n>  \t\tint max = 16;\n>  \n> -\t\tmutex_lock(&ipvs->service_mutex);\n> +\t\twhile (!mutex_trylock(&ipvs->service_mutex)) {\n> +\t\t\tif (kthread_should_stop() || !READ_ONCE(ipvs->enable))\n> +\t\t\t\treturn;\n> +\t\t\tcond_resched();\n> +\t\t}\n>  \n>  \t\twhile (max-- > 0) {\n>  \t\t\test = hlist_entry_safe(ipvs->est_temp_list.first,\n>  \t\t\t\t\t       struct ip_vs_estimator, list);\n>  \t\t\tif (est) {\n> -\t\t\t\tif (kthread_should_stop())\n> +\t\t\t\tif (!READ_ONCE(ipvs->enable))\n>  \t\t\t\t\tgoto unlock;\n>  \t\t\t\thlist_del_init(&est->list);\n>  \t\t\t\tif (ip_vs_enqueue_estimator(ipvs, est) >= 0)\n> @@ -647,7 +655,11 @@ static int ip_vs_est_calc_limits(struct netns_ipvs *ipvs, int *chain_max)\n>  \tu64 val;\n>  \n>  \tINIT_HLIST_HEAD(&chain);\n> -\tmutex_lock(&ipvs->service_mutex);\n> +\twhile (!mutex_trylock(&ipvs->service_mutex)) {\n> +\t\tif (kthread_should_stop() || !READ_ONCE(ipvs->enable))\n> +\t\t\treturn 0;\n> +\t\tcond_resched();\n> +\t}\n>  \tkd = ipvs->est_kt_arr[0];\n>  \tmutex_unlock(&ipvs->service_mutex);\n>  \ts = kd ? kd->calc_stats : NULL;\n> @@ -748,22 +760,24 @@ static void ip_vs_est_calc_phase(struct netns_ipvs *ipvs)\n>  \tif (!ip_vs_est_calc_limits(ipvs, &chain_max))\n>  \t\treturn;\n>  \n> -\tmutex_lock(&ipvs->service_mutex);\n> +\twhile (!mutex_trylock(&ipvs->service_mutex)) {\n> +\t\tif (kthread_should_stop() || !READ_ONCE(ipvs->enable))\n> +\t\t\treturn;\n> +\t\tcond_resched();\n> +\t}\n>  \n>  \t/* Stop all other tasks, so that we can immediately move the\n>  \t * estimators to est_temp_list without RCU grace period\n>  \t */\n> -\tmutex_lock(&ipvs->est_mutex);\n>  \tfor (id = 1; id < ipvs->est_kt_count; id++) {\n>  \t\t/* netns clean up started, abort */\n> -\t\tif (!READ_ONCE(ipvs->enable))\n> -\t\t\tgoto unlock2;\n> +\t\tif (kthread_should_stop() || !READ_ONCE(ipvs->enable))\n> +\t\t\tgoto unlock;\n>  \t\tkd = ipvs->est_kt_arr[id];\n>  \t\tif (!kd)\n>  \t\t\tcontinue;\n>  \t\tip_vs_est_kthread_stop(kd);\n>  \t}\n> -\tmutex_unlock(&ipvs->est_mutex);\n>  \n>  \t/* Move all estimators to est_temp_list but carefully,\n>  \t * all estimators and kthread data can be released while\n> @@ -817,7 +831,11 @@ static void ip_vs_est_calc_phase(struct netns_ipvs *ipvs)\n>  \t\t */\n>  \t\tmutex_unlock(&ipvs->service_mutex);\n>  \t\tcond_resched();\n> -\t\tmutex_lock(&ipvs->service_mutex);\n> +\t\twhile (!mutex_trylock(&ipvs->service_mutex)) {\n> +\t\t\tif (kthread_should_stop() || !READ_ONCE(ipvs->enable))\n> +\t\t\t\treturn;\n> +\t\t\tcond_resched();\n> +\t\t}\n>  \n>  \t\t/* Current kt released ? */\n>  \t\tif (id >= ipvs->est_kt_count)\n> @@ -889,7 +907,6 @@ static void ip_vs_est_calc_phase(struct netns_ipvs *ipvs)\n>  \tif (genid == atomic_read(&ipvs->est_genid))\n>  \t\tipvs->est_calc_phase = 0;\n>  \n> -unlock2:\n>  \tmutex_unlock(&ipvs->est_mutex);\n>  \n>  unlock:\n> -- \n> 2.53.0\n\nRegards\n\n--\nJulian Anastasov <ja@ssi.bg>","headers":{"Return-Path":"\n <netfilter-devel+bounces-12151-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","netfilter-devel@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (4096-bit key;\n unprotected) header.d=ssi.bg header.i=@ssi.bg header.a=rsa-sha256\n header.s=ssi header.b=j57NngAO;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c0a:e001:db::12fc:5321; helo=sea.lore.kernel.org;\n envelope-from=netfilter-devel+bounces-12151-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (4096-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=\"j57NngAO\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=193.238.174.39","smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=ssi.bg","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=ssi.bg"],"Received":["from sea.lore.kernel.org (sea.lore.kernel.org\n [IPv6:2600:3c0a:e001:db::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g1X3t3Z5cz1yCv\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 20:13:30 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sea.lore.kernel.org (Postfix) with ESMTP id C31F830B302A\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 10:05:15 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id BFACC378830;\n\tThu, 23 Apr 2026 10:05:14 +0000 (UTC)","from mx.ssi.bg (mx.ssi.bg [193.238.174.39])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 5F3E9303A04;\n\tThu, 23 Apr 2026 10:05:08 +0000 (UTC)","from mx.ssi.bg (localhost [127.0.0.1])\n\tby mx.ssi.bg (Potsfix) with ESMTP id 2A6C22126C;\n\tThu, 23 Apr 2026 13:05:06 +0300 (EEST)","from box.ssi.bg (box.ssi.bg [193.238.174.46])\n\tby mx.ssi.bg (Potsfix) with ESMTPS;\n\tThu, 23 Apr 2026 13:05:02 +0300 (EEST)","from ja.ssi.bg (unknown [213.16.62.126])\n\tby box.ssi.bg (Potsfix) with ESMTPSA id BE2E2602F6;\n\tThu, 23 Apr 2026 13:05:01 +0300 (EEST)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby ja.ssi.bg (8.18.1/8.18.1) with ESMTP id 63NA50vS027349;\n\tThu, 23 Apr 2026 13:05:00 +0300"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776938713; cv=none;\n b=ZpYC5ewJmDHkoHplgGHQW31kYoOmYlaZ7YqcKiUrWIgduO16YnJVgzx9NyKyI9aK3yktxvB/fdIW7HT+bmSlc99+P86oYBg+pncxSnbLtQEK0i+KRdSlhTcR967rWLE84wmy7qc9njoPksTL+5nRUZYbO46vpt7pkp0gIVi1B5o=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776938713; c=relaxed/simple;\n\tbh=bzfP9xLRFktv7DtrhDWe3rtsAjdW8yP4UPUER+NxYD0=;\n\th=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:\n\t MIME-Version:Content-Type;\n b=glTbqinM9rV5paR61H5tFwHOmabA0Uf58/XWWHz40jJiCGpWeJpscD9kI9DFVfdZxWlGofxtKzhWT7kWPo0bBPSCrULhofPr8AEYDHEHh7hM1RwQfhAVAxGpARTA3CS9vu2CYRM0YSh+7gu4ZrCcJfYJ1QKBMlZ200BWs5bkshQ=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=ssi.bg;\n spf=pass smtp.mailfrom=ssi.bg;\n dkim=pass (4096-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=j57NngAO;\n arc=none smtp.client-ip=193.238.174.39","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ssi.bg; h=cc:cc\n\t:content-type:content-type:date:from:from:in-reply-to:message-id\n\t:mime-version:references:reply-to:subject:subject:to:to; s=ssi;\n\t bh=V8DmRTw1CVPVyyIUBx9UuRriYstQGtNs16CM3+fVvlo=; b=j57NngAOqG8H\n\t25JlT1qkxcnp93y1orSIn0UcT8eZmJrFrrHmcpoX6kUAD/g5fBZzPs9ZJtRpZqm4\n\tcNCkbYUDwRrRgw5ZtaH0DgZ7uKSChw9hN3/DVhJQmU56RDxwVGjAUmSQVoRLQBZO\n\tFclX1b9JZneNh2Zd2/sRv9hFe3jRF0mRGDfxu0rx0bc3wDLt/GuDfkfLFIuS7c8a\n\tuDlnWkvUiEuCI4DhDBxUAWrzgYzIVweEnR/60n3aaRVMl3p0NaH092tUxVvJr5RX\n\tNVsybY9j08hBW1DKccRPPD4ZEP0S3XcBdYH2JQ5KDpc2MycnD2G6ozVo5X36dYCy\n\t5LjVE1Jwon5k/Xneumrkcv5ht6O+05+yM7n0+dxUITIYi4coGIASJzRdYrBBWvGt\n\tFAy4MzAkWSIjOWewlFH7zw2+qPyK+jp94yXnuO7h1LiE0q64ivb09ozUaSTbcBdM\n\trFkM/wDD5AuimssqA2ga/NfOde4vUQ6voyf4c/JB7LbNKnRQJk6GEvasrF2Vhp8l\n\tF9NY1ntHHWFryPgrctspHzzE5qaz8p3nOx6VIjii3M8pc+NIuDxQ5fy5Pc/biinD\n\tZn8UYnR7loSRNdrFp6PxNb5ZDHoK+TB9On+2kOK95PYr8CgANjZX+CLMqzZDr37V\n\tkSAy/NKu2PdLdVipBeAKPAzI4rTQF4g=","Date":"Thu, 23 Apr 2026 13:05:00 +0300 (EEST)","From":"Julian Anastasov <ja@ssi.bg>","To":"Simon Horman <horms@verge.net.au>","cc":"Pablo Neira Ayuso <pablo@netfilter.org>, Florian Westphal <fw@strlen.de>,\n        lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org","Subject":"Re: [PATCHv2 net] ipvs: fix races around est_mutex and est_cpulist","In-Reply-To":"<20260422125123.40658-1-ja@ssi.bg>","Message-ID":"<ebfb924e-49c0-787f-b5f2-c201199e1345@ssi.bg>","References":"<20260422125123.40658-1-ja@ssi.bg>","Precedence":"bulk","X-Mailing-List":"netfilter-devel@vger.kernel.org","List-Id":"<netfilter-devel.vger.kernel.org>","List-Subscribe":"<mailto:netfilter-devel+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:netfilter-devel+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII"}}]