[{"id":3681362,"web_url":"http://patchwork.ozlabs.org/comment/3681362/","msgid":"<de8fd075-4aa7-5155-83db-624e2e2a36f4@ssi.bg>","list_archive_url":null,"date":"2026-04-23T10:02:57","subject":"Re: [PATCHv3 net 3/3] ipvs: fix the spin_lock usage for RT build","submitter":{"id":2825,"url":"http://patchwork.ozlabs.org/api/people/2825/","name":"Julian Anastasov","email":"ja@ssi.bg"},"content":"Hello,\n\nOn Wed, 22 Apr 2026, Julian Anastasov wrote:\n\n> syzbot reports for sleeping function called from invalid context [1].\n> The recently added code for resizable hash tables uses\n> hlist_bl bit locks in combination with spin_lock for\n> the connection fields (cp->lock).\n> \n> Fix the following problems:\n> \n> * avoid using spin_lock(&cp->lock) under locked bit lock\n> because it sleeps on PREEMPT_RT\n> \n> * as the recent changes call ip_vs_conn_hash() only for newly\n> allocated connection, the spin_lock can be removed there because\n> the connection is still not linked to table and does not need\n> cp->lock protection.\n> \n> * the lock can be removed also from ip_vs_conn_unlink() where we\n> are the last connection user.\n> \n> * the last place that is fixed is ip_vs_conn_fill_cport()\n> where now the cp->lock is locked before the other locks to\n> ensure other packets do not modify the cp->flags in non-atomic\n> way. Here we make sure cport and flags are changed only once\n> if two or more packets race to fill the cport. Also, we fill\n> cport early, so that if we race with resizing there will be\n> valid cport key for the hashing. Problems reported by Sashiko.\n> \n> [1]:\n> BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48\n> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 16, name: ktimers/0\n> preempt_count: 2, expected: 0\n> RCU nest depth: 3, expected: 3\n> 8 locks held by ktimers/0/16:\n>  #0: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163\n>  #1: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163\n>  #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]\n>  #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: timer_base_lock_expiry kernel/time/timer.c:1502 [inline]\n>  #2: ffff8880b8826360 (&base->expiry_lock){+...}-{3:3}, at: __run_timer_base+0x120/0x9f0 kernel/time/timer.c:2384\n>  #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]\n>  #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]\n>  #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: __rt_spin_lock kernel/locking/spinlock_rt.c:50 [inline]\n>  #3: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rt_spin_lock+0x1e0/0x400 kernel/locking/spinlock_rt.c:57\n>  #4: ffffc90000157a80 ((&cp->timer)){+...}-{0:0}, at: call_timer_fn+0xd4/0x5e0 kernel/time/timer.c:1745\n>  #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire include/linux/rcupdate.h:300 [inline]\n>  #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: rcu_read_lock include/linux/rcupdate.h:838 [inline]\n>  #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:315 [inline]\n>  #5: ffffffff8dfc80c0 (rcu_read_lock){....}-{1:3}, at: ip_vs_conn_expire+0x257/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260\n>  #6: ffffffff8de5f260 (local_bh){.+.+}-{1:3}, at: __local_bh_disable_ip+0x3c/0x420 kernel/softirq.c:163\n>  #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: spin_lock include/linux/spinlock_rt.h:45 [inline]\n>  #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]\n>  #7: ffff888068d4c3f0 (&cp->lock#2){+...}-{3:3}, at: ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260\n> Preemption disabled at:\n> [<ffffffff898a6358>] bit_spin_lock include/linux/bit_spinlock.h:38 [inline]\n> [<ffffffff898a6358>] hlist_bl_lock+0x18/0x110 include/linux/list_bl.h:149\n> CPU: 0 UID: 0 PID: 16 Comm: ktimers/0 Tainted: G        W    L      syzkaller #0 PREEMPT_{RT,(full)}\n> Tainted: [W]=WARN, [L]=SOFTLOCKUP\n> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/18/2026\n> Call Trace:\n>  <TASK>\n>  dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120\n>  __might_resched+0x329/0x480 kernel/sched/core.c:9162\n>  __rt_spin_lock kernel/locking/spinlock_rt.c:48 [inline]\n>  rt_spin_lock+0xc2/0x400 kernel/locking/spinlock_rt.c:57\n>  spin_lock include/linux/spinlock_rt.h:45 [inline]\n>  ip_vs_conn_unlink net/netfilter/ipvs/ip_vs_conn.c:324 [inline]\n>  ip_vs_conn_expire+0xd4a/0x2390 net/netfilter/ipvs/ip_vs_conn.c:1260\n>  call_timer_fn+0x192/0x5e0 kernel/time/timer.c:1748\n>  expire_timers kernel/time/timer.c:1799 [inline]\n>  __run_timers kernel/time/timer.c:2374 [inline]\n>  __run_timer_base+0x6a3/0x9f0 kernel/time/timer.c:2386\n>  run_timer_base kernel/time/timer.c:2395 [inline]\n>  run_timer_softirq+0xb7/0x170 kernel/time/timer.c:2405\n>  handle_softirqs+0x1de/0x6d0 kernel/softirq.c:622\n>  __do_softirq kernel/softirq.c:656 [inline]\n>  run_ktimerd+0x69/0x100 kernel/softirq.c:1151\n>  smpboot_thread_fn+0x541/0xa50 kernel/smpboot.c:160\n>  kthread+0x388/0x470 kernel/kthread.c:436\n>  ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158\n>  ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245\n>  </TASK>\n> \n> Reported-by: syzbot+504e778ddaecd36fdd17@syzkaller.appspotmail.com\n> Link: https://sashiko.dev/#/patchset/20260415200216.79699-1-ja%40ssi.bg\n> Link: https://sashiko.dev/#/patchset/20260420165539.85174-4-ja%40ssi.bg\n> Fixes: 2fa7cc9c7025 (\"ipvs: switch to per-net connection table\")\n> Signed-off-by: Julian Anastasov <ja@ssi.bg>\n\n\tAccording to Sashiko, this patch needs more\nwork, I'll send new patchset version soon...\n\npw-bot: changes-requested\n\n> ---\n>  net/netfilter/ipvs/ip_vs_conn.c | 64 ++++++++++++++++-----------------\n>  1 file changed, 32 insertions(+), 32 deletions(-)\n> \n> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c\n> index 84a4921a7865..5c7951a60462 100644\n> --- a/net/netfilter/ipvs/ip_vs_conn.c\n> +++ b/net/netfilter/ipvs/ip_vs_conn.c\n> @@ -267,27 +267,20 @@ static inline int ip_vs_conn_hash(struct ip_vs_conn *cp)\n>  \t\thash_key2 = hash_key;\n>  \t\tuse2 = false;\n>  \t}\n> +\n>  \tconn_tab_lock(t, cp, hash_key, hash_key2, use2, true /* new_hash */,\n>  \t\t      &head, &head2);\n> -\tspin_lock(&cp->lock);\n> -\n> -\tif (!(cp->flags & IP_VS_CONN_F_HASHED)) {\n> -\t\tcp->flags |= IP_VS_CONN_F_HASHED;\n> -\t\tWRITE_ONCE(cp->hn0.hash_key, hash_key);\n> -\t\tWRITE_ONCE(cp->hn1.hash_key, hash_key2);\n> -\t\trefcount_inc(&cp->refcnt);\n> -\t\thlist_bl_add_head_rcu(&cp->hn0.node, head);\n> -\t\tif (use2)\n> -\t\t\thlist_bl_add_head_rcu(&cp->hn1.node, head2);\n> -\t\tret = 1;\n> -\t} else {\n> -\t\tpr_err(\"%s(): request for already hashed, called from %pS\\n\",\n> -\t\t       __func__, __builtin_return_address(0));\n> -\t\tret = 0;\n> -\t}\n>  \n> -\tspin_unlock(&cp->lock);\n> +\tcp->flags |= IP_VS_CONN_F_HASHED;\n> +\tWRITE_ONCE(cp->hn0.hash_key, hash_key);\n> +\tWRITE_ONCE(cp->hn1.hash_key, hash_key2);\n> +\trefcount_inc(&cp->refcnt);\n> +\thlist_bl_add_head_rcu(&cp->hn0.node, head);\n> +\tif (use2)\n> +\t\thlist_bl_add_head_rcu(&cp->hn1.node, head2);\n> +\n>  \tconn_tab_unlock(head, head2);\n> +\tret = 1;\n>  \n>  \t/* Schedule resizing if load increases */\n>  \tif (atomic_read(&ipvs->conn_count) > t->u_thresh &&\n> @@ -321,7 +314,6 @@ static inline bool ip_vs_conn_unlink(struct ip_vs_conn *cp)\n>  \n>  \tconn_tab_lock(t, cp, hash_key, hash_key2, use2, false /* new_hash */,\n>  \t\t      &head, &head2);\n> -\tspin_lock(&cp->lock);\n>  \n>  \tif (cp->flags & IP_VS_CONN_F_HASHED) {\n>  \t\t/* Decrease refcnt and unlink conn only if we are last user */\n> @@ -334,7 +326,6 @@ static inline bool ip_vs_conn_unlink(struct ip_vs_conn *cp)\n>  \t\t}\n>  \t}\n>  \n> -\tspin_unlock(&cp->lock);\n>  \tconn_tab_unlock(head, head2);\n>  \n>  \trcu_read_unlock();\n> @@ -637,6 +628,7 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)\n>  \tstruct ip_vs_conn_hnode *hn;\n>  \tu32 hash_key, hash_key_new;\n>  \tstruct ip_vs_conn_param p;\n> +\tbool by_me = false;\n>  \tint ntbl;\n>  \tint dir;\n>  \n> @@ -683,10 +675,13 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)\n>  \tif (head > head2 && t == t2)\n>  \t\tswap(head, head2);\n>  \n> +\t/* Protect the cp->flags modification */\n> +\tspin_lock_bh(&cp->lock);\n> +\n>  \t/* Lock seqcount only for the old bucket, even if we are on new table\n>  \t * because it affects the del operation, not the adding.\n>  \t */\n> -\tspin_lock_bh(&t->lock[hash_key & t->lock_mask].l);\n> +\tspin_lock(&t->lock[hash_key & t->lock_mask].l);\n>  \tpreempt_disable_nested();\n>  \twrite_seqcount_begin(&t->seqc[hash_key & t->seqc_mask]);\n>  \n> @@ -704,14 +699,23 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)\n>  \t\thlist_bl_unlock(head);\n>  \t\twrite_seqcount_end(&t->seqc[hash_key & t->seqc_mask]);\n>  \t\tpreempt_enable_nested();\n> -\t\tspin_unlock_bh(&t->lock[hash_key & t->lock_mask].l);\n> +\t\tspin_unlock(&t->lock[hash_key & t->lock_mask].l);\n> +\t\tspin_unlock_bh(&cp->lock);\n>  \t\thash_key = hash_key_new;\n>  \t\tgoto retry;\n>  \t}\n>  \n> -\tspin_lock(&cp->lock);\n> -\tif ((cp->flags & IP_VS_CONN_F_NO_CPORT) &&\n> -\t    (cp->flags & IP_VS_CONN_F_HASHED)) {\n> +\t/* Fill cport once, even if multiple packets try to do it */\n> +\tif (cp->flags & IP_VS_CONN_F_NO_CPORT && (!cp->cport || by_me)) {\n> +\t\t/* If we race with resizing make sure cport is set for dir 1 */\n> +\t\tif (!cp->cport) {\n> +\t\t\tcp->cport = cport;\n> +\t\t\tby_me = true;\n> +\t\t}\n> +\t\tif (!dir) {\n> +\t\t\tatomic_dec(&ipvs->no_cport_conns[af_id]);\n> +\t\t\tcp->flags &= ~IP_VS_CONN_F_NO_CPORT;\n> +\t\t}\n>  \t\t/* We do not recalc hash_key_r under lock, we assume the\n>  \t\t * parameters in cp do not change, i.e. cport is\n>  \t\t * the only possible change.\n> @@ -726,21 +730,17 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)\n>  \t\t\thlist_bl_del_rcu(&hn->node);\n>  \t\t\thlist_bl_add_head_rcu(&hn->node, head_new);\n>  \t\t}\n> -\t\tif (!dir) {\n> -\t\t\tatomic_dec(&ipvs->no_cport_conns[af_id]);\n> -\t\t\tcp->flags &= ~IP_VS_CONN_F_NO_CPORT;\n> -\t\t\tcp->cport = cport;\n> -\t\t}\n>  \t}\n> -\tspin_unlock(&cp->lock);\n>  \n>  \tif (head != head2)\n>  \t\thlist_bl_unlock(head2);\n>  \thlist_bl_unlock(head);\n>  \twrite_seqcount_end(&t->seqc[hash_key & t->seqc_mask]);\n>  \tpreempt_enable_nested();\n> -\tspin_unlock_bh(&t->lock[hash_key & t->lock_mask].l);\n> -\tif (dir--)\n> +\tspin_unlock(&t->lock[hash_key & t->lock_mask].l);\n> +\n> +\tspin_unlock_bh(&cp->lock);\n> +\tif (dir-- && by_me)\n>  \t\tgoto next_dir;\n>  }\n>  \n> -- \n> 2.53.0\n> \n> \n> \n> \n\nRegards\n\n--\nJulian Anastasov <ja@ssi.bg>","headers":{"Return-Path":"\n <netfilter-devel+bounces-12152-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","netfilter-devel@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (4096-bit key;\n unprotected) header.d=ssi.bg header.i=@ssi.bg header.a=rsa-sha256\n header.s=ssi header.b=mhqFGvpw;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=2600:3c04:e001:36c::12fc:5321; helo=tor.lore.kernel.org;\n envelope-from=netfilter-devel+bounces-12152-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (4096-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=\"mhqFGvpw\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=193.238.174.39","smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=ssi.bg","smtp.subspace.kernel.org;\n spf=pass smtp.mailfrom=ssi.bg"],"Received":["from tor.lore.kernel.org (tor.lore.kernel.org\n [IPv6:2600:3c04:e001:36c::12fc:5321])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4g1X7B16xdz1y2d\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 20:16:22 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby tor.lore.kernel.org (Postfix) with ESMTP id DEA8330364B4\n\tfor <incoming@patchwork.ozlabs.org>; Thu, 23 Apr 2026 10:10:47 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 5E042379EE8;\n\tThu, 23 Apr 2026 10:10:47 +0000 (UTC)","from mx.ssi.bg (mx.ssi.bg [193.238.174.39])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id 86725328B7A;\n\tThu, 23 Apr 2026 10:10:43 +0000 (UTC)","from mx.ssi.bg (localhost [127.0.0.1])\n\tby mx.ssi.bg (Potsfix) with ESMTP id AE24821267;\n\tThu, 23 Apr 2026 13:03:08 +0300 (EEST)","from box.ssi.bg (box.ssi.bg [193.238.174.46])\n\tby mx.ssi.bg (Potsfix) with ESMTPS;\n\tThu, 23 Apr 2026 13:03:03 +0300 (EEST)","from ja.ssi.bg (unknown [213.16.62.126])\n\tby box.ssi.bg (Potsfix) with ESMTPSA id E04586298D;\n\tThu, 23 Apr 2026 13:03:02 +0300 (EEST)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby ja.ssi.bg (8.18.1/8.18.1) with ESMTP id 63NA2vXN027309;\n\tThu, 23 Apr 2026 13:03:01 +0300"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1776939046; cv=none;\n b=nzodwCeEmZeil4NZ6U0G2X0LS03+6EySt4quwjBupxTWuGRrYQrOiK5KEATsTaJN/XHl4cYsMy4FXnU7Uy1wSj2CshwW4dngYgRH/SoolPHHDE71CMzDf1vZBfVk3V48+9NRRA+1nB300Usa3HEBg2F9bxxf1pZnuvLlC/qlWw8=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1776939046; c=relaxed/simple;\n\tbh=seH4ZENbPfoXrT+CaxAwPCEEnfaPdoXDyWIiHqy+SS8=;\n\th=Date:From:To:cc:Subject:In-Reply-To:Message-ID:References:\n\t MIME-Version:Content-Type;\n b=TwHotZhfYcEYWUVAObISPs/2ozNq3L41MIar1K55xtn/TqqM7wzdoxKOcOa56VMvJUHuuPgPdryXs1n65Cc7XrmVvoJqlSD4owMDfh/enwmC0pMAhbCE2xOvAmBL+tBd//AEMA2zo4ISZq+fL40znmVl+ud7mkcqswRMP8Wy7m0=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dmarc=pass (p=reject dis=none) header.from=ssi.bg;\n spf=pass smtp.mailfrom=ssi.bg;\n dkim=pass (4096-bit key) header.d=ssi.bg header.i=@ssi.bg header.b=mhqFGvpw;\n arc=none smtp.client-ip=193.238.174.39","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed; d=ssi.bg; h=cc:cc\n\t:content-type:content-type:date:from:from:in-reply-to:message-id\n\t:mime-version:references:reply-to:subject:subject:to:to; s=ssi;\n\t bh=16QDNobm/qlBA4jR1h+YOsqJr8tmJMGVmzgltUmmsEo=; b=mhqFGvpwEStI\n\t0S4FLIcyhBjXBhliaOtOayJ4cmpxY80CWIsGmHmLHZDOFrO551hymJ8iOgQ8fwWq\n\t7kMeOzX7npPI8CzXJrXcHV2PtGpDf9v8KrxyLOVhca5WoOIwjHIplZwulIynNIxJ\n\tRGr3PiMGUjoW1Xj4PBx6xWRj6kZTHJ0IW65556KfYbUVs+B61otrf5Y620tHq5wV\n\t1vunicQQaQuDjuPvxKj38VlXQhjSCbkSYoBlit+pesQaEgiGNyGPEUUBLt+0Vk/d\n\t/2BrTfhv3loO9BoE1aI6cAECbrOb3il4gHWvfBJpH3xO+qG/w0onV69WosubMP4c\n\tzTQi6EuaokfpPtUTptWKUgqF7gMcH6MkPcQoZl8YB+MicBbjtbu/2mmnxpw2LBTk\n\tuwzIrSv/pulYodBYrRS9Xg7IJ49e7Tn5Z8YgFb0r0NHNymBXfLYBJU5pFJ+UvDpO\n\tCkrJpv2gYDQQ7bUo9yguO0AjrpQY3G7t53ISNkBX53/WdvehdkqQ7D5NRbfXraGU\n\tMgKo1HWqQYWEfhc/K2/Mu2Q4VyxNAF2HsAozwPhUDfeo7ekdsfikfj564dXx8DRA\n\tUxYKSUsHk0PhqPL06e62853NOvtyRVMph30WvsegQ0D2GUi/ijWLOxd9ulUlMX3S\n\trmXfmbGSbBiW8LbQV0FNfWkg8PQ7gRU=","Date":"Thu, 23 Apr 2026 13:02:57 +0300 (EEST)","From":"Julian Anastasov <ja@ssi.bg>","To":"Simon Horman <horms@verge.net.au>","cc":"Pablo Neira Ayuso <pablo@netfilter.org>, Florian Westphal <fw@strlen.de>,\n        lvs-devel@vger.kernel.org, netfilter-devel@vger.kernel.org","Subject":"Re: [PATCHv3 net 3/3] ipvs: fix the spin_lock usage for RT build","In-Reply-To":"<20260422135823.50489-4-ja@ssi.bg>","Message-ID":"<de8fd075-4aa7-5155-83db-624e2e2a36f4@ssi.bg>","References":"<20260422135823.50489-1-ja@ssi.bg>\n <20260422135823.50489-4-ja@ssi.bg>","Precedence":"bulk","X-Mailing-List":"netfilter-devel@vger.kernel.org","List-Id":"<netfilter-devel.vger.kernel.org>","List-Subscribe":"<mailto:netfilter-devel+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:netfilter-devel+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=US-ASCII"}}]