{"id":2220896,"url":"http://patchwork.ozlabs.org/api/1.1/patches/2220896/?format=json","web_url":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/20260408114951.995031895@kernel.org/","project":{"id":26,"url":"http://patchwork.ozlabs.org/api/1.1/projects/26/?format=json","name":"Netfilter Development","link_name":"netfilter-devel","list_id":"netfilter-devel.vger.kernel.org","list_email":"netfilter-devel@vger.kernel.org","web_url":null,"scm_url":null,"webscm_url":null},"msgid":"<20260408114951.995031895@kernel.org>","date":"2026-04-08T11:53:46","name":"[V2,01/11] hrtimer: Provide hrtimer_start_range_ns_user()","commit_ref":null,"pull_url":null,"state":"handled-elsewhere","archived":true,"hash":"2e6165800bf691a4f3eb9b7551c777a6507ab012","submitter":{"id":92397,"url":"http://patchwork.ozlabs.org/api/1.1/people/92397/?format=json","name":"Thomas Gleixner","email":"tglx@kernel.org"},"delegate":null,"mbox":"http://patchwork.ozlabs.org/project/netfilter-devel/patch/20260408114951.995031895@kernel.org/mbox/","series":[{"id":499126,"url":"http://patchwork.ozlabs.org/api/1.1/series/499126/?format=json","web_url":"http://patchwork.ozlabs.org/project/netfilter-devel/list/?series=499126","date":"2026-04-08T11:53:41","name":"hrtimers: Prevent hrtimer interrupt starvation","version":2,"mbox":"http://patchwork.ozlabs.org/series/499126/mbox/"}],"comments":"http://patchwork.ozlabs.org/api/patches/2220896/comments/","check":"pending","checks":"http://patchwork.ozlabs.org/api/patches/2220896/checks/","tags":{},"headers":{"Return-Path":"\n <netfilter-devel+bounces-11714-incoming=patchwork.ozlabs.org@vger.kernel.org>","X-Original-To":["incoming@patchwork.ozlabs.org","netfilter-devel@vger.kernel.org"],"Delivered-To":"patchwork-incoming@legolas.ozlabs.org","Authentication-Results":["legolas.ozlabs.org;\n\tdkim=pass (2048-bit key;\n unprotected) header.d=kernel.org header.i=@kernel.org header.a=rsa-sha256\n header.s=k20201202 header.b=cWw1Uu1n;\n\tdkim-atps=neutral","legolas.ozlabs.org;\n spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org\n (client-ip=104.64.211.4; helo=sin.lore.kernel.org;\n envelope-from=netfilter-devel+bounces-11714-incoming=patchwork.ozlabs.org@vger.kernel.org;\n receiver=patchwork.ozlabs.org)","smtp.subspace.kernel.org;\n\tdkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=\"cWw1Uu1n\"","smtp.subspace.kernel.org;\n arc=none smtp.client-ip=10.30.226.201"],"Received":["from sin.lore.kernel.org (sin.lore.kernel.org [104.64.211.4])\n\t(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)\n\t key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384)\n\t(No client certificate requested)\n\tby legolas.ozlabs.org (Postfix) with ESMTPS id 4frM164slpz1yD6\n\tfor <incoming@patchwork.ozlabs.org>; Wed, 08 Apr 2026 21:54:18 +1000 (AEST)","from smtp.subspace.kernel.org (conduit.subspace.kernel.org\n [100.90.174.1])\n\tby sin.lore.kernel.org (Postfix) with ESMTP id 37C963029672\n\tfor <incoming@patchwork.ozlabs.org>; Wed,  8 Apr 2026 11:53:58 +0000 (UTC)","from localhost.localdomain (localhost.localdomain [127.0.0.1])\n\tby smtp.subspace.kernel.org (Postfix) with ESMTP id 681DE3BBA14;\n\tWed,  8 Apr 2026 11:53:51 +0000 (UTC)","from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org\n [10.30.226.201])\n\t(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))\n\t(No client certificate requested)\n\tby smtp.subspace.kernel.org (Postfix) with ESMTPS id D0EF82DEA75;\n\tWed,  8 Apr 2026 11:53:50 +0000 (UTC)","by smtp.kernel.org (Postfix) with ESMTPSA id 8D20FC19421;\n\tWed,  8 Apr 2026 11:53:49 +0000 (UTC)"],"ARC-Seal":"i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;\n\tt=1775649230; cv=none;\n b=STSuqKs0XVh8ZJbQC49gjrBee5dPzLYjTIZPCg+FVMCKBgTNDyPcuc8ohIfmcc1TW1j2XN9rfP4fRvFZPxIuFQLiHE9rDd1/6vhkweJgZ4kxL/XFLLcX/rg0k+nS4lqK3EWLJdsZyNI5pilmoV9oiM0clvGIooxGAIpAatmnLrQ=","ARC-Message-Signature":"i=1; a=rsa-sha256; d=subspace.kernel.org;\n\ts=arc-20240116; t=1775649230; c=relaxed/simple;\n\tbh=nTGiCk6OwHsxQG7elt4n6Y4G5bqO6SNPXj/FFhihf5o=;\n\th=Date:Message-ID:From:To:Cc:Subject:References:MIME-Version:\n\t Content-Type;\n b=W8zpSiKd7t3gJ35cO6Y8VFWkLlhlh2B2XUU2m1UK3XPxkj9MwyOytMDMymBE04DWKz4kE5mPnm3GiSRGR4blRTwmqKhzJsTg8KgqumgJ1bdzojN64X05vdT7Bpl7dZTK2at6iPShYMxTqCEnqfQX56veEm/FIoyTDHZtok5vi0k=","ARC-Authentication-Results":"i=1; smtp.subspace.kernel.org;\n dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org\n header.b=cWw1Uu1n; arc=none smtp.client-ip=10.30.226.201","DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;\n\ts=k20201202; t=1775649230;\n\tbh=nTGiCk6OwHsxQG7elt4n6Y4G5bqO6SNPXj/FFhihf5o=;\n\th=Date:From:To:Cc:Subject:References:From;\n\tb=cWw1Uu1nn+G+uRwDaYTdrftv+JwOEh/qjfHRZln5GBZu4GBXrKGzW3Xqhq7/WAWKi\n\t QnNod7yGjHAREtgSXLzxH6v27qzmKjhQ520cklT+o3M0lWpaglwD1wVe784EM258tA\n\t qdotW01P0e6b+bR6ovK62S+wT4ZaMFma92J2JE8CjA1sgFa2vx8bgy2NESaBZWpwt4\n\t f2KYjF2UJIoThBWciFev3XskcLYO1HbuvSb0u1sdaNJDRgGZyfWoq8suyAss4otNZQ\n\t aP8gbsg1zH6H01PB7wZY6HfFG+3HR67ycA/vXFUiai2bCZuVczLqTdNLF0UxzLWwjb\n\t riUVRt0wFy8Ew==","Date":"Wed, 08 Apr 2026 13:53:46 +0200","Message-ID":"<20260408114951.995031895@kernel.org>","User-Agent":"quilt/0.68","From":"Thomas Gleixner <tglx@kernel.org>","To":"LKML <linux-kernel@vger.kernel.org>","Cc":"Calvin Owens <calvin@wbinvd.org>,\n Anna-Maria Behnsen <anna-maria@linutronix.de>,\n Frederic Weisbecker <frederic@kernel.org>,\n \"Peter Zijlstra (Intel)\" <peterz@infradead.org>,\n John Stultz <jstultz@google.com>,\n Stephen Boyd <sboyd@kernel.org>,\n Alexander Viro <viro@zeniv.linux.org.uk>,\n Christian Brauner <brauner@kernel.org>,\n Jan Kara <jack@suse.cz>,\n linux-fsdevel@vger.kernel.org,\n Sebastian Reichel <sre@kernel.org>,\n linux-pm@vger.kernel.org,\n Pablo Neira Ayuso <pablo@netfilter.org>,\n Florian Westphal <fw@strlen.de>,\n Phil Sutter <phil@nwl.cc>,\n netfilter-devel@vger.kernel.org,\n coreteam@netfilter.org","Subject":"[patch V2 01/11] hrtimer: Provide hrtimer_start_range_ns_user()","References":"<20260408102356.783133335@kernel.org>","Precedence":"bulk","X-Mailing-List":"netfilter-devel@vger.kernel.org","List-Id":"<netfilter-devel.vger.kernel.org>","List-Subscribe":"<mailto:netfilter-devel+subscribe@vger.kernel.org>","List-Unsubscribe":"<mailto:netfilter-devel+unsubscribe@vger.kernel.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=UTF-8"},"content":"Calvin reported an odd NMI watchdog lockup which claims that the CPU locked\nup in user space. He provided a reproducer, which set's up a timerfd based\ntimer and then rearms it in a loop with an absolute expiry time of 1ns.\n\nAs the expiry time is in the past, the timer ends up as the first expiring\ntimer in the per CPU hrtimer base and the clockevent device is programmed\nwith the minimum delta value. If the machine is fast enough, this ends up\nin a endless loop of programming the delta value to the minimum value\ndefined by the clock event device, before the timer interrupt can fire,\nwhich starves the interrupt and consequently triggers the lockup detector\nbecause the hrtimer callback of the lockup mechanism is never invoked.\n\nThe clockevents code already has a last resort mechanism to prevent that,\nbut it's sensible to catch such issues before trying to reprogram the clock\nevent device.\n\nProvide a variant of hrtimer_start_range_ns(), which sanity checks the\ntimer after queueing it. It does not so before because the timer might be\narmed and therefore needs to be dequeued. also we optimize for the latest\npossible point to check, so that the clock event prevention is avoided as\nmuch as possible.\n\nIf the timer is already expired _before_ the clock event is reprogrammed,\nremove the timer from the queue and signal to the caller that the operation\nfailed by returning false.\n\nThat allows the caller to take immediate action without going through the\nloops and hoops of the hrtimer interrupt.\n\nThe queueing code can't invoke the timer callback as the caller might hold\na lock which is taken in the callback.\n\nAdd a tracepoint which allows to analyze the expired at start situation.\n\nReported-by: Calvin Owens <calvin@wbinvd.org>\nSigned-off-by: Thomas Gleixner <tglx@kernel.org>\nCc: Anna-Maria Behnsen <anna-maria@linutronix.de>\nCc: Frederic Weisbecker <frederic@kernel.org>\n---\nV2: Moved the user check into hrtimer_start_range_ns_user() and handled\n    the NONE case explictly. - PeterZ\n    Rebased on tip timers/core\n---\n include/linux/hrtimer.h      |   20 +++++-\n include/trace/events/timer.h |   13 ++++\n kernel/time/hrtimer.c        |  134 +++++++++++++++++++++++++++++++++++++------\n 3 files changed, 148 insertions(+), 19 deletions(-)","diff":"--- a/include/linux/hrtimer.h\n+++ b/include/linux/hrtimer.h\n@@ -206,6 +206,9 @@ static inline void destroy_hrtimer_on_st\n extern void hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,\n \t\t\t\t   u64 range_ns, const enum hrtimer_mode mode);\n \n+extern bool hrtimer_start_range_ns_user(struct hrtimer *timer, ktime_t tim,\n+\t\t\t\t\tu64 range_ns, const enum hrtimer_mode mode);\n+\n /**\n  * hrtimer_start - (re)start an hrtimer\n  * @timer:\tthe timer to be added\n@@ -223,17 +226,28 @@ static inline void hrtimer_start(struct\n extern int hrtimer_cancel(struct hrtimer *timer);\n extern int hrtimer_try_to_cancel(struct hrtimer *timer);\n \n-static inline void hrtimer_start_expires(struct hrtimer *timer,\n-\t\t\t\t\t enum hrtimer_mode mode)\n+static inline void hrtimer_start_expires(struct hrtimer *timer, enum hrtimer_mode mode)\n {\n-\tu64 delta;\n \tktime_t soft, hard;\n+\tu64 delta;\n+\n \tsoft = hrtimer_get_softexpires(timer);\n \thard = hrtimer_get_expires(timer);\n \tdelta = ktime_to_ns(ktime_sub(hard, soft));\n \thrtimer_start_range_ns(timer, soft, delta, mode);\n }\n \n+static inline bool hrtimer_start_expires_user(struct hrtimer *timer, enum hrtimer_mode mode)\n+{\n+\tktime_t soft, hard;\n+\tu64 delta;\n+\n+\tsoft = hrtimer_get_softexpires(timer);\n+\thard = hrtimer_get_expires(timer);\n+\tdelta = ktime_to_ns(ktime_sub(hard, soft));\n+\treturn hrtimer_start_range_ns_user(timer, soft, delta, mode);\n+}\n+\n void hrtimer_sleeper_start_expires(struct hrtimer_sleeper *sl,\n \t\t\t\t   enum hrtimer_mode mode);\n \n--- a/include/trace/events/timer.h\n+++ b/include/trace/events/timer.h\n@@ -299,6 +299,19 @@ DECLARE_EVENT_CLASS(hrtimer_class,\n );\n \n /**\n+ * hrtimer_start_expired - Invoked when a expired timer was started\n+ * @hrtimer:\tpointer to struct hrtimer\n+ *\n+ * Preceeded by a hrtimer_start tracepoint.\n+ */\n+DEFINE_EVENT(hrtimer_class, hrtimer_start_expired,\n+\n+\tTP_PROTO(struct hrtimer *hrtimer),\n+\n+\tTP_ARGS(hrtimer)\n+);\n+\n+/**\n  * hrtimer_expire_exit - called immediately after the hrtimer callback returns\n  * @hrtimer:\tpointer to struct hrtimer\n  *\n--- a/kernel/time/hrtimer.c\n+++ b/kernel/time/hrtimer.c\n@@ -1352,6 +1352,12 @@ static inline bool hrtimer_keep_base(str\n \treturn hrtimer_prefer_local(is_local, is_first, is_pinned);\n }\n \n+enum {\n+\tHRTIMER_REPROGRAM_NONE,\n+\tHRTIMER_REPROGRAM,\n+\tHRTIMER_REPROGRAM_FORCE,\n+};\n+\n static bool __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim, u64 delta_ns,\n \t\t\t\t     const enum hrtimer_mode mode, struct hrtimer_clock_base *base)\n {\n@@ -1410,7 +1416,7 @@ static bool __hrtimer_start_range_ns(str\n \t/* If a deferred rearm is pending skip reprogramming the device */\n \tif (cpu_base->deferred_rearm) {\n \t\tcpu_base->deferred_needs_update = true;\n-\t\treturn false;\n+\t\treturn HRTIMER_REPROGRAM_NONE;\n \t}\n \n \tif (!was_first || cpu_base != this_cpu_base) {\n@@ -1423,7 +1429,7 @@ static bool __hrtimer_start_range_ns(str\n \t\t * callbacks.\n \t\t */\n \t\tif (likely(hrtimer_base_is_online(this_cpu_base)))\n-\t\t\treturn first;\n+\t\t\treturn first ? HRTIMER_REPROGRAM : HRTIMER_REPROGRAM_NONE;\n \n \t\t/*\n \t\t * Timer was enqueued remote because the current base is\n@@ -1432,7 +1438,7 @@ static bool __hrtimer_start_range_ns(str\n \t\t */\n \t\tif (first)\n \t\t\tsmp_call_function_single_async(cpu_base->cpu, &cpu_base->csd);\n-\t\treturn false;\n+\t\treturn HRTIMER_REPROGRAM_NONE;\n \t}\n \n \t/*\n@@ -1446,7 +1452,7 @@ static bool __hrtimer_start_range_ns(str\n \t */\n \tif (timer->is_lazy) {\n \t\tif (cpu_base->expires_next <= hrtimer_get_expires(timer))\n-\t\t\treturn false;\n+\t\t\treturn HRTIMER_REPROGRAM_NONE;\n \t}\n \n \t/*\n@@ -1455,8 +1461,24 @@ static bool __hrtimer_start_range_ns(str\n \t * reprogram the hardware by evaluating the new first expiring\n \t * timer.\n \t */\n-\thrtimer_force_reprogram(cpu_base, /* skip_equal */ true);\n-\treturn false;\n+\treturn HRTIMER_REPROGRAM_FORCE;\n+}\n+\n+static int hrtimer_start_range_ns_common(struct hrtimer *timer, ktime_t tim,\n+\t\t\t\t\t u64 delta_ns, const enum hrtimer_mode mode,\n+\t\t\t\t\t struct hrtimer_clock_base *base)\n+{\n+\t/*\n+\t * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft\n+\t * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard\n+\t * expiry mode because unmarked timers are moved to softirq expiry.\n+\t */\n+\tif (!IS_ENABLED(CONFIG_PREEMPT_RT))\n+\t\tWARN_ON_ONCE(!(mode & HRTIMER_MODE_SOFT) ^ !timer->is_soft);\n+\telse\n+\t\tWARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard);\n+\n+\treturn __hrtimer_start_range_ns(timer, tim, delta_ns, mode, base);\n }\n \n /**\n@@ -1476,24 +1498,104 @@ void hrtimer_start_range_ns(struct hrtim\n \n \tdebug_hrtimer_assert_init(timer);\n \n+\tbase = lock_hrtimer_base(timer, &flags);\n+\n+\tswitch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {\n+\tcase HRTIMER_REPROGRAM:\n+\t\thrtimer_reprogram(timer, true);\n+\t\tbreak;\n+\tcase HRTIMER_REPROGRAM_FORCE:\n+\t\thrtimer_force_reprogram(timer->base->cpu_base, 1);\n+\t\tbreak;\n+\tcase HRTIMER_REPROGRAM_NONE:\n+\t\tbreak;\n+\t}\n+\n+\tunlock_hrtimer_base(timer, &flags);\n+}\n+EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);\n+\n+static inline bool hrtimer_check_user_timer(struct hrtimer *timer)\n+{\n+\tstruct hrtimer_cpu_base *cpu_base = timer->base->cpu_base;\n+\tktime_t expires;\n+\n \t/*\n-\t * Check whether the HRTIMER_MODE_SOFT bit and hrtimer.is_soft\n-\t * match on CONFIG_PREEMPT_RT = n. With PREEMPT_RT check the hard\n-\t * expiry mode because unmarked timers are moved to softirq expiry.\n+\t * This uses soft expires because that's the user provided\n+\t * expiry time, while expires can be further in the past\n+\t * due to a slack value added to the user expiry time.\n \t */\n-\tif (!IS_ENABLED(CONFIG_PREEMPT_RT))\n-\t\tWARN_ON_ONCE(!(mode & HRTIMER_MODE_SOFT) ^ !timer->is_soft);\n-\telse\n-\t\tWARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard);\n+\texpires = hrtimer_get_softexpires(timer);\n+\n+\t/* Convert to monotonic */\n+\texpires = ktime_sub(expires, timer->base->offset);\n+\n+\t/*\n+\t * Check whether this timer will end up as the first expiring timer in\n+\t * the CPU base. If not, no further checks required as it's then\n+\t * guaranteed to expire in the future.\n+\t */\n+\tif (expires >= cpu_base->expires_next)\n+\t\treturn true;\n+\n+\t/* Validate that the expiry time is in the future. */\n+\tif (expires > ktime_get())\n+\t\treturn true;\n+\n+\tdebug_hrtimer_deactivate(timer);\n+\t__remove_hrtimer(timer, timer->base, HRTIMER_STATE_INACTIVE, false);\n+\ttrace_hrtimer_start_expired(timer);\n+\treturn false;\n+}\n+\n+/**\n+ * hrtimer_start_range_ns_user - (re)start an user controlled hrtimer\n+ * @timer:\tthe timer to be added\n+ * @tim:\texpiry time\n+ * @delta_ns:\t\"slack\" range for the timer\n+ * @mode:\ttimer mode: absolute (HRTIMER_MODE_ABS) or\n+ *\t\trelative (HRTIMER_MODE_REL), and pinned (HRTIMER_MODE_PINNED);\n+ *\t\tsoftirq based mode is considered for debug purpose only!\n+ *\n+ * Returns: True when the timer was queued, false if it was already expired\n+ *\n+ * This function cannot invoke the timer callback for expired timers as it might\n+ * be called under a lock which the timer callback needs to acquire. So the\n+ * caller has to handle that case.\n+ */\n+bool hrtimer_start_range_ns_user(struct hrtimer *timer, ktime_t tim,\n+\t\t\t\t u64 delta_ns, const enum hrtimer_mode mode)\n+{\n+\tstruct hrtimer_clock_base *base;\n+\tunsigned long flags;\n+\tbool ret = true;\n+\n+\tdebug_hrtimer_assert_init(timer);\n \n \tbase = lock_hrtimer_base(timer, &flags);\n \n-\tif (__hrtimer_start_range_ns(timer, tim, delta_ns, mode, base))\n-\t\thrtimer_reprogram(timer, true);\n+\tswitch (hrtimer_start_range_ns_common(timer, tim, delta_ns, mode, base)) {\n+\tcase HRTIMER_REPROGRAM:\n+\t\tret = hrtimer_check_user_timer(timer);\n+\t\tif (ret)\n+\t\t\thrtimer_reprogram(timer, true);\n+\t\tbreak;\n+\tcase HRTIMER_REPROGRAM_FORCE:\n+\t\tret = hrtimer_check_user_timer(timer);\n+\t\t/*\n+\t\t * The base must always be reevaluated, independent of the\n+\t\t * result above because the timer was the first pending timer.\n+\t\t */\n+\t\thrtimer_force_reprogram(timer->base->cpu_base, 1);\n+\t\tbreak;\n+\tcase HRTIMER_REPROGRAM_NONE:\n+\t\tbreak;\n+\t}\n \n \tunlock_hrtimer_base(timer, &flags);\n+\treturn ret;\n }\n-EXPORT_SYMBOL_GPL(hrtimer_start_range_ns);\n+EXPORT_SYMBOL_GPL(hrtimer_start_range_ns_user);\n \n /**\n  * hrtimer_try_to_cancel - try to deactivate a timer\n","prefixes":["V2","01/11"]}