diff mbox series

[v5,10/18] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()

Message ID 20230519101840.v5.10.I3a7d4dd8c23ac30ee0b607d77feb6646b64825c0@changeid
State New
Headers show
Series watchdog/hardlockup: Add the buddy hardlockup detector | expand

Commit Message

Doug Anderson May 19, 2023, 5:18 p.m. UTC
In preparation for the buddy hardlockup detector where the CPU
checking for lockup might not be the currently running CPU, add a
"cpu" parameter to watchdog_hardlockup_check().

As part of this change, make hrtimer_interrupts an atomic_t since now
the CPU incrementing the value and the CPU reading the value might be
different. Technially this could also be done with just READ_ONCE and
WRITE_ONCE, but atomic_t feels a little cleaner in this case.

While hrtimer_interrupts is made atomic_t, we change
hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
needed to match the data type backing atomic_t for hrtimer_interrupts.
Even if this changes us from 64-bits to 32-bits (which I don't think
is true for most compilers), it doesn't really matter. All we ever do
is increment it every few seconds and compare it to an old value so
32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
also doesn't matter for simple equality comparisons.

hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
always consistently accessed with the same CPU. NOTE: with the
upcoming "buddy" detector there is one special case. When a CPU goes
offline/online then we can change which CPU is the one to consistently
access a given instance of hrtimer_interrupts_saved. We still can't
end up with a partially updated hrtimer_interrupts_saved, however,
because we end up petting all affected CPUs to make sure the new and
old CPU can't end up somehow read/write hrtimer_interrupts_saved at
the same time.

Signed-off-by: Douglas Anderson <dianders@chromium.org>
---

Changes in v5:
- Don't dump stack on the buddy CPU if we fail to backtrace the hung CPU.
- Use atomic_t for hrtimer_interrupts.

Changes in v4:
- ("Add a "cpu" param to watchdog_hardlockup_check()") new for v4.

 include/linux/nmi.h    |  2 +-
 kernel/watchdog.c      | 52 ++++++++++++++++++++++++++----------------
 kernel/watchdog_perf.c |  2 +-
 3 files changed, 34 insertions(+), 22 deletions(-)

Comments

Petr Mladek May 23, 2023, 4:02 p.m. UTC | #1
On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector where the CPU
> checking for lockup might not be the currently running CPU, add a
> "cpu" parameter to watchdog_hardlockup_check().
> 
> As part of this change, make hrtimer_interrupts an atomic_t since now
> the CPU incrementing the value and the CPU reading the value might be
> different. Technially this could also be done with just READ_ONCE and
> WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> 
> While hrtimer_interrupts is made atomic_t, we change
> hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> needed to match the data type backing atomic_t for hrtimer_interrupts.
> Even if this changes us from 64-bits to 32-bits (which I don't think
> is true for most compilers), it doesn't really matter. All we ever do
> is increment it every few seconds and compare it to an old value so
> 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> also doesn't matter for simple equality comparisons.
> 
> hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> always consistently accessed with the same CPU. NOTE: with the
> upcoming "buddy" detector there is one special case. When a CPU goes
> offline/online then we can change which CPU is the one to consistently
> access a given instance of hrtimer_interrupts_saved. We still can't
> end up with a partially updated hrtimer_interrupts_saved, however,
> because we end up petting all affected CPUs to make sure the new and
> old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> the same time.
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
>  
>  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
>  
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
>  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
>  static unsigned long watchdog_hardlockup_all_cpu_dumped;
>  
> -static bool is_hardlockup(void)
> +static bool is_hardlockup(unsigned int cpu)
>  {
> -	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> +	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
>  
> -	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> +	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
>  		return true;
>  
> -	__this_cpu_write(hrtimer_interrupts_saved, hrint);
> +	/*
> +	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> +	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> +	 * written/read by a single CPU.
> +	 */
> +	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>  
>  	return false;
>  }
>  
>  static void watchdog_hardlockup_kick(void)
>  {
> -	__this_cpu_inc(hrtimer_interrupts);
> +	atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));

Is there any particular reason why raw_*() is needed, please?

My expectation is that the raw_ API should be used only when
there is a good reason for it. Where a good reason might be
when the checks might fail but the consistency is guaranteed
another way.

IMHO, we should use:

	atomic_inc(this_cpu_ptr(&hrtimer_interrupts));

To be honest, I am a bit lost in the per_cpu API definitions.

But this_cpu_ptr() seems to be implemented the same way as
per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
And we use per_cpu_ptr() in is_hardlockup().

Also this_cpu_ptr() is used more commonly:

$> git grep this_cpu_ptr | wc -l
1385
$> git grep raw_cpu_ptr | wc -l
114

>  }
>  
> -void watchdog_hardlockup_check(struct pt_regs *regs)
> +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
>  	/*
>  	 * Check for a hardlockup by making sure the CPU's timer
> @@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
>  	 * fired multiple times before we overflow'd. If it hasn't
>  	 * then this is a good indication the cpu is stuck
>  	 */
> -	if (is_hardlockup()) {
> +	if (is_hardlockup(cpu)) {
>  		unsigned int this_cpu = smp_processor_id();
> +		struct cpumask backtrace_mask = *cpu_online_mask;

Does this work, please?

IMHO, we should use cpumask_copy().

>  
>  		/* Only print hardlockups once. */
> -		if (__this_cpu_read(watchdog_hardlockup_warned))
> +		if (per_cpu(watchdog_hardlockup_warned, cpu))
>  			return;
>  

Otherwise, it looks good to me.

Best Regards,
Petr
Doug Anderson May 23, 2023, 4:34 p.m. UTC | #2
Hi,

On Tue, May 23, 2023 at 9:02 AM Petr Mladek <pmladek@suse.com> wrote:
>
> On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> > In preparation for the buddy hardlockup detector where the CPU
> > checking for lockup might not be the currently running CPU, add a
> > "cpu" parameter to watchdog_hardlockup_check().
> >
> > As part of this change, make hrtimer_interrupts an atomic_t since now
> > the CPU incrementing the value and the CPU reading the value might be
> > different. Technially this could also be done with just READ_ONCE and
> > WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> >
> > While hrtimer_interrupts is made atomic_t, we change
> > hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> > needed to match the data type backing atomic_t for hrtimer_interrupts.
> > Even if this changes us from 64-bits to 32-bits (which I don't think
> > is true for most compilers), it doesn't really matter. All we ever do
> > is increment it every few seconds and compare it to an old value so
> > 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> > also doesn't matter for simple equality comparisons.
> >
> > hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> > accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> > always consistently accessed with the same CPU. NOTE: with the
> > upcoming "buddy" detector there is one special case. When a CPU goes
> > offline/online then we can change which CPU is the one to consistently
> > access a given instance of hrtimer_interrupts_saved. We still can't
> > end up with a partially updated hrtimer_interrupts_saved, however,
> > because we end up petting all affected CPUs to make sure the new and
> > old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> > the same time.
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> >
> >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> >
> > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> > +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> >  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
> >  static unsigned long watchdog_hardlockup_all_cpu_dumped;
> >
> > -static bool is_hardlockup(void)
> > +static bool is_hardlockup(unsigned int cpu)
> >  {
> > -     unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> > +     int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> >
> > -     if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> >               return true;
> >
> > -     __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > +     /*
> > +      * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > +      * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > +      * written/read by a single CPU.
> > +      */
> > +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> >
> >       return false;
> >  }
> >
> >  static void watchdog_hardlockup_kick(void)
> >  {
> > -     __this_cpu_inc(hrtimer_interrupts);
> > +     atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
>
> Is there any particular reason why raw_*() is needed, please?
>
> My expectation is that the raw_ API should be used only when
> there is a good reason for it. Where a good reason might be
> when the checks might fail but the consistency is guaranteed
> another way.
>
> IMHO, we should use:
>
>         atomic_inc(this_cpu_ptr(&hrtimer_interrupts));
>
> To be honest, I am a bit lost in the per_cpu API definitions.
>
> But this_cpu_ptr() seems to be implemented the same way as
> per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
> And we use per_cpu_ptr() in is_hardlockup().
>
> Also this_cpu_ptr() is used more commonly:
>
> $> git grep this_cpu_ptr | wc -l
> 1385
> $> git grep raw_cpu_ptr | wc -l
> 114

Hmmm, I think maybe I confused myself. The old code purposely used the
double-underscore prefixed version of this_cpu_inc(). I couldn't find
a double-underscore version of this_cpu_ptr() and I somehow convinced
myself that the raw() version was the right equivalent version.

You're right that this_cpu_ptr() is a better choice here and I don't
see any reason why we'd need the raw version.

> >  }
> >
> > -void watchdog_hardlockup_check(struct pt_regs *regs)
> > +void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
> >  {
> >       /*
> >        * Check for a hardlockup by making sure the CPU's timer
> > @@ -117,35 +122,42 @@ void watchdog_hardlockup_check(struct pt_regs *regs)
> >        * fired multiple times before we overflow'd. If it hasn't
> >        * then this is a good indication the cpu is stuck
> >        */
> > -     if (is_hardlockup()) {
> > +     if (is_hardlockup(cpu)) {
> >               unsigned int this_cpu = smp_processor_id();
> > +             struct cpumask backtrace_mask = *cpu_online_mask;
>
> Does this work, please?
>
> IMHO, we should use cpumask_copy().

Ah, good call, thanks!


> >               /* Only print hardlockups once. */
> > -             if (__this_cpu_read(watchdog_hardlockup_warned))
> > +             if (per_cpu(watchdog_hardlockup_warned, cpu))
> >                       return;
> >
>
> Otherwise, it looks good to me.

Neither change seems urgent though both are important to fix, I'll
wait a day or two to see if you have feedback on any of the other
patches and I'll send a fixup series.

-Doug
Petr Mladek May 24, 2023, 11:36 a.m. UTC | #3
On Tue 2023-05-23 09:34:37, Doug Anderson wrote:
> Hi,
> 
> On Tue, May 23, 2023 at 9:02 AM Petr Mladek <pmladek@suse.com> wrote:
> >
> > On Fri 2023-05-19 10:18:34, Douglas Anderson wrote:
> > > In preparation for the buddy hardlockup detector where the CPU
> > > checking for lockup might not be the currently running CPU, add a
> > > "cpu" parameter to watchdog_hardlockup_check().
> > >
> > > As part of this change, make hrtimer_interrupts an atomic_t since now
> > > the CPU incrementing the value and the CPU reading the value might be
> > > different. Technially this could also be done with just READ_ONCE and
> > > WRITE_ONCE, but atomic_t feels a little cleaner in this case.
> > >
> > > While hrtimer_interrupts is made atomic_t, we change
> > > hrtimer_interrupts_saved from "unsigned long" to "int". The "int" is
> > > needed to match the data type backing atomic_t for hrtimer_interrupts.
> > > Even if this changes us from 64-bits to 32-bits (which I don't think
> > > is true for most compilers), it doesn't really matter. All we ever do
> > > is increment it every few seconds and compare it to an old value so
> > > 32-bits is fine (even 16-bits would be). The "signed" vs "unsigned"
> > > also doesn't matter for simple equality comparisons.
> > >
> > > hrtimer_interrupts_saved is _not_ switched to atomic_t nor even
> > > accessed with READ_ONCE / WRITE_ONCE. The hrtimer_interrupts_saved is
> > > always consistently accessed with the same CPU. NOTE: with the
> > > upcoming "buddy" detector there is one special case. When a CPU goes
> > > offline/online then we can change which CPU is the one to consistently
> > > access a given instance of hrtimer_interrupts_saved. We still can't
> > > end up with a partially updated hrtimer_interrupts_saved, however,
> > > because we end up petting all affected CPUs to make sure the new and
> > > old CPU can't end up somehow read/write hrtimer_interrupts_saved at
> > > the same time.
> > >
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -87,29 +87,34 @@ __setup("nmi_watchdog=", hardlockup_panic_setup);
> > >
> > >  #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
> > >
> > > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
> > > -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> > > +static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
> > > +static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
> > >  static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
> > >  static unsigned long watchdog_hardlockup_all_cpu_dumped;
> > >
> > > -static bool is_hardlockup(void)
> > > +static bool is_hardlockup(unsigned int cpu)
> > >  {
> > > -     unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> > > +     int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
> > >
> > > -     if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> > > +     if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > >               return true;
> > >
> > > -     __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > > +     /*
> > > +      * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
> > > +      * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
> > > +      * written/read by a single CPU.
> > > +      */
> > > +     per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
> > >
> > >       return false;
> > >  }
> > >
> > >  static void watchdog_hardlockup_kick(void)
> > >  {
> > > -     __this_cpu_inc(hrtimer_interrupts);
> > > +     atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
> >
> > Is there any particular reason why raw_*() is needed, please?
> >
> > My expectation is that the raw_ API should be used only when
> > there is a good reason for it. Where a good reason might be
> > when the checks might fail but the consistency is guaranteed
> > another way.
> >
> > IMHO, we should use:
> >
> >         atomic_inc(this_cpu_ptr(&hrtimer_interrupts));
> >
> > To be honest, I am a bit lost in the per_cpu API definitions.
> >
> > But this_cpu_ptr() seems to be implemented the same way as
> > per_cpu_ptr() when CONFIG_DEBUG_PREEMPT is enabled.
> > And we use per_cpu_ptr() in is_hardlockup().
> >
> > Also this_cpu_ptr() is used more commonly:
> >
> > $> git grep this_cpu_ptr | wc -l
> > 1385
> > $> git grep raw_cpu_ptr | wc -l
> > 114
> 
> Hmmm, I think maybe I confused myself. The old code purposely used the
> double-underscore prefixed version of this_cpu_inc(). I couldn't find
> a double-underscore version of this_cpu_ptr() and I somehow convinced
> myself that the raw() version was the right equivalent version.
> 
> You're right that this_cpu_ptr() is a better choice here and I don't
> see any reason why we'd need the raw version.

I was confused too. Honestly, it looks a bit messy to me.

My understanding is that this_cpu*() API has the following semantic:

    + this_cpu_*()* actively disables interrupts/preemption

    + __this_cpu_*() just warns when the task could migrate
		between CPUs.

    + raw_cpu_*() can be used in preemtible context when
		the validity is guaranteed another way.

this_cpu_ptr() does not fit the above. I guess that it is
because it is just providing the address and it is not
accessing the data. So it is enough to read the current
CPU id an atomic way.

IMHO, it would make sense to distinguish how the pointer is
going to be used. From this POV, __this_cpu_ptr() and
raw_cpu_ptr() would make more sense to me.

But it looks to me that this_cpu_ptr() has the same semantic
as per_cpu_ptr().

> Neither change seems urgent though both are important to fix, I'll
> wait a day or two to see if you have feedback on any of the other
> patches and I'll send a fixup series.

Yup, I am going to review the rest.

Best Regards,
Petr
diff mbox series

Patch

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 0c62c1bf0a71..92aa568c0c42 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -88,7 +88,7 @@  static inline void hardlockup_detector_disable(void) {}
 #endif
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
-void watchdog_hardlockup_check(struct pt_regs *regs);
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs);
 #endif
 
 #if defined(CONFIG_HAVE_NMI_WATCHDOG) || defined(CONFIG_HARDLOCKUP_DETECTOR)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 169e5dffbc00..2552e224f76a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -87,29 +87,34 @@  __setup("nmi_watchdog=", hardlockup_panic_setup);
 
 #if defined(CONFIG_HARDLOCKUP_DETECTOR_PERF)
 
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
-static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
+static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
 static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
 static unsigned long watchdog_hardlockup_all_cpu_dumped;
 
-static bool is_hardlockup(void)
+static bool is_hardlockup(unsigned int cpu)
 {
-	unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
+	int hrint = atomic_read(&per_cpu(hrtimer_interrupts, cpu));
 
-	if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
+	if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
 		return true;
 
-	__this_cpu_write(hrtimer_interrupts_saved, hrint);
+	/*
+	 * NOTE: we don't need any fancy atomic_t or READ_ONCE/WRITE_ONCE
+	 * for hrtimer_interrupts_saved. hrtimer_interrupts_saved is
+	 * written/read by a single CPU.
+	 */
+	per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
 
 	return false;
 }
 
 static void watchdog_hardlockup_kick(void)
 {
-	__this_cpu_inc(hrtimer_interrupts);
+	atomic_inc(raw_cpu_ptr(&hrtimer_interrupts));
 }
 
-void watchdog_hardlockup_check(struct pt_regs *regs)
+void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
 {
 	/*
 	 * Check for a hardlockup by making sure the CPU's timer
@@ -117,35 +122,42 @@  void watchdog_hardlockup_check(struct pt_regs *regs)
 	 * fired multiple times before we overflow'd. If it hasn't
 	 * then this is a good indication the cpu is stuck
 	 */
-	if (is_hardlockup()) {
+	if (is_hardlockup(cpu)) {
 		unsigned int this_cpu = smp_processor_id();
+		struct cpumask backtrace_mask = *cpu_online_mask;
 
 		/* Only print hardlockups once. */
-		if (__this_cpu_read(watchdog_hardlockup_warned))
+		if (per_cpu(watchdog_hardlockup_warned, cpu))
 			return;
 
-		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", this_cpu);
+		pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
 		print_modules();
 		print_irqtrace_events(current);
-		if (regs)
-			show_regs(regs);
-		else
-			dump_stack();
+		if (cpu == this_cpu) {
+			if (regs)
+				show_regs(regs);
+			else
+				dump_stack();
+			cpumask_clear_cpu(cpu, &backtrace_mask);
+		} else {
+			if (trigger_single_cpu_backtrace(cpu))
+				cpumask_clear_cpu(cpu, &backtrace_mask);
+		}
 
 		/*
-		 * Perform all-CPU dump only once to avoid multiple hardlockups
-		 * generating interleaving traces
+		 * Perform multi-CPU dump only once to avoid multiple
+		 * hardlockups generating interleaving traces
 		 */
 		if (sysctl_hardlockup_all_cpu_backtrace &&
 		    !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
-			trigger_allbutself_cpu_backtrace();
+			trigger_cpumask_backtrace(&backtrace_mask);
 
 		if (hardlockup_panic)
 			nmi_panic(regs, "Hard LOCKUP");
 
-		__this_cpu_write(watchdog_hardlockup_warned, true);
+		per_cpu(watchdog_hardlockup_warned, cpu) = true;
 	} else {
-		__this_cpu_write(watchdog_hardlockup_warned, false);
+		per_cpu(watchdog_hardlockup_warned, cpu) = false;
 	}
 }
 
diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
index 04415812d079..4e60e8023515 100644
--- a/kernel/watchdog_perf.c
+++ b/kernel/watchdog_perf.c
@@ -120,7 +120,7 @@  static void watchdog_overflow_callback(struct perf_event *event,
 		return;
 	}
 
-	watchdog_hardlockup_check(regs);
+	watchdog_hardlockup_check(smp_processor_id(), regs);
 }
 
 static int hardlockup_detector_event_create(void)