diff mbox

powerpc: xive: ensure active irqd when setting affinity

Message ID 20170803013822.GD28905@us.ibm.com (mailing list archive)
State Accepted
Commit cffb717ceb8e2ca0316e89d908db54af454f1fbb
Headers show

Commit Message

Sukadev Bhattiprolu Aug. 3, 2017, 1:38 a.m. UTC
From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Tue, 1 Aug 2017 20:54:41 -0500
Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity

Ensure irqd is active before attempting to set affinity. This should
make the set affinity code more robust. For instance, this prevents
these messages seen on a 4.12 based kernel when taking cpus offline:

   [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
   [   77.885859] xive: Error -6 reconfiguring irq 17
   [   77.885862] IRQ17: set affinity failed(-6).

The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:

   commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/sysdev/xive/common.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Michael Ellerman Aug. 8, 2017, 10:40 a.m. UTC | #1
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:

> From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Tue, 1 Aug 2017 20:54:41 -0500
> Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity
>
> Ensure irqd is active before attempting to set affinity. This should
> make the set affinity code more robust. For instance, this prevents
> these messages seen on a 4.12 based kernel when taking cpus offline:
>
>    [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
>    [   77.885859] xive: Error -6 reconfiguring irq 17
>    [   77.885862] IRQ17: set affinity failed(-6).
>
> The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:
>
>    commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")

So do we still need this? Or is the above only a partial fix?

I'm a bit confused.

cheers
Sukadev Bhattiprolu Aug. 9, 2017, midnight UTC | #2
Michael Ellerman [mpe@ellerman.id.au] wrote:
> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> 
> > From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Date: Tue, 1 Aug 2017 20:54:41 -0500
> > Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity
> >
> > Ensure irqd is active before attempting to set affinity. This should
> > make the set affinity code more robust. For instance, this prevents
> > these messages seen on a 4.12 based kernel when taking cpus offline:
> >
> >    [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
> >    [   77.885859] xive: Error -6 reconfiguring irq 17
> >    [   77.885862] IRQ17: set affinity failed(-6).
> >
> > The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:
> >
> >    commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")
> 
> So do we still need this? Or is the above only a partial fix?

It would be good to have this fix.

Commit 91f26cb4cd3c fixes the problem, so we wont see the errors with
that commit applied. But if such a problem were to show up again, xive
will handle them earlier before hitting those errors.

Sukadev

> 
> I'm a bit confused.
> 
> cheers
Michael Ellerman Aug. 9, 2017, 6:15 a.m. UTC | #3
Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
> Michael Ellerman [mpe@ellerman.id.au] wrote:
>> Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> writes:
>> > From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
>> > From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> > Date: Tue, 1 Aug 2017 20:54:41 -0500
>> > Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity
>> >
>> > Ensure irqd is active before attempting to set affinity. This should
>> > make the set affinity code more robust. For instance, this prevents
>> > these messages seen on a 4.12 based kernel when taking cpus offline:
>> >
>> >    [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
>> >    [   77.885859] xive: Error -6 reconfiguring irq 17
>> >    [   77.885862] IRQ17: set affinity failed(-6).
>> >
>> > The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:
>> >
>> >    commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")
>> 
>> So do we still need this? Or is the above only a partial fix?
>
> It would be good to have this fix.
>
> Commit 91f26cb4cd3c fixes the problem, so we wont see the errors with
> that commit applied. But if such a problem were to show up again, xive
> will handle them earlier before hitting those errors.

I'm not sure I'm convinced. We can't handle every possible case of the
higher level code calling us in situations we don't expect.

For example irq_data could be NULL, but we trust the higher level code
not to do that to us.

Also I don't see any other driver doing this check.

  $ git grep irqd_is_started
  include/linux/irq.h:static inline bool irqd_is_started(struct irq_data *d)
  kernel/irq/chip.c:      if (irqd_is_started(d)) {
  kernel/irq/chip.c:      if (irqd_is_started(&desc->irq_data)) {
  kernel/irq/cpuhotplug.c:        if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {


cheers
Benjamin Herrenschmidt Aug. 9, 2017, 7:33 a.m. UTC | #4
On Wed, 2017-08-09 at 16:15 +1000, Michael Ellerman wrote:
> I'm not sure I'm convinced. We can't handle every possible case of the
> higher level code calling us in situations we don't expect.
> 
> For example irq_data could be NULL, but we trust the higher level code
> not to do that to us.
> 
> Also I don't see any other driver doing this check.
> 
>   $ git grep irqd_is_started
>   include/linux/irq.h:static inline bool irqd_is_started(struct irq_data *d)
>   kernel/irq/chip.c:      if (irqd_is_started(d)) {
>   kernel/irq/chip.c:      if (irqd_is_started(&desc->irq_data)) {
>   kernel/irq/cpuhotplug.c:        if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {

irqd_is_started is brand new so you won't find any :-)

For most cases the problem is a non-issue. Due to how xive works, it's
more of a problem for us because a non-started interrupt has no
targetting information at all.

So this is *somewhat* related to xive internal and I'd rather have
that sanity check in there.

Cheers,
Ben.
Michael Ellerman Aug. 11, 2017, 12:19 p.m. UTC | #5
On Thu, 2017-08-03 at 01:38:22 UTC, Sukadev Bhattiprolu wrote:
> >From fd0abf5c61b6041fdb75296e8580b86dc91d08d6 Mon Sep 17 00:00:00 2001
> From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Date: Tue, 1 Aug 2017 20:54:41 -0500
> Subject: [PATCH] powerpc: xive: ensure active irqd when setting affinity
> 
> Ensure irqd is active before attempting to set affinity. This should
> make the set affinity code more robust. For instance, this prevents
> these messages seen on a 4.12 based kernel when taking cpus offline:
> 
>    [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
>    [   77.885859] xive: Error -6 reconfiguring irq 17
>    [   77.885862] IRQ17: set affinity failed(-6).
> 
> The underlying problem with taking cpus offline was fixed in 4.13-rc1 by:
> 
>    commit 91f26cb4cd3c ("genirq/cpuhotplug: Do not migrated shutdown irqs")
> 
> Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/cffb717ceb8e2ca0316e89d908db54

cheers
diff mbox

Patch

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 6595462..2708d42 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -672,6 +672,10 @@  static int xive_irq_set_affinity(struct irq_data *d,
 	if (cpumask_any_and(cpumask, cpu_online_mask) >= nr_cpu_ids)
 		return -EINVAL;
 
+	/* Don't do anything if the interrupt isn't started */
+	if (!irqd_is_started(d))
+		return IRQ_SET_MASK_OK;
+
 	/*
 	 * If existing target is already in the new mask, and is
 	 * online then do nothing.