Message ID | 1502182969-3180-1-git-send-email-clg@kaod.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Cédric Le Goater <clg@kaod.org> writes: > When called from xive_irq_startup(), the size of the cpumask can be > larger than nr_cpu_ids. Most of time, its value is NR_CPUS (2048). Ugh, you're right. #define nr_cpumask_bits ((unsigned int)NR_CPUS) ... /** * cpumask_weight - Count of bits in *srcp * @srcp: the cpumask to count bits (< nr_cpu_ids) in. */ static inline unsigned int cpumask_weight(const struct cpumask *srcp) { return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); } I don't know what the comment on srcp is trying to say. It's not true that it only counts nr_cpu_ids worth of bits. So it does seem if we're passed a mask with > nr_cpu_ids bits set then cpumask_weight() will return > nr_cpu_ids, which is .. unhelpful. BUT, I don't see other code handling cpumask_weight() returning > nr_cpu_ids - at least I can't find any with some grepping. So what is going wrong here that we're being passed a mask with more than nr_cpu_ids bits set? I think the affinity mask is copied to the desc in desc_smp_init(), and the call chain will be: irq_create_mapping() -> irq_domain_alloc_descs() -> __irq_alloc_descs() -> alloc_descs() -> alloc_desc() -> desc_set_defaults() -> desc_smp_init() irq_create_mapping() is doing: virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL); Where the affinity mask is the NULL at the end. So presumably we're hitting the irq_default_affinity case here: static void desc_smp_init(struct irq_desc *desc, int node, const struct cpumask *affinity) { if (!affinity) affinity = irq_default_affinity; cpumask_copy(desc->irq_common_data.affinity, affinity); Which comes from: static void __init init_irq_default_affinity(void) { #ifdef CONFIG_CPUMASK_OFFSTACK if (!irq_default_affinity) zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT); #endif if (cpumask_empty(irq_default_affinity)) cpumask_setall(irq_default_affinity); } And cpumask_setall() will indeed set NR_CPUs bits. So that all seems sane, except that it does mean cpumask_weight() can return > nr_cpu_ids which is awkward. I guess this patch is a good fix, I'll expand the change log a bit. cheers > This can result in such WARNINGs in xive_find_target_in_mask(): > > [ 0.094480] WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 > [ 0.094486] Modules linked in: > [ 0.094491] CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.12.0+ #3 > [ 0.094496] task: c0000003fae4f200 task.stack: c0000003fe108000 > [ 0.094501] NIP: c00000000008a310 LR: c00000000008a2e4 CTR: 000000000072ca34 > [ 0.094506] REGS: c0000003fe10b360 TRAP: 0700 Not tainted (4.12.0+) > [ 0.094510] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> > [ 0.094515] CR: 88000222 XER: 20040008 > [ 0.094521] CFAR: c00000000008a2cc SOFTE: 0 > [ 0.094521] GPR00: c00000000008a274 c0000003fe10b5e0 c000000001428f00 0000000000000010 > [ 0.094521] GPR04: 0000000000000010 0000000000000010 0000000000000010 0000000000000099 > [ 0.094521] GPR08: 0000000000000010 0000000000000001 ffffffffffff0000 0000000000000000 > [ 0.094521] GPR12: 0000000000000000 c00000000fff2d00 c00000000000d4d8 0000000000000000 > [ 0.094521] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 0.094521] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000b451e8 > [ 0.094521] GPR24: 00000000ffffffff c000000001462354 0000000000000800 00000000000007ff > [ 0.094521] GPR28: c000000001462354 0000000000000010 c0000003f857e418 0000000000000010 > [ 0.094580] NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 > [ 0.094585] LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 > [ 0.094589] Call Trace: > [ 0.094593] [c0000003fe10b5e0] [c00000000008a274] xive_find_target_in_mask+0x74/0x2f0 (unreliable) > [ 0.094601] [c0000003fe10b690] [c00000000008abf0] xive_pick_irq_target.isra.1+0x200/0x230 > [ 0.094608] [c0000003fe10b830] [c00000000008b250] xive_irq_startup+0x60/0x180 > [ 0.094614] [c0000003fe10b8b0] [c0000000001608f0] irq_startup+0x70/0xd0 > [ 0.094620] [c0000003fe10b8f0] [c00000000015df7c] __setup_irq+0x7bc/0x880 > [ 0.094626] [c0000003fe10ba90] [c00000000015e30c] request_threaded_irq+0x14c/0x2c0 > [ 0.094632] [c0000003fe10baf0] [c0000000000aeb00] request_event_sources_irqs+0x100/0x180 > [ 0.094639] [c0000003fe10bc10] [c000000000e7d2f8] __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 > [ 0.094646] [c0000003fe10bc40] [c00000000000cc88] do_one_initcall+0x68/0x1d0 > [ 0.094652] [c0000003fe10bd00] [c000000000e643c8] kernel_init_freeable+0x290/0x374 > [ 0.094658] [c0000003fe10bdc0] [c00000000000d4f4] kernel_init+0x24/0x170 > [ 0.094664] [c0000003fe10be30] [c00000000000b268] ret_from_kernel_thread+0x5c/0x74 > [ 0.094669] Instruction dump: > [ 0.094673] 48586529 60000000 e8dc0002 393f0001 7f9b4800 7c7d07b4 7d3f07b4 409effcc > [ 0.094682] 7f9d3000 7d26e850 79290fe0 69290001 <0b090000> 409c0194 3f620004 3b7b8ec8 > > Fix this problem by using a minimum value. > > Signed-off-by: Cédric Le Goater <clg@kaod.org> > --- > arch/powerpc/sysdev/xive/common.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c > index 536ee15f61fb..4dac7d560a42 100644 > --- a/arch/powerpc/sysdev/xive/common.c > +++ b/arch/powerpc/sysdev/xive/common.c > @@ -463,7 +463,7 @@ static int xive_find_target_in_mask(const struct cpumask *mask, > int cpu, first, num, i; > > /* Pick up a starting point CPU in the mask based on fuzz */ > - num = cpumask_weight(mask); > + num = min_t(int, cpumask_weight(mask), nr_cpu_ids); > first = fuzz % num; > > /* Locate it */ > -- > 2.7.5
On Wed, 2017-08-09 at 17:06 +1000, Michael Ellerman wrote: > /** > * cpumask_weight - Count of bits in *srcp > * @srcp: the cpumask to count bits (< nr_cpu_ids) in. > */ > static inline unsigned int cpumask_weight(const struct cpumask *srcp) > { > return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); > } > > > I don't know what the comment on srcp is trying to say. It's not true > that it only counts nr_cpu_ids worth of bits. Right, and that's what bit me. We should report that on lkml and maybe propose a patch that crops the result... Cheers, Ben.
On 08/09/2017 09:06 AM, Michael Ellerman wrote: > Cédric Le Goater <clg@kaod.org> writes: >> When called from xive_irq_startup(), the size of the cpumask can be >> larger than nr_cpu_ids. Most of time, its value is NR_CPUS (2048). > > Ugh, you're right. > > #define nr_cpumask_bits ((unsigned int)NR_CPUS) > ... > /** > * cpumask_weight - Count of bits in *srcp > * @srcp: the cpumask to count bits (< nr_cpu_ids) in. > */ > static inline unsigned int cpumask_weight(const struct cpumask *srcp) > { > return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); > } > > > I don't know what the comment on srcp is trying to say. It's not true > that it only counts nr_cpu_ids worth of bits. > > So it does seem if we're passed a mask with > nr_cpu_ids bits set then > cpumask_weight() will return > nr_cpu_ids, which is .. unhelpful. > > > BUT, I don't see other code handling cpumask_weight() returning > > nr_cpu_ids - at least I can't find any with some grepping. > > > So what is going wrong here that we're being passed a mask with more > than nr_cpu_ids bits set? > > I think the affinity mask is copied to the desc in desc_smp_init(), and > the call chain will be: > > irq_create_mapping() > -> irq_domain_alloc_descs() > -> __irq_alloc_descs() > -> alloc_descs() > -> alloc_desc() > -> desc_set_defaults() > -> desc_smp_init() > > irq_create_mapping() is doing: > > virq = irq_domain_alloc_descs(-1, 1, hwirq, of_node_to_nid(of_node), NULL); > > Where the affinity mask is the NULL at the end. > > So presumably we're hitting the irq_default_affinity case here: > > static void desc_smp_init(struct irq_desc *desc, int node, > const struct cpumask *affinity) > { > if (!affinity) > affinity = irq_default_affinity; > cpumask_copy(desc->irq_common_data.affinity, affinity); > > > Which comes from: > > static void __init init_irq_default_affinity(void) > { > #ifdef CONFIG_CPUMASK_OFFSTACK > if (!irq_default_affinity) > zalloc_cpumask_var(&irq_default_affinity, GFP_NOWAIT); > #endif > if (cpumask_empty(irq_default_affinity)) > cpumask_setall(irq_default_affinity); > } > > And cpumask_setall() will indeed set NR_CPUs bits. > > > So that all seems sane, except that it does mean cpumask_weight() can > return > nr_cpu_ids which is awkward. > > I guess this patch is a good fix, I'll expand the change log a bit. Yes. Thanks for the digging. I didn't do as much. Cheers, C.
Michael Ellerman <mpe@ellerman.id.au> writes: > Cédric Le Goater <clg@kaod.org> writes: >> When called from xive_irq_startup(), the size of the cpumask can be >> larger than nr_cpu_ids. Most of time, its value is NR_CPUS (2048). ... > > I guess this patch is a good fix, I'll expand the change log a bit. Actually this got lost, because it was part of the larger series, and then you sent a v2 of the series and so v1 was marked superseeded :/ Anyway I've pulled this out of the series and will merge it. cheers
On 08/24/2017 07:49 AM, Michael Ellerman wrote: > Michael Ellerman <mpe@ellerman.id.au> writes: > >> Cédric Le Goater <clg@kaod.org> writes: >>> When called from xive_irq_startup(), the size of the cpumask can be >>> larger than nr_cpu_ids. Most of time, its value is NR_CPUS (2048). > ... >> >> I guess this patch is a good fix, I'll expand the change log a bit. > > Actually this got lost, because it was part of the larger series, and > then you sent a v2 of the series and so v1 was marked superseeded :/ > > Anyway I've pulled this out of the series and will merge it. I just saw your resend. Thanks for doing so, C.
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c index 536ee15f61fb..4dac7d560a42 100644 --- a/arch/powerpc/sysdev/xive/common.c +++ b/arch/powerpc/sysdev/xive/common.c @@ -463,7 +463,7 @@ static int xive_find_target_in_mask(const struct cpumask *mask, int cpu, first, num, i; /* Pick up a starting point CPU in the mask based on fuzz */ - num = cpumask_weight(mask); + num = min_t(int, cpumask_weight(mask), nr_cpu_ids); first = fuzz % num; /* Locate it */
When called from xive_irq_startup(), the size of the cpumask can be larger than nr_cpu_ids. Most of time, its value is NR_CPUS (2048). This can result in such WARNINGs in xive_find_target_in_mask(): [ 0.094480] WARNING: CPU: 10 PID: 1 at ../arch/powerpc/sysdev/xive/common.c:476 xive_find_target_in_mask+0x110/0x2f0 [ 0.094486] Modules linked in: [ 0.094491] CPU: 10 PID: 1 Comm: swapper/0 Not tainted 4.12.0+ #3 [ 0.094496] task: c0000003fae4f200 task.stack: c0000003fe108000 [ 0.094501] NIP: c00000000008a310 LR: c00000000008a2e4 CTR: 000000000072ca34 [ 0.094506] REGS: c0000003fe10b360 TRAP: 0700 Not tainted (4.12.0+) [ 0.094510] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> [ 0.094515] CR: 88000222 XER: 20040008 [ 0.094521] CFAR: c00000000008a2cc SOFTE: 0 [ 0.094521] GPR00: c00000000008a274 c0000003fe10b5e0 c000000001428f00 0000000000000010 [ 0.094521] GPR04: 0000000000000010 0000000000000010 0000000000000010 0000000000000099 [ 0.094521] GPR08: 0000000000000010 0000000000000001 ffffffffffff0000 0000000000000000 [ 0.094521] GPR12: 0000000000000000 c00000000fff2d00 c00000000000d4d8 0000000000000000 [ 0.094521] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 0.094521] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000b451e8 [ 0.094521] GPR24: 00000000ffffffff c000000001462354 0000000000000800 00000000000007ff [ 0.094521] GPR28: c000000001462354 0000000000000010 c0000003f857e418 0000000000000010 [ 0.094580] NIP [c00000000008a310] xive_find_target_in_mask+0x110/0x2f0 [ 0.094585] LR [c00000000008a2e4] xive_find_target_in_mask+0xe4/0x2f0 [ 0.094589] Call Trace: [ 0.094593] [c0000003fe10b5e0] [c00000000008a274] xive_find_target_in_mask+0x74/0x2f0 (unreliable) [ 0.094601] [c0000003fe10b690] [c00000000008abf0] xive_pick_irq_target.isra.1+0x200/0x230 [ 0.094608] [c0000003fe10b830] [c00000000008b250] xive_irq_startup+0x60/0x180 [ 0.094614] [c0000003fe10b8b0] [c0000000001608f0] irq_startup+0x70/0xd0 [ 0.094620] [c0000003fe10b8f0] [c00000000015df7c] __setup_irq+0x7bc/0x880 [ 0.094626] [c0000003fe10ba90] [c00000000015e30c] request_threaded_irq+0x14c/0x2c0 [ 0.094632] [c0000003fe10baf0] [c0000000000aeb00] request_event_sources_irqs+0x100/0x180 [ 0.094639] [c0000003fe10bc10] [c000000000e7d2f8] __machine_initcall_pseries_init_ras_IRQ+0x104/0x134 [ 0.094646] [c0000003fe10bc40] [c00000000000cc88] do_one_initcall+0x68/0x1d0 [ 0.094652] [c0000003fe10bd00] [c000000000e643c8] kernel_init_freeable+0x290/0x374 [ 0.094658] [c0000003fe10bdc0] [c00000000000d4f4] kernel_init+0x24/0x170 [ 0.094664] [c0000003fe10be30] [c00000000000b268] ret_from_kernel_thread+0x5c/0x74 [ 0.094669] Instruction dump: [ 0.094673] 48586529 60000000 e8dc0002 393f0001 7f9b4800 7c7d07b4 7d3f07b4 409effcc [ 0.094682] 7f9d3000 7d26e850 79290fe0 69290001 <0b090000> 409c0194 3f620004 3b7b8ec8 Fix this problem by using a minimum value. Signed-off-by: Cédric Le Goater <clg@kaod.org> --- arch/powerpc/sysdev/xive/common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)