diff mbox

Another (ESP?) scsi blk-mq problem on sparc64

Message ID 54CA61DF.30807@kernel.dk
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Jens Axboe Jan. 29, 2015, 4:37 p.m. UTC
On 01/28/2015 11:53 PM, Meelis Roos wrote:
> On Mon, 24 Nov 2014, David Miller wrote:
>
>> From: mroos@linux.ee
>> Date: Tue, 25 Nov 2014 00:23:20 +0200 (EET)
>>
>>>>>>> Yes, that does look like the case.  Do you have a good trick on how
>>>>>>> to allocate a map for the highest possible cpu number without first
>>>>>>> iterating the cpu map?  I couldn't find something that looks like a
>>>>>>> highest_possible_cpu() helper.
>>>>>>
>>>>>> Honestly I think that num_posible_cpus() should return the max of
>>>>>> number of CPUs (weigt), and the highest numbered CPU. It's a pain in
>>>>>> the butt to handle this otherwise.
>>>>>
>>>>> Hear, hear!!!  That would make my life easier, and would make this sort
>>>>> of problem much less likely to occur!
>>>>
>>>> How about this one?
>>>
>>> It make the machine work.
>>
>> Thanks for testing!
>>
>
> What's the status of this fix? It is still not applied on yesterdays
> 3.19.0-rc6-00105-gc59c961 git...

Hmm, I thought commit a33c1ba29138 took care of it... Does the attached 
work?

Comments

Meelis Roos Sept. 4, 2015, 8:33 a.m. UTC | #1
> > > > > > > > Yes, that does look like the case.  Do you have a good trick on
> > > > > > > > how
> > > > > > > > to allocate a map for the highest possible cpu number without
> > > > > > > > first
> > > > > > > > iterating the cpu map?  I couldn't find something that looks
> > > > > > > > like a
> > > > > > > > highest_possible_cpu() helper.
> > > > > > >
> > > > > > > Honestly I think that num_posible_cpus() should return the max of
> > > > > > > number of CPUs (weigt), and the highest numbered CPU. It's a pain
> > > > > > > in
> > > > > > > the butt to handle this otherwise.
> > > > > >
> > > > > > Hear, hear!!!  That would make my life easier, and would make this
> > > > > > sort
> > > > > > of problem much less likely to occur!
> > > > >
> > > > > How about this one?
> > > >
> > > > It make the machine work.
> > >
> > > Thanks for testing!
> > >
> >
> > What's the status of this fix? It is still not applied on yesterdays
> > 3.19.0-rc6-00105-gc59c961 git...
> 
> Hmm, I thought commit a33c1ba29138 took care of it... Does the attached work?

Sorry for taking so long - now I am back to the machine and turned it on 
yesterday. The machine is Sun Fire 3500 with 4 sparsely numbered 4 CPUs 
(6,7,18,19).

First I fetched 4.2.0 and tried it without any patches. That got the 
following error - seems to be blk-mq related but I do not rmemeber if it 
is exactly the same as before.

Next I tried 4.2 + your previously sent map-sz.patch but it makes no 
difference.

spitfire_data_access_exception: SFSR[0000000000801009] 
SFAR[fffffdff01b02310], going.
              \|/ ____ \|/
              "@'/ .. \`@"
              /_| \__/ |_\
                 \__U_/
swapper/6(1): Dax [#1]
CPU: 19 PID: 1 Comm: swapper/6 Not tainted 4.2.0 #67
task: fffff800fe092fe0 ti: fffff800fe0b8000 task.ti: fffff800fe0b8000
TSTATE: 0000000080001602 TPC: 000000000065bad4 TNPC: 000000000065bad8 Y: 
00000003    Not tainted
TPC: <kobject_init+0x14/0xa0>
g0: 0000000000000022 g1: 0000000080000000 g2: 0000000000000000 g3: 0000000000000001
g4: fffff800fe092fe0 g5: fffff800fe404000 g6: fffff800fe0b8000 g7: 0000000000000017
o0: 0000000000000000 o1: fffff800fe359bd0 o2: 00000000009c0c00 o3: 0000000000a2ba00
o4: fffff800fd50a368 o5: 0000000000000000 sp: fffff800fe0bafe1 ret_pc: 000000000065cd4c
RPC: <kobject_uevent_env+0x4c/0x500>
l0: fffff800fe092fe0 l1: fffff800fe042420 l2: 00000000008b15f0 l3: fffff800fe0bb9e0
l4: 00000000009bc415 l5: fffff8017e0bb9df l6: 0000000000000000 l7: fffff800fe193860
i0: 00000dff01b022d8 i1: 0000000000a4afe0 i2: 0000000000000000 i3: 0000000000a4aa28
i4: 00000000009c1448 i5: fffff800fe359bd0 i6: fffff800fe0bb091 i7: 000000000064a2c0
I7: <blk_mq_register_disk+0xe0/0x1a0>
Call Trace:
 [000000000064a2c0] blk_mq_register_disk+0xe0/0x1a0
 [000000000063f880] blk_register_queue+0xa0/0x120
 [000000000064dbfc] add_disk+0x33c/0x480
 [00000000006f3bd0] loop_add+0x190/0x280
 [0000000000a8c5b0] loop_init+0x160/0x1b0
 [0000000000426ea4] do_one_initcall+0xe4/0x1e0
 [0000000000a70b8c] kernel_init_freeable+0x130/0x1e0
 [000000000087cfa4] kernel_init+0x4/0x100
 [0000000000406124] ret_from_fork+0x1c/0x2c
 [0000000000000000]           (null)
Caller[000000000064a2c0]: blk_mq_register_disk+0xe0/0x1a0
Caller[000000000063f880]: blk_register_queue+0xa0/0x120
Caller[000000000064dbfc]: add_disk+0x33c/0x480
Caller[00000000006f3bd0]: loop_add+0x190/0x280
Caller[0000000000a8c5b0]: loop_init+0x160/0x1b0
Caller[0000000000426ea4]: do_one_initcall+0xe4/0x1e0
Caller[0000000000a70b8c]: kernel_init_freeable+0x130/0x1e0
Caller[000000000087cfa4]: kernel_init+0x4/0x100
Caller[0000000000406124]: ret_from_fork+0x1c/0x2c
Caller[0000000000000000]:           (null)
Instruction DUMP: 15002703  02c6401a  03200000 <c45e2038> 82088001  
22c84009  c20e203c  11002704  92100
Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

Press Stop-A (L1-A) to return to the boot prom
diff mbox

Patch

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 5f13f4d0bcce..527d315dc1a5 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -88,10 +88,11 @@  int blk_mq_update_queue_map(unsigned int *map, unsigned int nr_queues)
 unsigned int *blk_mq_make_queue_map(struct blk_mq_tag_set *set)
 {
 	unsigned int *map;
+	size_t sz;
 
 	/* If cpus are offline, map them to first hctx */
-	map = kzalloc_node(sizeof(*map) * nr_cpu_ids, GFP_KERNEL,
-				set->numa_node);
+	sz = max_t(unsigned int, nr_cpu_ids, num_possible_cpus());
+	map = kzalloc_node(sizeof(*map) * sz, GFP_KERNEL, set->numa_node);
 	if (!map)
 		return NULL;