Message ID | 87a8eb5dwa.fsf@concordia.ellerman.id.au (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
On 11/10/16 22:22, Michael Ellerman wrote: > Tejun Heo <tj@kernel.org> writes: > >> Hello, Michael. >> >> On Mon, Oct 10, 2016 at 09:22:55PM +1100, Michael Ellerman wrote: >>> This patch seems to be causing one of my Power8 boxes not to boot. >>> >>> Specifically commit 3347fa092821 ("workqueue: make workqueue available >>> early during boot") in linux-next. >>> >>> If I revert this on top of next-20161005 then the machine boots again. >>> >>> I've attached the oops below. It looks like the cfs_rq of p->se is NULL? >> >> Hah, weird that it's arch dependent, or maybe it's just different >> config options. Most likely, it's caused by workqueue_init() call >> being moved too early. Can you please try the following patch and see >> whether the problem goes away? > > No that doesn't help. > > What does is this: > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 94732d1ab00a..4e79549d242f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1614,7 +1614,8 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags) > * [ this allows ->select_task() to simply return task_cpu(p) and > * not worry about this generic constraint ] > */ > - if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || > + if (unlikely(cpu >= nr_cpu_ids || > + !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || > !cpu_online(cpu))) > cpu = select_fallback_rq(task_cpu(p), p); > > > The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq > is NULL. > > The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is > NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in > set_task_rq() and happen to get NULL. > > We never should have done set_task_rq(p, 2048), because 2048 is >= > nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() > doesn't cope with that. > > The reason we're calling set_task_rq() with CPU 2048 is because > in select_task_rq() we had tsk_nr_cpus_allowed() = 0, because > tsk_cpus_allowed(p) is an empty cpu mask. > > That means we do in select_task_rq(): > cpu = cpumask_any(tsk_cpus_allowed(p)); > > And when tsk_cpus_allowed(p) is empty cpumask_any() returns nr_cpu_ids, > causing cpu to be set to 2048 in my case. > > select_task_rq() then does the check to see if it should use a fallback > rq: > > if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || > !cpu_online(cpu))) > cpu = select_fallback_rq(task_cpu(p), p); > > > But in both those checks we end up indexing off the end of the cpu mask, > because cpu is >= nr_cpu_ids. At least on my system they both return > true and so we return cpu == 2048. > > The patch above is pretty clearly not the right fix, though maybe it's a > good safety measure. > > Presumably we shouldn't be ending up with tsk_cpus_allowed() being > empty, but I haven't had time to track down why that's happening. > > cheers > +peterz FYI: I see the samething on my cpu as well, its just that I get lucky and cpu_online(cpu) returns false. I think from a functional perspective we may want to get some additional debug checks for places where the cpumask is empty early during boot. Looks like there is a dependency between cpumasks and cpus coming online. I wonder if we can hit similar issues during hotplug FWIW, your patch looks correct to me, though one might argue that cpumask_test_cpu() is a better place to fix it Balbir Singh
Hello, Michael. On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote: > The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq > is NULL. > > The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is > NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in > set_task_rq() and happen to get NULL. > > We never should have done set_task_rq(p, 2048), because 2048 is >= > nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() > doesn't cope with that. Hmm... it doesn't reproduce it here and can't see how the commit would affect this given that it doesn't really change when the kworker kthreads are being created. > Presumably we shouldn't be ending up with tsk_cpus_allowed() being > empty, but I haven't had time to track down why that's happening. Can you please add WARN_ON_ONCE(!tsk_nr_cpus_allowed(p)) to select_task_rq() and post what that says? Thanks.
Hello, Balbir. On Tue, Oct 11, 2016 at 11:21:09PM +1100, Balbir Singh wrote: > FYI: I see the samething on my cpu as well, its just that I get lucky > and cpu_online(cpu) returns false. Are you seeing this on x86 or is your test setup also a power machine? Thanks.
On 15/10/16 02:07, Tejun Heo wrote: > Hello, Michael. > > On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote: >> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq >> is NULL. >> >> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is >> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in >> set_task_rq() and happen to get NULL. >> >> We never should have done set_task_rq(p, 2048), because 2048 is >= >> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() >> doesn't cope with that. > > Hmm... it doesn't reproduce it here and can't see how the commit would > affect this given that it doesn't really change when the kworker > kthreads are being created. > >> Presumably we shouldn't be ending up with tsk_cpus_allowed() being >> empty, but I haven't had time to track down why that's happening. > I think the basic analysis shows the change to creation of unbounded workqueues from the unbound_hash, but those have a pool cpumask empty. > Can you please add WARN_ON_ONCE(!tsk_nr_cpus_allowed(p)) to > select_task_rq() and post what that says? > > Thanks. > Balbir Singh.
On 15/10/16 02:08, Tejun Heo wrote: > Hello, Balbir. > > On Tue, Oct 11, 2016 at 11:21:09PM +1100, Balbir Singh wrote: >> FYI: I see the samething on my cpu as well, its just that I get lucky >> and cpu_online(cpu) returns false. > > Are you seeing this on x86 or is your test setup also a power machine? > > Thanks. > I saw this on a powerpc box Balbir Singh.
Tejun Heo <tj@kernel.org> writes: > Hello, Michael. > > On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote: >> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq >> is NULL. >> >> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is >> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in >> set_task_rq() and happen to get NULL. >> >> We never should have done set_task_rq(p, 2048), because 2048 is >= >> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() >> doesn't cope with that. > > Hmm... it doesn't reproduce it here and can't see how the commit would > affect this given that it doesn't really change when the kworker > kthreads are being created. Try turning on CONFIG_DEBUG_PER_CPU_MAPS=y ? That will warn if you're indexing off the end of a cpu mask and just getting lucky with the result. >> Presumably we shouldn't be ending up with tsk_cpus_allowed() being >> empty, but I haven't had time to track down why that's happening. > > Can you please add WARN_ON_ONCE(!tsk_nr_cpus_allowed(p)) to > select_task_rq() and post what that says? It says: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:1602 try_to_wake_up+0x3f4/0x5c0 Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161014-dirty #116 task: c000000ff9200000 task.stack: c000001ffc084000 NIP: c0000000000f1ba4 LR: c0000000000f180c CTR: 0000000000000000 REGS: c000001ffc0878f0 TRAP: 0700 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161014-dirty) MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000422 XER: 00000000 CFAR: c0000000000f18bc SOFTE: 0 GPR00: c0000000000f180c c000001ffc087b70 c000000000e83400 0000000000000000 GPR04: 0000000000000002 0000000000000000 0000000000000000 0000000000000000 GPR08: c000000000dc3400 0000000000000001 0000000000000002 0000000000000000 GPR12: 0000000000000000 c00000000fb80000 c00000000000e0c8 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000eb8960 GPR24: 0000000000000000 c000000000d8ce00 0000000000000000 0000000000000000 GPR28: c0000007f54050f4 0000000000000000 0000000000000000 c0000007f5404900 NIP [c0000000000f1ba4] try_to_wake_up+0x3f4/0x5c0 LR [c0000000000f180c] try_to_wake_up+0x5c/0x5c0 Call Trace: [c000001ffc087b70] [c0000000000f180c] try_to_wake_up+0x5c/0x5c0 (unreliable) [c000001ffc087bf0] [c0000000000d53e4] create_worker+0x144/0x250 [c000001ffc087c90] [c000000000cf7930] workqueue_init+0x170/0x19c [c000001ffc087d00] [c000000000ce0e74] kernel_init_freeable+0x158/0x360 [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160 [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc Instruction dump: e8790890 4bff6ed9 2fa30000 419e00dc 60000000 4bfffe54 3d02fff4 8928d7f9 2f890000 409e0018 39200001 9928d7f9 <0fe00000> 60000000 60420000 3b5f0368 ---[ end trace 0000000000000000 ]--- But I'm not sure that tells us anything new? cheers
Tejun Heo <tj@kernel.org> writes: > Hello, Michael. > > On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote: >> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq >> is NULL. >> >> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is >> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in >> set_task_rq() and happen to get NULL. >> >> We never should have done set_task_rq(p, 2048), because 2048 is >= >> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() >> doesn't cope with that. > > Hmm... it doesn't reproduce it here and can't see how the commit would > affect this given that it doesn't really change when the kworker > kthreads are being created. It changes when the pool attributes are created, which is the source of the bug. The original crash happens because we have a task with an empty cpus_allowed mask. That mask originally comes from pool->attrs->cpumask. The attrs for the pool are created early via workqueue_init_early() in apply_wqattrs_prepare(): start_here_common -> start_kernel -> workqueue_init_early -> __alloc_workqueue_key -> apply_workqueue_attrs -> apply_workqueue_attrs_locked -> apply_wqattrs_prepare In there we do: copy_workqueue_attrs(new_attrs, attrs); cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask); if (unlikely(cpumask_empty(new_attrs->cpumask))) cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask); ... copy_workqueue_attrs(tmp_attrs, new_attrs); ... for_each_node(node) { if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) { + BUG_ON(cpumask_empty(tmp_attrs->cpumask)); ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs); The bad case (where we hit the BUG_ON I added above) is where we are creating a wq for node 1. In wq_calc_node_cpumask() we do: cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]); return !cpumask_equal(cpumask, attrs->cpumask); Which with the arguments inserted is: cpumask_and(tmp_attrs->cpumask, new_attrs->cpumask, wq_numa_possible_cpumask[1]); return !cpumask_equal(tmp_attrs->cpumask, new_attrs->cpumask); And that results in tmp_attrs->cpumask being empty, because wq_numa_possible_cpumask[1] is an empty cpumask. The reason wq_numa_possible_cpumask[1] is an empty mask is because in wq_numa_init() we did: for_each_possible_cpu(cpu) { node = cpu_to_node(cpu); if (WARN_ON(node == NUMA_NO_NODE)) { pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu); /* happens iff arch is bonkers, let's just proceed */ return; } cpumask_set_cpu(cpu, tbl[node]); } And cpu_to_node() returned node 0 for every CPU in the system, despite there being multiple nodes. That happened because we haven't yet called set_cpu_numa_node() for the non-boot cpus, because that happens in smp_prepare_cpus(), and workqueue_init_early() is called much earlier than that. This doesn't trigger on x86 because it does set_cpu_numa_node() in setup_per_cpu_areas(), which is called prior to workqueue_init_early(). We can (should) probably do the same on powerpc, I'll look at that tomorrow. But other arches may have a similar problem, and at the very least we need to document that workqueue_init_early() relies on cpu_to_node() working. cheers
On 17/10/16 23:24, Michael Ellerman wrote: > Tejun Heo <tj@kernel.org> writes: > >> Hello, Michael. >> >> On Tue, Oct 11, 2016 at 10:22:13PM +1100, Michael Ellerman wrote: >>> The oops happens because we're in enqueue_task_fair() and p->se->cfs_rq >>> is NULL. >>> >>> The cfs_rq is NULL because we did set_task_rq(p, 2048), where 2048 is >>> NR_CPUS. That causes us to index past the end of the tg->cfs_rq array in >>> set_task_rq() and happen to get NULL. >>> >>> We never should have done set_task_rq(p, 2048), because 2048 is >= >>> nr_cpu_ids, which means it's not a valid CPU number, and set_task_rq() >>> doesn't cope with that. >> >> Hmm... it doesn't reproduce it here and can't see how the commit would >> affect this given that it doesn't really change when the kworker >> kthreads are being created. > > It changes when the pool attributes are created, which is the source of > the bug. > > The original crash happens because we have a task with an empty cpus_allowed > mask. That mask originally comes from pool->attrs->cpumask. > > The attrs for the pool are created early via workqueue_init_early() in > apply_wqattrs_prepare(): > > start_here_common > -> start_kernel > -> workqueue_init_early > -> __alloc_workqueue_key > -> apply_workqueue_attrs > -> apply_workqueue_attrs_locked > -> apply_wqattrs_prepare > > In there we do: > > copy_workqueue_attrs(new_attrs, attrs); > cpumask_and(new_attrs->cpumask, new_attrs->cpumask, wq_unbound_cpumask); > if (unlikely(cpumask_empty(new_attrs->cpumask))) > cpumask_copy(new_attrs->cpumask, wq_unbound_cpumask); > ... > copy_workqueue_attrs(tmp_attrs, new_attrs); > ... > for_each_node(node) { > if (wq_calc_node_cpumask(new_attrs, node, -1, tmp_attrs->cpumask)) { > + BUG_ON(cpumask_empty(tmp_attrs->cpumask)); > ctx->pwq_tbl[node] = alloc_unbound_pwq(wq, tmp_attrs); > > > The bad case (where we hit the BUG_ON I added above) is where we are > creating a wq for node 1. > > In wq_calc_node_cpumask() we do: > > cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]); > return !cpumask_equal(cpumask, attrs->cpumask); > > Which with the arguments inserted is: > > cpumask_and(tmp_attrs->cpumask, new_attrs->cpumask, wq_numa_possible_cpumask[1]); > return !cpumask_equal(tmp_attrs->cpumask, new_attrs->cpumask); > > And that results in tmp_attrs->cpumask being empty, because > wq_numa_possible_cpumask[1] is an empty cpumask. > > The reason wq_numa_possible_cpumask[1] is an empty mask is because in > wq_numa_init() we did: > > for_each_possible_cpu(cpu) { > node = cpu_to_node(cpu); > if (WARN_ON(node == NUMA_NO_NODE)) { > pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu); > /* happens iff arch is bonkers, let's just proceed */ > return; > } > cpumask_set_cpu(cpu, tbl[node]); > } > > And cpu_to_node() returned node 0 for every CPU in the system, despite there > being multiple nodes. > > That happened because we haven't yet called set_cpu_numa_node() for the non-boot > cpus, because that happens in smp_prepare_cpus(), and > workqueue_init_early() is called much earlier than that. > > This doesn't trigger on x86 because it does set_cpu_numa_node() in > setup_per_cpu_areas(), which is called prior to workqueue_init_early(). > > We can (should) probably do the same on powerpc, I'll look at that > tomorrow. But other arches may have a similar problem, and at the very > least we need to document that workqueue_init_early() relies on > cpu_to_node() working. Don't we do the setup cpu->node mapings in initmem_init()? Ideally we have setup_arch->intmem_init->numa_setup_cpu Will look at it tomorrow Balbir Singh
Hello, On Sat, Oct 15, 2016 at 08:48:01PM +1100, Michael Ellerman wrote: > > Hmm... it doesn't reproduce it here and can't see how the commit would > > affect this given that it doesn't really change when the kworker > > kthreads are being created. > > Try turning on CONFIG_DEBUG_PER_CPU_MAPS=y ? > > That will warn if you're indexing off the end of a cpu mask and just > getting lucky with the result. That's not happening on x86. That could mean that powerpc is initializing cpu_possible_mask after workqueue_init_early(). Looking into it. > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 1 at ../kernel/sched/core.c:1602 try_to_wake_up+0x3f4/0x5c0 > Modules linked in: > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-compiler_gcc-6.2.0-next-20161014-dirty #116 > task: c000000ff9200000 task.stack: c000001ffc084000 > NIP: c0000000000f1ba4 LR: c0000000000f180c CTR: 0000000000000000 > REGS: c000001ffc0878f0 TRAP: 0700 Not tainted (4.8.0-compiler_gcc-6.2.0-next-20161014-dirty) > MSR: 9000000002029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000422 XER: 00000000 > CFAR: c0000000000f18bc SOFTE: 0 > GPR00: c0000000000f180c c000001ffc087b70 c000000000e83400 0000000000000000 > GPR04: 0000000000000002 0000000000000000 0000000000000000 0000000000000000 > GPR08: c000000000dc3400 0000000000000001 0000000000000002 0000000000000000 > GPR12: 0000000000000000 c00000000fb80000 c00000000000e0c8 0000000000000000 > GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000000eb8960 > GPR24: 0000000000000000 c000000000d8ce00 0000000000000000 0000000000000000 > GPR28: c0000007f54050f4 0000000000000000 0000000000000000 c0000007f5404900 > NIP [c0000000000f1ba4] try_to_wake_up+0x3f4/0x5c0 > LR [c0000000000f180c] try_to_wake_up+0x5c/0x5c0 > Call Trace: > [c000001ffc087b70] [c0000000000f180c] try_to_wake_up+0x5c/0x5c0 (unreliable) > [c000001ffc087bf0] [c0000000000d53e4] create_worker+0x144/0x250 > [c000001ffc087c90] [c000000000cf7930] workqueue_init+0x170/0x19c > [c000001ffc087d00] [c000000000ce0e74] kernel_init_freeable+0x158/0x360 > [c000001ffc087dc0] [c00000000000e0e4] kernel_init+0x24/0x160 > [c000001ffc087e30] [c00000000000bfa0] ret_from_kernel_thread+0x5c/0xbc > Instruction dump: > e8790890 4bff6ed9 2fa30000 419e00dc 60000000 4bfffe54 3d02fff4 8928d7f9 > 2f890000 409e0018 39200001 9928d7f9 <0fe00000> 60000000 60420000 3b5f0368 > ---[ end trace 0000000000000000 ]--- > > But I'm not sure that tells us anything new? Yeah, I should have asked to print out information of the target task but it looks like we have enough information now. Thanks.
Hello, Michael. On Mon, Oct 17, 2016 at 11:24:34PM +1100, Michael Ellerman wrote: > The bad case (where we hit the BUG_ON I added above) is where we are > creating a wq for node 1. > > In wq_calc_node_cpumask() we do: > > cpumask_and(cpumask, attrs->cpumask, wq_numa_possible_cpumask[node]); > return !cpumask_equal(cpumask, attrs->cpumask); > > Which with the arguments inserted is: > > cpumask_and(tmp_attrs->cpumask, new_attrs->cpumask, wq_numa_possible_cpumask[1]); > return !cpumask_equal(tmp_attrs->cpumask, new_attrs->cpumask); > > And that results in tmp_attrs->cpumask being empty, because > wq_numa_possible_cpumask[1] is an empty cpumask. Ah, should have read this before replying to the previous mail, so it's the numa mask, not the cpu_possible_mask. > The reason wq_numa_possible_cpumask[1] is an empty mask is because in > wq_numa_init() we did: > > for_each_possible_cpu(cpu) { > node = cpu_to_node(cpu); > if (WARN_ON(node == NUMA_NO_NODE)) { > pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu); > /* happens iff arch is bonkers, let's just proceed */ > return; > } > cpumask_set_cpu(cpu, tbl[node]); > } > > And cpu_to_node() returned node 0 for every CPU in the system, despite there > being multiple nodes. > > That happened because we haven't yet called set_cpu_numa_node() for the non-boot > cpus, because that happens in smp_prepare_cpus(), and > workqueue_init_early() is called much earlier than that. > > This doesn't trigger on x86 because it does set_cpu_numa_node() in > setup_per_cpu_areas(), which is called prior to workqueue_init_early(). > > We can (should) probably do the same on powerpc, I'll look at that > tomorrow. But other arches may have a similar problem, and at the very > least we need to document that workqueue_init_early() relies on > cpu_to_node() working. I should be able to move the numa part of initialization to the later init function. Working on it. Thanks.
Balbir Singh <bsingharora@gmail.com> writes: > On 17/10/16 23:24, Michael Ellerman wrote: >> That happened because we haven't yet called set_cpu_numa_node() for the non-boot >> cpus, because that happens in smp_prepare_cpus(), and >> workqueue_init_early() is called much earlier than that. >> >> This doesn't trigger on x86 because it does set_cpu_numa_node() in >> setup_per_cpu_areas(), which is called prior to workqueue_init_early(). >> >> We can (should) probably do the same on powerpc, I'll look at that >> tomorrow. But other arches may have a similar problem, and at the very >> least we need to document that workqueue_init_early() relies on >> cpu_to_node() working. > > Don't we do the setup cpu->node mapings in initmem_init()? > Ideally we have setup_arch->intmem_init->numa_setup_cpu That sets up numa_cpu_lookup_table, which is a powerpc only data structure. But it doesn't setup the percpu numa_node variables, used by cpu_to_node(), because percpu areas are not setup yet. cheers
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 94732d1ab00a..4e79549d242f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1614,7 +1614,8 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags) * [ this allows ->select_task() to simply return task_cpu(p) and * not worry about this generic constraint ] */ - if (unlikely(!cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || + if (unlikely(cpu >= nr_cpu_ids || + !cpumask_test_cpu(cpu, tsk_cpus_allowed(p)) || !cpu_online(cpu))) cpu = select_fallback_rq(task_cpu(p), p);