diff mbox series

[2/2] ARC: show_regs: fix lockdep splat for good

Message ID 1545159239-30628-3-git-send-email-vgupta@synopsys.com
State New
Headers show
Series ARC show_regs fixes | expand

Commit Message

Vineet Gupta Dec. 18, 2018, 6:53 p.m. UTC
signal handling core calls ARCH show_regs() with preemption disabled
which causes __might_sleep functions such as mmput leading to lockdep
splat.  Workaround by re-enabling preemption temporarily.

This may not be as bad as it sounds since the preemption disabling
itself was introduced for a supressing smp_processor_id() warning in x86
code by commit 3a9f84d354ce ("signals, debug: fix BUG: using
smp_processor_id() in preemptible code in print_fatal_signal()")

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 arch/arc/kernel/troubleshoot.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Michal Hocko Dec. 20, 2018, 1:04 p.m. UTC | #1
On Tue 18-12-18 10:53:59, Vineet Gupta wrote:
> signal handling core calls ARCH show_regs() with preemption disabled
> which causes __might_sleep functions such as mmput leading to lockdep
> splat.  Workaround by re-enabling preemption temporarily.
> 
> This may not be as bad as it sounds since the preemption disabling
> itself was introduced for a supressing smp_processor_id() warning in x86
> code by commit 3a9f84d354ce ("signals, debug: fix BUG: using
> smp_processor_id() in preemptible code in print_fatal_signal()")

The commit you are referring to here sounds dubious in itself. We do not
want to stick a preempt_disable just to silence a warning. show_regs is
called from preemptible context at several places (e.g. __warn). Maybe
this was not the case in 2009 when the change was introduced but this
seems like a relict from the past. So can we fix the actual problem
rather than build on top of it instead?

Or maybe I am just missing something here.
Vineet Gupta Dec. 20, 2018, 6:45 p.m. UTC | #2
On 12/20/18 5:04 AM, Michal Hocko wrote:
> On Tue 18-12-18 10:53:59, Vineet Gupta wrote:
>> signal handling core calls ARCH show_regs() with preemption disabled
>> which causes __might_sleep functions such as mmput leading to lockdep
>> splat.  Workaround by re-enabling preemption temporarily.
>>
>> This may not be as bad as it sounds since the preemption disabling
>> itself was introduced for a supressing smp_processor_id() warning in x86
>> code by commit 3a9f84d354ce ("signals, debug: fix BUG: using
>> smp_processor_id() in preemptible code in print_fatal_signal()")
> The commit you are referring to here sounds dubious in itself.

Indeed that was my thought as well, but it did introduce the preemption disabling
logic aroung core calling show_regs() !

> We do not
> want to stick a preempt_disable just to silence a warning.

I presume you are referring to original commit, not my anti-change in ARC code,
which is actually re-enabling it.

> show_regs is
> called from preemptible context at several places (e.g. __warn).

Right, but do we have other reports which show this, perhaps not too many distros
have CONFIG__PREEMPT enabled ?

> Maybe
> this was not the case in 2009 when the change was introduced but this
> seems like a relict from the past. So can we fix the actual problem
> rather than build on top of it instead?

The best/correct fix is to remove the preempt diabling in core code, but that
affects every arch out there and will likely trip dormant land mines, needed
localized fixes like I'm dealing with now.

-Vineet
Michal Hocko Dec. 21, 2018, 1:04 p.m. UTC | #3
On Thu 20-12-18 18:45:48, Vineet Gupta wrote:
> On 12/20/18 5:04 AM, Michal Hocko wrote:
> > On Tue 18-12-18 10:53:59, Vineet Gupta wrote:
> >> signal handling core calls ARCH show_regs() with preemption disabled
> >> which causes __might_sleep functions such as mmput leading to lockdep
> >> splat.  Workaround by re-enabling preemption temporarily.
> >>
> >> This may not be as bad as it sounds since the preemption disabling
> >> itself was introduced for a supressing smp_processor_id() warning in x86
> >> code by commit 3a9f84d354ce ("signals, debug: fix BUG: using
> >> smp_processor_id() in preemptible code in print_fatal_signal()")
> > The commit you are referring to here sounds dubious in itself.
> 
> Indeed that was my thought as well, but it did introduce the preemption disabling
> logic aroung core calling show_regs() !
> 
> > We do not
> > want to stick a preempt_disable just to silence a warning.
> 
> I presume you are referring to original commit, not my anti-change in ARC code,
> which is actually re-enabling it.

Yes, but you are building on a broken concept I believe. What
implications does re-enabling really have? Now you could reschedule and
you can move to another CPU. Is this really safe? I believe that yes
because the preemption disabling is simply bogus. Which doesn't sound
like a proper justification, does it?
 
> > show_regs is
> > called from preemptible context at several places (e.g. __warn).
> 
> Right, but do we have other reports which show this, perhaps not too many distros
> have CONFIG__PREEMPT enabled ?

I do not follow. If there is some path to require show_regs to run with
preemption disabled while others don't then something is clearly wrong.

> > Maybe
> > this was not the case in 2009 when the change was introduced but this
> > seems like a relict from the past. So can we fix the actual problem
> > rather than build on top of it instead?
> 
> The best/correct fix is to remove the preempt diabling in core code, but that
> affects every arch out there and will likely trip dormant land mines, needed
> localized fixes like I'm dealing with now.

Yes, the fix might be more involved but I would much rather prefer a
correct code which builds on solid assumptions.
Michal Hocko Dec. 21, 2018, 1:27 p.m. UTC | #4
On Fri 21-12-18 14:04:04, Michal Hocko wrote:
[...]
> Yes, but you are building on a broken concept I believe. What
> implications does re-enabling really have? Now you could reschedule and
> you can move to another CPU. Is this really safe? I believe that yes
> because the preemption disabling is simply bogus. Which doesn't sound
> like a proper justification, does it?

Well, thinking about it a bit more. What is the result of calling
preempt_enable outside of preempt_disabled section? E.g. __warn which
doesn't disable preemption AFAICS.
Vineet Gupta Dec. 21, 2018, 5:55 p.m. UTC | #5
On 12/21/18 5:04 AM, Michal Hocko wrote:
>> I presume you are referring to original commit, not my anti-change in ARC code,
>> which is actually re-enabling it.
> 
> Yes, but you are building on a broken concept I believe.

Not sure where this is heading. Broken concept was introduced by disabling
preemption around show_regs() to silence x86 smp_processor_id() splat in 2009.

> What
> implications does re-enabling really have ? Now you could reschedule and> you can move to another CPU. Is this really safe?

From initial testing, none so far. show_regs() is simply pretty printing the
passed pt_regs and decoding the current task, which agreed could move to a
different CPU (likely will due to console/printk calls), but I don't see how that
could mess up its mm or othe rinternal plumbing which it prints.


> I believe that yes
> because the preemption disabling is simply bogus. Which doesn't sound
> like a proper justification, does it?

[snip]

> I do not follow. If there is some path to require show_regs to run with
> preemption disabled while others don't then something is clearly wrong.

[snip]

> Yes, the fix might be more involved but I would much rather prefer a
> correct code which builds on solid assumptions.

Right so the first step is reverting the disabled semantics for ARC and do some
heavy testing to make sure any fallouts are addressed etc. And if that works, then
propagate this change to core itself. Low risk strategy IMO - agree ?

Thx,
-Vineet
Michal Hocko Dec. 24, 2018, 7:10 p.m. UTC | #6
On Fri 21-12-18 09:55:34, Vineet Gupta wrote:
> On 12/21/18 5:04 AM, Michal Hocko wrote:
[...]
> > Yes, the fix might be more involved but I would much rather prefer a
> > correct code which builds on solid assumptions.
> 
> Right so the first step is reverting the disabled semantics for ARC and do some
> heavy testing to make sure any fallouts are addressed etc. And if that works, then
> propagate this change to core itself. Low risk strategy IMO - agree ?

Yeah, I would simply remove the preempt_disable and see what falls out.
smp_processor_id could be converted to the raw version etc...
diff mbox series

Patch

diff --git a/arch/arc/kernel/troubleshoot.c b/arch/arc/kernel/troubleshoot.c
index 2885bec71fb8..c650d3de13e1 100644
--- a/arch/arc/kernel/troubleshoot.c
+++ b/arch/arc/kernel/troubleshoot.c
@@ -177,6 +177,12 @@  void show_regs(struct pt_regs *regs)
 	struct task_struct *tsk = current;
 	struct callee_regs *cregs;
 
+	/*
+	 * generic code calls us with preemption disabled, but some calls
+	 * here could sleep, so re-enable to avoid lockdep splat
+	 */
+	preempt_enable();
+
 	print_task_path_n_nm(tsk);
 	show_regs_print_info(KERN_INFO);
 
@@ -219,6 +225,8 @@  void show_regs(struct pt_regs *regs)
 	cregs = (struct callee_regs *)current->thread.callee_reg;
 	if (cregs)
 		show_callee_regs(cregs);
+
+	preempt_disable();
 }
 
 void show_kernel_fault_diag(const char *str, struct pt_regs *regs,