Message ID | 4C7E9FC1.60004@us.ibm.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Benjamin Herrenschmidt |
Headers | show |
On Wed, 2010-09-01 at 11:47 -0700, Darren Hart wrote: > from tip/rt/2.6.33 causes the preempt_count() to change across the cede > call. This patch appears to prevents the proxy preempt_count assignment > from happening. This non-local-cpu assignment to 0 would cause an > underrun of preempt_count() if the local-cpu had disabled preemption > prior to the assignment and then later tried to enable it. This appears > to be the case with the stack of __trace_hcall* calls preceeding the > return from extended_cede_processor() in the latency format trace-cmd > report: > > <idle>-0 1d.... 201.252737: function: .cpu_die Note, the above 1d.... is a series of values. The first being the CPU, the next if interrupts are disabled, the next if the NEED_RESCHED flag is set, the next is softirqs enabled or disabled, next the preempt_count, and finally the lockdepth count. Here we only care about the preempt_count, which is zero when '.' and a number if it is something else. It is the second to last field in that list. > <idle>-0 1d.... 201.252738: function: .pseries_mach_cpu_die > <idle>-0 1d.... 201.252740: function: .idle_task_exit > <idle>-0 1d.... 201.252741: function: .switch_slb > <idle>-0 1d.... 201.252742: function: .xics_teardown_cpu > <idle>-0 1d.... 201.252743: function: .xics_set_cpu_priority > <idle>-0 1d.... 201.252744: function: .__trace_hcall_entry > <idle>-0 1d..1. 201.252745: function: .probe_hcall_entry ^ preempt_count set to 1 > <idle>-0 1d..1. 201.252746: function: .__trace_hcall_exit > <idle>-0 1d..2. 201.252747: function: .probe_hcall_exit > <idle>-0 1d.... 201.252748: function: .__trace_hcall_entry > <idle>-0 1d..1. 201.252748: function: .probe_hcall_entry > <idle>-0 1d..1. 201.252750: function: .__trace_hcall_exit > <idle>-0 1d..2. 201.252751: function: .probe_hcall_exit > <idle>-0 1d.... 201.252752: function: .__trace_hcall_entry > <idle>-0 1d..1. 201.252753: function: .probe_hcall_entry ^ ^ CPU preempt_count Entering the function probe_hcall_entry() the preempt_count is 1 (see below). But probe_hcall_entry does: h = &get_cpu_var(hcall_stats)[opcode / 4]; Without doing the put (which it does in probe_hcall_exit()) So exiting the probe_hcall_entry() the prempt_count is 2. The trace_hcall_entry() will do a preempt_enable() making it leave as 1. > offon.sh-3684 6..... 201.466488: bprint: .smp_pSeries_kick_cpu : resetting pcnt to 0 for cpu 1 This is CPU 6, changing the preempt count from 1 to 0. > > preempt_count() is reset from 1 to 0 by smp_startup_cpu() without the > QCSS_NOT_STOPPED check from the patch above. > > <idle>-0 1d.... 201.466503: function: .__trace_hcall_exit Note: __trace_hcall_exit() and __trace_hcall_entry() basically do: preempt_disable(); call probe(); preempt_enable(); > <idle>-0 1d..1. 201.466505: function: .probe_hcall_exit The preempt_count of 1 entering the probe_hcall_exit() is because of the preempt_disable() shown above. It should have been entered as a 2. But then it does: put_cpu_var(hcall_stats); making preempt_count 0. But the preempt_enable() in the trace_hcall_exit() causes this to be -1. > <idle>-0 1d.Hff. 201.466507: bprint: .pseries_mach_cpu_die : after cede: ffffffff > > With the preempt_count() being one less than it should be, the final > preempt_enable() in the trace_hcall path drops preempt_count to > 0xffffffff, which of course is an illegal value and leads to a crash. I'm confused to how this works in mainline? -- Steve > > I don't know if the patch above is the "right fix" for this or not as > don't yet understand why this occurs only with PREEMPT_RT and not in > mainline. Once we sort that out, we may need a more specific fix.
> + /* Check to see if the CPU out of FW already for kexec */ Wow, that comment is shit. The checkin comment in aef40e87d866355ffd279ab21021de733242d0d5 is much better. > This comment is really confusing to me. I _think_ it is saying that this test > determines if the CPU is done executing firmware code and has begun executing > OS code.... Is that right? Yeah. It means for a normal boot, the CPU will not have started yet (still in firmware (FW)) so we have to call FW to bring it out. In the kexec case though, the CPU will have started already (it's spinning in the kernel) so we don't have to call FW to bring it back out again. To distinguish between these two cases, we ask FW if the CPU has started or not (via smp_query_cpu_stopped()) and if it's already start, don't restart it. Originally, we could call FW to start a cpu that was already started, but FW changed recently and stopped allowing us to do this. Hence this patch. Mikey
--- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -82,6 +82,12 @@ static inline int __devinit smp_startup_cpu(unsigned int lcpu) pcpu = get_hard_smp_processor_id(lcpu); + /* Check to see if the CPU out of FW already for kexec */ This comment is really confusing to me. I _think_ it is saying that this test determines if the CPU is done executing firmware code and has begun executing OS code.... Is that right? + if (smp_query_cpu_stopped(pcpu) == QCSS_NOT_STOPPED){ + cpu_set(lcpu, of_spin_map); + return 1;