Message ID | 20091210131311.78cab78c@lappy.seanm.ca (mailing list archive) |
---|---|
State | Rejected |
Headers | show |
On Thu, 2009-12-10 at 13:13 -0500, Sean MacLennan wrote: > One of our drivers has code that was originally running on a DSP. The > code makes heavy use of floating point. We have isolated all the > floating point to one kthread in the driver. Using enable_kernel_fp() > this has worked well. > > But under a specific heavy RTP load, we started getting kernel panics. > To make a long story short, the scheduler disables FP when you are > context switched out. When you come back and access a FP instruction, > you trap and call load_up_fpu() and everything is fine..... unless you > are in the kernel. If you are in the kernel, like our kthread is, you > get a "kernel FP unavailable exception". Right, you must not use floating point in the kernel -and- expect it to survive schedule. You should use preempt_disable() and ensure you don't schedule() around a block using the FP. Note that you may also lose the FP register content if you schedule. > Basically we got away with it for two years because the thread is at > high priority (-20) and tries very hard to finish within 1ms. But the > RTP high load causes us to context switch out and crash. The following > patch fixes this: > > diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h > index 50504ae..3476de9 100644 > --- a/arch/powerpc/kernel/head_booke.h > +++ b/arch/powerpc/kernel/head_booke.h > @@ -383,7 +383,7 @@ label: > #define FP_UNAVAILABLE_EXCEPTION \ > START_EXCEPTION(FloatingPointUnavailable) \ > NORMAL_EXCEPTION_PROLOG; \ > - beq 1f; \ > + /* SAM beq 1f; */ \ > bl load_up_fpu; /* if from user, just load it up */ \ > b fast_exception_return; \ > 1: addi r3,r1,STACK_FRAME_OVERHEAD; \ > > With the patch we run fine, at the expense that we lose the ability to > catch real FP unavailable exceptions in the kernel. It is because of > this loss that I have not submitted this patch. I'm not sure that will work in all cases, you are playing a bit with fire :-) I suppose I could think it through after breakfast but my first thought is "don't do that !". Among other things you may not have a pt_regs to save the registers to. > We also hit another problem under high RTP load... and this is the > patch that fixes it: > > diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S > index fc8f5b1..051a02c 100644 > --- a/arch/powerpc/kernel/fpu.S > +++ b/arch/powerpc/kernel/fpu.S > @@ -83,6 +83,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) > stfd fr0,THREAD_FPSCR(r4) > PPC_LL r5,PT_REGS(r4) > toreal(r5) > + > + /* Under heavy RTP load the hsp thread can have a NULL pt_regs. */ > + PPC_LCMPI 0,r5,0 > + beq 1f > + Right and that means you just lost the content of your FP registers. > PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > li r10,MSR_FP|MSR_FE0|MSR_FE1 > andc r4,r4,r10 /* disable FP for previous task */ > > So, if you are still reading this far, I am just looking for any > suggestions. Are there better ways of handling this? Have I > missed something? Anybody know why pt_regs might be NULL? Just don't schedule when you enable_kernel_fp() or move your workload to userspace :-) Cheers, Ben. > Cheers, > Sean > _______________________________________________ > Linuxppc-dev mailing list > Linuxppc-dev@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev
On Thursday 10 December 2009 18:13:11 Sean MacLennan wrote: > One of our drivers has code that was originally running on a DSP. The > code makes heavy use of floating point. We have isolated all the > floating point to one kthread in the driver. Using enable_kernel_fp() > this has worked well. > > But under a specific heavy RTP load, we started getting kernel panics. > To make a long story short, the scheduler disables FP when you are > context switched out. When you come back and access a FP instruction, > you trap and call load_up_fpu() and everything is fine..... unless you > are in the kernel. If you are in the kernel, like our kthread is, you > get a "kernel FP unavailable exception". I think the rule here is that you have to disable preemption and must not call any potentially blocking functions like kmalloc when enable_kernel_fp is set. The kernel has good control over whether a thread get context switched or not, so it should be able to prevent these problems. Arnd <><
On Fri, 11 Dec 2009 07:19:39 +1100 Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > I'm not sure that will work in all cases, you are playing a bit with > fire :-) I suppose I could think it through after breakfast but my > first thought is "don't do that !". Among other things you may not > have a pt_regs to save the registers to. Actually, we usually do have pt_regs, or we are stealing some ;) > > We also hit another problem under high RTP load... and this is the > > patch that fixes it: > > > > diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S > > index fc8f5b1..051a02c 100644 > > --- a/arch/powerpc/kernel/fpu.S > > +++ b/arch/powerpc/kernel/fpu.S > > @@ -83,6 +83,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) > > stfd fr0,THREAD_FPSCR(r4) > > PPC_LL r5,PT_REGS(r4) > > toreal(r5) > > + > > + /* Under heavy RTP load the hsp thread can have a NULL > > pt_regs. */ > > + PPC_LCMPI 0,r5,0 > > + beq 1f > > + > > Right and that means you just lost the content of your FP registers. This only happens once in a blue moon, even under heavy RTP load. But I agree, this could be a real problem. > > PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5) > > li r10,MSR_FP|MSR_FE0|MSR_FE1 > > andc r4,r4,r10 /* disable FP for previous > > task */ > > > > So, if you are still reading this far, I am just looking for any > > suggestions. Are there better ways of handling this? Have I > > missed something? Anybody know why pt_regs might be NULL? > > Just don't schedule when you enable_kernel_fp() or move your workload > to userspace :-) To be honest, I can't find *why* we are scheduling :( They only way we give up the CPU is with locking... and none of the locks where hit during the problem. We also never get near our timeslice... the longest run I saw when the problem happened was 670us. Is there a way to disable scheduling? We currently do not have preempt enabled... but may in the future. Cheers, Sean
On Thu, 2009-12-10 at 15:33 -0500, Sean MacLennan wrote: > To be honest, I can't find *why* we are scheduling :( They only way we > give up the CPU is with locking... and none of the locks where hit > during the problem. We also never get near our timeslice... the > longest > run I saw when the problem happened was 670us. > > Is there a way to disable scheduling? We currently do not have preempt > enabled... but may in the future. Well, that would be interesting to see where you schedule indeed. Outside of preempt and an explicit lock or memory allocation I don't see ... Or get/put_user ? Cheers, Ben.
On Thursday 10 December 2009, Sean MacLennan wrote: > To be honest, I can't find why we are scheduling :( They only way we > give up the CPU is with locking... and none of the locks where hit > during the problem. We also never get near our timeslice... the longest > run I saw when the problem happened was 670us. > > Is there a way to disable scheduling? We currently do not have preempt > enabled... but may in the future. If you do preempt_disable(), it should give you a nice oops with a backtrace at the point where an actual schedule happens. The sequence should be something like preempt_disable(); enable_kernel_fp(); /* use FP here */ preempt_enable(); /* may schedule again */ Arnd <><
Found it. We are calling sock_sendmsg, which is definitely a call that can block! The receive side is done in a thread (which does no floating point ;), but the send was called directly from the "evil FP thread". It looks like under light load, you tend to get away with it, so our trivial testing did not catch it. And most of our warp users do RTP via asterisk, so this RTP path was not really tested. I really appreciate the input, the comments convinced me I was going in the wrong direction and forced me to look harder. I am going to back out the two patches I sent and fix this properly instead. Cheers, Sean
On Friday 11 December 2009, Sean MacLennan wrote: > Found it. We are calling sock_sendmsg, which is definitely a call that > can block! The receive side is done in a thread (which does no floating > point ;), but the send was called directly from the "evil FP thread". > > It looks like under light load, you tend to get away with it, so our > trivial testing did not catch it. And most of our warp users do RTP via > asterisk, so this RTP path was not really tested. > > I really appreciate the input, the comments convinced me I was going > in the wrong direction and forced me to look harder. I am going to back > out the two patches I sent and fix this properly instead. Glad to hear you found it. As a general remark though, both the usage of sockets and of floating point are a really strong hint that the stuff you are doing should be in user space instead, which would have saved you a lot of time in debugging. If you do something like this again, make sure to get the partitioning of the code between code in kernel and user space right. If in doubt, just ask on the mailing list. Arnd <><
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h index 50504ae..3476de9 100644 --- a/arch/powerpc/kernel/head_booke.h +++ b/arch/powerpc/kernel/head_booke.h @@ -383,7 +383,7 @@ label: #define FP_UNAVAILABLE_EXCEPTION \ START_EXCEPTION(FloatingPointUnavailable) \ NORMAL_EXCEPTION_PROLOG; \ - beq 1f; \ + /* SAM beq 1f; */ \ bl load_up_fpu; /* if from user, just load it up */ \ b fast_exception_return; \ 1: addi r3,r1,STACK_FRAME_OVERHEAD; \ With the patch we run fine, at the expense that we lose the ability to catch real FP unavailable exceptions in the kernel. It is because of this loss that I have not submitted this patch. We also hit another problem under high RTP load... and this is the patch that fixes it: diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S index fc8f5b1..051a02c 100644 --- a/arch/powerpc/kernel/fpu.S +++ b/arch/powerpc/kernel/fpu.S @@ -83,6 +83,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX) stfd fr0,THREAD_FPSCR(r4) PPC_LL r5,PT_REGS(r4) toreal(r5) + + /* Under heavy RTP load the hsp thread can have a NULL pt_regs. */ + PPC_LCMPI 0,r5,0 + beq 1f + PPC_LL r4,_MSR-STACK_FRAME_OVERHEAD(r5) li r10,MSR_FP|MSR_FE0|MSR_FE1 andc r4,r4,r10 /* disable FP for previous task */