Patchwork Floating point in the kernel

login
register
mail settings
Submitter Sean MacLennan
Date Dec. 10, 2009, 6:13 p.m.
Message ID <20091210131311.78cab78c@lappy.seanm.ca>
Download mbox | patch
Permalink /patch/40849/
State Rejected
Headers show

Comments

Sean MacLennan - Dec. 10, 2009, 6:13 p.m.
One of our drivers has code that was originally running on a DSP. The
code makes heavy use of floating point. We have isolated all the
floating point to one kthread in the driver. Using enable_kernel_fp()
this has worked well.

But under a specific heavy RTP load, we started getting kernel panics.
To make a long story short, the scheduler disables FP when you are
context switched out. When you come back and access a FP instruction,
you trap and call load_up_fpu() and everything is fine..... unless you
are in the kernel. If you are in the kernel, like our kthread is, you
get a "kernel FP unavailable exception".

Basically we got away with it for two years because the thread is at
high priority (-20) and tries very hard to finish within 1ms. But the
RTP high load causes us to context switch out and crash. The following
patch fixes this:


So, if you are still reading this far, I am just looking for any
suggestions. Are there better ways of handling this? Have I
missed something? Anybody know why pt_regs might be NULL?

Cheers,
   Sean
Benjamin Herrenschmidt - Dec. 10, 2009, 8:19 p.m.
On Thu, 2009-12-10 at 13:13 -0500, Sean MacLennan wrote:
> One of our drivers has code that was originally running on a DSP. The
> code makes heavy use of floating point. We have isolated all the
> floating point to one kthread in the driver. Using enable_kernel_fp()
> this has worked well.
> 
> But under a specific heavy RTP load, we started getting kernel panics.
> To make a long story short, the scheduler disables FP when you are
> context switched out. When you come back and access a FP instruction,
> you trap and call load_up_fpu() and everything is fine..... unless you
> are in the kernel. If you are in the kernel, like our kthread is, you
> get a "kernel FP unavailable exception".

Right, you must not use floating point in the kernel -and- expect it to
survive schedule. You should use preempt_disable() and ensure you don't
schedule() around a block using the FP.

Note that you may also lose the FP register content if you schedule.

> Basically we got away with it for two years because the thread is at
> high priority (-20) and tries very hard to finish within 1ms. But the
> RTP high load causes us to context switch out and crash. The following
> patch fixes this:
> 
> diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
> index 50504ae..3476de9 100644
> --- a/arch/powerpc/kernel/head_booke.h
> +++ b/arch/powerpc/kernel/head_booke.h
> @@ -383,7 +383,7 @@ label:
>  #define FP_UNAVAILABLE_EXCEPTION                                             \
>         START_EXCEPTION(FloatingPointUnavailable)                             \
>         NORMAL_EXCEPTION_PROLOG;                                              \
> -       beq     1f;                                                           \
> +       /* SAM beq      1f; */                                          \
>         bl      load_up_fpu;            /* if from user, just load it up */   \
>         b       fast_exception_return;                                        \
>  1:     addi    r3,r1,STACK_FRAME_OVERHEAD;                                   \
> 
> With the patch we run fine, at the expense that we lose the ability to
> catch real FP unavailable exceptions in the kernel. It is because of
> this loss that I have not submitted this patch.

I'm not sure that will work in all cases, you are playing a bit with
fire :-) I suppose I could think it through after breakfast but my first
thought is "don't do that !". Among other things you may not have a
pt_regs to save the registers to.

> We also hit another problem under high RTP load... and this is the
> patch that fixes it:
> 
> diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> index fc8f5b1..051a02c 100644
> --- a/arch/powerpc/kernel/fpu.S
> +++ b/arch/powerpc/kernel/fpu.S
> @@ -83,6 +83,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
>         stfd    fr0,THREAD_FPSCR(r4)
>         PPC_LL  r5,PT_REGS(r4)
>         toreal(r5)
> +
> +       /* Under heavy RTP load the hsp thread can have a NULL pt_regs. */
> +       PPC_LCMPI       0,r5,0
> +       beq     1f
> +

Right and that means you just lost the content of your FP registers.

>         PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
>         li      r10,MSR_FP|MSR_FE0|MSR_FE1
>         andc    r4,r4,r10               /* disable FP for previous task */
> 
> So, if you are still reading this far, I am just looking for any
> suggestions. Are there better ways of handling this? Have I
> missed something? Anybody know why pt_regs might be NULL?

Just don't schedule when you enable_kernel_fp() or move your workload to
userspace :-)

Cheers,
Ben.

> Cheers,
>    Sean
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
Arnd Bergmann - Dec. 10, 2009, 8:32 p.m.
On Thursday 10 December 2009 18:13:11 Sean MacLennan wrote:
> One of our drivers has code that was originally running on a DSP. The
> code makes heavy use of floating point. We have isolated all the
> floating point to one kthread in the driver. Using enable_kernel_fp()
> this has worked well.
> 
> But under a specific heavy RTP load, we started getting kernel panics.
> To make a long story short, the scheduler disables FP when you are
> context switched out. When you come back and access a FP instruction,
> you trap and call load_up_fpu() and everything is fine..... unless you
> are in the kernel. If you are in the kernel, like our kthread is, you
> get a "kernel FP unavailable exception".

I think the rule here is that you have to disable preemption and must not
call any potentially blocking functions like kmalloc when enable_kernel_fp
is set. The kernel has good control over whether a thread get context switched
or not, so it should be able to prevent these problems.

	Arnd <><
Sean MacLennan - Dec. 10, 2009, 8:33 p.m.
On Fri, 11 Dec 2009 07:19:39 +1100
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:

> I'm not sure that will work in all cases, you are playing a bit with
> fire :-) I suppose I could think it through after breakfast but my
> first thought is "don't do that !". Among other things you may not
> have a pt_regs to save the registers to.

Actually, we usually do have pt_regs, or we are stealing some ;)

> > We also hit another problem under high RTP load... and this is the
> > patch that fixes it:
> > 
> > diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> > index fc8f5b1..051a02c 100644
> > --- a/arch/powerpc/kernel/fpu.S
> > +++ b/arch/powerpc/kernel/fpu.S
> > @@ -83,6 +83,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
> >         stfd    fr0,THREAD_FPSCR(r4)
> >         PPC_LL  r5,PT_REGS(r4)
> >         toreal(r5)
> > +
> > +       /* Under heavy RTP load the hsp thread can have a NULL
> > pt_regs. */
> > +       PPC_LCMPI       0,r5,0
> > +       beq     1f
> > +  
> 
> Right and that means you just lost the content of your FP registers.

This only happens once in a blue moon, even under heavy RTP load. But I
agree, this could be a real problem.

> >         PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
> >         li      r10,MSR_FP|MSR_FE0|MSR_FE1
> >         andc    r4,r4,r10               /* disable FP for previous
> > task */
> > 
> > So, if you are still reading this far, I am just looking for any
> > suggestions. Are there better ways of handling this? Have I
> > missed something? Anybody know why pt_regs might be NULL?  
> 
> Just don't schedule when you enable_kernel_fp() or move your workload
> to userspace :-)

To be honest, I can't find *why* we are scheduling :( They only way we
give up the CPU is with locking... and none of the locks where hit
during the problem. We also never get near our timeslice... the longest
run I saw when the problem happened was 670us. 

Is there a way to disable scheduling? We currently do not have preempt
enabled... but may in the future.

Cheers,
   Sean
Benjamin Herrenschmidt - Dec. 10, 2009, 8:56 p.m.
On Thu, 2009-12-10 at 15:33 -0500, Sean MacLennan wrote:
> To be honest, I can't find *why* we are scheduling :( They only way we
> give up the CPU is with locking... and none of the locks where hit
> during the problem. We also never get near our timeslice... the
> longest
> run I saw when the problem happened was 670us. 
> 
> Is there a way to disable scheduling? We currently do not have preempt
> enabled... but may in the future.

Well, that would be interesting to see where you schedule indeed.

Outside of preempt and an explicit lock or memory allocation I don't
see ... Or get/put_user ?

Cheers,
Ben.
Arnd Bergmann - Dec. 10, 2009, 9:35 p.m.
On Thursday 10 December 2009, Sean MacLennan wrote:
> To be honest, I can't find why we are scheduling :( They only way we
> give up the CPU is with locking... and none of the locks where hit
> during the problem. We also never get near our timeslice... the longest
> run I saw when the problem happened was 670us. 
> 
> Is there a way to disable scheduling? We currently do not have preempt
> enabled... but may in the future.

If you do preempt_disable(), it should give you a nice oops with a backtrace
at the point where an actual schedule happens.

The sequence should be something like

preempt_disable();
enable_kernel_fp();
/* use FP here */
preempt_enable();
/* may schedule again */

	Arnd <><
Sean MacLennan - Dec. 11, 2009, 12:17 a.m.
Found it. We are calling sock_sendmsg, which is definitely a call that
can block! The receive side is done in a thread (which does no floating
point ;), but the send was called directly from the "evil FP thread".

It looks like under light load, you tend to get away with it, so our
trivial testing did not catch it. And most of our warp users do RTP via
asterisk, so this RTP path was not really tested.

I really appreciate the input, the comments convinced me I was going
in the wrong direction and forced me to look harder. I am going to back
out the two patches I sent and fix this properly instead.

Cheers,
   Sean
Arnd Bergmann - Dec. 11, 2009, 11:28 a.m.
On Friday 11 December 2009, Sean MacLennan wrote:
> Found it. We are calling sock_sendmsg, which is definitely a call that
> can block! The receive side is done in a thread (which does no floating
> point ;), but the send was called directly from the "evil FP thread".
> 
> It looks like under light load, you tend to get away with it, so our
> trivial testing did not catch it. And most of our warp users do RTP via
> asterisk, so this RTP path was not really tested.
> 
> I really appreciate the input, the comments convinced me I was going
> in the wrong direction and forced me to look harder. I am going to back
> out the two patches I sent and fix this properly instead.

Glad to hear you found it.

As a general remark though, both the usage of sockets and of floating
point are a really strong hint that the stuff you are doing should be
in user space instead, which would have saved you a lot of time in
debugging. If you do something like this again, make sure to get the
partitioning of the code between code in kernel and user space right.
If in doubt, just ask on the mailing list.

	Arnd <><

Patch

diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 50504ae..3476de9 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -383,7 +383,7 @@  label:
 #define FP_UNAVAILABLE_EXCEPTION                                             \
        START_EXCEPTION(FloatingPointUnavailable)                             \
        NORMAL_EXCEPTION_PROLOG;                                              \
-       beq     1f;                                                           \
+       /* SAM beq      1f; */                                          \
        bl      load_up_fpu;            /* if from user, just load it up */   \
        b       fast_exception_return;                                        \
 1:     addi    r3,r1,STACK_FRAME_OVERHEAD;                                   \

With the patch we run fine, at the expense that we lose the ability to
catch real FP unavailable exceptions in the kernel. It is because of
this loss that I have not submitted this patch.

We also hit another problem under high RTP load... and this is the
patch that fixes it:

diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
index fc8f5b1..051a02c 100644
--- a/arch/powerpc/kernel/fpu.S
+++ b/arch/powerpc/kernel/fpu.S
@@ -83,6 +83,11 @@  END_FTR_SECTION_IFSET(CPU_FTR_VSX)
        stfd    fr0,THREAD_FPSCR(r4)
        PPC_LL  r5,PT_REGS(r4)
        toreal(r5)
+
+       /* Under heavy RTP load the hsp thread can have a NULL pt_regs. */
+       PPC_LCMPI       0,r5,0
+       beq     1f
+
        PPC_LL  r4,_MSR-STACK_FRAME_OVERHEAD(r5)
        li      r10,MSR_FP|MSR_FE0|MSR_FE1
        andc    r4,r4,r10               /* disable FP for previous task */