diff mbox

G5 Xserve rackmeter broken?

Message ID 20150514100656.GC609@fuloong-minipc.musicnaut.iki.fi (mailing list archive)
State Rejected
Headers show

Commit Message

Aaro Koskinen May 14, 2015, 10:06 a.m. UTC
Hi,

On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > I'm running with HZ=100 so the values are still probably within
> > jiffy resolution, so perhaps the calculation should first do
> > idle = min(idle, total)?
> 
> Does it gives you a reasonable output if you do that ?

The below change fixes the idle system blinking behaviour.

I'm also able to reproduce the leds going off during full CPU load case.
It seems there is some DMA error. Normally, reading rm->dma_regs->status
in the IRQ handler gives 0x8400. In the failure cases I've seen values
0x8880 and 0x8980 - the IRQ will stop after this and it will need
paused -> started cycle before it gets going again (but sometimes fails
again soon after).

A.

Comments

Benjamin Herrenschmidt May 14, 2015, 10:14 a.m. UTC | #1
On Thu, 2015-05-14 at 13:06 +0300, Aaro Koskinen wrote:
> Hi,
> 
> On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> > On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > > I'm running with HZ=100 so the values are still probably within
> > > jiffy resolution, so perhaps the calculation should first do
> > > idle = min(idle, total)?
> > 
> > Does it gives you a reasonable output if you do that ?
> 
> The below change fixes the idle system blinking behaviour.
> 
> I'm also able to reproduce the leds going off during full CPU load case.
> It seems there is some DMA error. Normally, reading rm->dma_regs->status
> in the IRQ handler gives 0x8400. In the failure cases I've seen values
> 0x8880 and 0x8980 - the IRQ will stop after this and it will need
> paused -> started cycle before it gets going again (but sometimes fails
> again soon after).

That's a bit worrysome, is that new ? Smells like faulting HW ...

Ben.

> A.
> 
> diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
> index 048901a..3381fa59 100644
> --- a/drivers/macintosh/rack-meter.c
> +++ b/drivers/macintosh/rack-meter.c
> @@ -227,6 +227,7 @@ static void rackmeter_do_timer(struct work_struct *work)
>  
>  	total_idle_ticks = get_cpu_idle_time(cpu);
>  	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
> +	idle_ticks = min(idle_ticks, total_ticks);
>  	rcpu->prev_idle = total_idle_ticks;
>  
>  	/* We do a very dumb calculation to update the LEDs for now,
Aaro Koskinen May 14, 2015, 11:48 a.m. UTC | #2
Hi,

On Thu, May 14, 2015 at 08:14:57PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-05-14 at 13:06 +0300, Aaro Koskinen wrote:
> > On Wed, May 13, 2015 at 06:39:40AM +1000, Benjamin Herrenschmidt wrote:
> > > On Tue, 2015-05-12 at 20:55 +0300, Aaro Koskinen wrote:
> > > > I'm running with HZ=100 so the values are still probably within
> > > > jiffy resolution, so perhaps the calculation should first do
> > > > idle = min(idle, total)?
> > > 
> > > Does it gives you a reasonable output if you do that ?
> > 
> > The below change fixes the idle system blinking behaviour.
> > 
> > I'm also able to reproduce the leds going off during full CPU load case.
> > It seems there is some DMA error. Normally, reading rm->dma_regs->status
> > in the IRQ handler gives 0x8400. In the failure cases I've seen values
> > 0x8880 and 0x8980 - the IRQ will stop after this and it will need
> > paused -> started cycle before it gets going again (but sometimes fails
> > again soon after).
> 
> That's a bit worrysome, is that new ? Smells like faulting HW ...

Ok, right... I swapped the PSU and HD into a different box, and now it
seems to work as expected! (At least the first hour into GCC bootstrap
is still going fine...)

A.
diff mbox

Patch

diff --git a/drivers/macintosh/rack-meter.c b/drivers/macintosh/rack-meter.c
index 048901a..3381fa59 100644
--- a/drivers/macintosh/rack-meter.c
+++ b/drivers/macintosh/rack-meter.c
@@ -227,6 +227,7 @@  static void rackmeter_do_timer(struct work_struct *work)
 
 	total_idle_ticks = get_cpu_idle_time(cpu);
 	idle_ticks = (unsigned int) (total_idle_ticks - rcpu->prev_idle);
+	idle_ticks = min(idle_ticks, total_ticks);
 	rcpu->prev_idle = total_idle_ticks;
 
 	/* We do a very dumb calculation to update the LEDs for now,