Patchwork [v2,06/15] watchdog: orion: Remove unneeded BRIDGE_CAUSE clear

login
register
mail settings
Submitter Ezequiel Garcia
Date Jan. 22, 2014, 4:49 p.m.
Message ID <20140122164904.GB27273@localhost>
Download mbox | patch
Permalink /patch/313325/
State New
Headers show

Comments

Ezequiel Garcia - Jan. 22, 2014, 4:49 p.m.
On Tue, Jan 21, 2014 at 04:35:37PM -0700, Jason Gunthorpe wrote:
> On Tue, Jan 21, 2014 at 06:12:32AM -0300, Ezequiel Garcia wrote:
> > After adding the IRQ request, the BRIDGE_CAUSE bit should be cleared by the
> > bridge interrupt controller. There's no longer a need to do it in the watchdog
> > driver, so we can simply remove it.
> 
> When we talked about this before I pointed out that sequence here is
> important:
> 
> - Disable WDT
> - Clear bridge
> - Enable WDT
> 

Sure, I wrote this series with that in mind and made some tests to
ensure that no spurious trigger was possible.

> Looking at this patch in isolation it looks to me like the clear
> bridge lines should be replaced with a request_irq (as that does the
> clear) - is the request_irq in the wrong spot?
> 

First of all, it seems to me that the first item "Disable WDT" is not
currently true on this driver. orion_wdt_start() seem to reset the
counter, but doesn't clear the WDT_EN bit. Do you think we should
enforce a "true" disable?

As an example, s3c2410wdt calls stop() from start(). Maybe we should
do something like it?

Regarding the sequence. Let me see if I got this problem right. The
concern here is about the bootloader leaving the registers in a
weird-state, right?

In that case, I thought that requesting the IRQ at probe time was enough
to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is
started. However, after reading through the irqchip code again, I'm no longer
sure this is the case.

It looks like the BRIDGE_CAUSE register is cleared when the interruption
is acked (which happens in the handler if I understood the code right).
So requesting the IRQ is useless...

I'll trace the code to confirm this.

Sebastian: If the above is correct, do you think we can add a cause clear to
the orion irqchip? (supposing it's harmful) Something like this:
Jason Gunthorpe - Jan. 22, 2014, 5:34 p.m.
On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote:
> > Looking at this patch in isolation it looks to me like the clear
> > bridge lines should be replaced with a request_irq (as that does the
> > clear) - is the request_irq in the wrong spot?
> 
> First of all, it seems to me that the first item "Disable WDT" is not
> currently true on this driver. orion_wdt_start() seem to reset the
> counter, but doesn't clear the WDT_EN bit. Do you think we should
> enforce a "true" disable?

I think so.

> Regarding the sequence. Let me see if I got this problem right. The
> concern here is about the bootloader leaving the registers in a
> weird-state, right?

It isn't just bootloaders to worry about, but also things like kexec..

> In that case, I thought that requesting the IRQ at probe time was enough
> to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is
> started. However, after reading through the irqchip code again, I'm no longer
> sure this is the case.

The watchdog should ideally be fully stopped before request_irq so
there is no possible race.

> It looks like the BRIDGE_CAUSE register is cleared when the interruption
> is acked (which happens in the handler if I understood the code right).
> So requesting the IRQ is useless...

IMHO, the IRQ stuff should clear out pending edge triggered interrupts
at request_irq time. It makes no sense to take an interrupt for a
stale edge event.

I had always assumed the core code did this via irq_gc_ack_clr_bit -
but I don't see an obvious path..

> Sebastian: If the above is correct, do you think we can add a cause clear to
> the orion irqchip? (supposing it's harmful) Something like this:

Hrm, irq_startup looks like the right hook to put something like this
in.

Jason
Ezequiel Garcia - Jan. 22, 2014, 5:45 p.m.
On Wed, Jan 22, 2014 at 10:34:17AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote:
> > > Looking at this patch in isolation it looks to me like the clear
> > > bridge lines should be replaced with a request_irq (as that does the
> > > clear) - is the request_irq in the wrong spot?
> > 
> > First of all, it seems to me that the first item "Disable WDT" is not
> > currently true on this driver. orion_wdt_start() seem to reset the
> > counter, but doesn't clear the WDT_EN bit. Do you think we should
> > enforce a "true" disable?
> 
> I think so.
> 
> > Regarding the sequence. Let me see if I got this problem right. The
> > concern here is about the bootloader leaving the registers in a
> > weird-state, right?
> 
> It isn't just bootloaders to worry about, but also things like kexec..
> 
> > In that case, I thought that requesting the IRQ at probe time was enough
> > to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is
> > started. However, after reading through the irqchip code again, I'm no longer
> > sure this is the case.
> 
> The watchdog should ideally be fully stopped before request_irq so
> there is no possible race.
> 

Agreed. So, let's assume we can guarantee that request_irq() does the
job of clearing the cause register (clearing pending irqs).

So, your suggestion is to put request_irq() in the watchdog start()?

Otherwise, we can ensure a watchdog full stop at probe(), before
requesting the IRQ. Both solutions should work, uh?

I don't have a strong opinion. It's not like the watchdog is going to be
frequently started/stopped (right?) so we can easily do the
request_irq() in the start() without worrying.
Jason Gunthorpe - Jan. 22, 2014, 5:50 p.m.
On Wed, Jan 22, 2014 at 02:45:40PM -0300, Ezequiel Garcia wrote:
 
> Agreed. So, let's assume we can guarantee that request_irq() does the
> job of clearing the cause register (clearing pending irqs).
> 
> So, your suggestion is to put request_irq() in the watchdog start()?
> 
> Otherwise, we can ensure a watchdog full stop at probe(), before
> requesting the IRQ. Both solutions should work, uh?

Right, I would keep the request_irq in probe, just because that feels
more idiomatic..

Jason
Ezequiel Garcia - Jan. 22, 2014, 6:03 p.m.
On Wed, Jan 22, 2014 at 10:50:30AM -0700, Jason Gunthorpe wrote:
> On Wed, Jan 22, 2014 at 02:45:40PM -0300, Ezequiel Garcia wrote:
>  
> > Agreed. So, let's assume we can guarantee that request_irq() does the
> > job of clearing the cause register (clearing pending irqs).
> > 
> > So, your suggestion is to put request_irq() in the watchdog start()?
> > 
> > Otherwise, we can ensure a watchdog full stop at probe(), before
> > requesting the IRQ. Both solutions should work, uh?
> 
> Right, I would keep the request_irq in probe, just because that feels
> more idiomatic..
> 

Done. Baking v4...
Sebastian Hesselbarth - Jan. 22, 2014, 8:31 p.m.
On 01/22/2014 06:34 PM, Jason Gunthorpe wrote:
> On Wed, Jan 22, 2014 at 01:49:05PM -0300, Ezequiel Garcia wrote:
>>> Looking at this patch in isolation it looks to me like the clear
>>> bridge lines should be replaced with a request_irq (as that does the
>>> clear) - is the request_irq in the wrong spot?
>>
>> In that case, I thought that requesting the IRQ at probe time was enough
>> to ensure the BRIDGE_CAUSE would be cleared by the time the watchdog is
>> started. However, after reading through the irqchip code again, I'm no longer
>> sure this is the case.
>
> The watchdog should ideally be fully stopped before request_irq so
> there is no possible race.
>
>> It looks like the BRIDGE_CAUSE register is cleared when the interruption
>> is acked (which happens in the handler if I understood the code right).
>> So requesting the IRQ is useless...
>
> IMHO, the IRQ stuff should clear out pending edge triggered interrupts
> at request_irq time. It makes no sense to take an interrupt for a
> stale edge event.
>
> I had always assumed the core code did this via irq_gc_ack_clr_bit -
> but I don't see an obvious path..
>
>> Sebastian: If the above is correct, do you think we can add a cause clear to
>> the orion irqchip? (supposing it's harmful) Something like this:

Ezequiel,

irqchip/irq-orion.c does mask all interrupts but you are right, it 
should also clear pending interrupts right after that.

So
	/* mask all interrupts */
	writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK);

should become

	/* mask and clear all interrupts */
	writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK);
	writel(~0, gc->reg_base + ORION_BRIDGE_IRQ_CAUSE);

Could also be clear on write 0, I'll check that. I already had some
beer, so I'll postpone any patches till tomorrow.

Clearing BRIDGE_CAUSE will only clear all currently pending upstream
IRQs, of course. If WDT IRQ will be re-raised right after that in
BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no
clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE.

So, you should also clear WDT's irq in the driver yourself to clear a
possible pending upstream BRIDGE_CAUSE.

Sebastian
Jason Gunthorpe - Jan. 22, 2014, 8:52 p.m.
> Clearing BRIDGE_CAUSE will only clear all currently pending upstream
> IRQs, of course. If WDT IRQ will be re-raised right after that in
> BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no
> clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE.

Which is why it makes no sense to clear it one time at kernel start.

Either you only get new edge triggered interrupts after request_irq
(sane behavior) or you might sometimes get an old pending edge
triggered interrupt after request_irq (crazy behavior).

Clearing BRIDGE_CAUSE at kernel start only shortens the racy window it
doesn't eliminate it.

In the more familiar level triggered world the driver would go to the
device and ensure it wasn't asserting an IRQ level and then do the
request_irq. This guarentees it won't get an interrupt callback.

In a edge triggered world the driver should go to the device an ensure
that it won't create a new IRQ, then do request_irq - confident that
there will NEVER be a call to the IRQ handler, under any
circumstances.

So I think edge triggered interrupts need to ack any possible old edge
trigger in the cause register before the first unmask - eg in the
setup callback.

> So, you should also clear WDT's irq in the driver yourself to clear a
> possible pending upstream BRIDGE_CAUSE.

Which isn't possible - the BRIDGE_CAUSE is owned by the irq driver and
it must be cleared there, and it must only be cleared after the wdt
has been stopped so it doesn't set it again.

Notice that Ezequiel has added an IRQ handler that just calls panic,
so a spurious interrupt call is VERY VERY BAD.

Jason
Sebastian Hesselbarth - Jan. 22, 2014, 11:49 p.m.
On 01/22/2014 09:52 PM, Jason Gunthorpe wrote:
>> Clearing BRIDGE_CAUSE will only clear all currently pending upstream
>> IRQs, of course. If WDT IRQ will be re-raised right after that in
>> BRIDGE_CAUSE depends on the actual HW implementation, i.e. we do no
>> clear the causing IRQ itself but just what it raised in BRIDGE_CAUSE.
>
> Which is why it makes no sense to clear it one time at kernel start.
>
> Either you only get new edge triggered interrupts after request_irq
> (sane behavior) or you might sometimes get an old pending edge
> triggered interrupt after request_irq (crazy behavior).
>
> Clearing BRIDGE_CAUSE at kernel start only shortens the racy window it
> doesn't eliminate it.

Actually, I missed that we mask all BRIDGE irqs in
orion_bridge_irq_init. If we now also clear already pending irqs, that
will not raise any old interrupts as long as watchdog clears all
reasons for upstream irqs before requesting a BRIDGE irq.

> In the more familiar level triggered world the driver would go to the
> device and ensure it wasn't asserting an IRQ level and then do the
> request_irq. This guarentees it won't get an interrupt callback.
>
> In a edge triggered world the driver should go to the device an ensure
> that it won't create a new IRQ, then do request_irq - confident that
> there will NEVER be a call to the IRQ handler, under any
> circumstances.
>
> So I think edge triggered interrupts need to ack any possible old edge
> trigger in the cause register before the first unmask - eg in the
> setup callback.
 >
>> So, you should also clear WDT's irq in the driver yourself to clear a
>> possible pending upstream BRIDGE_CAUSE.
>
> Which isn't possible - the BRIDGE_CAUSE is owned by the irq driver and
> it must be cleared there, and it must only be cleared after the wdt
> has been stopped so it doesn't set it again.

I should have been more precise here: I meant watchdog driver should
clear all sources of possible upstream interrupts in its _own_
registers.

> Notice that Ezequiel has added an IRQ handler that just calls panic,
> so a spurious interrupt call is VERY VERY BAD.

And I understand that he now clears watchdog's register before
requesting an irq. All that is missing is bridge_irq driver clearing
CAUSE register after masking all irqs, right?

I'll stich a patch for that hopefully tomorrow.

Sebastian
Ezequiel Garcia - Jan. 23, 2014, 11:10 a.m.
On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote:
[..]
> > Notice that Ezequiel has added an IRQ handler that just calls panic,
> > so a spurious interrupt call is VERY VERY BAD.
> 
> And I understand that he now clears watchdog's register before
> requesting an irq. All that is missing is bridge_irq driver clearing
> CAUSE register after masking all irqs, right?
> 

Are you sure clearing the CAUSE register after masking the IRQs will be enough?

AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt
driver didn't request the interruption) but *still* the CAUSE register is set
upon watchdog expiration. So I would guessed a masked interrupt still raises a
bit in the CAUSE register.

Although I'm no sure I'm making sense here or just noise...
Ezequiel Garcia - Jan. 23, 2014, 11:54 a.m.
Sebastian,

On Thu, Jan 23, 2014 at 08:10:49AM -0300, Ezequiel Garcia wrote:
> On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote:
> [..]
> > > Notice that Ezequiel has added an IRQ handler that just calls panic,
> > > so a spurious interrupt call is VERY VERY BAD.
> > 
> > And I understand that he now clears watchdog's register before
> > requesting an irq. All that is missing is bridge_irq driver clearing
> > CAUSE register after masking all irqs, right?
> > 
> 
> Are you sure clearing the CAUSE register after masking the IRQs will be enough?
> 
> AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt
> driver didn't request the interruption) but *still* the CAUSE register is set
> upon watchdog expiration. So I would guessed a masked interrupt still raises a
> bit in the CAUSE register.
> 

Let me add some real information instead of my speculations. Taken from
the Kirkwood specification:

Table 136: Mbus-L to Mbus Bridge Interrupt Mask Register
Offset:		0x00020114
Field: 		Mask
Type/InitVal: 	RW 0x0
Description:	There is a mask bit per each cause bit. Mask only affects the
		assertion of interrupt pins. It does not affect the setting of
		bits in the Cause register.

So I guess this is why Jason has been insisting with the introduction of
the irq_startup.

(Just for reference, the little patch I attached yesterday proved to work here.)
Sebastian Hesselbarth - Jan. 23, 2014, 9:15 p.m.
Jason, Ezequiel,

On 01/23/2014 12:54 PM, Ezequiel Garcia wrote:
> Sebastian,
>
> On Thu, Jan 23, 2014 at 08:10:49AM -0300, Ezequiel Garcia wrote:
>> On Thu, Jan 23, 2014 at 12:49:50AM +0100, Sebastian Hesselbarth wrote:
>> [..]
>>>> Notice that Ezequiel has added an IRQ handler that just calls panic,
>>>> so a spurious interrupt call is VERY VERY BAD.
>>>
>>> And I understand that he now clears watchdog's register before
>>> requesting an irq. All that is missing is bridge_irq driver clearing
>>> CAUSE register after masking all irqs, right?
>>>
>>
>> Are you sure clearing the CAUSE register after masking the IRQs will be enough?
>>
>> AFAICS, until now nobody unmasks the watchdog IRQ (at least the orion_wdt
>> driver didn't request the interruption) but *still* the CAUSE register is set
>> upon watchdog expiration. So I would guessed a masked interrupt still raises a
>> bit in the CAUSE register.
>>
>
> Let me add some real information instead of my speculations. Taken from
> the Kirkwood specification:
>
> Table 136: Mbus-L to Mbus Bridge Interrupt Mask Register
> Offset:		0x00020114
> Field: 		Mask
> Type/InitVal: 	RW 0x0
> Description:	There is a mask bit per each cause bit. Mask only affects the
> 		assertion of interrupt pins. It does not affect the setting of
> 		bits in the Cause register.
>
> So I guess this is why Jason has been insisting with the introduction of
> the irq_startup.
>
> (Just for reference, the little patch I attached yesterday proved to work here.)

You guys were so right ;)

I just tested your v4 of the watchdog patches and forced a stale
watchdog irq. It will cause a watchdog reset as you predicted.

I have some fixes for irq-orion.c's bridge irq that take care of:
- handle_edge_irq
- mask _and_ clear irqs on init
- use irq_enable to ensure stale irqs are acked before unmask

With the last one, the stale watchdog irq will not raise watchdog's
irq handler anymore.

BTW, during my test, it looks like RSTOUT wasn't set, so I'll add
my Tested-by on v4 when I have investigated that.

Sebastian

Patch

diff --git a/drivers/irqchip/irq-orion.c b/drivers/irqchip/irq-orion.c
index e51d400..fef9171 100644
--- a/drivers/irqchip/irq-orion.c
+++ b/drivers/irqchip/irq-orion.c
@@ -180,6 +180,9 @@  static int __init orion_bridge_irq_init(struct device_node *np,
 	gc->chip_types[0].chip.irq_mask = irq_gc_mask_clr_bit;
 	gc->chip_types[0].chip.irq_unmask = irq_gc_mask_set_bit;
 
+	/* clear pending interrupts */
+	writel(0, gc->reg_base + ORION_BRIDGE_IRQ_CAUSE);
+
 	/* mask all interrupts */
 	writel(0, gc->reg_base + ORION_BRIDGE_IRQ_MASK);