Patchwork [2/2] at86rf230: change irq handling to prevent lockups with edge type irq

login
register
mail settings
Submitter Sascha Herrmann
Date April 4, 2013, 9:02 p.m.
Message ID <57c67195742f5e7482dc57f5a05b6d69156b00d2.1365107512.git.sascha@ps.nvbi.de>
Download mbox | patch
Permalink /patch/233945/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Sascha Herrmann - April 4, 2013, 9:02 p.m.
Hard code the interrupt type of the at86rf230 to rising edge type and
remove the calls to disable_irq_nosync() and enable_irq() from the
isr and the irqwork function. The at86rf230 resets the irq line only
after the irq status register is read. Disabling the irq can lock the
driver in situations where a irq is set by the radio while the driver
is still reading the frame buffer. Additional the irq filter register
is set to filter out all unused interrupts and the irq status register
is read in the probe function to clear the irq line.

Signed-off-by: Sascha Herrmann <sascha@ps.nvbi.de>
---
 drivers/net/ieee802154/at86rf230.c |   29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)
Werner Almesberger - April 5, 2013, 3:59 a.m.
Sascha Herrmann wrote:
> -	rc = request_irq(spi->irq, at86rf230_isr, IRQF_SHARED,
> +	rc = request_irq(spi->irq, at86rf230_isr,
> +			 IRQF_SHARED | IRQF_TRIGGER_RISING,

To clarify the purpose of your patch, I think it's only needed
for platforms that don't support level-triggered interrupts.
I.e., if you used IRQF_TRIGGER_HIGH (and happened to have a
platform that supports it, which you said on IRC you don't),
the existing code should work fine. Correct ?

I wonder if it wouldn't be better to make the code work with
both edge and level interrupts instead of having to choose.

E.g., this sort of loop in at86rf230_irqwork should work with
either:

	while (1) {
		irq = read_and_clear_interrupt();
		...
		if (!test_and_clear_bit(0, &lp->irq_disabled))
			break;
		enable_irq();
	}
	lp->irq_busy = 0; /* allow _xmit */

Disadvantage: you get extra IRQ_STATUS reads, which has a
slight penalty, given that all this is on SPI.

To achieve perfection, at86rf230_probe could try all four
possible trigger modes, pick one the platform supports, and
set TRX_CTRL_1.IRQ_POLARITY accordingly.

In any case, I gave your patch very light testing on ATBEN and
it performed as good as the original code.

- Werner
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Werner Almesberger - April 5, 2013, 10:51 a.m.
I wrote:
> To achieve perfection, at86rf230_probe could try all four
> possible trigger modes, pick one the platform supports, and
> set TRX_CTRL_1.IRQ_POLARITY accordingly.

Thinking of it, probing by trying request_irq has an unpleasant ring
to it. Perhaps a better way would be to leave this decision to the
platform code and do one of these:

1) pass irqflags and the polarity in the platform data, or

2) pass irqflags and extract the polarity from the irqflags, or

3) set up the trigger mode outside the driver and pass only the
   polarity,

where 1) with (irqflags & IRQF_TRIGGER_MASK) == 0 includes
case 3).

- Werner
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sascha Herrmann - April 5, 2013, 3:59 p.m.
>I wrote:
>> To achieve perfection, at86rf230_probe could try all four
>> possible trigger modes, pick one the platform supports, and
>> set TRX_CTRL_1.IRQ_POLARITY accordingly.
>
>Thinking of it, probing by trying request_irq has an unpleasant ring
>to it. Perhaps a better way would be to leave this decision to the
>platform code and do one of these:
>
>1) pass irqflags and the polarity in the platform data, or
>
>2) pass irqflags and extract the polarity from the irqflags, or
>
>3) set up the trigger mode outside the driver and pass only the
>   polarity,
>
>where 1) with (irqflags & IRQF_TRIGGER_MASK) == 0 includes
>case 3).

The change with this patch is mostly about the trigger type of the interrupt. The trigger level isn't configurable in the rf230, as far as I understand the datasheet. I posted a version of this patch with the option to configure the interrupt type (edge or level) for discussion on the linux-zigbee list some time ago. The result tended toward hardcoding the interrupt type in the driver [1]. My assumption is, that every platform would support edge type interrupts.

If it is preferred to have the interrupt type configurable I now would prefer to implement seperate isr and irqwork functions for each irq type. I fear the sollution to read the interrupt status register in a loop (as suggested in your earlier message) would leave chances for race conditions or spurious interrupts, depending on wheter interrupts are enabled before or after reading the status register. 

Hard coding the interrupt type would have the pro of keeping the driver complexity low. Surely for this option, the assumption that (at least nearly) every platform supports edge type interrupts should hold.

[1] http://www.mail-archive.com/linux-zigbee-devel@lists.sourceforge.net/msg01385.html

Thanks,
Sascha
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Werner Almesberger - April 6, 2013, 2:20 p.m.
Sascha Herrmann wrote:
> The trigger level isn't configurable in the rf230,

Right, I should have mentioned that the polarity can only be set on
the AT86RF231. The 230 has to make do with rising edge or high level.

> [...] discussion on the linux-zigbee list [...]

I read it now. I think this chip is a bit unusual in that it has a
very generic interface and for many things there is more than one way
to do them (e.g., interrupts, reset, TX_START), which results in
different hardware configurations, and differences in driver
behaviour.

> I fear the sollution to read the interrupt status register in a loop
> (as suggested in your earlier message) would leave chances for race
> conditions or spurious interrupts, depending on wheter interrupts are
> enabled before or after reading the status register. 

I don't think the analysis is worse than for any other solution.
There are also tools and methods that can help if it becomes too
much of a headache.

If you don't like the loop, a double read without loop would work in
this case as well:

	irq = read_and_clear_interrupt();
	enable_irq();
	irq |= read_and_clear_interrupt();
	...

By the way, once we switch to early RX_ON, I think we'll have the
problem that two TRX_END interrupts may be generated without any
host action between them, in which case the first will be
interpreted as the end of sending and the second will be ignored,
leaving a received frame in the buffer, which in turn leaves
dynamic buffer protection on and thus prevents any further
reception.

Not sure yet how to solve this. I also don't know how bad our
latencies are, but I fear that they can at times be substantial.
Already a single register access with spi-gpio takes some 80 us
(measured on an otherwise idle Ben, 336 MHz MIPS).

> Surely for this option, the assumption that (at least nearly) every
> platform supports edge type interrupts should hold.

I think you're on relatively safe ground with the assumption that
most relevant platforms per se can do it. But if the interrupt line
happens to be shared with some other device, then level trigger is
quite popular.

- Werner
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sascha Herrmann - April 7, 2013, 8:52 p.m.
On 06.04.2013 16:20, Werner Almesberger wrote:

> Sascha Herrmann wrote:
>> The trigger level isn't configurable in the rf230,
> 
> Right, I should have mentioned that the polarity can only be set on
> the AT86RF231. The 230 has to make do with rising edge or high level.


Oh, I didn't realized, that this is possible at all with the rf231...

 
>> I fear the sollution to read the interrupt status register in a loop
>> (as suggested in your earlier message) would leave chances for race
>> conditions or spurious interrupts, depending on wheter interrupts are
>> enabled before or after reading the status register. 
> 
> I don't think the analysis is worse than for any other solution.
> There are also tools and methods that can help if it becomes too
> much of a headache.
> 
> If you don't like the loop, a double read without loop would work in
> this case as well:
> 
> 	irq = read_and_clear_interrupt();
> 	enable_irq();
> 	irq |= read_and_clear_interrupt();
> 	...


Maybe one way to eliminate the extra latency of the second register read
would be to split the interrupt handling function into a generic part
and two different functions to handle the different types of interrupts:

	static void at86rf230_irqwork_level(void) {
		__at86rf230_irqwork();

		spin_lock_irqsave(&lp->lock, flags);
		lp->irq_busy = 0;
		enable_irq()
		spin_unlock_irqrestore(&lp->lock, flags);
	}

For edge type configuration the call to enable_irq() must be removed.
The same would be required for the isr function. The probe function
could than decide, which isr function should be registered for the
interrupt.

> By the way, once we switch to early RX_ON, I think we'll have the
> problem that two TRX_END interrupts may be generated without any
> host action between them, in which case the first will be
> interpreted as the end of sending and the second will be ignored,
> leaving a received frame in the buffer, which in turn leaves
> dynamic buffer protection on and thus prevents any further
> reception.


I think this is right, we should have an eye on this when working on the
early RX_ON...

> Not sure yet how to solve this. I also don't know how bad our
> latencies are, but I fear that they can at times be substantial.
> Already a single register access with spi-gpio takes some 80 us
> (measured on an otherwise idle Ben, 336 MHz MIPS).
> 
>> Surely for this option, the assumption that (at least nearly) every
>> platform supports edge type interrupts should hold.
> 
> I think you're on relatively safe ground with the assumption that
> most relevant platforms per se can do it. But if the interrupt line
> happens to be shared with some other device, then level trigger is
> quite popular.


If you think the solution above would be ok, I could try to send a
version which allows the configuration of trigger type and level.

Thanks,
Sascha
Werner Almesberger - April 8, 2013, 3:35 p.m.
Sascha Herrmann wrote:
> Maybe one way to eliminate the extra latency of the second register read
> would be to split the interrupt handling function into a generic part
> and two different functions to handle the different types of interrupts:

Yes, if you want to optimize the number of register accesses and
work queue invocations, splitting the paths that touch interrupts
seems to be the most straightforward approach.

> If you think the solution above would be ok, I could try to send a
> version which allows the configuration of trigger type and level.

Sounds good to me. Pity the irq_get_irq_type() you mentioned
doesn't exist. That would have made things a bit nicer.

- Werner
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ieee802154/at86rf230.c b/drivers/net/ieee802154/at86rf230.c
index fc315dd..eb5359f 100644
--- a/drivers/net/ieee802154/at86rf230.c
+++ b/drivers/net/ieee802154/at86rf230.c
@@ -51,7 +51,7 @@  struct at86rf230_local {
 	struct ieee802154_dev *dev;
 
 	spinlock_t lock;
-	bool irq_disabled;
+	bool irq_busy;
 	bool is_tx;
 };
 
@@ -544,7 +544,7 @@  at86rf230_xmit(struct ieee802154_dev *dev, struct sk_buff *skb)
 	unsigned long flags;
 
 	spin_lock(&lp->lock);
-	if  (lp->irq_disabled) {
+	if  (lp->irq_busy) {
 		spin_unlock(&lp->lock);
 		return -EBUSY;
 	}
@@ -705,20 +705,16 @@  static void at86rf230_irqwork(struct work_struct *work)
 	}
 
 	spin_lock_irqsave(&lp->lock, flags);
-	lp->irq_disabled = 0;
+	lp->irq_busy = 0;
 	spin_unlock_irqrestore(&lp->lock, flags);
-
-	enable_irq(lp->spi->irq);
 }
 
 static irqreturn_t at86rf230_isr(int irq, void *data)
 {
 	struct at86rf230_local *lp = data;
 
-	disable_irq_nosync(irq);
-
 	spin_lock(&lp->lock);
-	lp->irq_disabled = 1;
+	lp->irq_busy = 1;
 	spin_unlock(&lp->lock);
 
 	schedule_work(&lp->irqwork);
@@ -748,12 +744,7 @@  static int at86rf230_hw_init(struct at86rf230_local *lp)
 		dev_info(&lp->spi->dev, "Status: %02x\n", status);
 	}
 
-	rc = at86rf230_write_subreg(lp, SR_IRQ_MASK, 0xff); /* IRQ_TRX_UR |
-							     * IRQ_CCA_ED |
-							     * IRQ_TRX_END |
-							     * IRQ_PLL_UNL |
-							     * IRQ_PLL_LOCK
-							     */
+	rc = at86rf230_write_subreg(lp, SR_IRQ_MASK, IRQ_TRX_END);
 	if (rc)
 		return rc;
 
@@ -819,7 +810,7 @@  static int at86rf230_probe(struct spi_device *spi)
 {
 	struct ieee802154_dev *dev;
 	struct at86rf230_local *lp;
-	u8 man_id_0, man_id_1;
+	u8 man_id_0, man_id_1, status;
 	int rc;
 	const char *chip;
 	int supported = 0;
@@ -928,11 +919,17 @@  static int at86rf230_probe(struct spi_device *spi)
 	if (rc)
 		goto err_gpio_dir;
 
-	rc = request_irq(spi->irq, at86rf230_isr, IRQF_SHARED,
+	rc = request_irq(spi->irq, at86rf230_isr,
+			 IRQF_SHARED | IRQF_TRIGGER_RISING,
 			 dev_name(&spi->dev), lp);
 	if (rc)
 		goto err_gpio_dir;
 
+	/* Read irq status register to reset irq line */
+	rc = at86rf230_read_subreg(lp, RG_IRQ_STATUS, 0xff, 0, &status);
+	if (rc)
+		goto err_irq;
+
 	rc = ieee802154_register_device(lp->dev);
 	if (rc)
 		goto err_irq;