Patchwork sky2: Fix a race condition in sky2_poll

login
register
mail settings
Submitter Mike McCormack
Date June 20, 2009, 8:01 a.m.
Message ID <392fb48f0906200101y5c83e853w170829f2ce40b274@mail.gmail.com>
Download mbox | patch
Permalink /patch/28925/
State Rejected
Delegated to: David Miller
Headers show

Comments

Mike McCormack - June 20, 2009, 8:01 a.m.
Clear interrupt only when the status buffer is fully drained,
Make sure to clear interrupt when work_done == work_limit
 and the buffer is drained.
---
 drivers/net/sky2.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

 	sky2_rx_done(hw, 1, total_packets[1], total_bytes[1]);
@@ -2779,9 +2776,16 @@ static int sky2_poll(struct napi_struct *napi,
int work_limit)
 	if (status & Y2_IS_IRQ_PHY2)
 		sky2_phy_intr(hw, 1);

-	while ((idx = sky2_read16(hw, STAT_PUT_IDX)) != hw->st_idx) {
+	idx = sky2_read16(hw, STAT_PUT_IDX);
+	while (idx != hw->st_idx) {
 		work_done += sky2_status_intr(hw, work_limit - work_done, idx);

+		/* If we fully processed the status ring, clear the irq */
+		idx = sky2_read16(hw, STAT_PUT_IDX);
+		if (idx == hw->st_idx) {
+			sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
+			break;
+		}
 		if (work_done >= work_limit)
 			goto done;
 	}
David Miller - June 23, 2009, 11:45 p.m.
From: Mike McCormack <mikem@ring3k.org>
Date: Sat, 20 Jun 2009 17:01:25 +0900

> Clear interrupt only when the status buffer is fully drained,
> Make sure to clear interrupt when work_done == work_limit
>  and the buffer is drained.

Stephen's away on vacation for a bit so I wonder what to do
about these two sky2 patches of your's.

I guess it's not extremely urgent and we can thus wait for him
to get back in a week or so and review your patches.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mike McCormack - June 24, 2009, 9 a.m.
2009/6/24 David Miller <davem@davemloft.net>:
> From: Mike McCormack <mikem@ring3k.org>
> Date: Sat, 20 Jun 2009 17:01:25 +0900
>
>> Clear interrupt only when the status buffer is fully drained,
>> Make sure to clear interrupt when work_done == work_limit
>>  and the buffer is drained.
>
> Stephen's away on vacation for a bit so I wonder what to do
> about these two sky2 patches of your's.
>
> I guess it's not extremely urgent and we can thus wait for him
> to get back in a week or so and review your patches.

Hi Dave,

They're not urgent, so I'll test them some more, and work on a fix for
the sky2_up receiver hang issue until Stephen returns.

thanks,

Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Hemminger - July 6, 2009, 6:51 p.m.
On Sat, 20 Jun 2009 17:01:25 +0900
Mike McCormack <mikem@ring3k.org> wrote:

> Clear interrupt only when the status buffer is fully drained,
> Make sure to clear interrupt when work_done == work_limit
>  and the buffer is drained.
> ---
>  drivers/net/sky2.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index 7681d28..ca1e9e5 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -2524,9 +2524,6 @@ static int sky2_status_intr(struct sky2_hw *hw,
> int to_do, u16 idx)
>  		}
>  	} while (hw->st_idx != idx);
> 
> -	/* Fully processed status ring so clear irq */
> -	sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
> -
>  exit_loop:
>  	sky2_rx_done(hw, 0, total_packets[0], total_bytes[0]);
>  	sky2_rx_done(hw, 1, total_packets[1], total_bytes[1]);
> @@ -2779,9 +2776,16 @@ static int sky2_poll(struct napi_struct *napi,
> int work_limit)
>  	if (status & Y2_IS_IRQ_PHY2)
>  		sky2_phy_intr(hw, 1);
> 
> -	while ((idx = sky2_read16(hw, STAT_PUT_IDX)) != hw->st_idx) {
> +	idx = sky2_read16(hw, STAT_PUT_IDX);
> +	while (idx != hw->st_idx) {
>  		work_done += sky2_status_intr(hw, work_limit - work_done, idx);
> 
> +		/* If we fully processed the status ring, clear the irq */
> +		idx = sky2_read16(hw, STAT_PUT_IDX);
> +		if (idx == hw->st_idx) {
> +			sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
> +			break;
> +		}
>  		if (work_done >= work_limit)
>  			goto done;
>  	}

Have you actually seen this race, or are you hypothesizing based
on code review?


I think the original works fine. There is a race where interrupt is cleared early,
and the poll processing runs an extra time but that is harmless.

But the patched code races the other way:

> +	while (idx != hw->st_idx) {
>  		work_done += sky2_status_intr(hw, work_limit - work_done, idx);
> 
> +		/* If we fully processed the status ring, clear the irq */
> +		idx = sky2_read16(hw, STAT_PUT_IDX);

Packet arrives right here. The variable "idx" has old value,
but chip register "put_idx" shows new packet.

> +		if (idx == hw->st_idx) {
> +			sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);

This clears level triggered status interrupt.

> +			break;
> +		}
>  		if (work_done >= work_limit)
>  			goto done;
>  	}

Now the driver misses the status interrupt.
Mike McCormack - July 7, 2009, 4:09 a.m.
Hi Stephen,

2009/7/7 Stephen Hemminger <shemminger@linux-foundation.org>:

> Have you actually seen this race, or are you hypothesizing based
> on code review?

Based on code review prompted by seeing "receiver hang detected"
messages in the log.

> I think the original works fine. There is a race where interrupt is cleared early,
> and the poll processing runs an extra time but that is harmless.

There also appears to be a logic problem where the interrupt will not
be cleared until another packet is received, which is the case I was
originally trying to fix.  It looks like if just enough packets are
received (so work_done >= to_do and we go through exit_loop) in
sky2_status_intr(), the interrupt may not be cleared even though no
packets remain.

> Now the driver misses the status interrupt.

My interpretation was that the interrupt condition must be cleared at
least once before acknowledging it.

In any case, the receiver hang detection logic will restart the
driver.  I see this happening quite a bit, and it seems to happen most
commonly after sky2_up is called.

thanks,

Mike
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index 7681d28..ca1e9e5 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -2524,9 +2524,6 @@  static int sky2_status_intr(struct sky2_hw *hw,
int to_do, u16 idx)
 		}
 	} while (hw->st_idx != idx);

-	/* Fully processed status ring so clear irq */
-	sky2_write32(hw, STAT_CTRL, SC_STAT_CLR_IRQ);
-
 exit_loop:
 	sky2_rx_done(hw, 0, total_packets[0], total_bytes[0]);