Patchwork sky2: Fix a race between sky2_down and sky2_poll

login
register
mail settings
Submitter Mike McCormack
Date June 9, 2009, 2:03 p.m.
Message ID <392fb48f0906090703q66fbaf56wbf1157f90b97df0f@mail.gmail.com>
Download mbox | patch
Permalink /patch/28327/
State Changes Requested
Delegated to: David Miller
Headers show

Comments

Mike McCormack - June 9, 2009, 2:03 p.m.
Hi Stephen,

This patch fixes a crash in the sky2 driver when doing "ifconfig eth1
down" and receiving a constant stream of ethernet packets on my
mac-mini with Linux 2.6.29.4.  This is what I can see of the crash on
the console:

do_IRQ+0x64/0x77
common_interrupt+0x27/0x2c
kfree+0x78/0x95
__kfree_skb+0xf/0x6e
sky2_rx_clean+0x35/0x48
sky2_down+0x367/0x486
dev_close+0x63/0x85
dev_change_flags+0xa2/0x153
devinet_ioctl+0x22a/0x532
sock_ioctl+0x1ad/0x1d1
sock_ioctl+0x0/0x1d1
vfs_ioctl+0x1c/0x5f
do_vfs_ioctl+0x456/0x491
net_tx_action+0xc6/0x123
__do_soft_irq+0x8c/0x115
sys_ioctl+0x4/0x58
sysenter_do_call+0x12/0x2f
Code: 68 cd f0 93 de e9 0 02 00 00 8b 4c 24 24 0f b7 ...
EIP: sky2_poll+0x856/0xb49 [sky2]
kernel panic - no syncing: fatal exception in interrupt

This was tested by adding a USB ethernet dongle to the same machine,
and creating a program that spews raw packets back to the sky2 port.

I have update my analysis and tweaked the patch slightly.

I had a look at the callers of sky2_down, and can't see any immediate
trouble that could be caused by this patch.  If you have any
suggestions for further improvements, please let me know what they are
and I will do my best to address them.

thanks,

Mike


---

If sky2_down was called between an interrupt and the corresponding sky2_poll,
rx_ring will have been free'd and we'll crash.

This may happen because not all sources of interrupts are masked in sky2_down,
but a single interrupt from any source will cause all sources to be
checked, regardless of whether they are masked or not.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
---
 drivers/net/sky2.c |   14 ++++++++++++--
 1 files changed, 12 insertions(+), 2 deletions(-)

 		case OP_RXSTAT:
Stephen Hemminger - June 10, 2009, 2:50 p.m.
On Tue, 9 Jun 2009 23:03:52 +0900
Mike McCormack <mikem@ring3k.org> wrote:

> Hi Stephen,
> 
> This patch fixes a crash in the sky2 driver when doing "ifconfig eth1
> down" and receiving a constant stream of ethernet packets on my
> mac-mini with Linux 2.6.29.4.  This is what I can see of the crash on
> the console:
> 
> do_IRQ+0x64/0x77
> common_interrupt+0x27/0x2c
> kfree+0x78/0x95
> __kfree_skb+0xf/0x6e
> sky2_rx_clean+0x35/0x48
> sky2_down+0x367/0x486
> dev_close+0x63/0x85
> dev_change_flags+0xa2/0x153
> devinet_ioctl+0x22a/0x532
> sock_ioctl+0x1ad/0x1d1
> sock_ioctl+0x0/0x1d1
> vfs_ioctl+0x1c/0x5f
> do_vfs_ioctl+0x456/0x491
> net_tx_action+0xc6/0x123
> __do_soft_irq+0x8c/0x115
> sys_ioctl+0x4/0x58
> sysenter_do_call+0x12/0x2f
> Code: 68 cd f0 93 de e9 0 02 00 00 8b 4c 24 24 0f b7 ...
> EIP: sky2_poll+0x856/0xb49 [sky2]
> kernel panic - no syncing: fatal exception in interrupt
> 
> This was tested by adding a USB ethernet dongle to the same machine,
> and creating a program that spews raw packets back to the sky2 port.
> 
> I have update my analysis and tweaked the patch slightly.
> 
> I had a look at the callers of sky2_down, and can't see any immediate
> trouble that could be caused by this patch.  If you have any
> suggestions for further improvements, please let me know what they are
> and I will do my best to address them.
> 
> thanks,
> 
> Mike
> 
> 
> ---
> 
> If sky2_down was called between an interrupt and the corresponding sky2_poll,
> rx_ring will have been free'd and we'll crash.
> 
> This may happen because not all sources of interrupts are masked in sky2_down,
> but a single interrupt from any source will cause all sources to be
> checked, regardless of whether they are masked or not.
> 
> Signed-off-by: Mike McCormack <mikem@ring3k.org>
> ---
>  drivers/net/sky2.c |   14 ++++++++++++--
>  1 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
> index a2ff9cb..d0d4840 100644
> --- a/drivers/net/sky2.c
> +++ b/drivers/net/sky2.c
> @@ -1822,8 +1822,8 @@ static int sky2_down(struct net_device *dev)
>  	ctrl &= ~(GM_GPCR_TX_ENA | GM_GPCR_RX_ENA);
>  	gma_write16(hw, port, GM_GP_CTRL, ctrl);
> 
> -	/* Make sure no packets are pending */
> -	napi_synchronize(&hw->napi);
> +	/* disable soft interrupts */
> +	napi_disable(&hw->napi);
> 
>  	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);
> 
> @@ -1878,6 +1878,9 @@ static int sky2_down(struct net_device *dev)
>  	sky2->rx_ring = NULL;
>  	sky2->tx_ring = NULL;
> 
> +	/* re-enable soft interrupts */
> +	napi_enable(&hw->napi);
> +
>  	return 0;
>  }
> 
> @@ -2372,6 +2375,13 @@ static int sky2_status_intr(struct sky2_hw *hw,
> int to_do, u16 idx)
>  		length = le16_to_cpu(le->length);
>  		status = le32_to_cpu(le->status);
> 
> +		/* rx_ring may have been free'd in sky2_down */
> +		if (unlikely(!sky2->rx_ring)) {
> +			printk(KERN_INFO "sky2_status_intr: rx_ring NULL opcode %02x\n", opcode);
> +			work_done++;
> +			break;
> +		}
> +

The first part is okay, but the second part is not. It wallpapers
over a real race. It is important that it should not be possible
to get to the sky2_status_intr (called from sky2_poll)
after down has both disabled IRQ in hardware and disabled NAPI.
If it can make to there then there is some bug in NAPI logic,
or hardware that needs attention. Remember sky2_down might be
running on a different CPU than the receiver.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
index a2ff9cb..d0d4840 100644
--- a/drivers/net/sky2.c
+++ b/drivers/net/sky2.c
@@ -1822,8 +1822,8 @@  static int sky2_down(struct net_device *dev)
 	ctrl &= ~(GM_GPCR_TX_ENA | GM_GPCR_RX_ENA);
 	gma_write16(hw, port, GM_GP_CTRL, ctrl);

-	/* Make sure no packets are pending */
-	napi_synchronize(&hw->napi);
+	/* disable soft interrupts */
+	napi_disable(&hw->napi);

 	sky2_write8(hw, SK_REG(port, GPHY_CTRL), GPC_RST_SET);

@@ -1878,6 +1878,9 @@  static int sky2_down(struct net_device *dev)
 	sky2->rx_ring = NULL;
 	sky2->tx_ring = NULL;

+	/* re-enable soft interrupts */
+	napi_enable(&hw->napi);
+
 	return 0;
 }

@@ -2372,6 +2375,13 @@  static int sky2_status_intr(struct sky2_hw *hw,
int to_do, u16 idx)
 		length = le16_to_cpu(le->length);
 		status = le32_to_cpu(le->status);

+		/* rx_ring may have been free'd in sky2_down */
+		if (unlikely(!sky2->rx_ring)) {
+			printk(KERN_INFO "sky2_status_intr: rx_ring NULL opcode %02x\n", opcode);
+			work_done++;
+			break;
+		}
+
 		le->opcode = 0;
 		switch (opcode & ~HW_OWNER) {