diff mbox

tg3: not checking dma errors?

Message ID 20091202171359.GB32112@xw6200.broadcom.net
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

Matt Carlson Dec. 2, 2009, 5:13 p.m. UTC
On Wed, Dec 02, 2009 at 06:01:06AM -0800, Johannes Berg wrote:
> I'm having some trouble with tg3 under memory pressure when it's
> receiving lots of data, and so far this is the only thing I could find
> -- unfortunately I can't capture the debug information on the machine --
> netconsole obviously doesn't work, there's no serial and the output
> doesn't fit on the screen...
> 
> So I figured I'd look at the code, and now I think there could be a bug
> with mappings:
> 
> tg3_alloc_rx_skb:
> ...
>         mapping = pci_map_single(tp->pdev, skb->data, skb_size,
>                                  PCI_DMA_FROMDEVICE);
> 
>         map->skb = skb;
>         pci_unmap_addr_set(map, mapping, mapping);
> 
>         if (src_map != NULL)
>                 src_map->skb = NULL;
> 
>         desc->addr_hi = ((u64)mapping >> 32);
>         desc->addr_lo = ((u64)mapping & 0xffffffff);
> 
>         return skb_size;
> 
> 
> 
> Shouldn't that check that 'mapping' is != DMA_ERROR_CODE?
> 
> Similarly in tg3_run_loopback()?
> 
> johannes

Yes.  I submitted a patch to the net-next-2.6 tree addressing this exact
problem.  It will appear in the 2.6.33 kernel.  The patch is appended at
the bottom of this email.

What kind of trouble are you seeing?  What device are you working with?

Comments

Johannes Berg Dec. 2, 2009, 6:03 p.m. UTC | #1
On Wed, 2009-12-02 at 09:13 -0800, Matt Carlson wrote:

> Yes.  I submitted a patch to the net-next-2.6 tree addressing this exact
> problem.  It will appear in the 2.6.33 kernel.  The patch is appended at
> the bottom of this email.

Cool, thanks, I'll stick it into my tree and see what happens.

> What kind of trouble are you seeing?  What device are you working with?

I have a dual-port device, see lspci info below. The device is in a
machine with 6.5GiB memory, and large disks. Now I rsync a lot of data
over gigabit (but I think I never set up jumbo frames) to the disk, say
50GiB data. During that, eventually the memory will fill with cached
filesystem data.

If I allow that to happen, the machine will eventually crash, and tg3
shows up in the stack trace -- but I can't see all of it on the display.
I can work around it by periodically telling the kernel to drop
filesystem caches so it never fills the entire memory with cached data.

johannes

0001:05:04.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 03)
	Subsystem: Apple Computer Inc. Device 0085
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 16 (16000ns min), Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 34
	Region 0: Memory at fa530000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at fa520000 (64-bit, non-prefetchable) [size=64K]
	Expansion ROM at <ignored> [disabled]
	Capabilities: [40] PCI-X non-bridge device
		Command: DPERE- ERO- RBC=512 OST=1
		Status: Dev=05:04.0 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95780
			[EC] Engineering changes: 106679-15
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 34 65 34
			[RV] Reserved: checksum bad, 28 byte(s) reserved
		Read/write fields:
			[YA] Asset tag: XYZ01234567
			[RW] Read-write area: 107 byte(s) free
		End
	Capabilities: [58] MSI: Enable+ Count=1/8 Maskable- 64bit+
		Address: 00000000fee00000  Data: 000d
	Kernel driver in use: tg3

0001:05:04.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5780 Gigabit Ethernet (rev 03)
	Subsystem: Apple Computer Inc. Device 0085
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 16 (16000ns min), Cache Line Size: 64 bytes
	Interrupt: pin B routed to IRQ 67
	Region 0: Memory at fa510000 (64-bit, non-prefetchable) [size=64K]
	Region 2: Memory at fa500000 (64-bit, non-prefetchable) [size=64K]
	Capabilities: [40] PCI-X non-bridge device
		Command: DPERE- ERO+ RBC=512 OST=1
		Status: Dev=05:04.1 64bit+ 133MHz+ SCD- USC- DC=simple DMMRBC=2048 DMOST=1 DMCRS=16 RSCEM- 266MHz- 533MHz-
	Capabilities: [48] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
	Capabilities: [50] Vital Product Data
		Product Name: Broadcom NetXtreme Gigabit Ethernet Controller
		Read-only fields:
			[PN] Part number: BCM95780
			[EC] Engineering changes: 106679-15
			[SN] Serial number: 0123456789
			[MN] Manufacture ID: 31 34 65 34
			[RV] Reserved: checksum bad, 28 byte(s) reserved
		Read/write fields:
			[YA] Asset tag: XYZ01234567
			[RW] Read-write area: 107 byte(s) free
		End
	Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+
		Address: 820170100a07bcb4  Data: 6548
	Kernel driver in use: tg3
diff mbox

Patch

=====

Subject: [PATCH 06/20] tg3: Add more PCI DMA map error checking

This patch adds code to check the status of pci_map_single() before
allowing rx buffers to be used.  It also converts the pci_map_single()
call in tg3_run_loopback() to use skb_dma_map() instead.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
---
 drivers/net/tg3.c |   14 +++++++++++---
 1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 53a193e..54dbe98 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4426,6 +4426,10 @@  static int tg3_alloc_rx_skb(struct tg3_napi *tnapi, u32 opaque_key,
 
 	mapping = pci_map_single(tp->pdev, skb->data, skb_size,
 				 PCI_DMA_FROMDEVICE);
+	if (pci_dma_mapping_error(tp->pdev, mapping)) {
+		dev_kfree_skb(skb);
+		return -EIO;
+	}
 
 	map->skb = skb;
 	pci_unmap_addr_set(map, mapping, mapping);
@@ -10369,7 +10373,10 @@  static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 	for (i = 14; i < tx_len; i++)
 		tx_data[i] = (u8) (i & 0xff);
 
-	map = pci_map_single(tp->pdev, skb->data, tx_len, PCI_DMA_TODEVICE);
+	if (skb_dma_map(&tp->pdev->dev, skb, DMA_TO_DEVICE)) {
+		dev_kfree_skb(skb);
+		return -EIO;
+	}
 
 	tw32_f(HOSTCC_MODE, tp->coalesce_mode | HOSTCC_MODE_ENABLE |
 	       rnapi->coal_now);
@@ -10380,7 +10387,8 @@  static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 
 	num_pkts = 0;
 
-	tg3_set_txd(tnapi, tnapi->tx_prod, map, tx_len, 0, 1);
+	tg3_set_txd(tnapi, tnapi->tx_prod,
+		    skb_shinfo(skb)->dma_head, tx_len, 0, 1);
 
 	tnapi->tx_prod++;
 	num_pkts++;
@@ -10404,7 +10412,7 @@  static int tg3_run_loopback(struct tg3 *tp, int loopback_mode)
 			break;
 	}
 
-	pci_unmap_single(tp->pdev, map, tx_len, PCI_DMA_TODEVICE);
+	skb_dma_unmap(&tp->pdev->dev, skb, DMA_TO_DEVICE);
 	dev_kfree_skb(skb);
 
 	if (tx_idx != tnapi->tx_prod)