Patchwork tg3 appears to be sick in 2.6.33

login
register
mail settings
Submitter Matt Carlson
Date Jan. 8, 2010, 7:42 p.m.
Message ID <20100108194222.GA8386@xw6200.broadcom.net>
Download mbox | patch
Permalink /patch/42533/
State Awaiting Upstream
Delegated to: David Miller
Headers show

Comments

Matt Carlson - Jan. 8, 2010, 7:42 p.m.
On Fri, Jan 08, 2010 at 10:25:50AM -0800, Dmitry Torokhov wrote:
> On Friday 08 January 2010 10:09:15 am Matt Carlson wrote:
> > On Thu, Jan 07, 2010 at 05:36:00PM -0800, Dmitry Torokhov wrote:
> > > On Thu, Jan 07, 2010 at 04:14:54PM -0800, Dmitry Torokhov wrote:
> > > > On Thu, Jan 07, 2010 at 03:53:09PM -0800, Matt Carlson wrote:
> > > > > On Thu, Jan 07, 2010 at 02:45:43PM -0800, Dmitry Torokhov wrote:
> > > > > > On Thursday 07 January 2010 02:24:44 pm Eric Dumazet wrote:
> > > > > > > Le 07/01/2010 21:48, Matt Carlson a ?crit :
> > > > > > > > Nothing jumps to mind.  Let me see if I can repro this here.
> > > > > > > >
> > > > > > > > On Thu, Jan 07, 2010 at 11:56:31AM -0800, Dmitry Torokhov wrote:
> > > > > > > >> Hi,
> > > > > > > >>
> > > > > > > >> Ever since I moved post 2.6.32 tg3 seems to be very sick in my
> > > > > > > >> Latitude D630, it discards most of the packages for some
> > > > > > > >> reason:
> > > > > > > >>
> > > > > > > >> [root@dtor-d630 ~]# ethtool -S eth0 | grep rx_
> > > > > > > >>      rx_octets: 35886
> > > > > > > >>      rx_fragments: 0
> > > > > > > >>      rx_ucast_packets: 9
> > > > > > > >>      rx_mcast_packets: 93
> > > > > > > >>      rx_bcast_packets: 237
> > > > > > > >>      rx_fcs_errors: 0
> > > > > > > >>      rx_align_errors: 0
> > > > > > > >>      rx_xon_pause_rcvd: 0
> > > > > > > >>      rx_xoff_pause_rcvd: 0
> > > > > > > >>      rx_mac_ctrl_rcvd: 0
> > > > > > > >>      rx_xoff_entered: 0
> > > > > > > >>      rx_frame_too_long_errors: 0
> > > > > > > >>      rx_jabbers: 0
> > > > > > > >>      rx_undersize_packets: 0
> > > > > > > >>      rx_in_length_errors: 0
> > > > > > > >>      rx_out_length_errors: 0
> > > > > > > >>      rx_64_or_less_octet_packets: 0
> > > > > > > >>      rx_65_to_127_octet_packets: 0
> > > > > > > >>      rx_128_to_255_octet_packets: 0
> > > > > > > >>      rx_256_to_511_octet_packets: 0
> > > > > > > >>      rx_512_to_1023_octet_packets: 0
> > > > > > > >>      rx_1024_to_1522_octet_packets: 0
> > > > > > > >>      rx_1523_to_2047_octet_packets: 0
> > > > > > > >>      rx_2048_to_4095_octet_packets: 0
> > > > > > > >>      rx_4096_to_8191_octet_packets: 0
> > > > > > > >>      rx_8192_to_9022_octet_packets: 0
> > > > > > > >>      rx_discards: 304
> > > > > > > >>      rx_errors: 0
> > > > > > > >>      rx_threshold_hit: 0
> > > > > > > >>
> > > > > > > >> The above on last night pull from Linux (so 2.6.33+).
> > > > > > > >>
> > > > > > > >> Everything works fine if I use wireless or another wired card
> > > > > > > >> (Realtek in cardbus slot):
> > > > > > > >>
> > > > > > > >> [root@dtor-d630 ~]# lspci | grep -ri net
> > > > > > > >> 04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
> > > > > > > >> RTL-8139/8139C/8139C+ (rev 10)
> > > > > > > >> 09:00.0 Ethernet controller: Broadcom Corporation NetXtreme
> > > > > > > >> BCM5755M Gigabit Ethernet PCI Express (rev 02)
> > > > > > > >> 0c:00.0 Network controller: Intel Corporation PRO/Wireless
> > > > > > > >> 3945ABG [Golan] Network Connection (rev 02)
> > > > > > > >>
> > > > > > > >> [root@dtor-d630 ~]# ethtool eth0
> > > > > > > >> Settings for eth0:
> > > > > > > >>         Supported ports: [ TP ]
> > > > > > > >>         Supported link modes:   10baseT/Half 10baseT/Full
> > > > > > > >>                                 100baseT/Half 100baseT/Full
> > > > > > > >>                                 1000baseT/Half 1000baseT/Full
> > > > > > > >>         Supports auto-negotiation: Yes
> > > > > > > >>         Advertised link modes:  10baseT/Half 10baseT/Full
> > > > > > > >>                                 100baseT/Half 100baseT/Full
> > > > > > > >>                                 1000baseT/Half 1000baseT/Full
> > > > > > > >>         Advertised auto-negotiation: Yes
> > > > > > > >>         Speed: 1000Mb/s
> > > > > > > >>         Duplex: Full
> > > > > > > >>         Port: Twisted Pair
> > > > > > > >>         PHYAD: 1
> > > > > > > >>         Transceiver: internal
> > > > > > > >>         Auto-negotiation: on
> > > > > > > >>         Supports Wake-on: g
> > > > > > > >>         Wake-on: g
> > > > > > > >>         Current message level: 0x000000ff (255)
> > > > > > > >>         Link detected: yes
> > > > > > > >>
> > > > > > > >> Any ideas? Thanks!
> > > > > > >
> > > > > > > My laptop is a D630, and running 2.6.33-rc3 with no special
> > > > > > > problems on tg3
> > > > > >
> > > > > > This is wierd... It all works well with Fedora 12 kernel and .32 so
> > > > > > hardware is fine here. I think it also worked when I plugged it
> > > > > > into my home router (will try it again tonight). But in the office
> > > > > > it just does not want to work. It's like it just does not like the
> > > > > > addresses that we use here...
> > > > >
> > > > > Is the problem related to link speed?  I see that Eric was running at
> > > > > 100Mbps.  Your ethtool output above shows that you are running at
> > > > > 1Gbps.
> > > >
> > > > Indeed I have 100Mbps at home and 1Gb in the office.. I amy try playing
> > > > with it some more if you want but that patch you sent seems to have
> > > > fixed the issue for me...
> > > >
> > > > Except that sice I started using tg3 in couple of hours the box locked
> > > > up completely once and panicked twice; of course it happened in X so no
> > > > traces whatsoever. Maybe a coincidence though, I am running with nvidia
> > > > driver...
> > >
> > > No, something is definitely going on. I can work for a long time using
> > > the Realtek card but if I switch to tg3 then pretty quickly network gets
> > > stuck again and in a minute or so the box panics. Last time that
> > > happened I switched to the text console and managed to captute an oops
> > > (attached) but there is nothing really conclusive there...
> > >
> > > Thanks.
> > 
> > I looked over all the patches I submitted between tg3 versions 3.102
> > and 3.105.  Long story short, I didn't see anything particularly
> > suspicious.
> > 
> > Could you try dropping in the tg3.c and tg3.h from the 2.6.32 kernel
> > into your 2.6.33 source tree and see if you encounter the same problem?
> > 
> 
> skb_dma_map/unmap are gone from 2.6.33 so direct drop does not work :(

Ah.  I may have found the problem.  Can you apply the below patch on top
of the patch you already applied and see if this fixes the problem?

---------

[PATCH] tg3: Fix std rx prod ring handling

There are some tg3 devices that require the driver to post new rx
buffers in smaller increments.  Commit
4361935afe3abc3e5a93006b99197fac1fabbd50, "tg3: Consider
rx_std_prod_idx a hw mailbox" changed how the driver tracks the rx
producer ring updates, but it does not make any special considerations
for the above-mentioned devices.  For those devices, it is possible for
the driver to hit the special case path, which updates the hardware
mailbox register but skips updating the shadow software mailbox member.
If the special case path represents the final mailbox update for this
ISR iteration, the hardware and software mailbox values will be out of
sync.  Ultimately, this will cause the driver to use a stale mailbox
value on the next iteration, which will appear to the hardware as a
large rx buffer update.  Bad things ensue.

The fix is to update the software shadow mailbox member when the special
case path is taken.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
---
 drivers/net/tg3.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)
Dmitry Torokhov - Jan. 8, 2010, 10:50 p.m.
On Fri, Jan 08, 2010 at 11:42:22AM -0800, Matt Carlson wrote:
> 
> Ah.  I may have found the problem.  Can you apply the below patch on top
> of the patch you already applied and see if this fixes the problem?
> 

Looks very promising...2+ hours and I haven't seen a single lockup.
Dmitry Torokhov - Jan. 9, 2010, 6:30 a.m.
Hi Matt,

On Fri, Jan 08, 2010 at 02:50:25PM -0800, Dmitry Torokhov wrote:
> On Fri, Jan 08, 2010 at 11:42:22AM -0800, Matt Carlson wrote:
> > 
> > Ah.  I may have found the problem.  Can you apply the below patch on top
> > of the patch you already applied and see if this fixes the problem?
> > 
> 
> Looks very promising...2+ hours and I haven't seen a single lockup.
> 

So I haven't had a single lockup since applying this patch so that is
it. Thank you very much for the quick fix and please make sure it gets
upstream.
David Miller - Jan. 10, 2010, 9:52 p.m.
From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Fri, 8 Jan 2010 11:42:22 -0800

> [PATCH] tg3: Fix std rx prod ring handling
> 
> There are some tg3 devices that require the driver to post new rx
> buffers in smaller increments.  Commit
> 4361935afe3abc3e5a93006b99197fac1fabbd50, "tg3: Consider
> rx_std_prod_idx a hw mailbox" changed how the driver tracks the rx
> producer ring updates, but it does not make any special considerations
> for the above-mentioned devices.  For those devices, it is possible for
> the driver to hit the special case path, which updates the hardware
> mailbox register but skips updating the shadow software mailbox member.
> If the special case path represents the final mailbox update for this
> ISR iteration, the hardware and software mailbox values will be out of
> sync.  Ultimately, this will cause the driver to use a stale mailbox
> value on the next iteration, which will appear to the hardware as a
> large rx buffer update.  Bad things ensue.
> 
> The fix is to update the software shadow mailbox member when the special
> case path is taken.
> 
> Signed-off-by: Matt Carlson <mcarlson@broadcom.com>

Matt, since we have positive testing, want me to apply this?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matt Carlson - Jan. 11, 2010, 5:59 p.m.
On Sun, Jan 10, 2010 at 01:52:42PM -0800, David Miller wrote:
> From: "Matt Carlson" <mcarlson@broadcom.com>
> Date: Fri, 8 Jan 2010 11:42:22 -0800
> 
> > [PATCH] tg3: Fix std rx prod ring handling
> > 
> > There are some tg3 devices that require the driver to post new rx
> > buffers in smaller increments.  Commit
> > 4361935afe3abc3e5a93006b99197fac1fabbd50, "tg3: Consider
> > rx_std_prod_idx a hw mailbox" changed how the driver tracks the rx
> > producer ring updates, but it does not make any special considerations
> > for the above-mentioned devices.  For those devices, it is possible for
> > the driver to hit the special case path, which updates the hardware
> > mailbox register but skips updating the shadow software mailbox member.
> > If the special case path represents the final mailbox update for this
> > ISR iteration, the hardware and software mailbox values will be out of
> > sync.  Ultimately, this will cause the driver to use a stale mailbox
> > value on the next iteration, which will appear to the hardware as a
> > large rx buffer update.  Bad things ensue.
> > 
> > The fix is to update the software shadow mailbox member when the special
> > case path is taken.
> > 
> > Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
> 
> Matt, since we have positive testing, want me to apply this?

I have a small patchset queued up for net-2.6.  Let me just include this
one into the mix and submit it all at once.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Jan. 11, 2010, 9:39 p.m.
From: "Matt Carlson" <mcarlson@broadcom.com>
Date: Mon, 11 Jan 2010 09:59:11 -0800

> On Sun, Jan 10, 2010 at 01:52:42PM -0800, David Miller wrote:
>> From: "Matt Carlson" <mcarlson@broadcom.com>
>> Date: Fri, 8 Jan 2010 11:42:22 -0800
>> 
>> Matt, since we have positive testing, want me to apply this?
> 
> I have a small patchset queued up for net-2.6.  Let me just include this
> one into the mix and submit it all at once.

Ok.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c
index 5c77e6a..4e8b87d 100644
--- a/drivers/net/tg3.c
+++ b/drivers/net/tg3.c
@@ -4693,8 +4693,9 @@  next_pkt:
 		(*post_ptr)++;
 
 		if (unlikely(rx_std_posted >= tp->rx_std_max_post)) {
-			u32 idx = *post_ptr % TG3_RX_RING_SIZE;
-			tw32_rx_mbox(TG3_RX_STD_PROD_IDX_REG, idx);
+			tpr->rx_std_prod_idx = std_prod_idx % TG3_RX_RING_SIZE;
+			tw32_rx_mbox(TG3_RX_STD_PROD_IDX_REG,
+				     tpr->rx_std_prod_idx);
 			work_mask &= ~RXD_OPAQUE_RING_STD;
 			rx_std_posted = 0;
 		}