diff mbox

[net] e1000e: Change wthresh to 1 to avoid possible Tx stalls.

Message ID 20120606174355.823e9aa7.shimoda.hiroaki@gmail.com
State Awaiting Upstream, archived
Delegated to: David Miller
Headers show

Commit Message

Hiroaki SHIMODA June 6, 2012, 8:43 a.m. UTC
Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.

e1000e has WTHRESH which determines when Tx descripters are written
back and successive Tx interrupts are generated, and setting WTHRESH
to 5 gives efficient bus utilization but this cause possible Tx stalls,
especially on BQL enabled system.

To avoid possible Tx stalls, change WTHRESH to 1.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
---
 drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
 drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

Comments

Kirsher, Jeffrey T June 6, 2012, 8:46 a.m. UTC | #1
On Wed, 2012-06-06 at 17:43 +0900, Hiroaki SHIMODA wrote:
> Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.
> 
> e1000e has WTHRESH which determines when Tx descripters are written
> back and successive Tx interrupts are generated, and setting WTHRESH
> to 5 gives efficient bus utilization but this cause possible Tx
> stalls,
> especially on BQL enabled system.
> 
> To avoid possible Tx stalls, change WTHRESH to 1.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-) 

Thanks! I will add this to my queue of e1000e patches.
Eric Dumazet June 6, 2012, 8:50 a.m. UTC | #2
On Wed, 2012-06-06 at 17:43 +0900, Hiroaki SHIMODA wrote:
> Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.
> 
> e1000e has WTHRESH which determines when Tx descripters are written
> back and successive Tx interrupts are generated, and setting WTHRESH
> to 5 gives efficient bus utilization but this cause possible Tx stalls,
> especially on BQL enabled system.
> 
> To avoid possible Tx stalls, change WTHRESH to 1.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-)

Thanks a lot !

Acked-by: Eric Dumazet <edumazet@google.com>


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T June 7, 2012, 12:59 a.m. UTC | #3
On Wed, 2012-06-06 at 01:43 -0700, Hiroaki SHIMODA wrote:
> Denys Fedoryshchenko reported Tx stalls on e1000e with BQL enabled.
> 
> e1000e has WTHRESH which determines when Tx descripters are written
> back and successive Tx interrupts are generated, and setting WTHRESH
> to 5 gives efficient bus utilization but this cause possible Tx
> stalls,
> especially on BQL enabled system.
> 
> To avoid possible Tx stalls, change WTHRESH to 1.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Tested-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> ---
>  drivers/net/ethernet/intel/e1000e/e1000.h  |    6 +++---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    2 +-
>  2 files changed, 4 insertions(+), 4 deletions(-) 

After further internal review, NACK.

This patch will cause unacceptable performance issues with non-ESB2
parts.

I am dropping this patch from my queue.
David Miller June 7, 2012, 1:34 a.m. UTC | #4
From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 06 Jun 2012 17:59:12 -0700

> This patch will cause unacceptable performance issues with non-ESB2
> parts.

You have to fix the regression a non-1 setting causes, performance
is secondary.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 7, 2012, 4:24 a.m. UTC | #5
On Wed, 2012-06-06 at 17:59 -0700, Jeff Kirsher wrote:

> After further internal review, NACK.
> 
> This patch will cause unacceptable performance issues with non-ESB2
> parts.
> 
> I am dropping this patch from my queue.
> 

I'd like you share your performance numbers before NACKing this patch.

What is the alternative patch you guys have ?



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kirsher, Jeffrey T June 7, 2012, 4:52 a.m. UTC | #6
On Thu, 2012-06-07 at 06:24 +0200, Eric Dumazet wrote:
> On Wed, 2012-06-06 at 17:59 -0700, Jeff Kirsher wrote:
> 
> > After further internal review, NACK.
> > 
> > This patch will cause unacceptable performance issues with non-ESB2
> > parts.
> > 
> > I am dropping this patch from my queue.
> > 
> 
> I'd like you share your performance numbers before NACKing this patch.
> 
> What is the alternative patch you guys have ?
> 

Jesse did not share any performance numbers with me, I am sure he can
give some background tomorrow when he is back online.

I am working on an alternative patch now and should have something to
share tomorrow.
Eric Dumazet June 7, 2012, 5:03 a.m. UTC | #7
On Wed, 2012-06-06 at 21:52 -0700, Jeff Kirsher wrote:

> Jesse did not share any performance numbers with me, I am sure he can
> give some background tomorrow when he is back online.
> 
> I am working on an alternative patch now and should have something to
> share tomorrow.

Thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Reppin Oct. 9, 2012, 12:25 a.m. UTC | #8
Jeff Kirsher <jeffrey.t.kirsher <at> intel.com> writes:
> 
> On Thu, 2012-06-07 at 06:24 +0200, Eric Dumazet wrote:
> > On Wed, 2012-06-06 at 17:59 -0700, Jeff Kirsher wrote:
> > 
> > > After further internal review, NACK.
> > > 
> > > This patch will cause unacceptable performance issues with non-ESB2
> > > parts.
> > > 
> > > I am dropping this patch from my queue.
> > > 
> > 
> > I'd like you share your performance numbers before NACKing this patch.
> > 
> > What is the alternative patch you guys have ?
> > 
> 
> Jesse did not share any performance numbers with me, I am sure he can
> give some background tomorrow when he is back online.
> 
> I am working on an alternative patch now and should have something to
> share tomorrow.
Please allow me to ask if there's any progess here?

I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
and could still observe severe latency (up to 3000ms) spikes.

Applying Hiroakis suggested patch did fix this for me as well.
[please note as well that I didn't had this issue in any 3.4.x kernel
before - so +1 for fixing the regression]

Thankyou!
Frank Reppin
Eric Dumazet Oct. 9, 2012, 6:07 a.m. UTC | #9
On Tue, 2012-10-09 at 00:25 +0000, Frank Reppin wrote:
> Jeff Kirsher <jeffrey.t.kirsher <at> intel.com> writes:
> > 
> > On Thu, 2012-06-07 at 06:24 +0200, Eric Dumazet wrote:
> > > On Wed, 2012-06-06 at 17:59 -0700, Jeff Kirsher wrote:
> > > 
> > > > After further internal review, NACK.
> > > > 
> > > > This patch will cause unacceptable performance issues with non-ESB2
> > > > parts.
> > > > 
> > > > I am dropping this patch from my queue.
> > > > 
> > > 
> > > I'd like you share your performance numbers before NACKing this patch.
> > > 
> > > What is the alternative patch you guys have ?
> > > 
> > 
> > Jesse did not share any performance numbers with me, I am sure he can
> > give some background tomorrow when he is back online.
> > 
> > I am working on an alternative patch now and should have something to
> > share tomorrow.
> Please allow me to ask if there's any progess here?
> 
> I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
> and could still observe severe latency (up to 3000ms) spikes.
> 
> Applying Hiroakis suggested patch did fix this for me as well.
> [please note as well that I didn't had this issue in any 3.4.x kernel
> before - so +1 for fixing the regression]

Its a shame this bug is not yet fixed.

I dont know what to say.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesse Brandeburg Oct. 9, 2012, 5:36 p.m. UTC | #10
> > > Jesse did not share any performance numbers with me, I am sure he can
> > > give some background tomorrow when he is back online.
> > > 
> > > I am working on an alternative patch now and should have something to
> > > share tomorrow.
> > Please allow me to ask if there's any progess here?
> > 
> > I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
> > and could still observe severe latency (up to 3000ms) spikes.
> > 
> > Applying Hiroakis suggested patch did fix this for me as well.
> > [please note as well that I didn't had this issue in any 3.4.x kernel
> > before - so +1 for fixing the regression]

I'm not sure what went wrong internally here that this hasn't been
fixed, and I'm personally embarrassed.  I am working on it until I have
a patch/solution.

currently am trying to reproduce the issue, am in some weird how to
use BQL limbo, the lack of documentation on user usage of BQL is slowing
me down.

Hints or clues (I'm trying to follow the repro steps mentioned in
some related threads) are appreciated.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 9, 2012, 9:02 p.m. UTC | #11
On Tue, 2012-10-09 at 10:36 -0700, Jesse Brandeburg wrote:
> > > > Jesse did not share any performance numbers with me, I am sure he can
> > > > give some background tomorrow when he is back online.
> > > > 
> > > > I am working on an alternative patch now and should have something to
> > > > share tomorrow.
> > > Please allow me to ask if there's any progess here?
> > > 
> > > I've tried 3.5.4 a couple of days ago on a SuperMicro X8SIE-LN4 (82574L)
> > > and could still observe severe latency (up to 3000ms) spikes.
> > > 
> > > Applying Hiroakis suggested patch did fix this for me as well.
> > > [please note as well that I didn't had this issue in any 3.4.x kernel
> > > before - so +1 for fixing the regression]
> 
> I'm not sure what went wrong internally here that this hasn't been
> fixed, and I'm personally embarrassed.  I am working on it until I have
> a patch/solution.
> 
> currently am trying to reproduce the issue, am in some weird how to
> use BQL limbo, the lack of documentation on user usage of BQL is slowing
> me down.
> 

BQL is blocking qdisc from delivering additional packet to TX as long as
previous packets were not completed. Not sure what you want to know
about BQL.

> Hints or clues (I'm trying to follow the repro steps mentioned in
> some related threads) are appreciated.


Problem is BQL depends on TX completion being done in a reasonable time.

It seems e1000e can hold an skb in TX ring for up to 3000ms (not
reasonable it seems)

Aggregation / coalescing is fine, as long as we dont hold a packet too
long, in case no other packet follows.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andre Tomt Oct. 9, 2012, 10:48 p.m. UTC | #12
On 09. okt. 2012 19:36, Jesse Brandeburg wrote:
> I'm not sure what went wrong internally here that this hasn't been
> fixed, and I'm personally embarrassed.  I am working on it until I have
> a patch/solution.
>
> currently am trying to reproduce the issue, am in some weird how to
> use BQL limbo, the lack of documentation on user usage of BQL is slowing
> me down.
>
> Hints or clues (I'm trying to follow the repro steps mentioned in
> some related threads) are appreciated.

I found it simplest to reproduce when doing forwarding, and *not* 
saturating the interface doing the TX. 100Mbps forwarding on gigabit 
link triggered it in seconds. Doing gigabit forwarding speeds (~980Mbps) 
did not trigger anything.

Setup looked somewhat like this, GE beeing gigabit link, FE 100Mbps;

reciever PC (iperf -s)
     |
     GE
     |
    eth0   <- TX lockups
router with 2*e1000e
    eth1
     |
     GE
     |
switch
     |
     FE
     |
source PC (iperf -c recieverPC)

I don't recall all the details anymore, but I'm fairly certain I didnt 
use any non-default qdiscs to reproduce - eg just pfifo_fast (usually 
doing fq_codel though).

For the bug to manifest itself was somewhat dependent on GRO and TSO in 
kernel 3.5, but with 3.6 it didnt matter anymore (at least the rc's).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frank Reppin Oct. 10, 2012, 1:25 a.m. UTC | #13
On 10.10.2012 00:48, Andre Tomt wrote:
> On 09. okt. 2012 19:36, Jesse Brandeburg wrote:
[...]
>> Hints or clues (I'm trying to follow the repro steps mentioned in
>> some related threads) are appreciated.
In our scenario - the SuperMicro X8SIE-LN4 acts
as a router.

eth0 -> uplink
eth1 -> core servers network 192.168.a.b/24
eth2 -> developer employees network 192.168.c.d/24

eth1 and eth2 are hooked into (two different) HP2510-24G
switches (they both show no errors at all).

The switch connected to eth2 connects a bunch of
developers doing day-2-day stuff (commits/checkouts/documentation/
listening to stream music/surfing/email) and the such.

Continously running a ping from ie. 192.168.c.d/24 to
192.168.a.b/24 visualizes those spikes (mostly triplets, where
the first echo reply is quite high and the two following replies
are each around 50 percent faster - but mostly still >500ms) - whenever 
these spikes happen, it feels like the whole net 'stalls'.
Just like Denys Fedoryshchenko stated in

http://www.mail-archive.com/e1000-devel@lists.sourceforge.net/msg05517.html

HTH,
Frank Reppin
diff mbox

Patch

diff --git a/drivers/net/ethernet/intel/e1000e/e1000.h b/drivers/net/ethernet/intel/e1000e/e1000.h
index 6e6fffb..dc28078 100644
--- a/drivers/net/ethernet/intel/e1000e/e1000.h
+++ b/drivers/net/ethernet/intel/e1000e/e1000.h
@@ -175,13 +175,13 @@  struct e1000_info;
 /*
  * in the case of WTHRESH, it appears at least the 82571/2 hardware
  * writes back 4 descriptors when WTHRESH=5, and 3 descriptors when
- * WTHRESH=4, and since we want 64 bytes at a time written back, set
- * it to 5
+ * WTHRESH=4, so setting 5 gives most efficient bus utilization but
+ * to avoid possible Tx hangs, set it to 1
  */
 #define E1000_TXDCTL_DMA_BURST_ENABLE                          \
 	(E1000_TXDCTL_GRAN | /* set descriptor granularity */  \
 	 E1000_TXDCTL_COUNT_DESC |                             \
-	 (5 << 16) | /* wthresh must be +1 more than desired */\
+	 (1 << 16) | /* wthresh must be +1 more than desired */\
 	 (1 << 8)  | /* hthresh */                             \
 	 0x1f)       /* pthresh */
 
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index a4b0435..b031312 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -2806,7 +2806,7 @@  static void e1000_configure_tx(struct e1000_adapter *adapter)
 		 * set up some performance related parameters to encourage the
 		 * hardware to use the bus more efficiently in bursts, depends
 		 * on the tx_int_delay to be enabled,
-		 * wthresh = 5 ==> burst write a cacheline (64 bytes) at a time
+		 * wthresh = 1 ==> burst write is disabled to consider Tx hangs
 		 * hthresh = 1 ==> prefetch when one or more available
 		 * pthresh = 0x1f ==> prefetch if internal cache 31 or less
 		 * BEWARE: this seems to work but should be considered first if