Patchwork r8169: revert "add byte queue limit support".

login
register
mail settings
Submitter françois romieu
Date July 23, 2012, 8:55 p.m.
Message ID <20120723205555.GA4392@electric-eye.fr.zoreil.com>
Download mbox | patch
Permalink /patch/172747/
State Accepted
Delegated to: David Miller
Headers show

Comments

françois romieu - July 23, 2012, 8:55 p.m.
This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.

First it appears in bisection, then reverting it solves the usual
netdev watchdog problem for different people. I don't have a proper
fix yet so get rid of it.

Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Hayes Wang <hayeswang@realtek.com>
---

 The original 036da... commit has been modified due to the newly introduced
 skb_tx_timestamp in rtl8169_start_xmit. The herein included patch qualifies
 for 3.4-stable as well.

 drivers/net/ethernet/realtek/r8169.c |   27 +++++----------------------
 1 file changed, 5 insertions(+), 22 deletions(-)
David Miller - July 24, 2012, 1:27 a.m.
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Mon, 23 Jul 2012 22:55:55 +0200

> This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> 
> First it appears in bisection, then reverting it solves the usual
> netdev watchdog problem for different people. I don't have a proper
> fix yet so get rid of it.
> 
> Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Cc: Josh Boyer <jwboyer@redhat.com>
> Cc: Hayes Wang <hayeswang@realtek.com>
> ---
> 
>  The original 036da... commit has been modified due to the newly introduced
>  skb_tx_timestamp in rtl8169_start_xmit. The herein included patch qualifies
>  for 3.4-stable as well.

Applied to net-next and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - July 24, 2012, 5:06 a.m.
On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> 
> First it appears in bisection, then reverting it solves the usual
> netdev watchdog problem for different people. I don't have a proper
> fix yet so get rid of it.
> 
> Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> Cc: Josh Boyer <jwboyer@redhat.com>
> Cc: Hayes Wang <hayeswang@realtek.com>
> ---

bisection is not always the right way to qualify a problem.

BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565

Is there an easy way to reproduce the problem ?

Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - July 24, 2012, 5:09 a.m.
On Tue, 2012-07-24 at 07:06 +0200, Eric Dumazet wrote:
> On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> > 
> > First it appears in bisection, then reverting it solves the usual
> > netdev watchdog problem for different people. I don't have a proper
> > fix yet so get rid of it.
> > 
> > Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com>
> > Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
> > Cc: Josh Boyer <jwboyer@redhat.com>
> > Cc: Hayes Wang <hayeswang@realtek.com>
> > ---
> 
> bisection is not always the right way to qualify a problem.
> 
> BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565
> 
> Is there an easy way to reproduce the problem ?
> 
> Thanks
> 

BQL fixes are :

commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d
Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date:   Wed May 30 12:25:37 2012 +0000

    bql: Avoid possible inconsistent calculation.
    
    dql->num_queued could change while processing dql_completed().
    To provide consistent calculation, added an on stack variable.
    
    Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
    Cc: Tom Herbert <therbert@google.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: Denys Fedoryshchenko <denys@visp.net.lb>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 25426b794efdc70dde7fd3134dc56fac3e7d562d
Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date:   Wed May 30 12:25:19 2012 +0000

    bql: Avoid unneeded limit decrement.
    
    When below pattern is observed,
    
                                                   TIME
           dql_queued()         dql_completed()     |
          a) initial state                          |
                                                    |
          b) X bytes queued                         V
    
          c) Y bytes queued
                               d) X bytes completed
          e) Z bytes queued
                               f) Y bytes completed
    
    a) dql->limit has already some value and there is no in-flight
packet.
    b) X bytes queued.
    c) Y bytes queued and excess limit.
    d) X bytes completed and dql->prev_ovlimit is set and also
       dql->prev_num_queued is set Y.
    e) Z bytes queued.
    f) Y bytes completed. inprogress and prev_inprogress are true.
    
    At f), according to the comment, all_prev_completed becomes
    true and limit should be increased. But POSDIFF() ignores
    (completed == dql->prev_num_queued) case, so limit is decreased.
    
    Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
    Cc: Tom Herbert <therbert@google.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: Denys Fedoryshchenko <denys@visp.net.lb>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

commit 0cfd32b736ae0c36b42697584811042726c07cba
Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date:   Wed May 30 12:24:39 2012 +0000

    bql: Fix POSDIFF() to integer overflow aware.
    
    POSDIFF() fails to take into account integer overflow case.
    
    Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
    Cc: Tom Herbert <therbert@google.com>
    Cc: Eric Dumazet <eric.dumazet@gmail.com>
    Cc: Denys Fedoryshchenko <denys@visp.net.lb>
    Acked-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
françois romieu - July 24, 2012, 5:38 a.m.
Eric Dumazet <eric.dumazet@gmail.com> :
> On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
[...]
> bisection is not always the right way to qualify a problem.

I know. At some point I switch from "I could search more" to "users situation
will improve in a definite timeframe".

> BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565

Thanks.

They are in stable as of 3.4.5:

commit 4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e
Author:     Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
AuthorDate: Wed May 30 12:25:37 2012 +0000
Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Mon Jul 16 09:03:43 2012 -0700

    bql: Avoid possible inconsistent calculation.
    
    [ Upstream commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d ]
[...]
commit 1414a53d956340ca8b1b27e05ab94ba63e82ed97
Author:     Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
AuthorDate: Wed May 30 12:25:19 2012 +0000
Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CommitDate: Mon Jul 16 09:03:43 2012 -0700

    bql: Avoid unneeded limit decrement.

I have obviously not directed users at them and I do not see any
of the victims using a non -stable / -vendor or recent enough
kernel to test this patch since the issue has been reported.

They are both worth testing.

> Is there an easy way to reproduce the problem ?

None here :o(
Josh Boyer - July 24, 2012, 11:11 a.m.
On Tue, Jul 24, 2012 at 07:38:11AM +0200, Francois Romieu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> [...]
> > bisection is not always the right way to qualify a problem.
> 
> I know. At some point I switch from "I could search more" to "users situation
> will improve in a definite timeframe".
> 
> > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565
> 
> Thanks.
> 
> They are in stable as of 3.4.5:
> 
> commit 4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e
> Author:     Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> AuthorDate: Wed May 30 12:25:37 2012 +0000
> Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CommitDate: Mon Jul 16 09:03:43 2012 -0700
> 
>     bql: Avoid possible inconsistent calculation.
>     
>     [ Upstream commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d ]
> [...]
> commit 1414a53d956340ca8b1b27e05ab94ba63e82ed97
> Author:     Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
> AuthorDate: Wed May 30 12:25:19 2012 +0000
> Commit:     Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CommitDate: Mon Jul 16 09:03:43 2012 -0700
> 
>     bql: Avoid unneeded limit decrement.
> 
> I have obviously not directed users at them and I do not see any
> of the victims using a non -stable / -vendor or recent enough
> kernel to test this patch since the issue has been reported.
> 
> They are both worth testing.

Fedora has 3.4.5 in the F16 updates-testing repo.  F17 is already on
3.4.6 in stable updates.  Users should be able to use those kernels for
testing.  F16 will be getting 3.4.6 submitted this morning for
updates-testing.

josh
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stefan Lippers-Hollmann - July 27, 2012, 3:36 a.m.
Hi

On Tuesday 24 July 2012, Francois Romieu wrote:
> Eric Dumazet <eric.dumazet@gmail.com> :
> > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> [...]
> > bisection is not always the right way to qualify a problem.
> 
> I know. At some point I switch from "I could search more" to "users situation
> will improve in a definite timeframe".
> 
> > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565
> 
> Thanks.
> 
> They are in stable as of 3.4.5:
[…]
> I have obviously not directed users at them and I do not see any
> of the victims using a non -stable / -vendor or recent enough
> kernel to test this patch since the issue has been reported.
> 
> They are both worth testing.
[…]

3.4.x up to and including 3.4.4 exposed the problem on these cards[1]:

r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 00:24:1d:72:7c:75, XID 081000c0 IRQ 44
r8169 0000:05:00.0: eth1: RTL8168d/8111d at 0xffffc90000c70000, 00:24:1d:72:7c:77, XID 081000c0 IRQ 45

while it is stable with "add byte queue limit support" reverted; 3.4.5+
was only tested with 036dafa28da1e2565a8529de2ae663c37b7a0060 reverted.

Now testing plain 3.5.0, which still includes it, has been reliable for
almost 3 days - while the issue usually triggered within one hour (3 
hours at most) in 3.4.[0-4]. It might be a little too early to give a
definitive answer, but so far r8169/ 3.5.0 looks positive.

Regards
	Stefan Lippers-Hollmann

[1]	Message-Id: <201206290131.49150.s.L-H@gmx.de>
	http://lkml.kernel.org/r/<201206290131.49150.s.L-H@gmx.de>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - July 27, 2012, 5:27 a.m.
On Fri, 2012-07-27 at 05:36 +0200, Stefan Lippers-Hollmann wrote:
> Hi
> 
> On Tuesday 24 July 2012, Francois Romieu wrote:
> > Eric Dumazet <eric.dumazet@gmail.com> :
> > > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote:
> > > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060.
> > [...]
> > > bisection is not always the right way to qualify a problem.
> > 
> > I know. At some point I switch from "I could search more" to "users situation
> > will improve in a definite timeframe".
> > 
> > > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565
> > 
> > Thanks.
> > 
> > They are in stable as of 3.4.5:
> […]
> > I have obviously not directed users at them and I do not see any
> > of the victims using a non -stable / -vendor or recent enough
> > kernel to test this patch since the issue has been reported.
> > 
> > They are both worth testing.
> […]
> 
> 3.4.x up to and including 3.4.4 exposed the problem on these cards[1]:
> 
> r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 00:24:1d:72:7c:75, XID 081000c0 IRQ 44
> r8169 0000:05:00.0: eth1: RTL8168d/8111d at 0xffffc90000c70000, 00:24:1d:72:7c:77, XID 081000c0 IRQ 45
> 
> while it is stable with "add byte queue limit support" reverted; 3.4.5+
> was only tested with 036dafa28da1e2565a8529de2ae663c37b7a0060 reverted.
> 
> Now testing plain 3.5.0, which still includes it, has been reliable for
> almost 3 days - while the issue usually triggered within one hour (3 
> hours at most) in 3.4.[0-4]. It might be a little too early to give a
> definitive answer, but so far r8169/ 3.5.0 looks positive.
> 
> Regards
> 	Stefan Lippers-Hollmann
> 
> [1]	Message-Id: <201206290131.49150.s.L-H@gmx.de>
> 	http://lkml.kernel.org/r/<201206290131.49150.s.L-H@gmx.de>

Thats real good news, thanks !


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index d7a04e0..eb81da4 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5380,7 +5380,6 @@  static void rtl8169_tx_clear(struct rtl8169_private *tp)
 {
 	rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
 	tp->cur_tx = tp->dirty_tx = 0;
-	netdev_reset_queue(tp->dev);
 }
 
 static void rtl_reset_work(struct rtl8169_private *tp)
@@ -5535,8 +5534,6 @@  static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
-	netdev_sent_queue(dev, skb->len);
-
 	skb_tx_timestamp(skb);
 
 	wmb();
@@ -5633,16 +5630,9 @@  static void rtl8169_pcierr_interrupt(struct net_device *dev)
 	rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
 }
 
-struct rtl_txc {
-	int packets;
-	int bytes;
-};
-
 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 {
-	struct rtl8169_stats *tx_stats = &tp->tx_stats;
 	unsigned int dirty_tx, tx_left;
-	struct rtl_txc txc = { 0, 0 };
 
 	dirty_tx = tp->dirty_tx;
 	smp_rmb();
@@ -5661,24 +5651,17 @@  static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
-			struct sk_buff *skb = tx_skb->skb;
-
-			txc.packets++;
-			txc.bytes += skb->len;
-			dev_kfree_skb(skb);
+			u64_stats_update_begin(&tp->tx_stats.syncp);
+			tp->tx_stats.packets++;
+			tp->tx_stats.bytes += tx_skb->skb->len;
+			u64_stats_update_end(&tp->tx_stats.syncp);
+			dev_kfree_skb(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
 		dirty_tx++;
 		tx_left--;
 	}
 
-	u64_stats_update_begin(&tx_stats->syncp);
-	tx_stats->packets += txc.packets;
-	tx_stats->bytes += txc.bytes;
-	u64_stats_update_end(&tx_stats->syncp);
-
-	netdev_completed_queue(dev, txc.packets, txc.bytes);
-
 	if (tp->dirty_tx != dirty_tx) {
 		tp->dirty_tx = dirty_tx;
 		/* Sync with rtl8169_start_xmit: