Message ID | 20120723205555.GA4392@electric-eye.fr.zoreil.com |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
From: Francois Romieu <romieu@fr.zoreil.com> Date: Mon, 23 Jul 2012 22:55:55 +0200 > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > > First it appears in bisection, then reverting it solves the usual > netdev watchdog problem for different people. I don't have a proper > fix yet so get rid of it. > > Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com> > Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> > Cc: Josh Boyer <jwboyer@redhat.com> > Cc: Hayes Wang <hayeswang@realtek.com> > --- > > The original 036da... commit has been modified due to the newly introduced > skb_tx_timestamp in rtl8169_start_xmit. The herein included patch qualifies > for 3.4-stable as well. Applied to net-next and queued up for -stable, thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > > First it appears in bisection, then reverting it solves the usual > netdev watchdog problem for different people. I don't have a proper > fix yet so get rid of it. > > Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com> > Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> > Cc: Josh Boyer <jwboyer@redhat.com> > Cc: Hayes Wang <hayeswang@realtek.com> > --- bisection is not always the right way to qualify a problem. BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 Is there an easy way to reproduce the problem ? Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-07-24 at 07:06 +0200, Eric Dumazet wrote: > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > > > > First it appears in bisection, then reverting it solves the usual > > netdev watchdog problem for different people. I don't have a proper > > fix yet so get rid of it. > > > > Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com> > > Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> > > Cc: Josh Boyer <jwboyer@redhat.com> > > Cc: Hayes Wang <hayeswang@realtek.com> > > --- > > bisection is not always the right way to qualify a problem. > > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 > > Is there an easy way to reproduce the problem ? > > Thanks > BQL fixes are : commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Date: Wed May 30 12:25:37 2012 +0000 bql: Avoid possible inconsistent calculation. dql->num_queued could change while processing dql_completed(). To provide consistent calculation, added an on stack variable. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 25426b794efdc70dde7fd3134dc56fac3e7d562d Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Date: Wed May 30 12:25:19 2012 +0000 bql: Avoid unneeded limit decrement. When below pattern is observed, TIME dql_queued() dql_completed() | a) initial state | | b) X bytes queued V c) Y bytes queued d) X bytes completed e) Z bytes queued f) Y bytes completed a) dql->limit has already some value and there is no in-flight packet. b) X bytes queued. c) Y bytes queued and excess limit. d) X bytes completed and dql->prev_ovlimit is set and also dql->prev_num_queued is set Y. e) Z bytes queued. f) Y bytes completed. inprogress and prev_inprogress are true. At f), according to the comment, all_prev_completed becomes true and limit should be increased. But POSDIFF() ignores (completed == dql->prev_num_queued) case, so limit is decreased. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Denys Fedoryshchenko <denys@visp.net.lb> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> commit 0cfd32b736ae0c36b42697584811042726c07cba Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Date: Wed May 30 12:24:39 2012 +0000 bql: Fix POSDIFF() to integer overflow aware. POSDIFF() fails to take into account integer overflow case. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Cc: Tom Herbert <therbert@google.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Denys Fedoryshchenko <denys@visp.net.lb> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <eric.dumazet@gmail.com> : > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. [...] > bisection is not always the right way to qualify a problem. I know. At some point I switch from "I could search more" to "users situation will improve in a definite timeframe". > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 Thanks. They are in stable as of 3.4.5: commit 4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> AuthorDate: Wed May 30 12:25:37 2012 +0000 Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CommitDate: Mon Jul 16 09:03:43 2012 -0700 bql: Avoid possible inconsistent calculation. [ Upstream commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d ] [...] commit 1414a53d956340ca8b1b27e05ab94ba63e82ed97 Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> AuthorDate: Wed May 30 12:25:19 2012 +0000 Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CommitDate: Mon Jul 16 09:03:43 2012 -0700 bql: Avoid unneeded limit decrement. I have obviously not directed users at them and I do not see any of the victims using a non -stable / -vendor or recent enough kernel to test this patch since the issue has been reported. They are both worth testing. > Is there an easy way to reproduce the problem ? None here :o(
On Tue, Jul 24, 2012 at 07:38:11AM +0200, Francois Romieu wrote: > Eric Dumazet <eric.dumazet@gmail.com> : > > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > [...] > > bisection is not always the right way to qualify a problem. > > I know. At some point I switch from "I could search more" to "users situation > will improve in a definite timeframe". > > > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 > > Thanks. > > They are in stable as of 3.4.5: > > commit 4f4bdaeb40df95499c1ee7ea3fbca9d76174a59e > Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> > AuthorDate: Wed May 30 12:25:37 2012 +0000 > Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > CommitDate: Mon Jul 16 09:03:43 2012 -0700 > > bql: Avoid possible inconsistent calculation. > > [ Upstream commit 914bec1011a25f65cdc94988a6f974bfb9a3c10d ] > [...] > commit 1414a53d956340ca8b1b27e05ab94ba63e82ed97 > Author: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> > AuthorDate: Wed May 30 12:25:19 2012 +0000 > Commit: Greg Kroah-Hartman <gregkh@linuxfoundation.org> > CommitDate: Mon Jul 16 09:03:43 2012 -0700 > > bql: Avoid unneeded limit decrement. > > I have obviously not directed users at them and I do not see any > of the victims using a non -stable / -vendor or recent enough > kernel to test this patch since the issue has been reported. > > They are both worth testing. Fedora has 3.4.5 in the F16 updates-testing repo. F17 is already on 3.4.6 in stable updates. Users should be able to use those kernels for testing. F16 will be getting 3.4.6 submitted this morning for updates-testing. josh -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi On Tuesday 24 July 2012, Francois Romieu wrote: > Eric Dumazet <eric.dumazet@gmail.com> : > > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > [...] > > bisection is not always the right way to qualify a problem. > > I know. At some point I switch from "I could search more" to "users situation > will improve in a definite timeframe". > > > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 > > Thanks. > > They are in stable as of 3.4.5: […] > I have obviously not directed users at them and I do not see any > of the victims using a non -stable / -vendor or recent enough > kernel to test this patch since the issue has been reported. > > They are both worth testing. […] 3.4.x up to and including 3.4.4 exposed the problem on these cards[1]: r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 00:24:1d:72:7c:75, XID 081000c0 IRQ 44 r8169 0000:05:00.0: eth1: RTL8168d/8111d at 0xffffc90000c70000, 00:24:1d:72:7c:77, XID 081000c0 IRQ 45 while it is stable with "add byte queue limit support" reverted; 3.4.5+ was only tested with 036dafa28da1e2565a8529de2ae663c37b7a0060 reverted. Now testing plain 3.5.0, which still includes it, has been reliable for almost 3 days - while the issue usually triggered within one hour (3 hours at most) in 3.4.[0-4]. It might be a little too early to give a definitive answer, but so far r8169/ 3.5.0 looks positive. Regards Stefan Lippers-Hollmann [1] Message-Id: <201206290131.49150.s.L-H@gmx.de> http://lkml.kernel.org/r/<201206290131.49150.s.L-H@gmx.de> -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2012-07-27 at 05:36 +0200, Stefan Lippers-Hollmann wrote: > Hi > > On Tuesday 24 July 2012, Francois Romieu wrote: > > Eric Dumazet <eric.dumazet@gmail.com> : > > > On Mon, 2012-07-23 at 22:55 +0200, Francois Romieu wrote: > > > > This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. > > [...] > > > bisection is not always the right way to qualify a problem. > > > > I know. At some point I switch from "I could search more" to "users situation > > will improve in a definite timeframe". > > > > > BQL in itself had some fixes coming _after_ commit 036dafa28da1e2565 > > > > Thanks. > > > > They are in stable as of 3.4.5: > […] > > I have obviously not directed users at them and I do not see any > > of the victims using a non -stable / -vendor or recent enough > > kernel to test this patch since the issue has been reported. > > > > They are both worth testing. > […] > > 3.4.x up to and including 3.4.4 exposed the problem on these cards[1]: > > r8169 0000:04:00.0: eth0: RTL8168d/8111d at 0xffffc90000c72000, 00:24:1d:72:7c:75, XID 081000c0 IRQ 44 > r8169 0000:05:00.0: eth1: RTL8168d/8111d at 0xffffc90000c70000, 00:24:1d:72:7c:77, XID 081000c0 IRQ 45 > > while it is stable with "add byte queue limit support" reverted; 3.4.5+ > was only tested with 036dafa28da1e2565a8529de2ae663c37b7a0060 reverted. > > Now testing plain 3.5.0, which still includes it, has been reliable for > almost 3 days - while the issue usually triggered within one hour (3 > hours at most) in 3.4.[0-4]. It might be a little too early to give a > definitive answer, but so far r8169/ 3.5.0 looks positive. > > Regards > Stefan Lippers-Hollmann > > [1] Message-Id: <201206290131.49150.s.L-H@gmx.de> > http://lkml.kernel.org/r/<201206290131.49150.s.L-H@gmx.de> Thats real good news, thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index d7a04e0..eb81da4 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -5380,7 +5380,6 @@ static void rtl8169_tx_clear(struct rtl8169_private *tp) { rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC); tp->cur_tx = tp->dirty_tx = 0; - netdev_reset_queue(tp->dev); } static void rtl_reset_work(struct rtl8169_private *tp) @@ -5535,8 +5534,6 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb, txd->opts2 = cpu_to_le32(opts[1]); - netdev_sent_queue(dev, skb->len); - skb_tx_timestamp(skb); wmb(); @@ -5633,16 +5630,9 @@ static void rtl8169_pcierr_interrupt(struct net_device *dev) rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING); } -struct rtl_txc { - int packets; - int bytes; -}; - static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp) { - struct rtl8169_stats *tx_stats = &tp->tx_stats; unsigned int dirty_tx, tx_left; - struct rtl_txc txc = { 0, 0 }; dirty_tx = tp->dirty_tx; smp_rmb(); @@ -5661,24 +5651,17 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp) rtl8169_unmap_tx_skb(&tp->pci_dev->dev, tx_skb, tp->TxDescArray + entry); if (status & LastFrag) { - struct sk_buff *skb = tx_skb->skb; - - txc.packets++; - txc.bytes += skb->len; - dev_kfree_skb(skb); + u64_stats_update_begin(&tp->tx_stats.syncp); + tp->tx_stats.packets++; + tp->tx_stats.bytes += tx_skb->skb->len; + u64_stats_update_end(&tp->tx_stats.syncp); + dev_kfree_skb(tx_skb->skb); tx_skb->skb = NULL; } dirty_tx++; tx_left--; } - u64_stats_update_begin(&tx_stats->syncp); - tx_stats->packets += txc.packets; - tx_stats->bytes += txc.bytes; - u64_stats_update_end(&tx_stats->syncp); - - netdev_completed_queue(dev, txc.packets, txc.bytes); - if (tp->dirty_tx != dirty_tx) { tp->dirty_tx = dirty_tx; /* Sync with rtl8169_start_xmit:
This reverts commit 036dafa28da1e2565a8529de2ae663c37b7a0060. First it appears in bisection, then reverting it solves the usual netdev watchdog problem for different people. I don't have a proper fix yet so get rid of it. Bisected-and-reported-by: Alex Villacís Lasso <a_villacis@palosanto.com> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com> Cc: Josh Boyer <jwboyer@redhat.com> Cc: Hayes Wang <hayeswang@realtek.com> --- The original 036da... commit has been modified due to the newly introduced skb_tx_timestamp in rtl8169_start_xmit. The herein included patch qualifies for 3.4-stable as well. drivers/net/ethernet/realtek/r8169.c | 27 +++++---------------------- 1 file changed, 5 insertions(+), 22 deletions(-)