Message ID | 4B4A729E.9060805@gmail.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > Rafael J. Wysocki wrote: > > This message has been generated automatically as a part of a report > > of recent regressions. > > > > The following bug entry is on the current list of known regressions > > from 2.6.32. Please verify if it still should be listed and let me know > > (either way). BTW, I don't know why Berck didn't experience such a panic before 2.6.32, but seems not a regression to me. There might be new/more sky2 TX timeouts which trigger this panic and would make a real regression. Cheers, Jarek P. > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14925 > > Subject : sky2 panic under load > > Submitter : Berck E. Nash <flyboy@gmail.com> > > Date : 2009-12-21 23:52 (21 days old) > > References : http://marc.info/?l=linux-kernel&m=126143955730347&w=4 > > http://marc.info/?l=linux-kernel&m=126160893126548&w=4 > > Handled-By : Stephen Hemminger <shemminger@vyatta.com> > > The patch attached to the bug report did not fix the problem, but I'm > fairly certain that this one from Jarek P. did: > > During TX timeout procedure dev could be awoken too early, e.g. by > sky2_complete_tx() called from sky2_down(). Then sky2_xmit_frame() > can run while buffers are freed causing an oops. This patch fixes it > by adding netif_device_present() test in sky2_tx_complete(). > > Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=14925 > > With debugging by: Mike McCormack <mikem@ring3k.org> > > Reported-by: Berck E. Nash <flyboy@gmail.com> > Tested-by: Berck E. Nash <flyboy@gmail.com> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > --- > > drivers/net/sky2.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c > index 1c01b96..7650f73 100644 > --- a/drivers/net/sky2.c > +++ b/drivers/net/sky2.c > @@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port > *sky2, u16 done) > sky2->tx_cons = idx; > smp_mb(); > > - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) > + /* Wake unless it's detached, and called e.g. from sky2_down() */ > + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) > netif_wake_queue(dev); > } > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 11 January 2010, Jarek Poplawski wrote: > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > Rafael J. Wysocki wrote: > > > This message has been generated automatically as a part of a report > > > of recent regressions. > > > > > > The following bug entry is on the current list of known regressions > > > from 2.6.32. Please verify if it still should be listed and let me know > > > (either way). > > BTW, I don't know why Berck didn't experience such a panic before > 2.6.32, but seems not a regression to me. There might be new/more sky2 > TX timeouts which trigger this panic and would make a real regression. Even if the code has always been broken, but it's only become visible after 2.6.32, that still counts as a regression IMO, because now the users are affected who weren't before. Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: > On Monday 11 January 2010, Jarek Poplawski wrote: > > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > > Rafael J. Wysocki wrote: > > > > This message has been generated automatically as a part of a report > > > > of recent regressions. > > > > > > > > The following bug entry is on the current list of known regressions > > > > from 2.6.32. Please verify if it still should be listed and let me know > > > > (either way). > > > > BTW, I don't know why Berck didn't experience such a panic before > > 2.6.32, but seems not a regression to me. There might be new/more sky2 > > TX timeouts which trigger this panic and would make a real regression. > > Even if the code has always been broken, but it's only become visible after > 2.6.32, that still counts as a regression IMO, because now the users are > affected who weren't before. Right, but: 1) someone with a similar but older problem might be mislead a fix is not for them; 2) someone with exactly this one problem (i.e. Berck ;-) might be mislead "no oops" is enough, while their linux might be still worse than before. (So I intended Berck to re-consider or even re-check this problem wrt. 2.6.31, and maybe even reporting another regression.) Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Monday 11 January 2010, Jarek Poplawski wrote: > On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: > > On Monday 11 January 2010, Jarek Poplawski wrote: > > > On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: > > > > Rafael J. Wysocki wrote: > > > > > This message has been generated automatically as a part of a report > > > > > of recent regressions. > > > > > > > > > > The following bug entry is on the current list of known regressions > > > > > from 2.6.32. Please verify if it still should be listed and let me know > > > > > (either way). > > > > > > BTW, I don't know why Berck didn't experience such a panic before > > > 2.6.32, but seems not a regression to me. There might be new/more sky2 > > > TX timeouts which trigger this panic and would make a real regression. > > > > Even if the code has always been broken, but it's only become visible after > > 2.6.32, that still counts as a regression IMO, because now the users are > > affected who weren't before. > > Right, but: > 1) someone with a similar but older problem might be mislead a fix is > not for them; Not if the fix changelog mentions the older kernels explicitly. > 2) someone with exactly this one problem (i.e. Berck ;-) might be > mislead "no oops" is enough, while their linux might be still worse > than before. (So I intended Berck to re-consider or even re-check > this problem wrt. 2.6.31, and maybe even reporting another > regression.) Yes, rechecking the problem with 2.6.31 would be useful. Rafael -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jarek Poplawski wrote: > On Mon, Jan 11, 2010 at 08:32:24PM +0100, Rafael J. Wysocki wrote: >> On Monday 11 January 2010, Jarek Poplawski wrote: >>> On Sun, Jan 10, 2010 at 05:36:46PM -0700, Berck E. Nash wrote: >>>> Rafael J. Wysocki wrote: >>>>> This message has been generated automatically as a part of a report >>>>> of recent regressions. >>>>> >>>>> The following bug entry is on the current list of known regressions >>>>> from 2.6.32. Please verify if it still should be listed and let me know >>>>> (either way). >>> BTW, I don't know why Berck didn't experience such a panic before >>> 2.6.32, but seems not a regression to me. There might be new/more sky2 >>> TX timeouts which trigger this panic and would make a real regression. >> Even if the code has always been broken, but it's only become visible after >> 2.6.32, that still counts as a regression IMO, because now the users are >> affected who weren't before. > > Right, but: > 1) someone with a similar but older problem might be mislead a fix is > not for them; > 2) someone with exactly this one problem (i.e. Berck ;-) might be > mislead "no oops" is enough, while their linux might be still worse > than before. (So I intended Berck to re-consider or even re-check > this problem wrt. 2.6.31, and maybe even reporting another > regression.) Well, the problem with this bug is how hard it is for me to reproduce. I'm willing to admit that just because I never got it before 2.6.32 isn't proof that it wasn't there in 2.6.31. But it's a regression somewhere along the line, since I've been using this hardware for over 3 years now. There were lots of bugs in the sky2 driver years ago, but for the last 2+ years or so, I haven't had any trouble at all until now. The bug only shows up for me with bittorrent traffic. I also use the same adapter to transfer backups over the network from several computers, and that doesn't trigger it... I used 2.6.31 for however long it was the current stable, and I never got a crash with it. After I got several crashes in 2.6.32, I reverted to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32 regression, but I can't prove it. I would love to do more testing, but since I can't reproduce the bug at will, I'm not really sure what to offer? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 11, 2010 at 02:02:40PM -0700, Berck E. Nash wrote: > Jarek Poplawski wrote: > > 2) someone with exactly this one problem (i.e. Berck ;-) might be > > mislead "no oops" is enough, while their linux might be still worse > > than before. (So I intended Berck to re-consider or even re-check > > this problem wrt. 2.6.31, and maybe even reporting another > > regression.) > > Well, the problem with this bug is how hard it is for me to reproduce. > I'm willing to admit that just because I never got it before 2.6.32 > isn't proof that it wasn't there in 2.6.31. But it's a regression > somewhere along the line, since I've been using this hardware for over 3 > years now. There were lots of bugs in the sky2 driver years ago, but > for the last 2+ years or so, I haven't had any trouble at all until now. > > The bug only shows up for me with bittorrent traffic. I also use the > same adapter to transfer backups over the network from several > computers, and that doesn't trigger it... > > I used 2.6.31 for however long it was the current stable, and I never > got a crash with it. After I got several crashes in 2.6.32, I reverted > to 2.6.31 until Jarek sent this patch that seems to have fixed it. I've > never gotten it to crash in 2.6.31, so I'm pretty sure it's a 2.6.32 > regression, but I can't prove it. > > I would love to do more testing, but since I can't reproduce the bug at > will, I'm not really sure what to offer? I mainly thought about looking at the logs during such bittorrent to compare if there is any visible change in "sky2 eth0: disabling interface"/"sky2 eth0: enabling interface" etc. frequency between 2.6.31 and 2.6.32. Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c index 1c01b96..7650f73 100644 --- a/drivers/net/sky2.c +++ b/drivers/net/sky2.c @@ -1844,7 +1844,8 @@ static void sky2_tx_complete(struct sky2_port *sky2, u16 done) sky2->tx_cons = idx; smp_mb(); - if (tx_avail(sky2) > MAX_SKB_TX_LE + 4) + /* Wake unless it's detached, and called e.g. from sky2_down() */ + if (tx_avail(sky2) > MAX_SKB_TX_LE + 4 && netif_device_present(dev)) netif_wake_queue(dev); }