diff mbox

Mark IPW2100 as BROKEN: Fatal interrupt. Scheduling firmware restart.

Message ID 20080921193809.GA8735@2ka.mipt.ru
State Not Applicable, archived
Headers show

Commit Message

Evgeniy Polyakov Sept. 21, 2008, 7:38 p.m. UTC
Hi.

On Sun, Sep 21, 2008 at 09:14:04PM +0200, Johannes Berg (johannes@sipsolutions.net) wrote:
> > Do you want me to implement ipw2100 driver as a big work structure
> > which will run ipw2100_init()/wait/ipw2100_exit() in a loop?
> > And that will be the fix suggested by Intel? That would explain a lot.
> 
> I think what Arjan is saying is that it would be better to put pressure
> on the responsible folks (I don't think Arjan is anywhere near them at

Both maintainers were added to the copy list.

> all) if you'd put in a WARN_ON() for this error and that would make the
> top entry on kerneloops.org all the time... And additionally put in a
> workaround for yourself for now.

As I pointed, I can rewrite the whole driver's initialization process,
so that it looked like init/wait/exit loop, which can be processed at
the module load and when fatal interrupt fires. Do this a fix? This is
not even a remotely workaround. We can just add
rmmod/modprobe/ifdown/ifup to the crontab job. Another users reported in
bugzilla that they needed to reboot a machine to make card working
again. I'm not sure that user tried to do a rmmod/modprobe though.

> And can we keep the flames off this list please? That comment from Wei
> Weng was absolutely uncalled for, and inciting a flamewar (as you have
> already blogged) was not really productive either.

If we will keep silence, no one will notice that problem exists.

I do hope this will result in a progress. Arjan, do you aggree to add
this patch to the current tree?

Comments

Cyrill Gorcunov Sept. 21, 2008, 8:05 p.m. UTC | #1
[Evgeniy Polyakov - Sun, Sep 21, 2008 at 11:38:09PM +0400]
| Hi.
| 
| On Sun, Sep 21, 2008 at 09:14:04PM +0200, Johannes Berg (johannes@sipsolutions.net) wrote:
| > > Do you want me to implement ipw2100 driver as a big work structure
| > > which will run ipw2100_init()/wait/ipw2100_exit() in a loop?
| > > And that will be the fix suggested by Intel? That would explain a lot.
| > 
| > I think what Arjan is saying is that it would be better to put pressure
| > on the responsible folks (I don't think Arjan is anywhere near them at
| 
| Both maintainers were added to the copy list.
| 
| > all) if you'd put in a WARN_ON() for this error and that would make the
| > top entry on kerneloops.org all the time... And additionally put in a
| > workaround for yourself for now.
| 
| As I pointed, I can rewrite the whole driver's initialization process,
| so that it looked like init/wait/exit loop, which can be processed at
| the module load and when fatal interrupt fires. Do this a fix? This is
| not even a remotely workaround. We can just add
| rmmod/modprobe/ifdown/ifup to the crontab job. Another users reported in
| bugzilla that they needed to reboot a machine to make card working
| again. I'm not sure that user tried to do a rmmod/modprobe though.
| 
| > And can we keep the flames off this list please? That comment from Wei
| > Weng was absolutely uncalled for, and inciting a flamewar (as you have
| > already blogged) was not really productive either.
| 
| If we will keep silence, no one will notice that problem exists.
| 
| I do hope this will result in a progress. Arjan, do you aggree to add
| this patch to the current tree?
| 
| diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
| index 19a401c..9a7b64c 100644
| --- a/drivers/net/wireless/ipw2100.c
| +++ b/drivers/net/wireless/ipw2100.c
| @@ -206,6 +206,8 @@ MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
|  
|  static u32 ipw2100_debug_level = IPW_DL_NONE;
|  
| +static int ipw2100_max_fatal_ints = 10;
| +
|  #ifdef CONFIG_IPW2100_DEBUG
|  #define IPW_DEBUG(level, message...) \
|  do { \
| @@ -3174,6 +3176,10 @@ static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
|  	if (inta & IPW2100_INTA_FATAL_ERROR) {
|  		printk(KERN_WARNING DRV_NAME
|  		       ": Fatal interrupt. Scheduling firmware restart.\n");
| +		WARN_ON(1);
| +
| +		BUG_ON(ipw2100_max_fatal_ints-- <= 0);
| +
|  		priv->inta_other++;
|  		write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);
|  
| 
| 
| -- 
| 	Evgeniy Polyakov
| 

Since it's that serious maybe we should change

		IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n",
			       priv->net_dev->name, priv->fatal_error);

to printk(KERN_WARN)? And here is why - as I see now we can't say what
exactly is wrong - Evgeniy said he has a suspicious about firmware so
this WARNS will be collected by Arjan thru kerneloops and we could not
ask users to change debug level and repost problem - oops will have it
by default - and if it really firmware problem - firmware engineers could
find this _additional_ info usefull and resolve it (probably).

		- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 21, 2008, 8:26 p.m. UTC | #2
On Mon, Sep 22, 2008 at 12:05:18AM +0400, Cyrill Gorcunov (gorcunov@gmail.com) wrote:
> Since it's that serious maybe we should change
> 
> 		IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n",
> 			       priv->net_dev->name, priv->fatal_error);
> 
> to printk(KERN_WARN)? And here is why - as I see now we can't say what
> exactly is wrong - Evgeniy said he has a suspicious about firmware so
> this WARNS will be collected by Arjan thru kerneloops and we could not
> ask users to change debug level and repost problem - oops will have it
> by default - and if it really firmware problem - firmware engineers could
> find this _additional_ info usefull and resolve it (probably).

The only reason for this change is to make a mark at the kerneloops.
I.e. users know, there is a bug. Developers know, there is a bug.
Everyone knows that there is a bug, but until it is at the special place
we look to each other just like there is no bug.

Here are dumps for example:
http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=245

Bug existed even with 1.2 firmware and .11 kernel.
Intel, that's a great marketing slogan: stability everywhere!
Cyrill Gorcunov Sept. 21, 2008, 8:35 p.m. UTC | #3
[Evgeniy Polyakov - Mon, Sep 22, 2008 at 12:26:56AM +0400]
| On Mon, Sep 22, 2008 at 12:05:18AM +0400, Cyrill Gorcunov (gorcunov@gmail.com) wrote:
| > Since it's that serious maybe we should change
| > 
| > 		IPW_DEBUG_INFO("%s: Fatal error value: 0x%08X\n",
| > 			       priv->net_dev->name, priv->fatal_error);
| > 
| > to printk(KERN_WARN)? And here is why - as I see now we can't say what
| > exactly is wrong - Evgeniy said he has a suspicious about firmware so
| > this WARNS will be collected by Arjan thru kerneloops and we could not
| > ask users to change debug level and repost problem - oops will have it
| > by default - and if it really firmware problem - firmware engineers could
| > find this _additional_ info usefull and resolve it (probably).
| 
| The only reason for this change is to make a mark at the kerneloops.
| I.e. users know, there is a bug. Developers know, there is a bug.
| Everyone knows that there is a bug, but until it is at the special place
| we look to each other just like there is no bug.
| 
| Here are dumps for example:
| http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=245
| 
| Bug existed even with 1.2 firmware and .11 kernel.
| Intel, that's a great marketing slogan: stability everywhere!
| 
| -- 
| 	Evgeniy Polyakov
| 

From dump:
> Sep 25 11:31:39 suino wlan0: TX timed out.  Scheduling firmware restart.

yes Evgeniy - all could know that but this register info could help
firmware engineers to distinguish problems (without additional efforts
like ask users to pass debug argument - kerneloops will have it
by default) if there not only one exist. I mean I don't think anyone
would reject additional info about problem ever :)

		- Cyrill -
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..9a7b64c 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -206,6 +206,8 @@  MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
 
 static u32 ipw2100_debug_level = IPW_DL_NONE;
 
+static int ipw2100_max_fatal_ints = 10;
+
 #ifdef CONFIG_IPW2100_DEBUG
 #define IPW_DEBUG(level, message...) \
 do { \
@@ -3174,6 +3176,10 @@  static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
 	if (inta & IPW2100_INTA_FATAL_ERROR) {
 		printk(KERN_WARNING DRV_NAME
 		       ": Fatal interrupt. Scheduling firmware restart.\n");
+		WARN_ON(1);
+
+		BUG_ON(ipw2100_max_fatal_ints-- <= 0);
+
 		priv->inta_other++;
 		write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);