diff mbox

Mark IPW2100 as BROKEN: Fatal interrupt. Scheduling firmware restart.

Message ID 20080921202057.GB25052@2ka.mipt.ru
State Not Applicable, archived
Headers show

Commit Message

Evgeniy Polyakov Sept. 21, 2008, 8:20 p.m. UTC
On Sun, Sep 21, 2008 at 12:43:32PM -0700, Arjan van de Ven (arjan@infradead.org) wrote:
> > @@ -3174,6 +3176,10 @@ static void ipw2100_irq_tasklet(struct
> > ipw2100_priv *priv) if (inta & IPW2100_INTA_FATAL_ERROR) {
> >  		printk(KERN_WARNING DRV_NAME
> >  		       ": Fatal interrupt. Scheduling firmware
> > restart.\n");
> > +		WARN_ON(1);
> > +
> > +		BUG_ON(ipw2100_max_fatal_ints-- <= 0);
> 
> BUG_ON in interrupt context is just extremely hostile, since it means
> the box is dead.
>
> also I would suggest using WARN_ON_ONCE() 

Well, I actually wanted to have a bug there because of it, but now I
think that annoying repeated warning is enough to bring attention to the
problem by putting bug information into some magic special place called
kerneloops collection.

Consider for inclusing for the upcoming kernel to get wider
notifications. Yes, it is not a bugfix, I know.

Comments

Arjan van de Ven Sept. 21, 2008, 8:27 p.m. UTC | #1
On Mon, 22 Sep 2008 00:20:57 +0400
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> On Sun, Sep 21, 2008 at 12:43:32PM -0700, Arjan van de Ven
> (arjan@infradead.org) wrote:
> > > @@ -3174,6 +3176,10 @@ static void ipw2100_irq_tasklet(struct
> > > ipw2100_priv *priv) if (inta & IPW2100_INTA_FATAL_ERROR) {
> > >  		printk(KERN_WARNING DRV_NAME
> > >  		       ": Fatal interrupt. Scheduling firmware
> > > restart.\n");
> > > +		WARN_ON(1);
> > > +
> > > +		BUG_ON(ipw2100_max_fatal_ints-- <= 0);
> > 
> > BUG_ON in interrupt context is just extremely hostile, since it
> > means the box is dead.
> >
> > also I would suggest using WARN_ON_ONCE() 
> 
> Well, I actually wanted to have a bug there because of it, but now I
> think that annoying repeated warning is enough to bring attention to
> the problem by putting bug information into some magic special place
> called kerneloops collection.

are you more interested in bringing attention than finding something
that makes the driver work ? I sort of am getting that impression and
I'd be disappointed if that is the case.


> 
> Consider for inclusing for the upcoming kernel to get wider
> notifications. Yes, it is not a bugfix, I know.

still more complex than needed; a WARN_ON_ONCE() will be enough.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 21, 2008, 8:57 p.m. UTC | #2
On Sun, Sep 21, 2008 at 01:27:53PM -0700, Arjan van de Ven (arjan@infradead.org) wrote:
> > Well, I actually wanted to have a bug there because of it, but now I
> > think that annoying repeated warning is enough to bring attention to
> > the problem by putting bug information into some magic special place
> > called kerneloops collection.
> 
> are you more interested in bringing attention than finding something
> that makes the driver work ? I sort of am getting that impression and
> I'd be disappointed if that is the case.
 
I do think that it can not be fixed without serious intervention of the
Intel (hardware) folks, since bug exists more than 4 years in two
firmwares and lots of very different driver versions and was reproduced
even on 2.4 kernel.

I will experiment with reloading issues as Alan suggested and to
add/remove more surgery into initialization process to be allowed to
'workaround' the issue, since it looks noone else will.

But that's definitely not a fix and in my personal workaround's 10
degrees shit'o'meter this lies around 12.
 
> > Consider for inclusing for the upcoming kernel to get wider
> > notifications. Yes, it is not a bugfix, I know.
> 
> still more complex than needed; a WARN_ON_ONCE() will be enough.

That allows to dump whatever number of warnings you want. The more we
have, the louder will be customers scream.
Arjan van de Ven Sept. 21, 2008, 9:02 p.m. UTC | #3
On Mon, 22 Sep 2008 00:57:06 +0400
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> > > Consider for inclusing for the upcoming kernel to get wider
> > > notifications. Yes, it is not a bugfix, I know.
> > 
> > still more complex than needed; a WARN_ON_ONCE() will be enough.
> 
> That allows to dump whatever number of warnings you want. The more we
> have, the louder will be customers scream.

artificially increasing numbers isn't going to do that; it just shows
you're more interested in making a stink than in getting something
improved ;(

>
Evgeniy Polyakov Sept. 21, 2008, 9:05 p.m. UTC | #4
On Sun, Sep 21, 2008 at 02:02:03PM -0700, Arjan van de Ven (arjan@infradead.org) wrote:
> > That allows to dump whatever number of warnings you want. The more we
> > have, the louder will be customers scream.
> 
> artificially increasing numbers isn't going to do that; it just shows
> you're more interested in making a stink than in getting something
> improved ;(

As practice shows, I'm the only one who is interested in getting
something improved, and Intel, as we see right now, is not interested in
it at all, since you ask me not only decrease error verbosity, but also
do not work towards fixing the bug by trying to understand where it
lives.
Arjan van de Ven Sept. 21, 2008, 9:14 p.m. UTC | #5
On Mon, 22 Sep 2008 01:05:55 +0400
Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote:

> On Sun, Sep 21, 2008 at 02:02:03PM -0700, Arjan van de Ven
> (arjan@infradead.org) wrote:
> > > That allows to dump whatever number of warnings you want. The
> > > more we have, the louder will be customers scream.
> > 
> > artificially increasing numbers isn't going to do that; it just
> > shows you're more interested in making a stink than in getting
> > something improved ;(
> 
> As practice shows, I'm the only one who is interested in getting
> something improved, and Intel, as we see right now, is not interested
> in it at all, since you ask me not only decrease error verbosity, but
> also do not work towards fixing the bug by trying to understand where
> it lives.

I did no such thing and you know it.

I'm sorry, I'm not going to waste time on this if you keep acting
this dishonest; welcome to my mail filter...
Denys Fedoryshchenko Sept. 21, 2008, 9:43 p.m. UTC | #6
On Monday 22 September 2008, Evgeniy Polyakov wrote:
> On Sun, Sep 21, 2008 at 02:02:03PM -0700, Arjan van de Ven 
(arjan@infradead.org) wrote:
> > > That allows to dump whatever number of warnings you want. The more we
> > > have, the louder will be customers scream.
> >
> > artificially increasing numbers isn't going to do that; it just shows
> > you're more interested in making a stink than in getting something
> > improved ;(
>
> As practice shows, I'm the only one who is interested in getting
> something improved, and Intel, as we see right now, is not interested in
> it at all, since you ask me not only decrease error verbosity, but also
> do not work towards fixing the bug by trying to understand where it
> lives.

You are not right. I had totaly disfunctional Intel driver on two laptops and 
reported about issue to Intel. Yes it took time, they took all debugs and 
went coma mode (i was thinking that), but suddently i got mail from them, and 
next kernel/firmware release worked for me flawlessly. So they did perfect 
job.

Don't be negative and prepare yourself for giving long debug outputs. Patience 
and only patience.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 21, 2008, 10:07 p.m. UTC | #7
On Mon, Sep 22, 2008 at 12:43:07AM +0300, Denys Fedoryshchenko (denys@visp.net.lb) wrote:
> You are not right. I had totaly disfunctional Intel driver on two laptops and 
> reported about issue to Intel. Yes it took time, they took all debugs and 
> went coma mode (i was thinking that), but suddently i got mail from them, and 
> next kernel/firmware release worked for me flawlessly. So they did perfect 
> job.

I'm talking about current situation.

> Don't be negative and prepare yourself for giving long debug outputs. Patience 
> and only patience.

Just to be clear, did it take 4 years? :)

Anyway, I already made conclusions, as probably others: I will
experiment with different 'workarounds' for this bug, maybe I will
succeed, maybe Intel will decided to fix it, maybe LHC will crash the
world. Verbose warning about the bug was frowned upon, so its up to uses
to make a progress here...
Denys Fedoryshchenko Sept. 21, 2008, 10:15 p.m. UTC | #8
> Just to be clear, did it take 4 years? :)
Any bugzilla entry? I cannot find on 
http://www.intellinuxwireless.org/bugzilla/ anything about this bug.

I submit two reports in my case, one in kernel bugzilla, one in intel linux 
wireless project..
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Cox Sept. 21, 2008, 10:38 p.m. UTC | #9
> > still more complex than needed; a WARN_ON_ONCE() will be enough.
> 
> That allows to dump whatever number of warnings you want. The more we
> have, the louder will be customers scream.

But if Intel don't care then you can scream all you like 8)

A WARN_ON_ONCE is sufficient to capture an idea of how many people it is
effecting and maybe to figure out what the trigger is from their reports,
at that point there is some chance to get it fixed (especially if its
remotely triggerable ;))

Alan
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marcel Holtmann Sept. 21, 2008, 11:27 p.m. UTC | #10
Hi Evgeniy,

> > > That allows to dump whatever number of warnings you want. The more we
> > > have, the louder will be customers scream.
> > 
> > artificially increasing numbers isn't going to do that; it just shows
> > you're more interested in making a stink than in getting something
> > improved ;(
> 
> As practice shows, I'm the only one who is interested in getting
> something improved, and Intel, as we see right now, is not interested in
> it at all, since you ask me not only decrease error verbosity, but also
> do not work towards fixing the bug by trying to understand where it
> lives.

as Arjan and Alan pointed out already, WARN_ON_ONCE is enough and I
agree with them. Just to make this perfectly clear, this is with my
community hat on.

Please send a proper patch with a simple WARN_ON_ONCE and I am happy to
sign off on it.

Regards

Marcel


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 21, 2008, 11:44 p.m. UTC | #11
On Sun, Sep 21, 2008 at 11:38:19PM +0100, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
> > > still more complex than needed; a WARN_ON_ONCE() will be enough.
> > 
> > That allows to dump whatever number of warnings you want. The more we
> > have, the louder will be customers scream.
> 
> But if Intel don't care then you can scream all you like 8)

That's what happens :)

> A WARN_ON_ONCE is sufficient to capture an idea of how many people it is
> effecting and maybe to figure out what the trigger is from their reports,
> at that point there is some chance to get it fixed (especially if its
> remotely triggerable ;))

Well, redhat, suse and ubuntu bugzillas happend to be not enough. Why do
you believe a single warning at a new place will be? or couple of tens
or whatever else? If it cares, it cares. If it does not...

I attracted vendor's attention, vendor told me to fix it myself and to
create a patch to fill an entry in another 'bugzilla', so that vendor
could get results and probably decide to walk down from the cloud and
fix it.

So, if they do not care, I do not care about their care. That's the
deal. I will try to find a workaround, even if it is a real crap,
fortunately other users will not strike this bug too frequently.
Evgeniy Polyakov Sept. 21, 2008, 11:46 p.m. UTC | #12
On Mon, Sep 22, 2008 at 01:15:59AM +0300, Denys Fedoryshchenko (denys@visp.net.lb) wrote:
> > Just to be clear, did it take 4 years? :)
> Any bugzilla entry? I cannot find on 
> http://www.intellinuxwireless.org/bugzilla/ anything about this bug.
> 
> I submit two reports in my case, one in kernel bugzilla, one in intel linux 
> wireless project..

Lucky you :)
David Miller Sept. 21, 2008, 11:48 p.m. UTC | #13
Evgeniy, you're bordering on being an asshole, if not actually
being one.

If you behaved this way for a bug I was responsible for, I would
absolutely ignore you until you settled down and started to behave
more reasonably.

You're acting like a bomb which is about to explode, which is probably
why the actual Intel maintainers for this driver don't want to touch
you with a ten foot pole.  You're being volatile and extremely
unpleasant to interact with about this issue.

The Intel folks replying to you right now are general Intel linux
folks who are trying to help you, not the driver maintainers who can
look into the firmware and attack that angle.  So give them A FUCKING
BREAK!

Getting the OOPS to kerneloops.org is the way forward and will help
your cause, whether you believe it or not.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Evgeniy Polyakov Sept. 22, 2008, 12:18 a.m. UTC | #14
On Sun, Sep 21, 2008 at 04:48:29PM -0700, David Miller (davem@davemloft.net) wrote:
> Evgeniy, you're bordering on being an asshole, if not actually
> being one.

Out of curiosity, what's worse: being an asshole and pretend to be good
or vice versa? Whatever... :)

> If you behaved this way for a bug I was responsible for, I would
> absolutely ignore you until you settled down and started to behave
> more reasonably.

That's the main point: 'until you started to behave more reasonably'.
For example filling another bug in rh/suse/ubuntu bugzilla?
Put yourself to the user's place, and suddenly picture changes
dramatically.

We got some progress on this bug, at least there is direct suggestion
from Matthew about power state, if it will fix the issue, I think it is
a good deal: one bug fix for lot of users for the mail in the killfile
and a worsened 'reputation'.

I provded a patch like Arjan wanted, and it can only change something
because of all this talks I started being an asshole. In my opinion.
Maybe there were some other ways around, but it looks like being a
provocative is the only way to get to the cloud. Who knows :)
Bill Davidsen Sept. 22, 2008, 2:22 p.m. UTC | #15
Evgeniy Polyakov wrote:
> On Sun, Sep 21, 2008 at 11:38:19PM +0100, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote:
>>>> still more complex than needed; a WARN_ON_ONCE() will be enough.
>>> That allows to dump whatever number of warnings you want. The more we
>>> have, the louder will be customers scream.
>> But if Intel don't care then you can scream all you like 8)
> 
> That's what happens :)
> 
>> A WARN_ON_ONCE is sufficient to capture an idea of how many people it is
>> effecting and maybe to figure out what the trigger is from their reports,
>> at that point there is some chance to get it fixed (especially if its
>> remotely triggerable ;))
> 
> Well, redhat, suse and ubuntu bugzillas happend to be not enough. Why do
> you believe a single warning at a new place will be? or couple of tens
> or whatever else? If it cares, it cares. If it does not...
> 
Has it occurred to you that YOU have a problem on YOUR maschine, and that your 
patch would kill wireless for all the people who have the hardware on working 
systems? My experience was somewhat like Denys' except I got no notice, I just 
found that after an update the wireless worked solidly, and continued to do so 
until that laptop because obsolete and slow, and went to live with one of the 
teens in my family.

> I attracted vendor's attention, vendor told me to fix it myself and to
> create a patch to fill an entry in another 'bugzilla', so that vendor
> could get results and probably decide to walk down from the cloud and
> fix it.
> 
It would be good to gather data rather than claim it doesn't work, because for 
some set of machines it certainly does.

> So, if they do not care, I do not care about their care. That's the
> deal. I will try to find a workaround, even if it is a real crap,
> fortunately other users will not strike this bug too frequently.
> 
My experience with laptops has been that you fiddle with power saving, and more, 
and more, until you find the tricks which make the laptop save power by 
disabling something you need, like network or display. At least that's been both 
my practice and observation, that not every machine responds well to every power 
saving trick.

Have you checked for a BIOS update for the machine? Tried disabling all power 
saving settings and seeing if that changes the problem? I would normally assume 
you have, but you seem convinced that the bug is in the firmware and you're 
going to get it fixed. It may be a firmware bug, but if something in your system 
is triggering it, and most people don't have the problem, you might investigate 
a solution other than beating on Intel.
diff mbox

Patch

diff --git a/drivers/net/wireless/ipw2100.c b/drivers/net/wireless/ipw2100.c
index 19a401c..6599211 100644
--- a/drivers/net/wireless/ipw2100.c
+++ b/drivers/net/wireless/ipw2100.c
@@ -206,6 +206,9 @@  MODULE_PARM_DESC(disable, "manually disable the radio (default 0 [radio on])");
 
 static u32 ipw2100_debug_level = IPW_DL_NONE;
 
+static int ipw2100_max_fatal_ints = 10;
+module_param(ipw2100_max_fatal_ints, int, 0644);
+
 #ifdef CONFIG_IPW2100_DEBUG
 #define IPW_DEBUG(level, message...) \
 do { \
@@ -3174,6 +3177,9 @@  static void ipw2100_irq_tasklet(struct ipw2100_priv *priv)
 	if (inta & IPW2100_INTA_FATAL_ERROR) {
 		printk(KERN_WARNING DRV_NAME
 		       ": Fatal interrupt. Scheduling firmware restart.\n");
+
+		WARN_ON(ipw2100_max_fatal_ints-- >= 0);
+
 		priv->inta_other++;
 		write_register(dev, IPW_REG_INTA, IPW2100_INTA_FATAL_ERROR);