Message ID | 20080921172316.GA6306@2ka.mipt.ru |
---|---|
State | Not Applicable, archived |
Headers | show |
On Sunday 21 September 2008 19:23:17 Evgeniy Polyakov wrote: > Hi. > > Following bug exists in the ipw2100 driver/firmware for years and Intel > folks never responded to zillions bugzilla entries and forum notices in > the internet with some patch or firmware update (although did request > dmesg and debug info, and received them). > > ipw2100: Fatal interrupt. Scheduling firmware restart. > > I believe it is a firmware bug because after driver is unloaded and > loaded back again wireless adapter usually starts working (for small > amount of time though). My conspiracy feeling can suggest, that it may > be kind of a force to buy a new one, or trivial error in the firmware, > when it writes to the same place in the flash and essentially given cell > became dead or whatever else. > > Intel folks, please fix this problem, I see no other way to force you to do > this than to mark ipw2100 driver as broken, since that is what it is. You are pretty funny, actually. :) I think the bug should be fixed, but what makes _you_ think you can _force_ anybody to do so? > Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> > > diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig > index 9931b5a..c24fc6a 100644 > --- a/drivers/net/wireless/Kconfig > +++ b/drivers/net/wireless/Kconfig > @@ -125,7 +125,7 @@ config PCMCIA_RAYCS > > config IPW2100 > tristate "Intel PRO/Wireless 2100 Network Connection" > - depends on PCI && WLAN_80211 > + depends on PCI && WLAN_80211 && BROKEN > select WIRELESS_EXT > select FW_LOADER > select IEEE80211 >
On Sun, Sep 21, 2008 at 07:36:00PM +0200, Michael Buesch (mb@bu3sch.de) wrote: > > Intel folks, please fix this problem, I see no other way to force you to do > > this than to mark ipw2100 driver as broken, since that is what it is. > > You are pretty funny, actually. :) > > I think the bug should be fixed, but what makes _you_ think you can _force_ > anybody to do so? Maybe because I bought that adapter and it stopped working and Intel knows about this bug and does not fix it for years?
On Sun, 21 Sep 2008 21:23:17 +0400 Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: > Hi. > > Following bug exists in the ipw2100 driver/firmware for years and > Intel folks never responded to zillions bugzilla entries and forum > notices in the internet with some patch or firmware update (although > did request dmesg and debug info, and received them). > > ipw2100: Fatal interrupt. Scheduling firmware restart. > > I believe it is a firmware bug because after driver is unloaded and > loaded back again wireless adapter usually starts working (for small > amount of time though). \ > diff --git a/drivers/net/wireless/Kconfig > b/drivers/net/wireless/Kconfig index 9931b5a..c24fc6a 100644 > --- a/drivers/net/wireless/Kconfig > +++ b/drivers/net/wireless/Kconfig > @@ -125,7 +125,7 @@ config PCMCIA_RAYCS > > config IPW2100 > tristate "Intel PRO/Wireless 2100 Network Connection" > - depends on PCI && WLAN_80211 > + depends on PCI && WLAN_80211 && BROKEN > select WIRELESS_EXT > select FW_LOADER > select IEEE80211 > so now you go from an occasional burp to having nothing at all. How about you run with this patch on your own machine only? or.. since you say a reload of the driver fixes it.. why don't you make a patch for the driver that does basically the actions of a reload automatically when the driver detects the issue? (and stick a WARN_ON in for good measure so that kerneloops.org can start tracking these burps)
On Sun, Sep 21, 2008 at 11:04:22AM -0700, Arjan van de Ven (arjan@infradead.org) wrote: > so now you go from an occasional burp to having nothing at all. > How about you run with this patch on your own machine only? And how else user can get attention to the problem which is not fixed by the vendor? We close our eyes and there is no problem, since we do not see it. I just brought a lamp: no user can see that essentially driver is broken so he can run it on his own risk. > or.. since you say a reload of the driver fixes it.. why don't you make > a patch for the driver that does basically the actions of a reload > automatically when the driver detects the issue? > (and stick a WARN_ON in for good measure so that kerneloops.org can > start tracking these burps) It stops after several seconds (or packets?). Sometimes (but rarely) it works several minutes, sometimes it fires above dmesg line and continues to work, sometimes it fires it for a while and then stops writing it, although driver does not send or receive anything (at least ifconfig counters do not change). Actually, I do not think it is a driver problem, since what it does is pretty much straightforward, but if you will tell me how else can we fix this issue, I will print it and glue near the window so this gotcha could be used with other problems. Or you can (as everyone else who do) just said that this is damn wrong and forget about problem for the next several years.
On Sun, 21 Sep 2008 22:28:38 +0400 Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: > On Sun, Sep 21, 2008 at 11:04:22AM -0700, Arjan van de Ven > (arjan@infradead.org) wrote: > > so now you go from an occasional burp to having nothing at all. > > How about you run with this patch on your own machine only? > > And how else user can get attention to the problem which is not fixed > by the vendor? ... or the community > We close our eyes and there is no problem, since we do > not see it. I just brought a lamp: no user can see that essentially > driver is broken so he can run it on his own risk. > > > or.. since you say a reload of the driver fixes it.. why don't you > > make a patch for the driver that does basically the actions of a > > reload automatically when the driver detects the issue? > > (and stick a WARN_ON in for good measure so that kerneloops.org can > > start tracking these burps) > > It stops after several seconds (or packets?). Sometimes (but rarely) > it works several minutes, sometimes it fires above dmesg line and > continues to work, sometimes it fires it for a while and then stops > writing it, although driver does not send or receive anything (at > least ifconfig counters do not change). again.. so how about you detect this condition and do, in the driver code, the equivalent of rmmod/insmod to the hardware. I'm sure people who have the hardware would appreciate that type of patch a lot more than the one you sent out.
Arjan van de Ven wrote: > On Sun, 21 Sep 2008 22:28:38 +0400 > Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: > >> On Sun, Sep 21, 2008 at 11:04:22AM -0700, Arjan van de Ven >> (arjan@infradead.org) wrote: >>> so now you go from an occasional burp to having nothing at all. >>> How about you run with this patch on your own machine only? >> And how else user can get attention to the problem which is not fixed >> by the vendor? > > ... or the community > >> We close our eyes and there is no problem, since we do >> not see it. I just brought a lamp: no user can see that essentially >> driver is broken so he can run it on his own risk. >> >>> or.. since you say a reload of the driver fixes it.. why don't you >>> make a patch for the driver that does basically the actions of a >>> reload automatically when the driver detects the issue? >>> (and stick a WARN_ON in for good measure so that kerneloops.org can >>> start tracking these burps) >> It stops after several seconds (or packets?). Sometimes (but rarely) >> it works several minutes, sometimes it fires above dmesg line and >> continues to work, sometimes it fires it for a while and then stops >> writing it, although driver does not send or receive anything (at >> least ifconfig counters do not change). > > again.. so how about you detect this condition and do, in the driver > code, the equivalent of rmmod/insmod to the hardware. I'm sure people > who have the hardware would appreciate that type of patch a lot more > than the one you sent out. > I guess it is your way of "middle finger" to all the IPW2100 customers who try to use it on a Linux machine. Thanks Wei -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 21, 2008 at 11:35:13AM -0700, Arjan van de Ven (arjan@infradead.org) wrote: > > And how else user can get attention to the problem which is not fixed > > by the vendor? > > ... or the community Which does not have access to the firmware... Which IMO is failing and not the driver itself. > > It stops after several seconds (or packets?). Sometimes (but rarely) > > it works several minutes, sometimes it fires above dmesg line and > > continues to work, sometimes it fires it for a while and then stops > > writing it, although driver does not send or receive anything (at > > least ifconfig counters do not change). > > again.. so how about you detect this condition and do, in the driver > code, the equivalent of rmmod/insmod to the hardware. I'm sure people > who have the hardware would appreciate that type of patch a lot more > than the one you sent out. Reset task does efectively ipw2100_up(), so the difference is power cycles over the PCI bus and enable/disable/request commands. Like this stuff: /* We disable the RETRY_TIMEOUT register (0x41) to keep * PCI Tx retries from interfering with C3 CPU state */ pci_read_config_dword(pci_dev, 0x40, &val); if ((val & 0x0000ff00) != 0) pci_write_config_dword(pci_dev, 0x40, val & 0xffff00ff); I do remember I had a tibet monk course of decoding ipw2100 PCI config address space, just need to find my kimono. Do you want me to implement ipw2100 driver as a big work structure which will run ipw2100_init()/wait/ipw2100_exit() in a loop? And that will be the fix suggested by Intel? That would explain a lot. P.S. And some people tell that asking for bug bisection is a hard pressure on user. Vendor has to ask him to fix bug himself instead, and that will be a solution! Getting the fact, that rmmod/insmod does not always fix the problem (but most of the time for a short period of time), I again want to point, that it looks like a firmware problem related to some inner timings. You ask me to fix the driver and do not even listen to what I said previously and do not get that into account and analyze.
On Sun, 2008-09-21 at 23:00 +0400, Evgeniy Polyakov wrote: > Do you want me to implement ipw2100 driver as a big work structure > which will run ipw2100_init()/wait/ipw2100_exit() in a loop? > And that will be the fix suggested by Intel? That would explain a lot. I think what Arjan is saying is that it would be better to put pressure on the responsible folks (I don't think Arjan is anywhere near them at all) if you'd put in a WARN_ON() for this error and that would make the top entry on kerneloops.org all the time... And additionally put in a workaround for yourself for now. And can we keep the flames off this list please? That comment from Wei Weng was absolutely uncalled for, and inciting a flamewar (as you have already blogged) was not really productive either. johannes
On Sun, 21 Sep 2008 14:52:37 -0400 Wei Weng <wweng@acedsl.com> wrote: > > again.. so how about you detect this condition and do, in the driver > > code, the equivalent of rmmod/insmod to the hardware. I'm sure > > people who have the hardware would appreciate that type of patch a > > lot more than the one you sent out. > > > > I guess it is your way of "middle finger" to all the IPW2100 > customers who try to use it on a Linux machine. if suggesting a workaround is giving the middle finger in your mind, then I don't think it's worth my time to discuss this further with you.
Hi Evgeniy, > Following bug exists in the ipw2100 driver/firmware for years and Intel > folks never responded to zillions bugzilla entries and forum notices in > the internet with some patch or firmware update (although did request > dmesg and debug info, and received them). > > ipw2100: Fatal interrupt. Scheduling firmware restart. > > I believe it is a firmware bug because after driver is unloaded and > loaded back again wireless adapter usually starts working (for small > amount of time though). My conspiracy feeling can suggest, that it may > be kind of a force to buy a new one, or trivial error in the firmware, > when it writes to the same place in the flash and essentially given cell > became dead or whatever else. I don't know if it is for this bug or a different one, but Matthew Garrett seem to have some pending patches. At least that is what he told me at PlumbersConf. Lets see if these patches do help. And please follow up with Arjan's suggestion and put a WARN_ON in the upstream code instead of waving CONFIG_BROKEN around. Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, 21 Sep 2008 23:38:09 +0400 Evgeniy Polyakov <johnpol@2ka.mipt.ru> wrote: > > As I pointed, I can rewrite the whole driver's initialization process, > so that it looked like init/wait/exit loop, which can be processed at > the module load and when fatal interrupt fires. yes please do this. > > I do hope this will result in a progress. Arjan, do you aggree to add > this patch to the current tree? > > diff --git a/drivers/net/wireless/ipw2100.c > b/drivers/net/wireless/ipw2100.c index 19a401c..9a7b64c 100644 > --- a/drivers/net/wireless/ipw2100.c > +++ b/drivers/net/wireless/ipw2100.c > @@ -206,6 +206,8 @@ MODULE_PARM_DESC(disable, "manually disable the > radio (default 0 [radio on])"); > static u32 ipw2100_debug_level = IPW_DL_NONE; > > +static int ipw2100_max_fatal_ints = 10; > + > #ifdef CONFIG_IPW2100_DEBUG > #define IPW_DEBUG(level, message...) \ > do { \ > @@ -3174,6 +3176,10 @@ static void ipw2100_irq_tasklet(struct > ipw2100_priv *priv) if (inta & IPW2100_INTA_FATAL_ERROR) { > printk(KERN_WARNING DRV_NAME > ": Fatal interrupt. Scheduling firmware > restart.\n"); > + WARN_ON(1); > + > + BUG_ON(ipw2100_max_fatal_ints-- <= 0); BUG_ON in interrupt context is just extremely hostile, since it means the box is dead. also I would suggest using WARN_ON_ONCE()
> Getting the fact, that rmmod/insmod does not always fix the problem (but > most of the time for a short period of time), I again want to point, > that it looks like a firmware problem related to some inner timings. You > ask me to fix the driver and do not even listen to what I said > previously and do not get that into account and analyze. Try putting it into D3 counting to 10 and powering it back up. Thats about as close as you can get to pulling the plug when it hangs. Alan -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Sep 21, 2008 at 08:57:44PM +0100, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote: > Try putting it into D3 counting to 10 and powering it back up. Thats > about as close as you can get to pulling the plug when it hangs. I will experiment with this, thanks Alan. Unfortunately my machine builds this only updated driver for about 10 minutes, so results will appear not too quickly. I will start tests tomorrow.
Hi Marcel. On Sun, Sep 21, 2008 at 09:35:31PM +0200, Marcel Holtmann (holtmann@linux.intel.com) wrote: > I don't know if it is for this bug or a different one, but Matthew > Garrett seem to have some pending patches. At least that is what he told > me at PlumbersConf. Lets see if these patches do help. And please follow > up with Arjan's suggestion and put a WARN_ON in the upstream code > instead of waving CONFIG_BROKEN around. I expect it is something new, since this bug exists at least from the 1.2 firmware version and .11 kernel. It was also reproduced (long ago though) on 2.4.
Try D3ing the chip in the firmware restart code. Yes, it's retarded.
On Sun, Sep 21, 2008 at 09:35:31PM +0200, Marcel Holtmann wrote: > I don't know if it is for this bug or a different one, but Matthew > Garrett seem to have some pending patches. At least that is what he told > me at PlumbersConf. Lets see if these patches do help. And please follow > up with Arjan's suggestion and put a WARN_ON in the upstream code > instead of waving CONFIG_BROKEN around. The fix I had for this was actually for ipw2200, but it ought to be applicable for 2100 as well. The ideal fix is probably to ensure that ipw*_down D3s the card and *_up D0s it, which brings enhanced runtime power saving and also has the nice side effect of actually resetting the damned POS in error cases.
On Sun, Sep 21, 2008 at 11:42:10PM +0100, Matthew Garrett (mjg59@srcf.ucam.org) wrote:
> Try D3ing the chip in the firmware restart code. Yes, it's retarded.
Thank you, I will start tests tomorrow.
I gotta admit, those "firmware restarts" were pretty annoying, and I'd always wondered why Intel themselves couldn't be bothered to fix 'em. -Kenny
On Sun, Sep 21, 2008 at 08:57:44PM +0100, Alan Cox (alan@lxorguk.ukuu.org.uk) wrote: > > Getting the fact, that rmmod/insmod does not always fix the problem (but > > most of the time for a short period of time), I again want to point, > > that it looks like a firmware problem related to some inner timings. You > > ask me to fix the driver and do not even listen to what I said > > previously and do not get that into account and analyze. > > Try putting it into D3 counting to 10 and powering it back up. Thats > about as close as you can get to pulling the plug when it hangs. I made several experimetns with power states in reset handler, like put to d3 (hot), disable device, save/resetore states. Fatal interrupts continue to fire with essentially the same rate. The same error address does not always contain the same error value, but frequently it is finit small set. Here are some data: [41773.200686] ipw2100: Fatal interrupt. Scheduling firmware restart. [41773.200707] eth1: Fatal error value: 0x500185B8, address: 0x08004501, inta: 0x40000000 [41773.200810] ipw2100 0000:02:04.0: PCI INT A disabled [41773.203110] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.224446] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.245781] ipw2100: IRQ INTA == 0xFFFFFFFF [41773.249360] ipw2100 0000:02:04.0: enabling device (0000 -> 0002) [41773.249384] ipw2100 0000:02:04.0: PCI INT A -> Link[C0C8] -> GSI 11 (level, low) -> IRQ 11 [41773.249426] ipw2100 0000:02:04.0: restoring config space at offset 0x1 (was 0x2900002, writing 0x2900006) That is quite harmless, since interrupt handler just sees that device is dissapearing. This brought me to think more about interrupt processing (irq handler and related tasklet), and I found races between interrupt tasklet, ipw2100_wx_event_work() handler, reset task and probably others. Register access in some cases are proteceted by lock (interrupt handler), and in some cases is not (all others). Although every user first disables interrupts, but it can be handled right now and scheduled tasklet already. Also priv->status field is frequently accessed and modified with and without locks. This may be harmless, but still a red flag. Another data about the same failed address: eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x5000CEE4, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 eth1: Fatal error value: 0x50018584, address: 0x61C00000, inta: 0x40000000 values are quite limited, but I saw at different address wider set of different data, but still limited. Addresses and values of the fatal interrupt do not follow some immediately obvious law, so this looks more like firmware losts its mind. The reason for this actually may be a register access races described above. I will continue experiments with this.
diff --git a/drivers/net/wireless/Kconfig b/drivers/net/wireless/Kconfig index 9931b5a..c24fc6a 100644 --- a/drivers/net/wireless/Kconfig +++ b/drivers/net/wireless/Kconfig @@ -125,7 +125,7 @@ config PCMCIA_RAYCS config IPW2100 tristate "Intel PRO/Wireless 2100 Network Connection" - depends on PCI && WLAN_80211 + depends on PCI && WLAN_80211 && BROKEN select WIRELESS_EXT select FW_LOADER select IEEE80211
Hi. Following bug exists in the ipw2100 driver/firmware for years and Intel folks never responded to zillions bugzilla entries and forum notices in the internet with some patch or firmware update (although did request dmesg and debug info, and received them). ipw2100: Fatal interrupt. Scheduling firmware restart. I believe it is a firmware bug because after driver is unloaded and loaded back again wireless adapter usually starts working (for small amount of time though). My conspiracy feeling can suggest, that it may be kind of a force to buy a new one, or trivial error in the firmware, when it writes to the same place in the flash and essentially given cell became dead or whatever else. Intel folks, please fix this problem, I see no other way to force you to do this than to mark ipw2100 driver as broken, since that is what it is. Bug exists at least in .15 upto .24 kernels, just search above dmesg line. I cought it with 2.6.24-19-386 ubuntu kernel, 1.3 firmware version. lspci: 02:04.0 Network controller: Intel Corporation PRO/Wireless LAN 2100 3B Mini PCI Adapter (rev 04) Subsystem: Intel Corporation Samsung X10/P30 integrated WLAN Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- Latency: 64 (500ns min, 8500ns max), Cache Line Size: 64 bytes Interrupt: pin A routed to IRQ 11 Region 0: Memory at 90080000 (32-bit, non-prefetchable) [size=4K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=1 PME- dmesg is pretty usual. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>