diff mbox

NIU driver: Sun x8 Express Quad Gigabit Ethernet Adapter

Message ID 20081113.140824.156293734.davem@davemloft.net
State Not Applicable, archived
Headers show

Commit Message

David Miller Nov. 13, 2008, 10:08 p.m. UTC
From: Jesper Dangaard Brouer <jdb@comx.dk>
Date: Thu, 13 Nov 2008 09:50:22 +0100

> Another bug... while unloading the niu module.
> 
> During my testing I'm unloading/loading the niu module, I usually take
> down the interfaces _before_ unloading the module, but I forgot one
> time, and got the following BUG in the kern log.
> 
> niu: niu_put_parent: port[3]
> niu 0000:0b:00.3: PCI INT D disabled
> niu: niu_put_parent: port[2]
> niu 0000:0b:00.2: PCI INT C disabled
> niu: niu_put_parent: port[1]
> niu 0000:0b:00.1: PCI INT B disabled
> ------------[ cut here ]------------
> kernel BUG at drivers/pci/msi.c:630!

Weird.  When the module is unloaded, unregister_netdev() will
do a dev_close() which will invoke dev->stop() which is
niu_close().

And niu_close() will call free_irq() on every MSI interrupt
registered in niu_open().

So I can't see how this can happen but obviously it is happening.

I suspect that something might be changing np->num_ldg, but
anyways the following debugging patch should provide some
clues.  Please reproduce this and send the logs it generates.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jesper Dangaard Brouer Nov. 14, 2008, 12:38 p.m. UTC | #1
On Thu, 2008-11-13 at 14:08 -0800, David Miller wrote:
> I suspect that something might be changing np->num_ldg, but
> anyways the following debugging patch should provide some
> clues.  Please reproduce this and send the logs it generates.

Debugging the rmmod problem...

I found a strange behavior, rmmod'ing the niu driver will only cause a
kernel BUG, if the driver was loaded at boot time.  If I remove the
niu.ko driver from /lib/modules/2.6.28-rc4-davem/kernel/drivers/net/
reboot the system.  After that I can load and unload the niu.ko driver
without problems... hmmm

Here is you dmesg output with extra debug statements:
-----------------------------------------------------
niu 0000:0b:00.3: PCI INT D -> GSI 19 (level, low) -> IRQ 19
niu: niu_get_parent: platform_type[1] port[3]
niu 0000:0b:00.3: setting latency timer to 64
niu: niu_get_invariants: VPD offset [00016a00]
niu: VPD_SCAN: start[16a14] end[16b98]
niu: VPD_SCAN: Reading in property [local-mac-address] len[6]
niu: VPD_SCAN: Reading in property [version] len[38]
niu: VPD_SCAN: Reading in property [model] len[14]
niu: VPD_SCAN: Reading in property [board-model] len[12]
niu: VPD_SCAN: Reading in property [num-mac-addresses] len[1]
niu: VPD_SCAN: Reading in property [phy-type] len[4]
niu: VPD_SCAN: FCODE major(3) minor(9)
niu: niu_get_and_validate_port: port[3] num_ports[4]
niu: niu_probe_ports(): port_phy[000000aa]
niu: niu_classifier_swstate_init: num_tcam(256)
eth4: Broadcom NetXtreme II BCM5708 1000Base-T (B2) PCI-X 64-bit 133MHz found at mem f4000000, IRQ 17, node addr 00:1e:0b:71:60:84
usb 5-2: configuration #1 chosen from 1 choice
udev: renamed network interface eth4 to eth1
hub 5-2:1.0: USB hub found
hub 5-2:1.0: 7 ports detected
usb 5-2: New USB device found, idVendor=03f0, idProduct=1327
usb 5-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 5-2: Product: Virtual Hub
usb 5-2: Manufacturer: HP
eth4: NIU Ethernet 00:14:4f:da:17:09
eth4: Port type[BMAC] mode[1G:COPPER] XCVR[MII] phy[mif]
udev: renamed network interface eth1_rename to eth0
udev: renamed network interface eth2 to eth3
udev: renamed network interface eth4 to eth5
udev: renamed network interface eth0_rename to eth2
udev: renamed network interface eth3_rename to eth4
Adding 3903784k swap on /dev/cciss/c0d0p2.  Priority:-1 extents:1 across:3903784k
EXT3 FS on cciss/c0d0p1, internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS on cciss/c0d0p3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
IPv4 FIB: Using LC-trie version 0.408
niu 0000:0b:00.0: niu: eth2: niu_request_irq() num_ldg[13]
niu 0000:0b:00.0: niu: eth2: Request IRQ 32, lp(f62b46d4), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 31, lp(f62b470c), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 30, lp(f62b4744), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 29, lp(f62b477c), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 28, lp(f62b47b4), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 27, lp(f62b47ec), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 26, lp(f62b4824), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 25, lp(f62b485c), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 24, lp(f62b4894), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 23, lp(f62b48cc), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 22, lp(f62b4904), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 21, lp(f62b493c), err=0
niu 0000:0b:00.0: niu: eth2: Request IRQ 20, lp(f62b4974), err=0
niu 0000:0b:00.1: niu: eth3: niu_request_irq() num_ldg[1]
niu 0000:0b:00.1: niu: eth3: Request IRQ 17, lp(f63e46d4), err=0
niu: eth2: Link is up at 1Gb/sec, full duplex
niu: eth3: Link is up at 1Gb/sec, full duplex
bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex
niu: niu_put_parent: port[3]
niu 0000:0b:00.3: PCI INT D disabled
niu: niu_put_parent: port[2]
niu 0000:0b:00.2: PCI INT C disabled
niu 0000:0b:00.1: niu: eth3: niu_free_irq() num_ldg[1]
niu 0000:0b:00.1: niu: eth3: free IRQ 17, lp(f63e46d4)
niu: niu_put_parent: port[1]
niu 0000:0b:00.1: PCI INT B disabled
niu 0000:0b:00.0: niu: eth2: niu_free_irq() num_ldg[13]
niu 0000:0b:00.0: niu: eth2: free IRQ 32, lp(f62b46d4)
niu 0000:0b:00.0: niu: eth2: free IRQ 31, lp(f62b470c)
niu 0000:0b:00.0: niu: eth2: free IRQ 30, lp(f62b4744)
niu 0000:0b:00.0: niu: eth2: free IRQ 29, lp(f62b477c)
niu 0000:0b:00.0: niu: eth2: free IRQ 28, lp(f62b47b4)
niu 0000:0b:00.0: niu: eth2: free IRQ 27, lp(f62b47ec)
niu 0000:0b:00.0: niu: eth2: free IRQ 26, lp(f62b4824)
niu 0000:0b:00.0: niu: eth2: free IRQ 25, lp(f62b485c)
niu 0000:0b:00.0: niu: eth2: free IRQ 24, lp(f62b4894)
niu 0000:0b:00.0: niu: eth2: free IRQ 23, lp(f62b48cc)
niu 0000:0b:00.0: niu: eth2: free IRQ 22, lp(f62b4904)
niu 0000:0b:00.0: niu: eth2: free IRQ 21, lp(f62b493c)
niu 0000:0b:00.0: niu: eth2: free IRQ 20, lp(f62b4974)
------------[ cut here ]------------
kernel BUG at drivers/pci/msi.c:630!
invalid opcode: 0000 [#1] PREEMPT SMP 
last sysfs file: /sys/class/net/lo/operstate
Modules linked in: thermal rng_core hpwdt hpilo serio_raw ehci_hcd uhci_hcd bnx2 zlib_inflate niu(-) processor sr_mod cdrom

Pid: 3153, comm: rmmod Not tainted (2.6.28-rc4-davem #19) ProLiant DL380 G5
EIP: 0060:[<c0230cdc>] EFLAGS: 00010286 CPU: 2
EIP is at msi_free_irqs+0xdc/0xe0
EAX: f61887c0 EBX: 00000030 ECX: f6472694 EDX: c049c680
ESI: f7222000 EDI: f722246c EBP: f590feb4 ESP: f590fea8
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process rmmod (pid: 3153, ti=f590e000 task=f718b2c0 task.ti=f590e000)
Stack:
 f7222000 f62b4540 f7222000 f590febc c0230ce8 f590fec8 c0230f71 f62b4000
 f590fedc f81fe678 f7222000 f82033d4 f82033d4 f590fee8 c022bbc9 f7222058
 f590fef8 c027b1a9 f7222058 f7222184 f590ff0c c027b27d f82033a0 f82033d4
Call Trace:
 [<c0230ce8>] ? msix_free_all_irqs+0x8/0x10
 [<c0230f71>] ? pci_disable_msix+0x31/0x40
 [<f81fe678>] ? niu_pci_remove_one+0x88/0x8a [niu]
 [<c022bbc9>] ? pci_device_remove+0x19/0x40
 [<c027b1a9>] ? __device_release_driver+0x59/0x90
 [<c027b27d>] ? driver_detach+0x9d/0xb0
 [<c027a515>] ? bus_remove_driver+0x75/0xa0
 [<c027b729>] ? driver_unregister+0x39/0x40
 [<c022be21>] ? pci_unregister_driver+0x21/0x80
 [<f81fb29d>] ? niu_exit+0xd/0x10 [niu]
 [<c014ce46>] ? sys_delete_module+0x116/0x1f0
 [<c0144309>] ? lock_release_holdtime+0x79/0xc0
 [<c0174df6>] ? sys_munmap+0x46/0x60
 [<c0103231>] ? sysenter_do_call+0x12/0x2c
Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b 14 75 aa 8b 43 1c e8 dd 77 ee ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe 55 89 e
5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55 
EIP: [<c0230cdc>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f590fea8
---[ end trace 8eed6b3e1ad2a790 ]---
Jesper Dangaard Brouer Nov. 14, 2008, 6:49 p.m. UTC | #2
On Fri, 2008-11-14 at 13:38 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 2008-11-13 at 14:08 -0800, David Miller wrote:
> > I suspect that something might be changing np->num_ldg, but
> > anyways the following debugging patch should provide some
> > clues.  Please reproduce this and send the logs it generates.
> 
> Debugging the rmmod problem...
> 
> I found a strange behavior, rmmod'ing the niu driver will only cause a
> kernel BUG, if the driver was loaded at boot time.  If I remove the
> niu.ko driver from /lib/modules/2.6.28-rc4-davem/kernel/drivers/net/
> reboot the system.  After that I can load and unload the niu.ko driver
> without problems... hmmm

Perhaps this is a regression, as the problem is not in v2.6.27.

I'll start bisecting monday...

I'm not sure its a NIU driver bug, as the number of changes to niu.c is
very small since v2.6.27. (git log v2.6.27.. drivers/net/niu.c)
David Miller Nov. 15, 2008, 12:21 a.m. UTC | #3
From: Jesper Dangaard Brouer <jdb@comx.dk>
Date: Fri, 14 Nov 2008 19:49:22 +0100

> On Fri, 2008-11-14 at 13:38 +0100, Jesper Dangaard Brouer wrote:
> > On Thu, 2008-11-13 at 14:08 -0800, David Miller wrote:
> > > I suspect that something might be changing np->num_ldg, but
> > > anyways the following debugging patch should provide some
> > > clues.  Please reproduce this and send the logs it generates.
> > 
> > Debugging the rmmod problem...
> > 
> > I found a strange behavior, rmmod'ing the niu driver will only cause a
> > kernel BUG, if the driver was loaded at boot time.  If I remove the
> > niu.ko driver from /lib/modules/2.6.28-rc4-davem/kernel/drivers/net/
> > reboot the system.  After that I can load and unload the niu.ko driver
> > without problems... hmmm
> 
> Perhaps this is a regression, as the problem is not in v2.6.27.

This is what I started to suspect as well.

> I'll start bisecting monday...
> 
> I'm not sure its a NIU driver bug, as the number of changes to niu.c is
> very small since v2.6.27. (git log v2.6.27.. drivers/net/niu.c)

Ok, let me know what your bisect finds.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer Nov. 19, 2008, 12:10 p.m. UTC | #4
On Fri, 2008-11-14 at 16:21 -0800, David Miller wrote:
> From: Jesper Dangaard Brouer <jdb@comx.dk>
> Date: Fri, 14 Nov 2008 19:49:22 +0100
> 
> > On Fri, 2008-11-14 at 13:38 +0100, Jesper Dangaard Brouer wrote:
> > > On Thu, 2008-11-13 at 14:08 -0800, David Miller wrote:
> > > > I suspect that something might be changing np->num_ldg, but
> > > > anyways the following debugging patch should provide some
> > > > clues.  Please reproduce this and send the logs it generates.
> > > 
> > > Debugging the rmmod problem...
> > > 
> > > I found a strange behavior, rmmod'ing the niu driver will only cause a
> > > kernel BUG, if the driver was loaded at boot time.  If I remove the
> > > niu.ko driver from /lib/modules/2.6.28-rc4-davem/kernel/drivers/net/
> > > reboot the system.  After that I can load and unload the niu.ko driver
> > > without problems... hmmm
> > 
> > Perhaps this is a regression, as the problem is not in v2.6.27.
> 
> This is what I started to suspect as well.
> 
> > I'll start bisecting monday...
> > 
> > I'm not sure its a NIU driver bug, as the number of changes to niu.c is
> > very small since v2.6.27. (git log v2.6.27.. drivers/net/niu.c)
> 
> Ok, let me know what your bisect finds.

I have given up bisecting because during my bisect I have hit a kernel
that will not boot on my system (it hangs...)

I have attached the full bisect history document...
diff mbox

Patch

diff --git a/drivers/net/niu.c b/drivers/net/niu.c
index d8463b1..c0eedd3 100644
--- a/drivers/net/niu.c
+++ b/drivers/net/niu.c
@@ -5600,12 +5600,20 @@  static int niu_request_irq(struct niu *np)
 	int i, j, err;
 
 	err = 0;
+#if 1
+	dev_err(np->device, PFX "%s: niu_request_irq() num_ldg[%d]\n",
+		np->dev->name, np->num_ldg);
+#endif
 	for (i = 0; i < np->num_ldg; i++) {
 		struct niu_ldg *lp = &np->ldg[i];
 
 		err = request_irq(lp->irq, niu_interrupt,
 				  IRQF_SHARED | IRQF_SAMPLE_RANDOM,
 				  np->dev->name, lp);
+#if 1
+		dev_err(np->device, PFX "%s: Request IRQ %u, lp(%p), err=%d\n",
+			np->dev->name, lp->irq, lp, err);
+#endif
 		if (err)
 			goto out_free_irqs;
 
@@ -5617,6 +5625,11 @@  out_free_irqs:
 	for (j = 0; j < i; j++) {
 		struct niu_ldg *lp = &np->ldg[j];
 
+#if 1
+		dev_err(np->device, PFX "%s: out_free_irqs, "
+			"free IRQ %u, lp(%p)\n",
+			np->dev->name, lp->irq, lp);
+#endif
 		free_irq(lp->irq, lp);
 	}
 	return err;
@@ -5626,9 +5639,17 @@  static void niu_free_irq(struct niu *np)
 {
 	int i;
 
+#if 1
+	dev_err(np->device, PFX "%s: niu_free_irq() num_ldg[%d]\n",
+		np->dev->name, np->num_ldg);
+#endif
 	for (i = 0; i < np->num_ldg; i++) {
 		struct niu_ldg *lp = &np->ldg[i];
 
+#if 1
+		dev_err(np->device, PFX "%s: free IRQ %u, lp(%p)\n",
+			np->dev->name, lp->irq, lp);
+#endif
 		free_irq(lp->irq, lp);
 	}
 }