Message ID | 20081112.041143.11487260.davem@davemloft.net |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Wed, 12 Nov 2008 03:52:40 -0800 (PST) > > These tests are still useful for me, so please perform them, As a gratitude for your work and being allowed to operate your expresso machine, I'll be happy to perform the tests even though the bug has been found. > but I think I've found the bug. Yes! you have found the bug! :-) (This is on the non SMP and non MSI kernel. First test pktgen test says I can route 319 kpps using a single CPU, promising as I got 160 kpps using the Sun nxge driver) Tested-by: Jesper Dangaard Brouer <jdb@comx.dk>
On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: [...] > So the following patch should fix this bug. writeq() should > be OK as-is, so doesn't need a similar change. > > diff --git a/drivers/net/niu.c b/drivers/net/niu.c > index 9acb5d7..d8463b1 100644 > --- a/drivers/net/niu.c > +++ b/drivers/net/niu.c > @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); > #ifndef readq > static u64 readq(void __iomem *reg) > { > - return (((u64)readl(reg + 0x4UL) << 32) | > - (u64)readl(reg)); > + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); > } Since there's no sequence point between the reads, there's no guarantee that the reads happen in the order written (regardless of barriers inside readl()). This needs to be split into two statements. Ben.
On Wed, 2008-11-12 at 12:54 +0000, Ben Hutchings wrote: > On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > [...] > > So the following patch should fix this bug. writeq() should > > be OK as-is, so doesn't need a similar change. > > > > diff --git a/drivers/net/niu.c b/drivers/net/niu.c > > index 9acb5d7..d8463b1 100644 > > --- a/drivers/net/niu.c > > +++ b/drivers/net/niu.c > > @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); > > #ifndef readq > > static u64 readq(void __iomem *reg) > > { > > - return (((u64)readl(reg + 0x4UL) << 32) | > > - (u64)readl(reg)); > > + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); > > } > > Since there's no sequence point between the reads, there's no guarantee > that the reads happen in the order written (regardless of barriers > inside readl()). This needs to be split into two statements. The nxge driver does this: #ifndef readq static inline uint64_t readq(void *addr) { uint32_t val32 = readl(addr); uint64_t val64 = (uint64_t) readl(addr + 4); return (val32 | (val64 << 32)); } #endif #ifndef writeq static inline void writeq(uint64_t val64, void *addr) { writel((uint32_t)(val64), addr); writel((uint32_t)(val64 >> 32), (addr + 4)); } #endif
David Miller wrote: > I am guessing you're running a 32-bit x86 kernel. > > In such a case the driver has to define a local readq() > and writeq() implementation. > > What I provide for NIU right now reads the upper 32-bits > then the lower 32-bits of the register. > > Guess what that does? The packet counters live in the upper > 32-bits and the MARK bits live in the lower 32-bits of the > TX_CS register. > > So it first reads the packet counters, and as a side effect that > clears the MARK bits in the TX_CS register. So when we read the lower > 32-bits the MARK bits are always seen as zero. > > BzzaaarT! > > So the following patch should fix this bug. writeq() should > be OK as-is, so doesn't need a similar change. > > diff --git a/drivers/net/niu.c b/drivers/net/niu.c > index 9acb5d7..d8463b1 100644 > --- a/drivers/net/niu.c > +++ b/drivers/net/niu.c > @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); > #ifndef readq > static u64 readq(void __iomem *reg) > { > - return (((u64)readl(reg + 0x4UL) << 32) | > - (u64)readl(reg)); > + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); > } > > static void writeq(u64 val, void __iomem *reg) On my system, I'm not in a position where I can just pull down the server and test, but if the above seems plausible that it is the same bug I hit using the 10GBitE card, then I'll definately try to test it out. I sort-of reliably hit the problem after a few day of production on a 16 core, amd64 system running NFS-server. Does it seem likely to be the same problem? Thanks
Hi Google, On Wed, 12 Nov 2008, David Miller wrote: > Guess what that does? The packet counters live in the upper > 32-bits and the MARK bits live in the lower 32-bits of the > TX_CS register. > > So it first reads the packet counters, and as a side effect that > clears the MARK bits in the TX_CS register. So when we read the lower > 32-bits the MARK bits are always seen as zero. For the thorough reader, the TX_CS Transmit Control and Status register is described in table 26-15 page 761-762 in the PDF document titled: "UltraSPARC T2 supplement to UltraSPARC architecture 2007", downloadable from: http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf Cheers, Jesper Brouer -- ------------------------------------------------------------------- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk ------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jesper Krogh <jesper@krogh.cc> Date: Wed, 12 Nov 2008 18:56:48 +0100 > I sort-of reliably hit the problem after a few day of production on > a 16 core, amd64 system running NFS-server. > > Does it seem likely to be the same problem? Not really, it sounds like you're using a 64-bit kernel (this only effects 32-bit ones) and the problem triggers after the first 256 packets are sent to the send destination so it should happen quickly. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Ben Hutchings <bhutchings@solarflare.com> Date: Wed, 12 Nov 2008 12:54:53 +0000 > On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > [...] > > So the following patch should fix this bug. writeq() should > > be OK as-is, so doesn't need a similar change. > > > > diff --git a/drivers/net/niu.c b/drivers/net/niu.c > > index 9acb5d7..d8463b1 100644 > > --- a/drivers/net/niu.c > > +++ b/drivers/net/niu.c > > @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); > > #ifndef readq > > static u64 readq(void __iomem *reg) > > { > > - return (((u64)readl(reg + 0x4UL) << 32) | > > - (u64)readl(reg)); > > + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); > > } > > Since there's no sequence point between the reads, there's no guarantee > that the reads happen in the order written (regardless of barriers > inside readl()). This needs to be split into two statements. What version of the C language are you using? I personally think it's safe. If the compiler sees "A | B" it's going to emit the code to compute A, then the code to emit B, and finally the "|" operation. Everything I've always seen says that for "|" the expressions are evaluated left to right. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2008-11-12 at 13:46 -0800, David Miller wrote: > From: Ben Hutchings <bhutchings@solarflare.com> > Date: Wed, 12 Nov 2008 12:54:53 +0000 > > > On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > > [...] > > > So the following patch should fix this bug. writeq() should > > > be OK as-is, so doesn't need a similar change. > > > > > > diff --git a/drivers/net/niu.c b/drivers/net/niu.c > > > index 9acb5d7..d8463b1 100644 > > > --- a/drivers/net/niu.c > > > +++ b/drivers/net/niu.c > > > @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); > > > #ifndef readq > > > static u64 readq(void __iomem *reg) > > > { > > > - return (((u64)readl(reg + 0x4UL) << 32) | > > > - (u64)readl(reg)); > > > + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); > > > } > > > > Since there's no sequence point between the reads, there's no guarantee > > that the reads happen in the order written (regardless of barriers > > inside readl()). This needs to be split into two statements. > > What version of the C language are you using? Any version will do. > I personally think it's safe. If the compiler sees "A | B" it's going > to emit the code to compute A, then the code to emit B, and finally > the "|" operation. > > Everything I've always seen says that for "|" the expressions are > evaluated left to right. I think you're confusing it with "||" which does have this sequencing rule. See <http://c-faq.com/expr/seqpoints.html> if you're not convinced. Ben.
From: Ben Hutchings <bhutchings@solarflare.com> Date: Wed, 12 Nov 2008 21:50:57 +0000 > See <http://c-faq.com/expr/seqpoints.html> if you're not convinced. I don't think that has any implications for the piece of code we are talking about. Just google "C order of evaluation" and you will get hundreds of tables, and all of them will have an entry for "|" (not just "||") which says that operands are evaluated left to right. And since these MMIO reads are volatile operations, there is no way the compiler can execute them out of order. And the plain truth is that no compiler does, and that is what matters in the end. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Jesper Dangaard Brouer wrote: > > Hi Google, > > On Wed, 12 Nov 2008, David Miller wrote: > >> Guess what that does? The packet counters live in the upper >> 32-bits and the MARK bits live in the lower 32-bits of the >> TX_CS register. >> >> So it first reads the packet counters, and as a side effect that >> clears the MARK bits in the TX_CS register. So when we read the lower >> 32-bits the MARK bits are always seen as zero. > > > For the thorough reader, the TX_CS Transmit Control and Status > register is described in table 26-15 page 761-762 in the PDF document > titled: "UltraSPARC T2 supplement to UltraSPARC architecture 2007", > downloadable from: > http://opensparc-t2.sunsource.net/specs/UST2-UASuppl-current-draft-HP-EXT.pdf > > > Cheers, > Jesper Brouer > > -- > ------------------------------------------------------------------- > MSc. Master of Computer Science > Dept. of Computer Science, University of Copenhagen > Author of http://www.adsl-optimizer.dk > ------------------------------------------------------------------- > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html The niu/neptune HW puts some requirement on 32 bit reads of 64 bit registers. You need to read the lower 32 bits first and then the upper 32 bits. The same ordering applies to writes as well. On some 64 bit platforms, the 64 bit reads are split into two 32 bit reads as well, regardless of the OS. Regards Matheos -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Another bug... while unloading the niu module. During my testing I'm unloading/loading the niu module, I usually take down the interfaces _before_ unloading the module, but I forgot one time, and got the following BUG in the kern log. niu: niu_put_parent: port[3] niu 0000:0b:00.3: PCI INT D disabled niu: niu_put_parent: port[2] niu 0000:0b:00.2: PCI INT C disabled niu: niu_put_parent: port[1] niu 0000:0b:00.1: PCI INT B disabled ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:630! invalid opcode: 0000 [#1] PREEMPT SMP last sysfs file: /sys/class/net/lo/operstate Modules linked in: hpilo serio_raw bnx2 zlib_inflate ipmi_si ipmi_msghandler hpwdt rng_core ehci_hcd uhci_hcd niu(-) sr_mod cdrom Pid: 3307, comm: rmmod Tainted: G W (2.6.28-rc4-davem #17) ProLiant DL380 G5 EIP: 0060:[<c02314fc>] EFLAGS: 00010282 CPU: 0 EIP is at msi_free_irqs+0xdc/0xe0 EAX: f60ad420 EBX: 00000030 ECX: f664ff14 EDX: c04a5680 ESI: f71d1000 EDI: f71d146c EBP: f6305eb4 ESP: f6305ea8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rmmod (pid: 3307, ti=f6304000 task=f6aaa570 task.ti=f6304000) Stack: f71d1000 f62b4540 f71d1000 f6305ebc c0231508 f6305ec8 c0231791 f62b4000 f6305edc f81777f8 f71d1000 f817c5d4 f817c5d4 f6305ee8 c022c3e9 f71d1058 f6305ef8 c0281609 f71d1058 f71d1184 f6305f0c c02816dd f817c5a0 f817c5d4 Call Trace: [<c0231508>] ? msix_free_all_irqs+0x8/0x10 [<c0231791>] ? pci_disable_msix+0x31/0x40 [<f81777f8>] ? niu_pci_remove_one+0x88/0x8a [niu] [<c022c3e9>] ? pci_device_remove+0x19/0x40 [<c0281609>] ? __device_release_driver+0x59/0x90 [<c02816dd>] ? driver_detach+0x9d/0xb0 [<c0280975>] ? bus_remove_driver+0x75/0xa0 [<c0281b89>] ? driver_unregister+0x39/0x40 [<c022c641>] ? pci_unregister_driver+0x21/0x80 [<f817443d>] ? niu_exit+0xd/0x10 [niu] [<c014d646>] ? sys_delete_module+0x116/0x1f0 [<c01744e0>] ? do_munmap+0x1f0/0x250 [<c01755f6>] ? sys_munmap+0x46/0x60 [<c0103231>] ? sysenter_do_call+0x12/0x2c Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b 14 75 aa 8b 43 1c e8 bd 6f ee ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe 55 89 e5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55 EIP: [<c02314fc>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f6305ea8 ---[ end trace 6594bbb8d1cf29ee ]---
On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Wed, 12 Nov 2008 03:52:40 -0800 (PST) > > > Ok, Jesper, please try two things for me, leave the debugging patch > > in there for all the tests: > > > > 1) Retrigger the problem (with or without MSI, doesn't matter) but > > add back in that test I asked you to try last week. The one > > where the "if (++rp->mark_counter == rp->mark_freq)" condition > > test in niu_start_xmit() is commented out, so that the > > "mrk |= TX_DESC_MARK;" statement always runs. > > > > Get me the log dump produced by that scenerio. ------------[ cut here ]------------ WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x21e/0x230() NETDEV WATCHDOG: eth2 (niu): transmit timed out Modules linked in: niu ipmi_si hpwdt serio_raw bnx2 zlib_inflate rng_core ipmi_msghandler hpilo ehci_hcd uhci_hcd sr_mod cdrom Pid: 0, comm: swapper Not tainted 2.6.28-rc4-davem #17 Call Trace: [<c0125823>] warn_slowpath+0x63/0x80 [<c011f03e>] ? __enqueue_entity+0x8e/0xb0 [<c010888c>] ? native_sched_clock+0x1c/0x80 [<c01453c4>] ? __lock_acquire+0x104/0x8e0 [<c01453c4>] ? __lock_acquire+0x104/0x8e0 [<c010888c>] ? native_sched_clock+0x1c/0x80 [<c013f19b>] ? getnstimeofday+0x3b/0xe0 [<c0144b09>] ? lock_release_holdtime+0x79/0xc0 [<c021fd2e>] ? strlcpy+0x1e/0x60 [<c031f4be>] dev_watchdog+0x21e/0x230 [<c0144b09>] ? lock_release_holdtime+0x79/0xc0 [<c012e55d>] ? run_timer_softirq+0x10d/0x190 [<c012e56f>] run_timer_softirq+0x11f/0x190 [<c014362c>] ? tick_dev_program_event+0x3c/0xc0 [<c031f2a0>] ? dev_watchdog+0x0/0x230 [<c012a204>] __do_softirq+0x94/0x160 [<c013c7c0>] ? hrtimer_interrupt+0x150/0x180 [<c013c651>] ? ktime_get+0x11/0x30 [<c012a30b>] do_softirq+0x3b/0x50 [<c012a515>] irq_exit+0x75/0x90 [<c011364a>] smp_apic_timer_interrupt+0x5a/0x90 [<c013c5ca>] ? hrtimer_start+0x1a/0x20 [<c0103f0c>] apic_timer_interrupt+0x28/0x30 [<c01090d5>] ? mwait_idle+0x35/0x40 [<c0101c1e>] cpu_idle+0x4e/0xa0 ---[ end trace 3045c940a424568f ]--- niu 0000:0b:00.0: niu: eth2: Transmit timed out, resetting niu 0000:0b:00.0: niu: eth2: LDG[idx(0):num(0)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(1):num(1)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(2):num(2)] V0[sw(0x2000000000)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(3):num(3)] V0[sw(0x1)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(4):num(4)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(5):num(5)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(6):num(6)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(7):num(7)] V0[sw(0x100000000)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(8):num(8)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: LDG[idx(9):num(9)] V0[sw(0x0)hw(0x0)] V1[sw(0x0)hw(0x0)] V2[sw(0x0)hw(0x0)] niu 0000:0b:00.0: niu: eth2: Dumping transmitter state. niu 0000:0b:00.0: niu: eth2: TX_RING[ 0] CHANNEL 0 LDN 32 niu 0000:0b:00.0: niu: eth2: TX_RING[ 0] parent->lgd_map[ldn] 7 niu 0000:0b:00.0: niu: eth2: TX_RING[ 0] Num pending TX SKBs: 2 niu 0000:0b:00.0: niu: eth2: TX_RING[ 0] TX_CS sw[0002000100000000] hw[0002000100000000] niu 0000:0b:00.0: niu: eth2: TX_RING[ 1] CHANNEL 1 LDN 33 niu 0000:0b:00.0: niu: eth2: TX_RING[ 1] parent->lgd_map[ldn] 8 niu 0000:0b:00.0: niu: eth2: TX_RING[ 1] Num pending TX SKBs: 0 niu 0000:0b:00.0: niu: eth2: TX_RING[ 1] TX_CS sw[0000000000000000] hw[0000000000000000] niu 0000:0b:00.0: niu: eth2: TX_RING[ 2] CHANNEL 2 LDN 34 niu 0000:0b:00.0: niu: eth2: TX_RING[ 2] parent->lgd_map[ldn] 9 niu 0000:0b:00.0: niu: eth2: TX_RING[ 2] Num pending TX SKBs: 0 niu 0000:0b:00.0: niu: eth2: TX_RING[ 2] TX_CS sw[0000000000000000] hw[0000000000000000] niu 0000:0b:00.0: niu: eth2: TX_RING[ 3] CHANNEL 3 LDN 35 niu 0000:0b:00.0: niu: eth2: TX_RING[ 3] parent->lgd_map[ldn] 0 niu 0000:0b:00.0: niu: eth2: TX_RING[ 3] Num pending TX SKBs: 0 niu 0000:0b:00.0: niu: eth2: TX_RING[ 3] TX_CS sw[0000000000000000] hw[0000000000000000] niu 0000:0b:00.0: niu: eth2: TX_RING[ 4] CHANNEL 4 LDN 36 niu 0000:0b:00.0: niu: eth2: TX_RING[ 4] parent->lgd_map[ldn] 1 niu 0000:0b:00.0: niu: eth2: TX_RING[ 4] Num pending TX SKBs: 0 niu 0000:0b:00.0: niu: eth2: TX_RING[ 4] TX_CS sw[0000000000000000] hw[0000000000000000] niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] CHANNEL 5 LDN 37 niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] parent->lgd_map[ldn] 2 niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] Num pending TX SKBs: 237 niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] TX_CS sw[00ed00ec00000000] hw[00ed00ec00000000]
On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > From: David Miller <davem@davemloft.net> > Date: Wed, 12 Nov 2008 03:52:40 -0800 (PST) > > > Ok, Jesper, please try two things for me, leave the debugging patch > > in there for all the tests: > > > > 1) Retrigger the problem (with or without MSI, doesn't matter) but > > add back in that test I asked you to try last week. The one > > where the "if (++rp->mark_counter == rp->mark_freq)" condition > > test in niu_start_xmit() is commented out, so that the > > "mrk |= TX_DESC_MARK;" statement always runs. > > > > Get me the log dump produced by that scenerio. > > > > 2) Next, simply comment out the: > > > > if (unlikely(!(cs & (TX_CS_MK | TX_CS_MMK)))) > > goto out; > > > > lines in niu_tx_work(). > > > > Let's see what new info we can get out of this. Both applying test#1 and test#2. After applying test#2, I cannot get it to do a TX transmit timed out. And every thing seem to work... which after the known bug fix was kind of the expected behaviour... Although I'm not happy about the new perf numbers, as I now on a SMP system only can route approx 290 kpps, remember I could route 319 kpps using a single CPU nosmp kernel. (even more anyoing is that oprofile is broken)
From: Jesper Dangaard Brouer <jdb@comx.dk> Date: Thu, 13 Nov 2008 11:29:31 +0100 > Although I'm not happy about the new perf numbers, as I now on a SMP > system only can route approx 290 kpps, remember I could route 319 kpps > using a single CPU nosmp kernel. That unfortunately (can be) the cost of SMP :-/ With multi-flow tests, Robert Olsson is getting 4.2 mpps rates with NIU and pktgen. That's what this card is designed for, good multi-flow workload performance, rather than striving for maximum single-flow performance. > (even more anyoing is that oprofile is broken) Yes, people on lkml are trying to figure out what is causing that regression on x86. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jesper Dangaard Brouer <jdb@comx.dk> Date: Thu, 13 Nov 2008 10:10:12 +0100 > On Wed, 2008-11-12 at 04:11 -0800, David Miller wrote: > > From: David Miller <davem@davemloft.net> > > Date: Wed, 12 Nov 2008 03:52:40 -0800 (PST) > > > > > Ok, Jesper, please try two things for me, leave the debugging patch > > > in there for all the tests: > > > > > > 1) Retrigger the problem (with or without MSI, doesn't matter) but > > > add back in that test I asked you to try last week. The one > > > where the "if (++rp->mark_counter == rp->mark_freq)" condition > > > test in niu_start_xmit() is commented out, so that the > > > "mrk |= TX_DESC_MARK;" statement always runs. > > > > > > Get me the log dump produced by that scenerio. > > ------------[ cut here ]------------ > WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x21e/0x230() > NETDEV WATCHDOG: eth2 (niu): transmit timed out > Modules linked in: niu ipmi_si hpwdt serio_raw bnx2 zlib_inflate rng_core ipmi_msghandler hpilo ehci_hcd uhci_hcd sr_mod cdrom > Pid: 0, comm: swapper Not tainted 2.6.28-rc4-davem #17 > Call Trace: Thanks a lot for making this test Jesper, even though the bug is fixed. > niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] CHANNEL 5 LDN 37 > niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] parent->lgd_map[ldn] 2 > niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] Num pending TX SKBs: 237 > niu 0000:0b:00.0: niu: eth2: TX_RING[ 5] TX_CS sw[00ed00ec00000000] hw[00ed00ec00000000] Same signature, counters advancing yet no mark bits are set. Now if we can fix that MSIX BUG() and start analyzing your pps performance with oprofile, we'll be in good shape :) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 13 Nov 2008, David Miller wrote: > From: Jesper Dangaard Brouer <jdb@comx.dk> > Date: Thu, 13 Nov 2008 11:29:31 +0100 > >> Although I'm not happy about the new perf numbers, as I now on a SMP >> system only can route approx 290 kpps, remember I could route 319 kpps >> using a single CPU nosmp kernel. > > That unfortunately (can be) the cost of SMP :-/ [Regression] Well that was not the real cause of the performance loss. Because on kernel 2.6.27 I get really good performance (900-1200kpps) compared to 2.6.28 (git net-2.6). The cause of this problem (tracked down together with Robert Olsson) is that on 2.6.28 I have a lot less IRQs available. It seems max 34 IRQs. Due the reduced number of IRQs the NIU driver cannot get enough IRQs to the interfaces, and starts to use "IO-APIC" based IRQs. On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge". BUT my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with the usb driver... Think thats must be my performance problem on 2.6.28. > With multi-flow tests, Robert Olsson is getting 4.2 mpps rates with > NIU and pktgen. That's what this card is designed for, good > multi-flow workload performance, rather than striving for maximum > single-flow performance. [Packet performance] Yes, I know, I do use pktgen and multi-flows (rand dest IP+port). For the two drivers NIU and Suns NXGE, my packet per sec performance is now, on 2.6.27 (with backported NIU fixes). With NIU driver I can route 900 kpps. With NXGE driver (and enqueue=NULL hack) I can route 1200 kpps. Actually I think I can go higher, because I'm limited by my packet rate generator. I use pktgen (with rand dst IP+port) and can only generate 1200 kpps. (I have actually ordered some new hardware, so I can get a faster pktgen machine and perhaps test it as a router too. Also ordered the hardware because I want to test PCI-express v.2.0. I have a prototype 12-port gigabit NIC (from hotlava systems) that support PCIe v.2.0 and has 6x 82575 chips (4RX/4TX queues)) Hilsen Jesper Brouer -- ------------------------------------------------------------------- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk ------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Jesper Dangaard Brouer <hawk@diku.dk> Date: Wed, 19 Nov 2008 23:58:12 +0100 (CET) > Well that was not the real cause of the performance loss. Because > on kernel 2.6.27 I get really good performance (900-1200kpps) > compared to 2.6.28 (git net-2.6). > > The cause of this problem (tracked down together with Robert Olsson) > is that on 2.6.28 I have a lot less IRQs available. It seems max 34 > IRQs. > > Due the reduced number of IRQs the NIU driver cannot get enough IRQs > to the interfaces, and starts to use "IO-APIC" based IRQs. This is almost certainly related to the driver unload bug. I know you ran into unbuildable/unbootable kernels during a bisect, but you really need to track down this regression. There were a lot of IRQ changes, especially on x86. The sequence is something like: 1) dyn irqs 2) APIC/IO_APIC handling integration 3) by-hand REVERT of dyn irqs, it was done by hand in order to not lose the #2 changes 4) interrupt remapping support -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Thomas Gleixner, I have bisected a regression to your commit 3235e936c0cc3589309280b6f59e5096779adae3, "x86: remove sparse irq from Kconfig". Its actually not necessary your fault, as your commit simply removes the config option HAVE_SPARSE_IRQ. This revels the bug / regression I'm exposted to. Guess I should bisect again to find the exact faulty commit, but I'm rather sick of bisecting at the moment, and though you might have a better idea whats going wrong. I would rather spend my time performance tuning the multiqueue routing code... [The regression]: During my testing of the Sun Neptune based NICs. On kernel 2.6.27 I get really good performance (900-1200kpps) compared to 2.6.28 (davem git net-2.6). The cause of this problem (tracked down together with Robert Olsson) is that on 2.6.28 I have a lot less IRQs available. It seems max 34 IRQs. Due the reduced number of IRQs the NIU driver cannot get enough IRQs to the interfaces, and starts to use "IO-APIC" based IRQs. On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge". BUT my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with the usb driver. That my performance problem on 2.6.28. [Other related bugs]: Is that unloading the "niu" driver will give a kernel BUG during deallocation og MSI interrupts. (See dmesg output below if interested) (I have attached full bisect history) Cheers, Jesper Brouer -- ------------------------------------------------------------------- MSc. Master of Computer Science Dept. of Computer Science, University of Copenhagen Author of http://www.adsl-optimizer.dk ------------------------------------------------------------------- On Wed, 19 Nov 2008, David Miller wrote: > From: Jesper Dangaard Brouer <hawk@diku.dk> > Date: Wed, 19 Nov 2008 23:58:12 +0100 (CET) > >> Well that was not the real cause of the performance loss. Because >> on kernel 2.6.27 I get really good performance (900-1200kpps) >> compared to 2.6.28 (git net-2.6). >> >> The cause of this problem (tracked down together with Robert Olsson) >> is that on 2.6.28 I have a lot less IRQs available. It seems max 34 >> IRQs. >> >> Due the reduced number of IRQs the NIU driver cannot get enough IRQs >> to the interfaces, and starts to use "IO-APIC" based IRQs. > > This is almost certainly related to the driver unload bug. > > I know you ran into unbuildable/unbootable kernels during a bisect, > but you really need to track down this regression. ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:632! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: ehci_hcd bnx2 uhci_hcd zlib_inflate serio_raw hpilo niu(-) Pid: 3036, comm: rmmod Not tainted (2.6.27-bisect #5) ProLiant DL380 G5 EIP: 0060:[<c021ecac>] EFLAGS: 00010286 CPU: 2 EIP is at msi_free_irqs+0xdc/0xe0 EAX: f6b8f860 EBX: 00000030 ECX: f7156ba8 EDX: c0420500 ESI: f7156800 EDI: f7156ba8 EBP: f6431eb4 ESP: f6431ea8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rmmod (pid: 3036, ti=f6430000 task=f70f9b20 task.ti=f6430000) Stack: f7156800 f670c400 f7156800 f6431ebc c021ecb8 f6431ec8 c021ef41 f670c000 f6431edc f809d3f8 f7156800 f80a1ed4 f80a1ed4 f6431ee8 c0219c29 f7156858 f6431ef8 c026b0d4 f7156858 f7156914 f6431f0c c026b197 f80a1ea0 f80a1ed4 Call Trace: [<c021ecb8>] ? msix_free_all_irqs+0x8/0x10 [<c021ef41>] ? pci_disable_msix+0x31/0x40 [<f809d3f8>] ? niu_pci_remove_one+0x88/0x8a [niu] [<c0219c29>] ? pci_device_remove+0x19/0x40 [<c026b0d4>] ? __device_release_driver+0x54/0x80 [<c026b197>] ? driver_detach+0x97/0xa0 [<c026a475>] ? bus_remove_driver+0x75/0xa0 [<c026b609>] ? driver_unregister+0x39/0x40 [<c0219e51>] ? pci_unregister_driver+0x21/0x80 [<f809a0ad>] ? niu_exit+0xd/0x10 [niu] [<c0145d74>] ? sys_delete_module+0x114/0x1d0 [<c016810a>] ? remove_vma+0x3a/0x50 [<c0168c29>] ? do_munmap+0x189/0x1e0 [<c0103229>] ? sysenter_do_call+0x12/0x21 [<c0330000>] ? quirk_disable_msi+0x30/0x50 Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b 14 75 aa 8b 43 1c e8 3d 92 ef ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe 55 89 e5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55 EIP: [<c021ecac>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f6431ea8 ---[ end trace f72de2e283920207 ]--- ~~ -*-text-*- ------------------------------------------------------- Bisecting IRQ change: What change reduced the IRQs ------------------------------------------------------- Jesper Dangaard Brouer (jdb@comx.dk) ------------------------------------------------------- $LastChangedRevision: 786 $ $Date: 2008-11-20 20:44:51 +0100 (Thu, 20 Nov 2008) $ ------------------------------------------------------- git clone ~~~~~~~~~ +--------- cd /var/kernels/git/davem git clone git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.git net-2.6-bisect-irqs +--------- Description / Reason to find ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ During my testing of the Sun Neptune based NICs. On kernel 2.6.27 I get really good performance (900-1200kpps) compared to 2.6.28 (git net-2.6). The cause of this problem (tracked down together with Robert Olsson) is that on 2.6.28 I have a lot less IRQs available. It seems max 34 IRQs. Due the reduced number of IRQs the NIU driver cannot get enough IRQs to the interfaces, and starts to use "IO-APIC" based IRQs. On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge". BUT my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with the usb driver... Think thats must be my performance problem on 2.6.28. Known: Good and bad ~~~~~~~~~~~~~~~~~~~ GOOD: git bisect good v2.6.27 BAD: git bisect bad 92b29b86fe2e183d44eb467e5e74a5f718ef2e43 [92b29b86fe2e183d44eb467e5e74a5f718ef2e43] #Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip HiSTORY: ~~~~~~~~ +-------- cd /var/kernels/git/davem/net-2.6-bisect-irqs/ git bisect start git bisect good v2.6.27 +-------- +-------------- git bisect bad 92b29b86fe2e183d44eb467e5e74a5f718ef2e43 Bisecting: 3220 revisions left to test after this [af5c2bd16ac2e5688c3bf46ea1f95112d696d294] x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2 +-------------- CONFIG_LOCALVERSION="-bisect" +------------- cp ../net-2.6-bisect/.config . script make_oldconfig_01 make oldconfig exit #Script done, file is make_oldconfig_01 +------------- +---------------- time make -j6 bzImage modules # #real 9m22.739s #user 16m56.776s #sys 1m4.672s +---------------- Booted kernel: GOOD: irqs and (niu rmmod good) +---------------- git bisect good Bisecting: 1614 revisions left to test after this [36ac1d2f323f8bf8bc10c25b88f617657720e241] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input +---------------- Compiling: +---------------- time make -j6 bzImage modules +---------------- Booted kernel: GOOD: irqs and (niu rmmod good) +----------- git bisect good Bisecting: 807 revisions left to test after this [1aece34833721d64eb33fc15cd923c727296d3d3] container freezer: rename check_if_frozen() +----------- Compiling... +---------------- time make -j6 bzImage modules #real 10m1.561s #user 17m23.293s #sys 1m5.744s +---------------- Installing... Booted kernel: +---- dcu-router-ng:~# uname -a Linux dcu-router-ng 2.6.27-bisect #3 SMP PREEMPT Thu Nov 20 12:33:02 CET 2008 i686 GNU/Linux +---- Results: GOOD: irqs and (niu rmmod good) +------ git bisect good Bisecting: 403 revisions left to test after this [1d9a8a47d659f053abeca9ece45651b4d94780c8] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse +------ Compiling... +---------------- time make -j6 bzImage modules #real 10m9.371s #user 17m21.781s #sys 1m6.052s +---------------- Installing... Booting ... +------- dcu-router-ng:~# uname -a Linux dcu-router-ng 2.6.27-bisect #4 SMP PREEMPT Thu Nov 20 12:50:39 CET 2008 i686 GNU/Linux +------- Results: GOOD: irqs and (niu rmmod good) +------- git-bisect good Bisecting: 223 revisions left to test after this [dd3a1db900f2a215a7d7dd71b836e149a6cf5fed] genirq: improve include files +------- +---------------- time make -j6 bzImage modules +---------------- Booting ... +-------- Linux dcu-router-ng 2.6.27-bisect #5 SMP PREEMPT Thu Nov 20 13:58:34 CET 2008 i686 GNU/Linux +-------- Results: BAD: irqs and (niu rmmod also BAD) +------- cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 125 0 0 0 IO-APIC-edge timer 1: 0 0 1 1 IO-APIC-edge i8042 3: 2 1 2 2 IO-APIC-edge serial 8: 0 2 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 1 2 1 0 IO-APIC-edge i8042 16: 103 108 108 112 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6, eth0 17: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4, eth3 20: 0 0 0 0 PCI-MSI-edge eth2 21: 0 0 0 0 PCI-MSI-edge eth2 22: 24 23 23 23 IO-APIC-fasteoi uhci_hcd:usb5, eth2 23: 0 0 0 0 PCI-MSI-edge eth2 24: 0 0 0 0 PCI-MSI-edge eth2 25: 0 0 0 0 PCI-MSI-edge eth2 26: 0 0 0 0 PCI-MSI-edge eth2 27: 0 0 0 0 PCI-MSI-edge eth2 28: 0 0 0 0 PCI-MSI-edge eth2 29: 0 0 0 0 PCI-MSI-edge eth2 30: 0 0 0 0 PCI-MSI-edge eth2 31: 0 0 0 0 PCI-MSI-edge eth2 32: 0 0 0 0 PCI-MSI-edge eth2 34: 271 268 268 264 PCI-MSI-edge cciss0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 3301 2970 2594 2389 Local timer interrupts RES: 28 560 6 13 Rescheduling interrupts CAL: 50 104 99 62 Function call interrupts TLB: 241 224 287 279 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 +------- OUTPUT "rmmod niu" (gives segfault) and "dmesg" +------- ------------[ cut here ]------------ kernel BUG at drivers/pci/msi.c:632! invalid opcode: 0000 [#1] PREEMPT SMP Modules linked in: ehci_hcd bnx2 uhci_hcd zlib_inflate serio_raw hpilo niu(-) Pid: 3036, comm: rmmod Not tainted (2.6.27-bisect #5) ProLiant DL380 G5 EIP: 0060:[<c021ecac>] EFLAGS: 00010286 CPU: 2 EIP is at msi_free_irqs+0xdc/0xe0 EAX: f6b8f860 EBX: 00000030 ECX: f7156ba8 EDX: c0420500 ESI: f7156800 EDI: f7156ba8 EBP: f6431eb4 ESP: f6431ea8 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process rmmod (pid: 3036, ti=f6430000 task=f70f9b20 task.ti=f6430000) Stack: f7156800 f670c400 f7156800 f6431ebc c021ecb8 f6431ec8 c021ef41 f670c000 f6431edc f809d3f8 f7156800 f80a1ed4 f80a1ed4 f6431ee8 c0219c29 f7156858 f6431ef8 c026b0d4 f7156858 f7156914 f6431f0c c026b197 f80a1ea0 f80a1ed4 Call Trace: [<c021ecb8>] ? msix_free_all_irqs+0x8/0x10 [<c021ef41>] ? pci_disable_msix+0x31/0x40 [<f809d3f8>] ? niu_pci_remove_one+0x88/0x8a [niu] [<c0219c29>] ? pci_device_remove+0x19/0x40 [<c026b0d4>] ? __device_release_driver+0x54/0x80 [<c026b197>] ? driver_detach+0x97/0xa0 [<c026a475>] ? bus_remove_driver+0x75/0xa0 [<c026b609>] ? driver_unregister+0x39/0x40 [<c0219e51>] ? pci_unregister_driver+0x21/0x80 [<f809a0ad>] ? niu_exit+0xd/0x10 [niu] [<c0145d74>] ? sys_delete_module+0x114/0x1d0 [<c016810a>] ? remove_vma+0x3a/0x50 [<c0168c29>] ? do_munmap+0x189/0x1e0 [<c0103229>] ? sysenter_do_call+0x12/0x21 [<c0330000>] ? quirk_disable_msi+0x30/0x50 Code: b7 43 08 8b 53 1c c1 e0 04 01 d0 ba 01 00 00 00 83 c0 0c 89 10 3b 7b 14 75 aa 8b 43 1c e8 3d 92 ef ff eb a0 5b 31 c0 5e 5f 5d c3 <0f> 0b eb fe 55 89 e5 e8 18 ff ff ff 5d c3 8d b6 00 00 00 00 55 EIP: [<c021ecac>] msi_free_irqs+0xdc/0xe0 SS:ESP 0068:f6431ea8 ---[ end trace f72de2e283920207 ]--- +------- +------ git-bisect bad Bisecting: 89 revisions left to test after this [db4b5525caafd846ec20f95afbc6403c792e22cf] x86: apic_64.c - setup_APIC_timer has to be __cpuinit function +------ Related config change? (make oldconfig) +------ script make_oldconfig_02 make oldconfig Script done, file is make_oldconfig_02 +------ +------ Support sparse irq numbering (HAVE_SPARSE_IRQ) [Y/n/?] (NEW) ? ?Y This enables support for sparse irq, esp for msi/msi-x. the irq number will be bus/dev/fn + 12bit. You may need if you have lots of cards supports msi-x installed. If you don't know what to do here, say Y. +------ Compiling... +---------------- time make -j6 bzImage modules # #real 9m29.556s #user 17m10.396s #sys 1m5.056s +---------------- Booting ... +------- Linux dcu-router-ng 2.6.27-bisect #6 SMP PREEMPT Thu Nov 20 14:25:40 CET 2008 i686 GNU/Linux +------- The output from /proc/interrupts changed, very weird! BUT eth3 does use a "PCI-MSI-edge" interrupt. Guess this is a GOOD state even though it looks weird. Unloading NIU driver also GOOD. +--------- cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0x0: 124 1 0 0 IO-APIC-edge timer 0x1: 1 0 0 1 IO-APIC-edge i8042 0x3: 2 1 2 2 IO-APIC-edge serial 0x8: 1 0 0 1 IO-APIC-edge rtc 0x9: 0 0 0 0 IO-APIC-fasteoi acpi 0xc: 0 1 2 1 IO-APIC-edge i8042 0x10: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6 0x11: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 0x12: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 0x6000fe: 288 289 290 290 PCI-MSI-edge cciss0 0x16: 23 24 23 23 IO-APIC-fasteoi uhci_hcd:usb5 0xb00100: 0 0 0 0 PCI-MSI-edge eth2 0xb000ff: 0 0 0 0 PCI-MSI-edge eth2 0xb000fe: 0 0 0 0 PCI-MSI-edge eth2 0xb000fd: 0 0 0 0 PCI-MSI-edge eth2 0xb000fc: 0 0 0 0 PCI-MSI-edge eth2 0xb000fb: 0 0 0 0 PCI-MSI-edge eth2 0xb000fa: 0 0 0 0 PCI-MSI-edge eth2 0xb000f9: 0 0 0 0 PCI-MSI-edge eth2 0xb000f8: 0 0 0 0 PCI-MSI-edge eth2 0xb000f7: 0 0 0 0 PCI-MSI-edge eth2 0xb000f6: 0 0 0 0 PCI-MSI-edge eth2 0xb000f5: 0 0 0 0 PCI-MSI-edge eth2 0xb000f4: 0 0 0 0 PCI-MSI-edge eth2 0xb01100: 0 0 0 0 PCI-MSI-edge eth3 0xb010ff: 0 0 0 0 PCI-MSI-edge eth3 0xb010fe: 0 0 0 0 PCI-MSI-edge eth3 0xb010fd: 0 0 0 0 PCI-MSI-edge eth3 0xb010fc: 0 0 0 0 PCI-MSI-edge eth3 0xb010fb: 0 0 0 0 PCI-MSI-edge eth3 0xb010fa: 0 0 0 0 PCI-MSI-edge eth3 0xb010f9: 0 0 0 0 PCI-MSI-edge eth3 0xb010f8: 0 0 0 0 PCI-MSI-edge eth3 0xb010f7: 0 0 0 0 PCI-MSI-edge eth3 0xb010f6: 0 0 0 0 PCI-MSI-edge eth3 0x13: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 0x300100: 210 210 210 208 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 3630 3265 3103 2711 Local timer interrupts RES: 34 226 12 417 Rescheduling interrupts CAL: 89 55 90 78 Function call interrupts TLB: 253 205 311 267 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 +--------- Guess it a GOOD situation... +------ git-bisect good Bisecting: 44 revisions left to test after this [ba374c9baef910fbc5373541d98c50f15e82c3f8] x86: fix HPET compiler error when not using CONFIG_PCI_MSI +------ Compiling ... +-------- time make -j6 bzImage modules #real 9m28.062s #user 17m7.492s #sys 1m4.248s +-------- Installing ... Booting ... +------ Linux dcu-router-ng 2.6.27-bisect #7 SMP PREEMPT Thu Nov 20 14:52:45 CET 2008 i686 GNU/Linux +------ Still looks GOOD (/proc/interrupts still looks weird). And rmmod NIU driver GOOD. +------ git-bisect good Bisecting: 22 revisions left to test after this [922402f15a85f7a064926eb1db68cc52bc4d4a91] x86: Add UV partition call v4 +------ Compiling ... +-------- time make -j6 bzImage modules #real 0m34.622s #user 0m41.139s #sys 0m5.812s +-------- Install ... Booting ... +----- Linux dcu-router-ng 2.6.27-bisect #8 SMP PREEMPT Thu Nov 20 15:04:11 CET 2008 i686 GNU/Linux +----- Looks GOOD, and /proc/interrupts changed again! Now the interrupts are not i HEX anymore, but in decimal, but still strange/large numbers for MSI. Unloading NIU driver GOOD. +------ cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 124 0 0 0 IO-APIC-edge timer 1: 0 0 1 1 IO-APIC-edge i8042 3: 2 2 1 2 IO-APIC-edge serial 8: 0 0 1 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 1 2 1 0 IO-APIC-edge i8042 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6 17: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 6291710: 828 821 828 823 PCI-MSI-edge cciss0 22: 23 24 24 22 IO-APIC-fasteoi uhci_hcd:usb5 11534592: 0 0 0 0 PCI-MSI-edge eth2 11534591: 0 0 0 0 PCI-MSI-edge eth2 11534590: 0 0 0 0 PCI-MSI-edge eth2 11534589: 0 0 0 0 PCI-MSI-edge eth2 11534588: 0 0 0 0 PCI-MSI-edge eth2 11534587: 0 0 0 0 PCI-MSI-edge eth2 11534586: 0 0 0 0 PCI-MSI-edge eth2 11534585: 0 0 0 0 PCI-MSI-edge eth2 11534584: 0 0 0 0 PCI-MSI-edge eth2 11534583: 0 0 0 0 PCI-MSI-edge eth2 11534582: 0 0 0 0 PCI-MSI-edge eth2 11534581: 0 0 0 0 PCI-MSI-edge eth2 11534580: 0 0 0 0 PCI-MSI-edge eth2 11538688: 0 0 0 0 PCI-MSI-edge eth3 11538687: 0 0 0 0 PCI-MSI-edge eth3 11538686: 0 0 0 0 PCI-MSI-edge eth3 11538685: 0 0 0 0 PCI-MSI-edge eth3 11538684: 0 0 0 0 PCI-MSI-edge eth3 11538683: 0 0 0 0 PCI-MSI-edge eth3 11538682: 0 0 0 0 PCI-MSI-edge eth3 11538681: 0 0 0 0 PCI-MSI-edge eth3 11538680: 0 0 0 0 PCI-MSI-edge eth3 11538679: 0 0 0 0 PCI-MSI-edge eth3 11538678: 0 0 0 0 PCI-MSI-edge eth3 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 3145984: 9993 9994 9987 9993 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 10075 11448 8001 8787 Local timer interrupts RES: 297 17 349 26 Rescheduling interrupts CAL: 173 189 95 173 Function call interrupts TLB: 299 259 330 345 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 +------ +------- git-bisect good Bisecting: 11 revisions left to test after this [a1aca5de08a0cb840a90fb3f729a5940f8d21185] genirq: remove artifacts from sparseirq removal +------- Compiling +-------- time make -j6 bzImage modules #real 9m30.767s #user 17m10.808s #sys 1m6.388s +-------- Installing ... Booting ... +----- Linux dcu-router-ng 2.6.27-bisect #9 SMP PREEMPT Thu Nov 20 15:28:17 CET 2008 i686 GNU/Linux +----- BAD kernel version, max IRQ is 34. And eth3 got assigned a IO-APIC-fasteoi shared with uhci_hcd:usb2. Also BAD unloading of NIU driver. BUG is some where in between: git log 922402f15a85f7a064926eb1db68cc52bc4d4a91..a1aca5de08a0cb840a90fb3f729a5940f8d21185 | grep ^commit | wc -l 11 commits +------- git-bisect bad Bisecting: 5 revisions left to test after this [3235e936c0cc3589309280b6f59e5096779adae3] x86: remove sparse irq from Kconfig +------- Compiling... +-------- time make -j6 bzImage modules +-------- Install ... Booting +------ Linux dcu-router-ng 2.6.27-bisect #10 SMP PREEMPT Thu Nov 20 15:56:10 CET 2008 i686 GNU/Linux +------ BAD kernel. BAD rmmod NIU driver. +--------- git bisect bad Bisecting: 2 revisions left to test after this [4c66a73f0796dacc2ff0d4af75794ec843ceb3d1] x86: sparse_irq: fix typo in debug print out +--------- Compiling... +------ time make -j6 bzImage modules #real 7m23.814s #user 12m15.718s #sys 0m42.183s +------ Config change prompting: +----- Support sparse irq numbering (HAVE_SPARSE_IRQ) [Y/n/?] (NEW) Y +----- Installing ... Booting ... +------- Linux dcu-router-ng 2.6.27-bisect #11 SMP PREEMPT Thu Nov 20 16:19:29 CET 2008 i686 GNU/Linux +------- GOOD!!! +-------- cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 124 0 0 0 IO-APIC-edge timer 1: 0 0 1 1 IO-APIC-edge i8042 3: 2 2 2 2 IO-APIC-edge serial 8: 1 0 0 1 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 1 2 1 0 IO-APIC-edge i8042 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6 17: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 6291710: 285 288 293 287 PCI-MSI-edge cciss0 11534592: 0 0 0 0 PCI-MSI-edge eth2 11534591: 0 0 0 0 PCI-MSI-edge eth2 11534590: 0 0 0 0 PCI-MSI-edge eth2 11534589: 0 0 0 0 PCI-MSI-edge eth2 11534588: 0 0 0 0 PCI-MSI-edge eth2 11534587: 0 0 0 0 PCI-MSI-edge eth2 11534586: 0 0 0 0 PCI-MSI-edge eth2 11534585: 0 0 0 0 PCI-MSI-edge eth2 11534584: 0 0 0 0 PCI-MSI-edge eth2 11534583: 0 0 0 0 PCI-MSI-edge eth2 11534582: 0 0 0 0 PCI-MSI-edge eth2 11534581: 0 0 0 0 PCI-MSI-edge eth2 11534580: 0 0 0 0 PCI-MSI-edge eth2 22: 23 24 23 23 IO-APIC-fasteoi uhci_hcd:usb5 11538688: 0 0 0 0 PCI-MSI-edge eth3 11538687: 0 0 0 0 PCI-MSI-edge eth3 11538686: 0 0 0 0 PCI-MSI-edge eth3 11538685: 0 0 0 0 PCI-MSI-edge eth3 11538684: 0 0 0 0 PCI-MSI-edge eth3 11538683: 0 0 0 0 PCI-MSI-edge eth3 11538682: 0 0 0 0 PCI-MSI-edge eth3 11538681: 0 0 0 0 PCI-MSI-edge eth3 11538680: 0 0 0 0 PCI-MSI-edge eth3 11538679: 0 0 0 0 PCI-MSI-edge eth3 11538678: 0 0 0 0 PCI-MSI-edge eth3 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 3145984: 244 242 238 241 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 3715 3104 2853 2542 Local timer interrupts RES: 88 52 280 258 Rescheduling interrupts CAL: 76 75 93 59 Function call interrupts TLB: 245 241 312 283 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 +--------- +--------- git-bisect good Bisecting: 1 revisions left to test after this [7ef0c30dbf96a8d9a234e90c248eb19df3c031be] genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n +---------- Compiling ... +------ time make -j6 bzImage modules +------ Install... Boot ... +------- Linux dcu-router-ng 2.6.27-bisect #12 SMP PREEMPT Thu Nov 20 16:33:11 CET 2008 i686 GNU/Linux +------- +-------- cat /proc/interrupts CPU0 CPU1 CPU2 CPU3 0: 124 0 0 0 IO-APIC-edge timer 1: 0 0 1 1 IO-APIC-edge i8042 3: 1 2 2 2 IO-APIC-edge serial 8: 2 0 0 0 IO-APIC-edge rtc 9: 0 0 0 0 IO-APIC-fasteoi acpi 12: 1 2 1 0 IO-APIC-edge i8042 16: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb1, ehci_hcd:usb6 17: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb2 18: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb3 6291710: 268 269 268 267 PCI-MSI-edge cciss0 11534592: 0 0 0 0 PCI-MSI-edge eth2 11534591: 0 0 0 0 PCI-MSI-edge eth2 11534590: 0 0 0 0 PCI-MSI-edge eth2 11534589: 0 0 0 0 PCI-MSI-edge eth2 11534588: 0 0 0 0 PCI-MSI-edge eth2 11534587: 0 0 0 0 PCI-MSI-edge eth2 11534586: 0 0 0 0 PCI-MSI-edge eth2 11534585: 0 0 0 0 PCI-MSI-edge eth2 11534584: 0 0 0 0 PCI-MSI-edge eth2 11534583: 0 0 0 0 PCI-MSI-edge eth2 11534582: 0 0 0 0 PCI-MSI-edge eth2 11534581: 0 0 0 0 PCI-MSI-edge eth2 11534580: 0 0 0 0 PCI-MSI-edge eth2 11538688: 0 0 0 0 PCI-MSI-edge eth3 11538687: 0 0 0 0 PCI-MSI-edge eth3 11538686: 0 0 0 0 PCI-MSI-edge eth3 11538685: 0 0 0 0 PCI-MSI-edge eth3 11538684: 0 0 0 0 PCI-MSI-edge eth3 11538683: 0 0 0 0 PCI-MSI-edge eth3 11538682: 0 0 0 0 PCI-MSI-edge eth3 11538681: 0 0 0 0 PCI-MSI-edge eth3 11538680: 0 0 0 0 PCI-MSI-edge eth3 11538679: 0 0 0 0 PCI-MSI-edge eth3 11538678: 0 0 0 0 PCI-MSI-edge eth3 19: 0 0 0 0 IO-APIC-fasteoi uhci_hcd:usb4 22: 25 23 24 25 IO-APIC-fasteoi uhci_hcd:usb5 3145984: 175 174 176 178 PCI-MSI-edge eth0 NMI: 0 0 0 0 Non-maskable interrupts LOC: 3508 2902 2765 2489 Local timer interrupts RES: 238 35 461 6 Rescheduling interrupts CAL: 61 90 59 81 Function call interrupts TLB: 257 220 299 300 TLB shootdowns TRM: 0 0 0 0 Thermal event interrupts SPU: 0 0 0 0 Spurious interrupts ERR: 0 MIS: 0 +-------- GOOD. +---------- git-bisect good 3235e936c0cc3589309280b6f59e5096779adae3 is first bad commit commit 3235e936c0cc3589309280b6f59e5096779adae3 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Oct 15 13:16:00 2008 +0200 x86: remove sparse irq from Kconfig This code is not ready yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> :040000 040000 6043e32465556e828de0fbb6aa497b277239af01 2dd75ba207990d83a3a4c7b7b16abccfe2d5e10d M arch +-------- Found bad commit: 3235e936c0cc3589309280b6f59e5096779adae3 Git bisect LOG ~~~~~~~~~~~~~~ +------- git-bisect log git-bisect start # good: [3fa8749e584b55f1180411ab1b51117190bac1e5] Linux 2.6.27 git-bisect good 3fa8749e584b55f1180411ab1b51117190bac1e5 # bad: [92b29b86fe2e183d44eb467e5e74a5f718ef2e43] Merge branch 'tracing-v28-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip git-bisect bad 92b29b86fe2e183d44eb467e5e74a5f718ef2e43 # good: [af5c2bd16ac2e5688c3bf46ea1f95112d696d294] x86: fix virt_addr_valid() with CONFIG_DEBUG_VIRTUAL=y, v2 git-bisect good af5c2bd16ac2e5688c3bf46ea1f95112d696d294 # good: [36ac1d2f323f8bf8bc10c25b88f617657720e241] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input git-bisect good 36ac1d2f323f8bf8bc10c25b88f617657720e241 # good: [1aece34833721d64eb33fc15cd923c727296d3d3] container freezer: rename check_if_frozen() git-bisect good 1aece34833721d64eb33fc15cd923c727296d3d3 # good: [1d9a8a47d659f053abeca9ece45651b4d94780c8] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse git-bisect good 1d9a8a47d659f053abeca9ece45651b4d94780c8 # bad: [dd3a1db900f2a215a7d7dd71b836e149a6cf5fed] genirq: improve include files git-bisect bad dd3a1db900f2a215a7d7dd71b836e149a6cf5fed # good: [db4b5525caafd846ec20f95afbc6403c792e22cf] x86: apic_64.c - setup_APIC_timer has to be __cpuinit function git-bisect good db4b5525caafd846ec20f95afbc6403c792e22cf # good: [ba374c9baef910fbc5373541d98c50f15e82c3f8] x86: fix HPET compiler error when not using CONFIG_PCI_MSI git-bisect good ba374c9baef910fbc5373541d98c50f15e82c3f8 # good: [922402f15a85f7a064926eb1db68cc52bc4d4a91] x86: Add UV partition call v4 git-bisect good 922402f15a85f7a064926eb1db68cc52bc4d4a91 # bad: [a1aca5de08a0cb840a90fb3f729a5940f8d21185] genirq: remove artifacts from sparseirq removal git-bisect bad a1aca5de08a0cb840a90fb3f729a5940f8d21185 # bad: [3235e936c0cc3589309280b6f59e5096779adae3] x86: remove sparse irq from Kconfig git-bisect bad 3235e936c0cc3589309280b6f59e5096779adae3 # good: [4c66a73f0796dacc2ff0d4af75794ec843ceb3d1] x86: sparse_irq: fix typo in debug print out git-bisect good 4c66a73f0796dacc2ff0d4af75794ec843ceb3d1 # good: [7ef0c30dbf96a8d9a234e90c248eb19df3c031be] genirq: define nr_irqs for architectures with GENERIC_HARDIRQS=n git-bisect good 7ef0c30dbf96a8d9a234e90c248eb19df3c031be +------- Email ~~~~~ To: Thomas Gleixner <tglx@linutronix.de> David Miller <davem@davemloft.net>, Jesper Dangaard Brouer <jdb@comx.dk>, netdev <netdev@vger.kernel.org>, linux-kernel@vger.kernel.org, Robert Olsson <Robert.Olsson@data.slu.se> Subj.: Regression: Bisected, IRQ and MSI allocations screwed without sparse irq Hi Thomas Gleixner, I have bisected a regression to your commit 3235e936c0cc3589309280b6f59e5096779adae3, "x86: remove sparse irq from Kconfig". Its actually not necessary your fault, as your commit simply removes the config option HAVE_SPARSE_IRQ. This revels the bug / regression I'm exposted to. Guess I should bisect again to find the exact faulty commit, but I'm rather sick of bisecting at the moment, and though you might have a better idea whats going wrong. I would rather spend my time performance tuning the multiqueue routing code... [The regression]: During my testing of the Sun Neptune based NICs. On kernel 2.6.27 I get really good performance (900-1200kpps) compared to 2.6.28 (davem git net-2.6). The cause of this problem (tracked down together with Robert Olsson) is that on 2.6.28 I have a lot less IRQs available. It seems max 34 IRQs. Due the reduced number of IRQs the NIU driver cannot get enough IRQs to the interfaces, and starts to use "IO-APIC" based IRQs. On kernel 2.6.28: My eth2 is using 10 IRQs all "PCI-MSI-edge". BUT my eth3 is using a single IRQ using "IO-APIC-fasteoi" and shared with the usb driver. That my performance problem on 2.6.28. [Other related bugs]: Is that unloading the "niu" driver will give a kernel BUG.
diff --git a/drivers/net/niu.c b/drivers/net/niu.c index 9acb5d7..d8463b1 100644 --- a/drivers/net/niu.c +++ b/drivers/net/niu.c @@ -51,8 +51,7 @@ MODULE_VERSION(DRV_MODULE_VERSION); #ifndef readq static u64 readq(void __iomem *reg) { - return (((u64)readl(reg + 0x4UL) << 32) | - (u64)readl(reg)); + return ((u64) readl(reg)) | (((u64) readl(reg + 4UL)) << 32); } static void writeq(u64 val, void __iomem *reg)