Message ID | 1282055659.2448.58.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Tue, 17 Aug 2010, Eric Dumazet wrote: > Try following patch to check tg3 receives correct multicast list (its OK > for me, seen on dmesg output) > > [17162.120238] add mc_addr(ha->addr=33:33:00:00:00:01) > [17162.120270] add mc_addr(ha->addr=01:00:5e:00:00:01) > [17162.120298] add mc_addr(ha->addr=33:33:ff:87:96:ce) > [17162.120326] add mc_addr(ha->addr=33:33:ff:5c:00:02) > [17162.120355] filters=80000001 00000000 00400000 40000000 Right after boot: $ dmesg | egrep 'eth0|^add mc|^filters=' tg3 0000:03:04.0: eth0: Tigon3 [partno(N/A) rev 9003] (PCIX:133MHz:64-bit) MAC address 00:24:81:a3:44:24 tg3 0000:03:04.0: eth0: attached PHY is 5714 (10/100/1000Base-T Ethernet) (WireSpeed[1]) tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] tg3 0000:03:04.0: eth0: dma_rwctrl[76148000] dma_mask[40-bit] add mc_addr(ha->addr=33:33:00:00:00:01) filters=80000000 00000000 00000000 00000000 add mc_addr(ha->addr=33:33:00:00:00:01) filters=80000000 00000000 00000000 00000000 add mc_addr(ha->addr=33:33:00:00:00:01) filters=80000000 00000000 00000000 00000000 add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) filters=80000000 00000000 00000000 40000000 ADDRCONF(NETDEV_UP): eth0: link is not ready add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) filters=80000000 00000000 00000000 40000000 add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) filters=80000000 00000000 00000000 40000000 add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) filters=80000001 00000000 00000000 40000000 tg3 0000:03:04.0: eth0: Link is up at 1000 Mbps, full duplex tg3 0000:03:04.0: eth0: Flow control is off for TX and off for RX ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 eth0: no IPv6 routers present [ ifconfig eth0 allmulti (ip l and ifconfig say ALLMULTI is on) ] add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 [ $ sudo ifconfig eth0 -allmulti Warning: Interface eth0 still in ALLMULTI mode. (ip l and ifconfig say ALLMULTI is now off) ] add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 [ ifconfig eth0 allmulti (same effect) ] add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 [ $ sudo ifconfig eth0 -allmulti Warning: Interface eth0 still in ALLMULTI mode. (same effect) ] add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 > But if problem remains even with "ifconfig eth0 allmulti" I suspect a > NIC firmware problem. (allmulti set to 1 all the 128 bits of filters) If you expected more bits set in "filters" with allmulti than without it, that doesn't seem to be the case. Applied your patch to v2.6.35. --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 17, 2010 at 08:58:26AM -0700, Thomas Habets wrote: > On Tue, 17 Aug 2010, Eric Dumazet wrote: > > Try following patch to check tg3 receives correct multicast list (its OK > > for me, seen on dmesg output) > > > > [17162.120238] add mc_addr(ha->addr=33:33:00:00:00:01) > > [17162.120270] add mc_addr(ha->addr=01:00:5e:00:00:01) > > [17162.120298] add mc_addr(ha->addr=33:33:ff:87:96:ce) > > [17162.120326] add mc_addr(ha->addr=33:33:ff:5c:00:02) > > [17162.120355] filters=80000001 00000000 00400000 40000000 > > Right after boot: > > $ dmesg | egrep 'eth0|^add mc|^filters=' > tg3 0000:03:04.0: eth0: Tigon3 [partno(N/A) rev 9003] (PCIX:133MHz:64-bit) > MAC address 00:24:81:a3:44:24 > tg3 0000:03:04.0: eth0: attached PHY is 5714 (10/100/1000Base-T Ethernet) > (WireSpeed[1]) > tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] > tg3 0000:03:04.0: eth0: dma_rwctrl[76148000] dma_mask[40-bit] > add mc_addr(ha->addr=33:33:00:00:00:01) > filters=80000000 00000000 00000000 00000000 > add mc_addr(ha->addr=33:33:00:00:00:01) > filters=80000000 00000000 00000000 00000000 > add mc_addr(ha->addr=33:33:00:00:00:01) > filters=80000000 00000000 00000000 00000000 > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > filters=80000000 00000000 00000000 40000000 > ADDRCONF(NETDEV_UP): eth0: link is not ready > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > filters=80000000 00000000 00000000 40000000 > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > filters=80000000 00000000 00000000 40000000 > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > filters=80000001 00000000 00000000 40000000 > tg3 0000:03:04.0: eth0: Link is up at 1000 Mbps, full duplex > tg3 0000:03:04.0: eth0: Flow control is off for TX and off for RX > ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > eth0: no IPv6 routers present > > [ ifconfig eth0 allmulti > (ip l and ifconfig say ALLMULTI is on) > ] > > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > > [ > $ sudo ifconfig eth0 -allmulti > Warning: Interface eth0 still in ALLMULTI mode. > (ip l and ifconfig say ALLMULTI is now off) > ] > > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > > [ ifconfig eth0 allmulti > (same effect) > ] > > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > > [ > $ sudo ifconfig eth0 -allmulti > Warning: Interface eth0 still in ALLMULTI mode. > (same effect) > ] > > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > > > > But if problem remains even with "ifconfig eth0 allmulti" I suspect a > > NIC firmware problem. (allmulti set to 1 all the 128 bits of filters) I suspect Eric is right. Thomas, can you give me the output of 'ethtool -i eth0'? > If you expected more bits set in "filters" with allmulti than without it, > that doesn't seem to be the case. "allmulti" has the effect of enabling all 128 bits of the multicast hash filters. It doesn't explicitly enable them all though. > Applied your patch to v2.6.35. > > --------- > typedef struct me_s { > char name[] = { "Thomas Habets" }; > char email[] = { "thomas@habets.pp.se" }; > char kernel[] = { "Linux" }; > char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; > char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; > char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; > } me_t; > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mardi 17 août 2010 à 17:58 +0200, Thomas Habets a écrit : > If you expected more bits set in "filters" with allmulti than without it, > that doesn't seem to be the case. Nope, the patch displays mc list and filters bits only if not promiscuous and not allmulti (normal ethernet mode) If promiscuous -> a special PROMISC bit is selected on NIC (no display) If allmulti -> all 128 bits are set (but not displayed in my patch) I wanted to make sure the correct list of mc addrs is handled on your machine. It seems to be the case, so there might be a hardware problem with the multicast rx on this particular NIC -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 Aug 2010, Matt Carlson wrote: >>> But if problem remains even with "ifconfig eth0 allmulti" I suspect a >>> NIC firmware problem. (allmulti set to 1 all the 128 bits of filters) > Thomas, can you give me the output of 'ethtool -i eth0'? $ sudo ethtool -i eth0 driver: tg3 version: 3.110 firmware-version: 5715-v3.28, UMP 1.15 bus-info: 0000:03:04.0 --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 17, 2010 at 10:29:54AM -0700, Thomas Habets wrote: > On Tue, 17 Aug 2010, Matt Carlson wrote: > >>> But if problem remains even with "ifconfig eth0 allmulti" I suspect a > >>> NIC firmware problem. (allmulti set to 1 all the 128 bits of filters) > > Thomas, can you give me the output of 'ethtool -i eth0'? > > $ sudo ethtool -i eth0 > driver: tg3 > version: 3.110 > firmware-version: 5715-v3.28, UMP 1.15 > bus-info: 0000:03:04.0 Thanks. I put the question out to the firmware developer. While we wait, can you keep Eric's patch in place and give me the results along with the output of 'ethtool -d eth0 | grep 0x047' after the problem happens? Eric's patch shows the hash registers at the time they are programmed. I'm interested to see if the values change (by firmware) after the failure. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 Aug 2010, Matt Carlson wrote: > Thanks. I put the question out to the firmware developer. While we > wait, can you keep Eric's patch in place and give me the results along > with the output of 'ethtool -d eth0 | grep 0x047' after the problem > happens? Sure. I think the problem occurs shortly after booting, or is triggered by it Linux getting a neighbor table entry for the router. The reason it took a while for everything to actually stop working is that the router was caching and presumably updating its neighbors cache when it saw traffic. That is, maybe it only works if the router sets up its neigbor table first, and not otherwise. The problem is there now. Last output in the kernel log about this is: $ dmesg | egrep 'eth0|^add mc|^filters=' [...] add mc_addr(ha->addr=33:33:00:00:00:01) add mc_addr(ha->addr=01:00:5e:00:00:01) add mc_addr(ha->addr=33:33:ff:5c:00:02) add mc_addr(ha->addr=33:33:ff:a3:44:24) filters=80020001 00000000 00000000 40000000 $ sudo ethtool -d eth0 | grep 0x047 0x0470 0x80020001 0x0474 0x00000000 0x0478 0x00000000 0x047c 0x40000000 > Eric's patch shows the hash registers at the time they are programmed. > I'm interested to see if the values change (by firmware) after the > failure. Look the same. But a strange thing is that if I delete the ipv6 neighbor on the Linux box (ip ne del 2a00:800:752:1::5c:1 dev eth0) it suddenly answers a ND solicitation. I tried it just now and it "wakes it up". Nothing was written to the kernel log when I ran this command, and the ethtools -d output is the same afterwards as it was before. So unless there's another code path that changes the registers when I do "ip ne del" it may still be something else. --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 17, 2010 at 11:52:27AM -0700, Thomas Habets wrote: > On Tue, 17 Aug 2010, Matt Carlson wrote: > > Thanks. I put the question out to the firmware developer. While we > > wait, can you keep Eric's patch in place and give me the results along > > with the output of 'ethtool -d eth0 | grep 0x047' after the problem > > happens? > > Sure. > > I think the problem occurs shortly after booting, or is triggered by it > Linux getting a neighbor table entry for the router. The reason it took a > while for everything to actually stop working is that the router was > caching and presumably updating its neighbors cache when it saw traffic. > > That is, maybe it only works if the router sets up its neigbor table > first, and not otherwise. > > The problem is there now. Last output in the kernel log about this is: > > $ dmesg | egrep 'eth0|^add mc|^filters=' > [...] > add mc_addr(ha->addr=33:33:00:00:00:01) > add mc_addr(ha->addr=01:00:5e:00:00:01) > add mc_addr(ha->addr=33:33:ff:5c:00:02) > add mc_addr(ha->addr=33:33:ff:a3:44:24) > filters=80020001 00000000 00000000 40000000 > > $ sudo ethtool -d eth0 | grep 0x047 > 0x0470 0x80020001 > 0x0474 0x00000000 > 0x0478 0x00000000 > 0x047c 0x40000000 > > > Eric's patch shows the hash registers at the time they are programmed. > > I'm interested to see if the values change (by firmware) after the > > failure. > > Look the same. > > But a strange thing is that if I delete the ipv6 neighbor on the Linux > box (ip ne del 2a00:800:752:1::5c:1 dev eth0) it suddenly answers a ND > solicitation. I tried it just now and it "wakes it up". > > Nothing was written to the kernel log when I ran this command, and the > ethtools -d output is the same afterwards as it was before. So unless > there's another code path that changes the registers when I do "ip ne > del" it may still be something else. Do you have access to any diagnostic software that might have come with your machine? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 17 Aug 2010, Matt Carlson wrote: > Do you have access to any diagnostic software that might have come with > your machine? I'm don't know what diagnostic software that would be, nor does other people here. So "no", i guess. --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I've continued this a bit off-list but thought I would summarize for the archives. Summary ------- It looks like a firmware issue on the network card. When ILO is enabled it shares the first network card with the OS. When it does this multicast is broken. When multicast (on a L2 level) is broken IPv6 neighbor discovery breaks. Only eth0 breaks, eth1 is unaffected. System ------ HP Proliant DL320 G5p Xeon 3GHz 1GB RAM Arch: amd64 NIC: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) Debian Lenny (5.0.5) Kernels: 2.6.35 mainline, 2.6.33.6 Config: http://pastebin.com/raw.php?i=Y6S8iKW7 Problem ------- Buggy box will not answer IPv6 ND or ping to ff02::1. May work at some point in the boot process, but once box is fully booted it does not. If I on the neighboring Cisco router run "clear ipv6 neighbors" (or it times out) that router cannot re-acquire the neigborship with the buggy box. Instant IPv6 breakage until I do one of: * Turn on promisc mode long enough for IPv6 ND to do its thing * ip ne del <address of neighbor> on the buggy host. Workarounds ----------- Either one of these will hide the problem: * Set promisc mode on interface (ip link set promisc on eth0) forever * Disable ILO * Use eth1 instead of eth0. Troubleshooting --------------- Got patch for kernel from Eric Dumazet (eric.dumazet@gmail.com) to output what MAC addresses are being subscribed to, and some registers from the card. Output is earlier in this thread, along with "ethtool -i eth0" and some other data. Managed to get diagnostic tool[1] booting from stick (no CD drive in server), but did not set up memory (himem.sys etc..). Running b57udiag it therefore failed due to insufficient memory at test "Group D. Driver Associated tests". Card is assumed to be OK anyway. Matt Carlson (mcarlson@broadcom.com) suspected firmware bug and asked me to try disabling ASF and/or IPMI using the diagnostic tool, but running "setasf -d" and "setipmi -d" inside "b57udiag -cmd" did not seem to stick across reboot. It stuck properly before reboot (confirmed with setasf -q). Also tried "b57udiag -u 0". Tried both C-A-D reboot and powercycling (by power cord). At boot Linux still said ASF[1] for eth0 and ASF[0] for eth1: tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] tg3 0000:03:04.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1] (this output never changed throughout the process) ethtool -d eth1 | grep 0x047 did not change either. Then I disabled ILO and PXE in ILO bios and BIOS respectively. That fixed it. eth0 now works with multicast. I don't use ILO on this server so in this case that fixes it for me, but the bug is still there. At this point Matt thinks I should file a bug report with HP. I will attempt to do that. I have more detailed logs of what I did and when, and what the effect was. Related ------- May be the same issue as this: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263260 Which means it's the same with Ubuntu kernels 2.6.26.3, 2.6.26-5-generic and 2.6.27-2-generic, and mainline kernels 2.6.25, 2.6.26 and 2.6.27. [1] http://www.broadcom.com/support/ethernet_nic/netxtreme_server.php --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 01 septembre 2010 à 11:21 +0200, Thomas Habets a écrit : > I've continued this a bit off-list but thought I would summarize for the > archives. > > > Summary > ------- > It looks like a firmware issue on the network card. When ILO is enabled it > shares the first network card with the OS. When it does this multicast > is broken. When multicast (on a L2 level) is broken IPv6 neighbor > discovery breaks. Only eth0 breaks, eth1 is unaffected. > > > System > ------ > HP Proliant DL320 G5p > Xeon 3GHz > 1GB RAM > Arch: amd64 > NIC: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) > Debian Lenny (5.0.5) > Kernels: 2.6.35 mainline, 2.6.33.6 > Config: http://pastebin.com/raw.php?i=Y6S8iKW7 > > > Problem > ------- > Buggy box will not answer IPv6 ND or ping to ff02::1. May work at some > point in the boot process, but once box is fully booted it does not. > > If I on the neighboring Cisco router run "clear ipv6 neighbors" (or it > times out) that router cannot re-acquire the neigborship with the buggy > box. Instant IPv6 breakage until I do one of: > * Turn on promisc mode long enough for IPv6 ND to do its thing > * ip ne del <address of neighbor> on the buggy host. > > > Workarounds > ----------- > Either one of these will hide the problem: > * Set promisc mode on interface (ip link set promisc on eth0) forever > * Disable ILO > * Use eth1 instead of eth0. > > > Troubleshooting > --------------- > Got patch for kernel from Eric Dumazet (eric.dumazet@gmail.com) to output > what MAC addresses are being subscribed to, and some registers from the > card. Output is earlier in this thread, along with "ethtool -i eth0" and > some other data. > > Managed to get diagnostic tool[1] booting from stick (no CD drive in > server), but did not set up memory (himem.sys etc..). Running b57udiag > it therefore failed due to insufficient memory at test "Group D. Driver > Associated tests". Card is assumed to be OK anyway. > > Matt Carlson (mcarlson@broadcom.com) suspected firmware bug and asked me > to try disabling ASF and/or IPMI using the diagnostic tool, but running > "setasf -d" and "setipmi -d" inside "b57udiag -cmd" did not seem to stick > across reboot. It stuck properly before reboot (confirmed with setasf -q). > Also tried "b57udiag -u 0". Tried both C-A-D reboot and powercycling (by > power cord). > > At boot Linux still said ASF[1] for eth0 and ASF[0] for eth1: > tg3 0000:03:04.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1] > tg3 0000:03:04.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[0] TSOcap[1] > (this output never changed throughout the process) > ethtool -d eth1 | grep 0x047 did not change either. > > Then I disabled ILO and PXE in ILO bios and BIOS respectively. That fixed > it. eth0 now works with multicast. > > I don't use ILO on this server so in this case that fixes it for me, but > the bug is still there. > > At this point Matt thinks I should file a bug report with HP. I will > attempt to do that. > > I have more detailed logs of what I did and when, and what the effect was. > > > Related > ------- > May be the same issue as this: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/263260 > Which means it's the same with Ubuntu kernels 2.6.26.3, 2.6.26-5-generic > and 2.6.27-2-generic, and mainline kernels 2.6.25, 2.6.26 and 2.6.27. > > > [1] http://www.broadcom.com/support/ethernet_nic/netxtreme_server.php > Thanks a lot Thomas for this very detailed report ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Thomas, On 09/01/2010 05:21 AM, Thomas Habets wrote: > > I've continued this a bit off-list but thought I would summarize for the > archives. > > > Summary > ------- > It looks like a firmware issue on the network card. When ILO is enabled > it shares the first network card with the OS. When it does this > multicast is broken. When multicast (on a L2 level) is broken IPv6 > neighbor discovery breaks. Only eth0 breaks, eth1 is unaffected. So are you running with this set to "Shared Network Port" mode? I'm guessing you are. > System > ------ > HP Proliant DL320 G5p > Xeon 3GHz > 1GB RAM > Arch: amd64 > NIC: Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet (rev a3) There was another report on netdev back in 11/2008 on this exact hardware, with the same problem. > Problem > ------- > Buggy box will not answer IPv6 ND or ping to ff02::1. May work at some > point in the boot process, but once box is fully booted it does not. I dug-up my notes on the problem, and from what I can tell, the receive multicast filters on the NIC were getting removed, causing both incoming IPv6 and IPv4 multicast packets to get dropped. I'm not sure if there was ever a fix developed, or if we ever came to a conclusion on where the bug was - iLO, tg3, or some other area. -Brian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Sorry for the late reply. I've been swamped. On Wed, 1 Sep 2010, Brian Haley wrote: > So are you running with this set to "Shared Network Port" mode? I'm > guessing you are. Yes, there's no dedicated ILO port. > There was another report on netdev back in 11/2008 on this exact hardware, > with the same problem. I can't seem to find it. Do you happen to have the subject line or something? > I dug-up my notes on the problem, and from what I can tell, the receive > multicast filters on the NIC were getting removed, causing both incoming > IPv6 and IPv4 multicast packets to get dropped. Sounds about right. From what I understand the relevant registers were still the same for me when it wasn't working though (if that indeed is how the filter is implemented). --------- typedef struct me_s { char name[] = { "Thomas Habets" }; char email[] = { "thomas@habets.pp.se" }; char kernel[] = { "Linux" }; char *pgpKey[] = { "http://www.habets.pp.se/pubkey.txt" }; char pgp[] = { "A8A3 D1DD 4AE0 8467 7FDE 0945 286A E90A AD48 E854" }; char coolcmd[] = { "echo '. ./_&. ./_'>_;. ./_" }; } me_t; -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/14/2010 03:56 PM, Thomas Habets wrote: > > Sorry for the late reply. I've been swamped. > > On Wed, 1 Sep 2010, Brian Haley wrote: >> So are you running with this set to "Shared Network Port" mode? I'm >> guessing you are. > > Yes, there's no dedicated ILO port. > >> There was another report on netdev back in 11/2008 on this exact >> hardware, >> with the same problem. > > I can't seem to find it. Do you happen to have the subject line or > something? It was actually a month earlier in 2008, I mis-typed, here's the link: http://marc.info/?l=linux-netdev&m=122280545121251&w=2 >> I dug-up my notes on the problem, and from what I can tell, the receive >> multicast filters on the NIC were getting removed, causing both incoming >> IPv6 and IPv4 multicast packets to get dropped. > > Sounds about right. From what I understand the relevant registers were > still the same for me when it wasn't working though (if that indeed is > how the filter is implemented). One of the outcomes of that investigation was to update the firmware and/or iLO, I'm not sure if either fixed the problem. -Brian -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/tg3.c b/drivers/net/tg3.c index bc3af78..34510f5 100644 --- a/drivers/net/tg3.c +++ b/drivers/net/tg3.c @@ -9317,12 +9317,14 @@ static void __tg3_set_rx_mode(struct net_device *dev) u32 crc; netdev_for_each_mc_addr(ha, dev) { + pr_err("add mc_addr(ha->addr=%pM)\n", ha->addr); crc = calc_crc(ha->addr, ETH_ALEN); bit = ~crc & 0x7f; regidx = (bit & 0x60) >> 5; bit &= 0x1f; mc_filter[regidx] |= (1 << bit); } + pr_err("filters=%08X %08x %08x %08x\n", mc_filter[0], mc_filter[1], mc_filter[2], mc_filter[3]); tw32(MAC_HASH_REG_0, mc_filter[0]); tw32(MAC_HASH_REG_1, mc_filter[1]);