Message ID | 20090727153548.7c0d9f85@nehalam |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Stephen Hemminger wrote:
> Does this help?
Trying right now, will report results as soon as I have them.
Rene
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpup84ACgkQq7SPDcPCS96z8wCfZmWQMt1f5DHdOtsI1oCouqGU
dXwAoMXHAXKJNmZaWLiM6WjoIxEQWNlg
=+qLh
-----END PGP SIGNATURE-----
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen Hemminger wrote: > Does this help? > > --- a/drivers/net/sky2.c 2009-07-27 15:28:27.653757064 -0700 > +++ b/drivers/net/sky2.c 2009-07-27 15:34:24.358730966 -0700 > @@ -2763,6 +2763,11 @@ static int sky2_poll(struct napi_struct > int work_done = 0; > u16 idx; > > + if (unlikely(status == ~0)) { > + dev_info(&hw->pdev->dev, "device status error\n"); > + goto clear_napi; > + } > + > if (unlikely(status & Y2_IS_ERROR)) > sky2_err_intr(hw, status); > > @@ -2779,6 +2784,7 @@ static int sky2_poll(struct napi_struct > goto done; > } > > +clear_napi: > napi_complete(napi); > sky2_read32(hw, B0_Y2_SP_LISR); > done: With this applied, the behaviour is certainly different: On first networking restart, the kernel still continues to run although there are some phy errors. On the second restart, there is no Oops, but a BUG. [~]# /etc/init.d/networking restart Reconfiguring network interfaces...Removed VLAN -:quara.6:- [ 269.681295] sky2 0000:03:00.0: dmz: phy I/O error [ 269.686079] sky2 0000:03:00.0: dmz: phy I/O error [ 269.691000] sky2 0000:03:00.0: dmz: phy I/O error [ 269.695880] sky2 0000:03:00.0: dmz: phy I/O error [ 269.700751] sky2 0000:03:00.0: dmz: phy I/O error [ 269.705613] sky2 0000:03:00.0: dmz: phy I/O error [ 269.710519] sky2 0000:03:00.0: dmz: phy I/O error [ 269.715420] sky2 0000:03:00.0: dmz: phy I/O error [ 269.720290] sky2 0000:03:00.0: dmz: phy I/O error [ 269.725203] sky2 0000:03:00.0: dmz: phy I/O error Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config Added VLAN with VID == 6 to IF -:testnet:- [~]# /etc/init.d/networking restart Reconfiguring network interfaces...[ 298.296616] ICMPv6 NA: someone advertises our address on lan! [ 299.360719] lan: hw csum failure. [ 299.364169] Pid: 11563, comm: sh Not tainted 2.6.28.10 #3 [ 299.369763] Call Trace: [ 299.372317] [<c09eb603>] __skb_checksum_complete_head+0x3e/0x4f [ 299.378536] [<f827f1f7>] udp_error+0x124/0x198 [nf_conntrack] [ 299.384571] [<f827f0d3>] udp_error+0x0/0x198 [nf_conntrack] [ 299.390429] [<f827b7bd>] nf_conntrack_in+0x117/0x72a [nf_conntrack] [ 299.397138] [<c08718e1>] handle_mm_fault+0x54a/0xbfa [ 299.402464] [<c0a04da4>] nf_iterate+0x30/0x61 [ 299.407148] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 299.412108] [<c0a04f17>] nf_hook_slow+0x49/0xbd [ 299.416893] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 299.421760] [<c0a0a5a9>] ip_rcv+0x1d6/0x20e [ 299.426185] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 299.431067] [<c09ef01d>] netif_receive_skb+0x3f7/0x435 [ 299.436479] [<f8085143>] sky2_poll+0x844/0xc21 [sky2] [ 299.441780] [<f817f52e>] au_reval_and_lock_fdi+0x7e/0x540 [aufs] [ 299.448088] [<f817f3aa>] au_reopen_nondir+0x0/0x106 [aufs] [ 299.453825] [<c09eda45>] net_rx_action+0xb8/0x1f6 [ 299.458886] [<c082f954>] __do_softirq+0x95/0x142 [ 299.463850] [<c082fa49>] do_softirq+0x48/0x57 [ 299.468547] [<c082fbc9>] irq_exit+0x3b/0x78 [ 299.473053] [<c0806642>] do_IRQ+0x7a/0x8c [ 299.477306] [<c0804e23>] common_interrupt+0x23/0x30 [ 299.482543] [<c089f0f1>] fsstack_copy_inode_size+0x1d/0x3f [ 299.488314] [<f817b3a4>] au_cpup_attr_timesizes+0x4e/0x58 [aufs] [ 299.494615] [<f8180b06>] aufs_flush+0x93/0xc9 [aufs] [ 299.499837] [<c0884a24>] filp_close+0x2e/0x53 [ 299.504559] [<c0884ab4>] sys_close+0x6b/0xa4 [ 299.509178] [<c0803cb2>] syscall_call+0x7/0xb [ 299.513893] [<c0a60000>] rwsem_down_failed_common+0xa4/0x175 [ 299.621662] ------------[ cut here ]------------ [ 299.625591] kernel BUG at drivers/net/sky2.c:1781! [ 299.625591] invalid opcode: 0000 [#1] PREEMPT SMP [ 299.625591] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed [ 299.625591] Modules linked in: xt_multiport cpufreq_userspace xt_DSCP xt_length xt_mark xt_dscp xt_MARK xt_CONNMARK xt_comment xt_policy ipt_REDIRECT ip6t_LOG xt_tcpudp ip6table_mangle iptable_mangle ip6table_filter ip6_tables sit tunnel4 8021q garp stp llc ipt_LOG xt_limit xt_state iptable_nat iptable_filter ip_tables x_tables dm_mod p4_clockmod speedstep_lib freq_table tun imq nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 evdev parport_pc parport pcspkr serio_raw i2c_i801 i2c_core iTCO_wdt rng_core intel_agp agpgart squashfs sqlzma unlzma loop aufs exportfs nls_utf8 nls_cp437 ide_generic sd_mod ide_gd_mod ata_generic pata_acpi ata_piix piix ide_pci_generic ide_core skge sky2 thermal_sys [ 299.625591] [ 299.625591] Pid: 11626, comm: ip Not tainted (2.6.28.10 #3) [ 299.625591] EIP: 0060:[<f80836a5>] EFLAGS: 00010256 CPU: 0 [ 299.625591] EIP is at sky2_down+0x84/0x5c3 [sky2] [ 299.625591] EAX: f8010000 EBX: 00000280 ECX: 000006b4 EDX: f7060000 [ 299.625591] ESI: 00000000 EDI: f684e980 EBP: 00000001 ESP: f5e35b78 [ 299.625591] DS: 0068 ES: 0068 FS: 00d8 GS: 0033 SS: 0068 [ 299.625591] Process ip (pid: 11626, ti=f5e34000 task=f4a7e140 task.ti=f5e34000) [ 299.625591] Stack: [ 299.625591] f7060000 00000303 00000004 f7060500 00000004 c09fc8c5 00000000 f7060290 [ 299.625591] f7060000 00001002 00001003 00000001 c09f00bd f7060000 c09efe15 00000000 [ 299.625591] ffffffef 00000000 00000000 f5e35ce4 c09f66b8 f5e35c28 f5e8f410 f7060000 [ 299.625591] Call Trace: [ 299.625591] [<c09fc8c5>] dev_deactivate+0x116/0x13b [ 299.625591] [<c09f00bd>] dev_close+0x5f/0x7b [ 299.625591] [<c09efe15>] dev_change_flags+0x9e/0x14f [ 299.625591] [<c09f66b8>] do_setlink+0x28b/0x349 [ 299.625591] [<f80438b2>] skge_get_stats+0x2f/0x7b [skge] [ 299.625591] [<c09f77a2>] rtnl_newlink+0x292/0x3f7 [ 299.625591] [<c09f756a>] rtnl_newlink+0x5a/0x3f7 [ 299.625591] [<c09f75aa>] rtnl_newlink+0x9a/0x3f7 [ 299.625591] [<c0862c3d>] find_get_page+0x87/0xaa [ 299.625591] [<c09f7510>] rtnl_newlink+0x0/0x3f7 [ 299.625591] [<c09f74f6>] rtnetlink_rcv_msg+0x188/0x1a2 [ 299.625591] [<c09f736e>] rtnetlink_rcv_msg+0x0/0x1a2 [ 299.625591] [<c0a0390e>] netlink_rcv_skb+0x2d/0x73 [ 299.625591] [<c09f7368>] rtnetlink_rcv+0x19/0x1f [ 299.625591] [<c0a03477>] netlink_unicast+0x1c7/0x229 [ 299.625591] [<c0a03729>] netlink_sendmsg+0x250/0x25d [ 299.625591] [<c09e497a>] sock_sendmsg+0xc7/0xe1 [ 299.625591] [<c083b5fc>] autoremove_wake_function+0x0/0x2d [ 299.625591] [<c083b5fc>] autoremove_wake_function+0x0/0x2d [ 299.625591] [<c083b5fc>] autoremove_wake_function+0x0/0x2d [ 299.625591] [<c09eb21b>] verify_iovec+0x3e/0x6b [ 299.625591] [<c09e4b18>] sys_sendmsg+0x184/0x1e3 [ 299.625591] [<c09e574f>] sys_recvmsg+0x147/0x1e5 [ 299.625591] [<c09e5413>] sys_sendto+0xf9/0x124 [ 299.625591] [<c0a0298c>] netlink_insert+0xd2/0xef [ 299.625591] [<c08757bb>] vma_merge+0x1d7/0x3ac [ 299.625591] [<c09e5dd8>] sys_socketcall+0x177/0x1a9 [ 299.625591] [<c0875f3b>] sys_brk+0xd1/0xd9 [ 299.625591] [<c0803cb2>] syscall_call+0x7/0xb [ 299.625591] [<c0a60000>] rwsem_down_failed_common+0xa4/0x175 [ 299.625591] Code: b5 00 66 08 f8 b8 00 02 00 00 8d 8b 34 04 00 00 89 ca 03 17 89 02 8b 14 24 8b 82 00 05 00 00 8b 00 83 c0 04 66 8b 00 66 40 75 04 <0f> 0b eb fe 89 c8 03 07 8b 00 ba 05 00 00 00 8d 83 28 08 00 00 [ 299.625591] EIP: [<f80836a5>] sky2_down+0x84/0x5c3 [sky2] SS:ESP 0068:f5e35b78 [ 299.969972] sky2 0000:03:00.0: device status error Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel[ 299.982399] ---[ end trace c27f023d76d6060f ]--- :[ 299.621662] ------------[ cut here ]------------ Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] invalid opcode: 0000 [#1] PREEMPT SMP Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] Process ip (pid: 11626, ti=f5e34000 task=f4a7e140 task.ti=f5e34000) Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] Stack: Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] f7060000 00001002 00001003 00000001 c09f00bd f7060000 c09efe15 00000000 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] ffffffef 00000000 00000000 f5e35ce4 c09f66b8 f5e35c28 f5e8f410 f7060000 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] Call Trace: Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09fc8c5>] dev_deactivate+0x116/0x13b Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f00bd>] dev_close+0x5f/0x7b Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09efe15>] dev_change_flags+0x9e/0x14f Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.6[ 300.125238] lan: hw csum failure. 25591] [<c09f66[ 300.128944] Pid: 0, comm: swapper Tainted: G D 2.6.28.10 #3 b8>] do_setlink+[ 300.136882] Call Trace: 0x28b/0x349 [ 300.140827] [<c09eb603>] __skb_checksum_complete_head+0x3e/0x4f Message from s[ 300.148422] [<f827f1f7>] udp_error+0x124/0x198 [nf_conntrack] yslogd@gibraltar[ 300.155814] [<f827f0d3>] udp_error+0x0/0x198 [nf_conntrack] 3-esys-master at[ 300.163067] [<f827b7bd>] nf_conntrack_in+0x117/0x72a [nf_conntrack] Jul 28 11:46:45[ 300.170988] [<c08620a9>] cpupri_set+0xcf/0xeb ... kernel:[[ 300.177061] [<c08232cf>] enqueue_task_rt+0xfd/0x1ba 299.625591] [[ 300.183686] [<c081f9e3>] enqueue_task+0x52/0x5d <f80438b2>] skge[ 300.189847] [<c0a604e8>] _spin_unlock_irqrestore+0x22/0x39 _get_stats+0x2f/[ 300.196991] [<c0827013>] try_to_wake_up+0x158/0x162 0x7b [skge] [ 300.203501] [<c0a04da4>] nf_iterate+0x30/0x61 Message from s[ 300.209498] [<c0a0a124>] ip_rcv_finish+0x0/0x2af yslogd@gibraltar[ 300.215661] [<c0a04f17>] nf_hook_slow+0x49/0xbd 3-esys-master at[ 300.221842] [<c0a0a124>] ip_rcv_finish+0x0/0x2af Jul 28 11:46:45[ 300.228078] [<c0a0a5a9>] ip_rcv+0x1d6/0x20e ... kernel:[[ 300.233875] [<c0a0a124>] ip_rcv_finish+0x0/0x2af 299.625591] [[ 300.240132] [<c09ef01d>] netif_receive_skb+0x3f7/0x435 <c09f77a2>] rtnl[ 300.246911] [<f8085143>] sky2_poll+0x844/0xc21 [sky2] _newlink+0x292/0[ 300.253594] [<c09eda45>] net_rx_action+0xb8/0x1f6 x3f7 Messa[ 300.259939] [<c082f954>] __do_softirq+0x95/0x142 ge from syslogd@[ 300.266166] [<c082fa49>] do_softirq+0x48/0x57 gibraltar3-esys-[ 300.272137] [<c082fbc9>] irq_exit+0x3b/0x78 master at Jul 28[ 300.277945] [<c0806642>] do_IRQ+0x7a/0x8c 11:46:45 ... [ 300.283559] [<c0804e23>] common_interrupt+0x23/0x30 kernel:[ 299.6[ 300.290055] [<c080a2c6>] mwait_idle+0x2f/0x3b 25591] [<c09f75[ 300.296043] [<c0802ac9>] cpu_idle+0x7a/0xad 6a>] rtnl_newlink+0x5a/0x3f7 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f75aa>] rtnl_newlink+0x9a/0x3f7 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c0862c3d>] find_get_page+0x87/0xaa Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f7510>] rtnl_newlink+0x0/0x3f7 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f74f6>] rtnetlink_rcv_msg+0x188/0x1a2 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f736e>] rtnetlink_rcv_msg+0x0/0x1a2 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c0a0390e>] netlink_rcv_skb+0x2d/0x73 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09f7368>] rtnetlink_rcv+0x19/0x1f Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c0a03477>] netlink_unicast+0x1c7/0x229 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c0a03729>] netlink_sendmsg+0x250/0x25d Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09e497a>] sock_sendmsg+0xc7/0xe1 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c083b5fc>] autoremove_wake_function+0x0/0x2d Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:last message repeated 2 times Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09eb21b>] verify_iovec+0x3e/0x6b Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09e4b18>] sys_sendmsg+0x184/0x1e3 Message from syslogd@gibraltar3-esys-master at Jul 28 11:46:45 ... kernel:[ 299.625591] [<c09e574f>] sys_recvmsg+0x Message from/etc/network/if-down.d/60address: line 25: 11626 Segmentation fault ip link set dev $DEV down run-parts: /etc/network/if-down.d/60address exited with return code 139 [ 300.968019] sky2 0000:03:00.0: device status error ^C [~]# [ 302.000024] sky2 0000:03:00.0: device status error [ 302.746708] lan: hw csum failure. [ 302.750282] Pid: 0, comm: swapper Tainted: G D 2.6.28.10 #3 [ 302.757023] Call Trace: [ 302.759602] [<c09eb603>] __skb_checksum_complete_head+0x3e/0x4f [ 302.765863] [<f827f1f7>] udp_error+0x124/0x198 [nf_conntrack] [ 302.771934] [<f827f0d3>] udp_error+0x0/0x198 [nf_conntrack] [ 302.777799] [<f827b7bd>] nf_conntrack_in+0x117/0x72a [nf_conntrack] [ 302.784380] [<c08210dd>] __wake_up_sync+0x2a/0x3e [ 302.789356] [<c0a604e8>] _spin_unlock_irqrestore+0x22/0x39 [ 302.795129] [<c09e6ca5>] sock_def_readable+0x32/0x5b [ 302.800379] [<c0a6050d>] _read_unlock+0xe/0x21 [ 302.805080] [<c09e8064>] sock_queue_rcv_skb+0xb5/0xbd [ 302.810421] [<c0a25e9f>] __udp_queue_rcv_skb+0x12/0x86 [ 302.815849] [<c0a04da4>] nf_iterate+0x30/0x61 [ 302.820442] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 302.825356] [<c0a04f17>] nf_hook_slow+0x49/0xbd [ 302.830416] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 302.835510] [<c0a0a5a9>] ip_rcv+0x1d6/0x20e [ 302.840117] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 302.845167] [<c09ef01d>] netif_receive_skb+0x3f7/0x435 [ 302.850791] [<f8085143>] sky2_poll+0x844/0xc21 [sky2] [ 302.856337] [<c0811a18>] lapic_next_event+0x10/0x13 [ 302.861672] [<c09eda45>] net_rx_action+0xb8/0x1f6 [ 302.866654] [<c082f954>] __do_softirq+0x95/0x142 [ 302.871590] [<c082fa49>] do_softirq+0x48/0x57 [ 302.876335] [<c082fbc9>] irq_exit+0x3b/0x78 [ 302.880861] [<c0806642>] do_IRQ+0x7a/0x8c [ 302.885184] [<c0804e23>] common_interrupt+0x23/0x30 [ 302.890422] [<c080a2c6>] mwait_idle+0x2f/0x3b [ 302.896051] [<c0802ac9>] cpu_idle+0x7a/0xad [ 303.000027] sky2 0000:03:00.0: device status error [ 303.058758] lan: hw csum failure. [ 303.062367] Pid: 0, comm: swapper Tainted: G D 2.6.28.10 #3 [ 303.068861] Call Trace: [ 303.071406] [<c09eb603>] __skb_checksum_complete_head+0x3e/0x4f [ 303.077789] [<f827f1f7>] udp_error+0x124/0x198 [nf_conntrack] [ 303.083871] [<c0811a18>] lapic_next_event+0x10/0x13 [ 303.089068] [<f827f0d3>] udp_error+0x0/0x198 [nf_conntrack] [ 303.094929] [<f827b7bd>] nf_conntrack_in+0x117/0x72a [nf_conntrack] [ 303.101514] [<c0844533>] tick_program_event+0x2c/0x32 [ 303.106851] [<c083e621>] hrtimer_interrupt+0x146/0x16e [ 303.112313] [<c082fb99>] irq_exit+0xb/0x78 [ 303.116669] [<c081218f>] smp_apic_timer_interrupt+0x75/0x7f [ 303.122585] [<c0a04da4>] nf_iterate+0x30/0x61 [ 303.127184] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 303.132067] [<c0a04f17>] nf_hook_slow+0x49/0xbd [ 303.136911] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 303.141785] [<c0a0a5a9>] ip_rcv+0x1d6/0x20e [ 303.146239] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 303.151112] [<c09ef01d>] netif_receive_skb+0x3f7/0x435 [ 303.156552] [<f8085143>] sky2_poll+0x844/0xc21 [sky2] [ 303.161923] [<c08417f0>] getnstimeofday+0x4f/0xd5 [ 303.166879] [<c09eda45>] net_rx_action+0xb8/0x1f6 [ 303.171889] [<c082f954>] __do_softirq+0x95/0x142 [ 303.176758] [<c082fa49>] do_softirq+0x48/0x57 [ 303.181403] [<c082fbc9>] irq_exit+0x3b/0x78 [ 303.185850] [<c0806642>] do_IRQ+0x7a/0x8c [ 303.190128] [<c0804e23>] common_interrupt+0x23/0x30 [ 303.195241] [<c080a2c6>] mwait_idle+0x2f/0x3b [ 303.199879] [<c0802ac9>] cpu_idle+0x7a/0xad [ 304.000039] sky2 0000:03:00.0: device status error [ 304.150934] lan: hw csum failure. [ 304.154546] Pid: 0, comm: swapper Tainted: G D 2.6.28.10 #3 [ 304.161288] Call Trace: [ 304.163876] [<c09eb603>] __skb_checksum_complete_head+0x3e/0x4f [ 304.170102] [<f827f1f7>] udp_error+0x124/0x198 [nf_conntrack] [ 304.176113] [<c0811a18>] lapic_next_event+0x10/0x13 [ 304.181260] [<f827f0d3>] udp_error+0x0/0x198 [nf_conntrack] [ 304.187188] [<f827b7bd>] nf_conntrack_in+0x117/0x72a [nf_conntrack] [ 304.193794] [<c0844533>] tick_program_event+0x2c/0x32 [ 304.199092] [<c083e621>] hrtimer_interrupt+0x146/0x16e [ 304.204545] [<c082fb99>] irq_exit+0xb/0x78 [ 304.208887] [<c081218f>] smp_apic_timer_interrupt+0x75/0x7f [ 304.214774] [<c0a04da4>] nf_iterate+0x30/0x61 [ 304.219390] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 304.224251] [<c0a04f17>] nf_hook_slow+0x49/0xbd [ 304.229045] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 304.233922] [<c0a0a5a9>] ip_rcv+0x1d6/0x20e [ 304.238348] [<c0a0a124>] ip_rcv_finish+0x0/0x2af [ 304.243199] [<c09ef01d>] netif_receive_skb+0x3f7/0x435 [ 304.248634] [<f8085143>] sky2_poll+0x844/0xc21 [sky2] [ 304.253961] [<c0811a18>] lapic_next_event+0x10/0x13 [ 304.259076] [<c09eda45>] net_rx_action+0xb8/0x1f6 [ 304.264063] [<c082f954>] __do_softirq+0x95/0x142 [ 304.268988] [<c082fa49>] do_softirq+0x48/0x57 [ 304.273595] [<c082fbc9>] irq_exit+0x3b/0x78 [ 304.278053] [<c0806642>] do_IRQ+0x7a/0x8c [ 304.282326] [<c0804e23>] common_interrupt+0x23/0x30 [ 304.287467] [<c080a2c6>] mwait_idle+0x2f/0x3b [ 304.292095] [<c0802ac9>] cpu_idle+0x7a/0xad [ 305.000039] sky2 0000:03:00.0: device status error [ 306.000049] sky2 0000:03:00.0: device status error [ 307.000058] sky2 0000:03:00.0: device status error [ 308.000090] sky2 0000:03:00.0: device status error [ 309.000145] sky2 0000:03:00.0: device status error -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkpuyU0ACgkQq7SPDcPCS95M+gCg/Sc8kO3FmHrQTdIqeKIzq1XI gEwAoJez4jZCId+81exvRH6jF4Lzj922 =ryPP -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I have now tried again with the newest stable kernel (2.6.30.4), without PaX and squashfs-lzma support. Still the same problem: [~]# uname -a Linux gibraltar3-esys-master 2.6.30.4 #9 SMP PREEMPT Fri Jul 31 15:32:55 UTC 2009 i686 GNU/Linux [~]# /etc/init.d/networking restart Reconfiguring network interfaces...[ 277.816049] sky2 0000:01:00.0: error interrupt status=0xffffffff [ 277.822124] sky2 0000:01:00.0: PCI hardware error (0xffff) [ 277.827656] sky2 0000:01:00.0: PCI Express error (0xffffffff) [ 277.833449] sky2 wan: ram data read parity error [ 277.838107] sky2 wan: ram data write parity error [ 277.842852] sky2 wan: MAC parity error [ 277.846643] sky2 wan: RX parity error [ 277.850345] sky2 wan: TCP segmentation error [ 277.854688] BUG: unable to handle kernel NULL pointer dereference at 0000038d [ 277.858653] IP: [<f8050ca5>] sky2_mac_intr+0x30/0xc1 [sky2] [ 277.858653] *pde = 00000000 [ 277.858653] Oops: 0000 [#1] PREEMPT SMP [ 277.858653] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed [ 277.858653] Modules linked in: xt_multiport cpufreq_userspace xt_DSCP xt_length xt_mark xt_dscp xt_MARK xt_CONNMARK xt_comment xt_policy ipt_REDIRECT ip6t_LOG xt_tcpudp ip6table_mangle iptable_mangle ip6table_filter ip6_tables sit tunnel4 8021q garp stp llc ipt_LOG xt_limit xt_state iptable_nat iptable_filter ip_tables x_tables dm_mod p4_clockmod speedstep_lib freq_table tun imq nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 evdev parport_pc parport serio_raw i2c_i801 pcspkr i2c_core iTCO_wdt rng_core intel_agp loop aufs exportfs nls_utf8 nls_cp437 ide_generic sd_mod ide_gd_mod ata_generic pata_acpi skge ata_piix piix ide_pci_generic ide_core sky2 thermal_sys [ 277.858653] [ 277.858653] Pid: 9423, comm: tlsmgr Not tainted (2.6.30.4 #9) [ 277.858653] EIP: 0060:[<f8050ca5>] EFLAGS: 00010286 CPU: 0 [ 277.858653] EIP is at sky2_mac_intr+0x30/0xc1 [sky2] [ 277.858653] EAX: f8068f88 EBX: 00000001 ECX: 00000008 EDX: 000000ff [ 277.858653] ESI: 00000000 EDI: f6901b80 EBP: f6acfce4 ESP: f6acfccc [ 277.858653] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 277.858653] Process tlsmgr (pid: 9423, ti=f6ace000 task=f7176e70 task.ti=f6ace000) [ 277.858653] Stack: [ 277.858653] 00000080 ff901b80 968c5f08 f71ed840 ffffffff ffffffff f6acfd6c f80542d8 [ 277.858653] 00000000 c181d260 00000040 f6901b88 f6acfd08 c04ee2b5 f6901b80 ffffffff [ 277.858653] c022ded2 f71ef000 00000000 00000000 0000000f c181d260 00000000 00000246 [ 277.858653] Call Trace: [ 277.858653] [<f80542d8>] ? sky2_poll+0x1d2/0xb1e [sky2] [ 277.858653] [<c04ee2b5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 277.858653] [<c022ded2>] ? try_to_wake_up+0x291/0x2ac [ 277.858653] [<c022df62>] ? wake_up_process+0x1b/0x2e [ 277.858653] [<c04772f4>] ? __qdisc_run+0x73/0x1ca [ 277.858653] [<c0463cc2>] ? net_rx_action+0x9e/0x1a2 [ 277.858653] [<c0237b5e>] ? __do_softirq+0xb2/0x188 [ 277.858653] [<c0237c73>] ? do_softirq+0x3f/0x5c [ 277.858653] [<c0237dfd>] ? irq_exit+0x37/0x80 [ 277.858653] [<c0213cfd>] ? smp_apic_timer_interrupt+0x7c/0x9b [ 277.858653] [<c02037dd>] ? apic_timer_interrupt+0x31/0x38 [ 277.858653] [<c0371524>] ? radix_tree_lookup_slot+0x34/0x79 [ 277.858653] [<c0284852>] ? find_get_page+0x34/0xc6 [ 277.858653] [<c0284c9e>] ? find_lock_page+0x21/0x67 [ 277.858653] [<c0285214>] ? filemap_fault+0x97/0x366 [ 277.858653] [<c0297054>] ? __do_fault+0x56/0x3b0 [ 277.858653] [<c02503a2>] ? getnstimeofday+0x5f/0xf3 [ 277.858653] [<c0252d85>] ? clockevents_program_event+0xe8/0x108 [ 277.858653] [<c0298f33>] ? handle_mm_fault+0x2b9/0x668 [ 277.858653] [<c024b121>] ? hrtimer_interrupt+0x13e/0x15f [ 277.858653] [<c021d3f6>] ? do_page_fault+0x1fb/0x21b [ 277.858653] [<c021d1fb>] ? do_page_fault+0x0/0x21b [ 277.858653] [<c04ee72a>] ? error_code+0x7a/0x80 [ 277.858653] Code: c7 56 53 89 d3 83 ec 0c 65 a1 14 00 00 00 89 45 f0 31 c0 8b 74 97 3c c1 e2 07 89 d0 05 08 0f 00 00 89 55 e8 03 07 8a 10 88 55 ef <f6> 86 8d 03 00 00 02 74 12 0f b6 c2 50 56 68 30 64 05 f8 e8 74 [ 277.858653] EIP: [<f8050ca5>] sky2_mac_intr+0x30/0xc1 [sky2] SS:ESP 0068:f6acfccc [ 277.858653] CR2: 000000000000038d [ 278.173200] ---[ end trace bec12ce036036cbf ]--- [ 278.177861] Kernel panic - not syncing: Fatal exception in interrupt [ 278.184259] Pid: 9423, comm: tlsmgr Tainted: G D 2.6.30.4 #9 [ 278.190654] Call Trace: [ 278.193140] [<c04eb04e>] ? printk+0x1d/0x30 [ 278.197452] [<c04eaf8c>] panic+0x53/0xf8 [ 278.201506] [<c0206368>] oops_end+0x9f/0xbf [ 278.205817] [<c021ceb4>] no_context+0x11a/0x135 [ 278.210480] [<c021d005>] __bad_area_nosemaphore+0x136/0x14f [ 278.216177] [<c0374e68>] ? vsnprintf+0x91/0x332 [ 278.220840] [<c04ee2b5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 278.226622] [<c04ee2b5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 278.232404] [<c0232f3f>] ? release_console_sem+0x18b/0x1c9 [ 278.238015] [<c021d03b>] bad_area_nosemaphore+0x1d/0x34 [ 278.243370] [<c021d30b>] do_page_fault+0x110/0x21b [ 278.248287] [<c021d1fb>] ? do_page_fault+0x0/0x21b [ 278.253209] [<c04ee72a>] error_code+0x7a/0x80 [ 278.257693] [<c037007b>] ? kobject_uevent_env+0x42/0x387 [ 278.263141] [<f8050ca5>] ? sky2_mac_intr+0x30/0xc1 [sky2] [ 278.268673] [<f80542d8>] sky2_poll+0x1d2/0xb1e [sky2] [ 278.273850] [<c04ee2b5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 278.279632] [<c022ded2>] ? try_to_wake_up+0x291/0x2ac [ 278.284818] [<c022df62>] ? wake_up_process+0x1b/0x2e [ 278.289914] [<c04772f4>] ? __qdisc_run+0x73/0x1ca [ 278.294750] [<c0463cc2>] net_rx_action+0x9e/0x1a2 [ 278.299578] [<c0237b5e>] __do_softirq+0xb2/0x188 [ 278.304321] [<c0237c73>] do_softirq+0x3f/0x5c [ 278.308801] [<c0237dfd>] irq_exit+0x37/0x80 [ 278.313111] [<c0213cfd>] smp_apic_timer_interrupt+0x7c/0x9b [ 278.318807] [<c02037dd>] apic_timer_interrupt+0x31/0x38 [ 278.324165] [<c0371524>] ? radix_tree_lookup_slot+0x34/0x79 [ 278.329869] [<c0284852>] find_get_page+0x34/0xc6 [ 278.334619] [<c0284c9e>] find_lock_page+0x21/0x67 [ 278.339447] [<c0285214>] filemap_fault+0x97/0x366 [ 278.344276] [<c0297054>] __do_fault+0x56/0x3b0 [ 278.348842] [<c02503a2>] ? getnstimeofday+0x5f/0xf3 [ 278.353847] [<c0252d85>] ? clockevents_program_event+0xe8/0x108 [ 278.359899] [<c0298f33>] handle_mm_fault+0x2b9/0x668 [ 278.364997] [<c024b121>] ? hrtimer_interrupt+0x13e/0x15f [ 278.370445] [<c021d3f6>] do_page_fault+0x1fb/0x21b [ 278.375364] [<c021d1fb>] ? do_page_fault+0x0/0x21b [ 278.380287] [<c04ee72a>] error_code+0x7a/0x80 [ 278.384779] Rebooting in 30 seconds.. To allow easier debugging, I have now put our whole kernel tree up in a public (read-only) git repository at https://www.gibraltar.at/git/linux-2.6-gibraltar.git. The branch for this kernel is origin/gibraltar-3.0, although the above dump was produced by a version slightly "older" then HEAD, which did not yet have the latest PaX patch applied (no PaX and no lzma-squashfs in this kernel). Any hints/pointers/patches/etc. would be highly appreciated. best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp20DYACgkQq7SPDcPCS96R3QCdGTJsPiJGLfiWUZk67f6wms9Y rVgAoPMO2hnT3jwRtY0Qz40NRp0DpKxT =8NsP -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rene Mayrhofer wrote: > I have now tried again with the newest stable kernel (2.6.30.4), without > PaX and squashfs-lzma support. Still the same problem: Sorry for replying to myself, but I tried a few more things to do with MSI: Neither find /sys -name "msi_bus" | while read f; do echo 0 > $f; done nor booting with pci=nomsi changed anything. The oops still happens when setting the last sky2 interface down. > To allow easier debugging, I have now put our whole kernel tree up in a > public (read-only) git repository at > https://www.gibraltar.at/git/linux-2.6-gibraltar.git. The branch for > this kernel is origin/gibraltar-3.0, although the above dump was > produced by a version slightly "older" then HEAD, which did not yet have > the latest PaX patch applied (no PaX and no lzma-squashfs in this kernel). I have now updated the branch with both patches (the one from Stephen and the other one Mike). Still trying if it changes anything with 2.6.30.4 (they didn't help with 2.6.28.10, though...). best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEARECAAYFAkp3Kh0ACgkQq7SPDcPCS97QHgCgwdpi7RBPZNV1Of85/8qg5DsE DWoAnjlT8U5wqN9ywxUyUpLyivH/Ex1h =DCdB -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rene Mayrhofer wrote: >> To allow easier debugging, I have now put our whole kernel tree up in a >> public (read-only) git repository at >> https://www.gibraltar.at/git/linux-2.6-gibraltar.git. The branch for >> this kernel is origin/gibraltar-3.0, although the above dump was >> produced by a version slightly "older" then HEAD, which did not yet have >> the latest PaX patch applied (no PaX and no lzma-squashfs in this kernel). > I have now updated the branch with both patches (the one from Stephen > and the other one Mike). Still trying if it changes anything with > 2.6.30.4 (they didn't help with 2.6.28.10, though...). Result with both patches: there is no immediate crash when setting all sky2 interfaces down, but I get the following messages repeated roughly every second: 2009-08-04T09:35:31.030812+02:00 gibraltar3-esys-master kernel: [ 592.000071] sky2 0000:01:00.0: device status error 2009-08-04T09:35:32.030908+02:00 gibraltar3-esys-master kernel: [ 593.000058] sky2 0000:01:00.0: device status error 2009-08-04T09:35:33.030839+02:00 gibraltar3-esys-master kernel: [ 594.000082] sky2 0000:01:00.0: device status error 2009-08-04T09:35:34.030864+02:00 gibraltar3-esys-master kernel: [ 595.000118] sky2 0000:01:00.0: device status error 2009-08-04T09:35:35.030975+02:00 gibraltar3-esys-master kernel: [ 596.000259] sky2 0000:01:00.0: device status error 2009-08-04T09:35:36.030974+02:00 gibraltar3-esys-master kernel: [ 597.000198] sky2 0000:01:00.0: device status error 2009-08-04T09:35:37.030980+02:00 gibraltar3-esys-master kernel: [ 598.000203] sky2 0000:01:00.0: device status error and the network interface fails to work (no ping, nothing with tcpdump, etc.). Does anybody have an idea on what might be wrong in sky2_down? best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp35WsACgkQq7SPDcPCS97wHQCcCYWO2qgg+LdW+BFUmeOXjGVT B68AniD3Ur2NugPGhuvz3Fxy68Zl+3f4 =5MhE -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike McCormack wrote: > 2009/8/4 Rene Mayrhofer <rene.mayrhofer@gibraltar.at>: > >> Does anybody have an idea on what might be wrong in sky2_down? > > btw. for 2.6.30, I found I could copy sky2.c from the netdev git into > my 2.6.30 tree if I added the following line at the end of > sky2_xmit_frame() : > > dev->trans_start = jiffies; /* prevent tx timeout */ This seems to be already included in the current netdev git. Nonetheless, the current unmodified version from netdev git solves the oops in sky2. I have not diffed my old vs. this version, but whoever is interested in which change fixed the oops, it should be somewhere in commit 0a1449c in our Gibraltar kernel git repository. Thanks a lot for that hint! Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp4vEQACgkQq7SPDcPCS97BtgCfZy1QTeQOL340hD0HIgTC1c3O Gy0An1u8zdh4wyU4DchLfxNWzqlJExV+ =0+E4 -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rene Mayrhofer wrote: > Nonetheless, the current unmodified version from netdev git solves the > oops in sky2. Actually, it doesn't. I managed to run networking restart twice without an oops (with the netdev git version of sky2.c), but after generating some minor traffic and trying to restart again, I still get this oops: [~]# /etc/init.d/networking restart Reconfiguring network interfaces...[ 844.000236] sky2 0000:01:00.0: error interrupt status=0xffffffff [ 844.007309] sky2 0000:01:00.0: PCI hardware error (0xffff) [ 844.013657] sky2 0000:01:00.0: PCI Express error (0xffffffff) [ 844.020290] sky2 wan: ram data read parity error [ 844.025697] sky2 wan: ram data write parity error [ 844.031148] sky2 wan: MAC parity error [ 844.035522] sky2 wan: RX parity error [ 844.039812] sky2 wan: TCP segmentation error [ 844.044966] BUG: unable to handle kernel NULL pointer dereference at 0000038d [ 844.048782] IP: [<f8050d2d>] sky2_mac_intr+0x30/0xc1 [sky2] [ 844.048782] *pde = 00000000 [ 844.048782] Oops: 0000 [#1] PREEMPT SMP [ 844.048782] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed [ 844.048782] Modules linked in: xt_multiport cpufreq_userspace xt_DSCP xt_length xt_mark xt_dscp xt_MARK xt_CONNMARK xt_comment xt_policy ipt_REDIRECT ip6t_LOG xt_tcpudp ip6table_mangle iptable_mangle ip6table_filter ip6_tables sit tunnel4 8021q garp stp llc ipt_LOG xt_limit xt_state iptable_nat iptable_filter ip_tables x_tables dm_mod p4_clockmod speedstep_lib freq_table tun imq nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 evdev parport_pc parport serio_raw i2c_i801 i2c_core iTCO_wdt rng_core pcspkr intel_agp loop aufs exportfs nls_utf8 nls_cp437 ide_generic sd_mod ide_gd_mod ata_generic pata_acpi ata_piix skge piix ide_pci_generic ide_core sky2 thermal_sys [ 844.048782] [ 844.048782] Pid: 13285, comm: postfix Not tainted (2.6.30.4 #2) [ 844.048782] EIP: 0060:[<f8050d2d>] EFLAGS: 00010286 CPU: 0 [ 844.048782] EIP is at sky2_mac_intr+0x30/0xc1 [sky2] [ 844.048782] EAX: f8068f88 EBX: 00000001 ECX: 00000008 EDX: 000000ff [ 844.048782] ESI: 00000000 EDI: f6901b80 EBP: e1c83e9c ESP: e1c83e84 [ 844.048782] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 844.048782] Process postfix (pid: 13285, ti=e1c82000 task=e1d105b0 task.ti=e1c82000) [ 844.048782] Stack: [ 844.048782] 00000080 ff901b80 eda21a93 f71ed840 ffffffff ffffffff e1c83f28 f8054181 [ 844.048782] c022594e 00000000 00000040 f6901b88 00000003 eda21a93 f6901b80 ffffffff [ 844.048782] c181d7a4 f71ef000 c0243594 00000000 c181d7a0 f702e130 eda21a93 e1c83eec [ 844.048782] Call Trace: [ 844.048782] [<f8054181>] ? sky2_poll+0x1d2/0xb65 [sky2] [ 844.048782] [<c022594e>] ? __wake_up+0x41/0x5c [ 844.048782] [<c0243594>] ? insert_work+0xa5/0xbf [ 844.048782] [<c04ee2a5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 844.048782] [<c0243e4b>] ? __queue_work+0x36/0x4d [ 844.048782] [<c047731c>] ? __qdisc_run+0x73/0x1ca [ 844.048782] [<c0463ce6>] ? net_rx_action+0x9e/0x1a2 [ 844.048782] [<c0237b6e>] ? __do_softirq+0xb2/0x188 [ 844.048782] [<c0237c83>] ? do_softirq+0x3f/0x5c [ 844.048782] [<c0237e0d>] ? irq_exit+0x37/0x80 [ 844.048782] [<c0213cfd>] ? smp_apic_timer_interrupt+0x7c/0x9b [ 844.048782] [<c02037dd>] ? apic_timer_interrupt+0x31/0x38 [ 844.048782] Code: c7 56 53 89 d3 83 ec 0c 65 a1 14 00 00 00 89 45 f0 31 c0 8b 74 97 3c c1 e2 07 89 d0 05 08 0f 00 00 89 55 e8 03 07 8a 10 88 55 ef <f6> 86 8d 03 00 00 02 74 12 0f b6 c2 50 56 68 d0 64 05 f8 e8 df [ 844.048782] EIP: [<f8050d2d>] sky2_mac_intr+0x30/0xc1 [sky2] SS:ESP 0068:e1c83e84 [ 844.048782] CR2: 000000000000038d [ 844.345647] ---[ end trace d7398807329498ac ]--- [ 844.351055] Kernel panic - not syncing: Fatal exception in interrupt [ 844.358606] Pid: 13285, comm: postfix Tainted: G D 2.6.30.4 #2 [ 844.366298] Call Trace: [ 844.369278] [<c04eb041>] ? printk+0x1d/0x30 [ 844.374388] [<c04eaf7f>] panic+0x53/0xf8 [ 844.379197] [<c0206368>] oops_end+0x9f/0xbf [ 844.384303] [<c021ceb4>] no_context+0x11a/0x135 [ 844.389791] [<c021d005>] __bad_area_nosemaphore+0x136/0x14f [ 844.396489] [<c0374f60>] ? vsnprintf+0x91/0x332 [ 844.401994] [<c04ee2a5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 844.408787] [<c04ee2a5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 844.415546] [<c0232f4f>] ? release_console_sem+0x18b/0x1c9 [ 844.422152] [<c021d03b>] bad_area_nosemaphore+0x1d/0x34 [ 844.428464] [<c021d30b>] do_page_fault+0x110/0x21b [ 844.434271] [<c021d1fb>] ? do_page_fault+0x0/0x21b [ 844.440026] [<c04ee71a>] error_code+0x7a/0x80 [ 844.445442] [<c037007b>] ? add_uevent_var+0x17/0xb9 [ 844.451413] [<f8050d2d>] ? sky2_mac_intr+0x30/0xc1 [sky2] [ 844.457981] [<f8054181>] sky2_poll+0x1d2/0xb65 [sky2] [ 844.464050] [<c022594e>] ? __wake_up+0x41/0x5c [ 844.469437] [<c0243594>] ? insert_work+0xa5/0xbf [ 844.475055] [<c04ee2a5>] ? _spin_unlock_irqrestore+0x31/0x44 [ 844.481817] [<c0243e4b>] ? __queue_work+0x36/0x4d [ 844.487516] [<c047731c>] ? __qdisc_run+0x73/0x1ca [ 844.493201] [<c0463ce6>] net_rx_action+0x9e/0x1a2 [ 844.498883] [<c0237b6e>] __do_softirq+0xb2/0x188 [ 844.504446] [<c0237c83>] do_softirq+0x3f/0x5c [ 844.509720] [<c0237e0d>] irq_exit+0x37/0x80 [ 844.514791] [<c0213cfd>] smp_apic_timer_interrupt+0x7c/0x9b [ 844.521488] [<c02037dd>] apic_timer_interrupt+0x31/0x38 [ 844.527811] Rebooting in 30 seconds.. This is with the newest version of sky2 as of today. Is this any indication that traffic is needed to reproduce it? E.g. that a certain number of interrupts must have already been handled to trigger the bug? Again, any hints would be greatly appreciated (and sorry for being persistent about this annoying little bug...). best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp4vV8ACgkQq7SPDcPCS95UvgCfTNzwXKGxXi1SUfrMyLglF5Hf mCkAnRZqfuA5KYkKCz53leWgxHBOLWMo =Shq7 -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 05 Aug 2009 00:59:43 +0200 Rene Mayrhofer <rene@mayrhofer.eu.org> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Rene Mayrhofer wrote: > > Nonetheless, the current unmodified version from netdev git solves the > > oops in sky2. > Actually, it doesn't. I managed to run networking restart twice without > an oops (with the netdev git version of sky2.c), but after generating > some minor traffic and trying to restart again, I still get this oops: > > [~]# /etc/init.d/networking restart > Reconfiguring network interfaces...[ 844.000236] sky2 0000:01:00.0: > error interrupt status=0xffffffff > > [ 844.007309] sky2 0000:01:00.0: PCI hardware error (0xffff) > > [ 844.013657] sky2 0000:01:00.0: PCI Express error (0xffffffff) There is something about the hardware on your system that causes the Marvell chip to not be present on the bus after the steps taken in sky2_down. Is there something unique about how it is wired to the PCI express bus? The sky2 driver has to handle the rare case of dual port board, so in sky2_down in only shuts off part of the chip. Driver turns off the PHY and stops receiver/transmitter. It could be the power control bits on your hardware turn off more than just the PHY. Or perhaps, most systems have a low power input to keep chip alive for Wake On Lan and that isn't present on your system. Maybe an option to not power down phy would be the simplest fix.
Rene Mayrhofer wrote: > Again, any hints would be greatly appreciated (and sorry for being > persistent about this annoying little bug...). Hi Rene, Thanks for being persistent in testing :-) Looks like you've got a fairly unusual piece of hardware, as Stephen indicated. Would you mind adding the phy_lock fix on top of the latest net-2.6 git version of sky2 and testing that? thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike McCormack wrote: >> Again, any hints would be greatly appreciated (and sorry for being >> persistent about this annoying little bug...). > > Thanks for being persistent in testing :-) Looks like you've got a > fairly unusual piece of hardware, as Stephen indicated. Indeed, although I didn't think it _that_ unusual. It's just a 19" rack appliance with 2 expansion slots for 4x LAN ports each. And those are based around sky2. But we have had problems before with kernel 2.4.34/.36 as well with that hardware. They just weren't as easily reproducible but manifested themselves in occasional malfunctions of the network devices that could be solved by an ifdown/ifup cycle. We still have one spare box and will try that one in case the hardware is really flaky (which would be strange, given how reproducible it is right now). > Would you mind adding the phy_lock fix on top of the latest net-2.6 > git version of sky2 and testing that? Tried it, doesn't fix the issue. What would be the simplest change to stop disabling phy when the last device goes down? best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp5d6AACgkQq7SPDcPCS95OuACggTuTHsZd7m6IqHt0mrqUZbju G4wAoPfPGr5G05E6HdO9kcKflGaSx7f5 =78yk -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rene Mayrhofer wrote: > What would be the simplest change to stop disabling phy when the last > device goes down? Commenting out the following line should stop all the phys from powering off: sky2_phy_power_down(hw, port); If you have a chance, please test "sky2: Add a mutex around ethtools operations" also. it probably won't fix the problem you're seeing, but you never know... thanks, Mike -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Mike McCormack wrote: > Rene Mayrhofer wrote: > >> What would be the simplest change to stop disabling phy when the last >> device goes down? > > Commenting out the following line should stop all the phys from powering off: > > sky2_phy_power_down(hw, port); > > If you have a chance, please test "sky2: Add a mutex around ethtools operations" also. > it probably won't fix the problem you're seeing, but you never know... It seems that hardware is faulty, although in a very "interesting" way. We tried changing the "slot" modules with 4 NICs each, which did not change matters. However, another similar hardware appliance works. I am thus not sure which component is at fault here, as (parts of) the NICs were changed. Maybe the interrupt controller is weird on the "faulty" box? ACPI issues? If anybody wants to track this any further, I am still willing to test patches. best regards, Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkp/9moACgkQq7SPDcPCS979XACfRD6e5ixtX3oPiQCpC78nowO4 TH4Anivuo53VZsRO9LAIDIg7zYurW8UI =MwmU -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Rene Mayrhofer wrote: > Mike McCormack wrote: >> Rene Mayrhofer wrote: > >>> What would be the simplest change to stop disabling phy when the last >>> device goes down? >> Commenting out the following line should stop all the phys from powering off: > >> sky2_phy_power_down(hw, port); > >> If you have a chance, please test "sky2: Add a mutex around ethtools operations" also. >> it probably won't fix the problem you're seeing, but you never know... > > It seems that hardware is faulty, although in a very "interesting" way. > We tried changing the "slot" modules with 4 NICs each, which did not > change matters. However, another similar hardware appliance works. Actually, it's not. After producing a bit of traffic, we still see the same issue with the other hardware. It is therefore not likely to be a real hardware fault in the sense that a specific appliances is broken. Even after disabling the sky2_phy_power_down call in sky2_down, I get the oops on restarting the interfaces: [~]# /etc/init.d/networking restart Reconfiguring network interfaces...Removed VLAN -:quara.6:- RTNETLINK answers: Cannot assign requested address run-parts: /etc/network/if-up.d/40address exited with return code 2 SIOCSIFFLAGS: Cannot assign requested address Failed to bring up dmz. Set name-type for VLAN subsystem. Should be visible in /proc/net/vlan/config Added VLAN with VID == 6 to IF -:testnet:- Starting radvd: radvd. done. [~]# [~]# [~]# [~]# /etc/init.d/networking restart Reconfiguring network interfaces...[ 707.000123] sky2 0000:01:00.0: error interrupt status=0xffffffff [ 707.006858] sky2 0000:01:00.0: PCI hardware error (0xffff) [ 707.012977] sky2 0000:01:00.0: PCI Express error (0xffffffff) [ 707.019381] sky2 wan: ram data read parity error [ 707.024531] sky2 wan: ram data write parity error [ 707.029775] sky2 wan: MAC parity error [ 707.033969] sky2 wan: RX parity error [ 707.038060] sky2 wan: TCP segmentation error [ 707.042904] BUG: unable to handle kernel NULL pointer dereference at 0000038d [ 707.046812] IP: [<f8068d2d>] sky2_mac_intr+0x30/0xc1 [sky2] [ 707.046812] *pde = 00000000 [ 707.046812] Oops: 0000 [#1] PREEMPT SMP [ 707.046812] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_setspeed [ 707.046812] Modules linked in: xt_multiport cpufreq_userspace ip6t_REJECT xt_DSCP xt_length xt_mark xt_dscp xt_MARK xt_IMQ xt_CONNMARK xt_comment xt_policy ip6t_LOG xt_tcpudp ip6table_mangle iptable_mangle ip6table_filter ip6_tables sit tunnel4 8021q garp stp llc ipt_LOG xt_limit xt_state iptable_nat iptable_filter ip_tables x_tables dm_mod p4_clockmod speedstep_lib freq_table tun imq nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack_ipv6 nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ipv6 evdev parport_pc parport i2c_i801 button i2c_core iTCO_wdt processor serio_raw rng_core intel_agp pcspkr loop aufs exportfs nls_utf8 nls_cp437 ide_generic sd_mod ata_generic pata_acpi ata_piix ide_pci_generic skge ide_core sky2 thermal fan thermal_sys [ 707.145223] [ 707.145223] Pid: 11650, comm: 60address Not tainted (2.6.30.4 #3) [ 707.145223] EIP: 0060:[<f8068d2d>] EFLAGS: 00010286 CPU: 0 [ 707.145223] EIP is at sky2_mac_intr+0x30/0xc1 [sky2] [ 707.145223] EAX: f8080f88 EBX: 00000001 ECX: 00000008 EDX: 000000ff [ 707.169707] ESI: 00000000 EDI: f68c8e80 EBP: e1983c08 ESP: e1983bf0 [ 707.169707] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 707.169707] Process 60address (pid: 11650, ti=e1982000 task=dc0ce030 task.ti=e1982000) [ 707.195323] Stack: [ 707.195323] 00000080 ff8c8e80 6f11c339 f71cef60 ffffffff ffffffff e1983c94 f806c064 [ 707.195323] c04ee377 6f11c339 00000040 f68c8e88 f70c4bcc 00000000 f68c8e80 ffffffff [ 707.212226] e1983ca4 f71d5800 c0243594 00000000 c06b7134 f707c230 00000001 00000000 [ 707.212226] Call Trace: [ 707.212226] [<f806c064>] ? sky2_poll+0x1d2/0xb66 [sky2] [ 707.232409] [<c04ee377>] ? _spin_unlock+0x29/0x3c [ 707.232409] [<c0243594>] ? insert_work+0xa5/0xbf [ 707.232409] [<c047732c>] ? __qdisc_run+0x73/0x1ca [ 707.245403] [<c0463cf6>] ? net_rx_action+0x9e/0x1a2 [ 707.245403] [<c0237b6e>] ? __do_softirq+0xb2/0x188 [ 707.245403] [<c0237c83>] ? do_softirq+0x3f/0x5c [ 707.245403] [<c0237e0d>] ? irq_exit+0x37/0x80 [ 707.245403] [<c0213cfd>] ? smp_apic_timer_interrupt+0x7c/0x9b [ 707.245403] [<c02037dd>] ? apic_timer_interrupt+0x31/0x38 [ 707.245403] [<c029804c>] ? unmap_vmas+0x1df/0x655 [ 707.245403] [<c028d170>] ? ____pagevec_lru_add+0x10b/0x12a [ 707.245403] [<c029c293>] ? exit_mmap+0xb8/0x158 [ 707.295480] [<c02305e1>] ? mmput+0x2f/0xa5 [ 707.295480] [<c02b43b1>] ? flush_old_exec+0x3a0/0x630 [ 707.295480] [<c02b46da>] ? kernel_read+0x40/0x63 [ 707.295480] [<c02e25e9>] ? load_elf_binary+0x355/0x11e4 [ 707.295480] [<c0299591>] ? __get_user_pages+0x28f/0x310 [ 707.295480] [<c029964a>] ? get_user_pages+0x38/0x50 [ 707.295480] [<c02b3825>] ? get_arg_page+0x38/0x9c [ 707.295480] [<c02b3b80>] ? search_binary_handler+0xed/0x273 [ 707.295480] [<c02e2294>] ? load_elf_binary+0x0/0x11e4 [ 707.345549] [<c02b4ed8>] ? do_execve+0x24d/0x35c [ 707.345549] [<c02016f0>] ? sys_execve+0x34/0x6d [ 707.345549] [<c0202df3>] ? sysenter_do_call+0x12/0x28 [ 707.345549] Code: c7 56 53 89 d3 83 ec 0c 65 a1 14 00 00 00 89 45 f0 31 c0 8b 74 97 3c c1 e2 07 89 d0 05 08 0f 00 00 89 55 e8 03 07 8a 10 88 55 ef <f6> 86 8d 03 00 00 02 74 12 0f b6 c2 50 56 68 b4 e3 06 f8 e8 f3 [ 707.345549] EIP: [<f8068d2d>] sky2_mac_intr+0x30/0xc1 [sky2] SS:ESP 0068:e1983bf0 [ 707.395629] CR2: 000000000000038d [ 707.401711] ---[ end trace 78f2d616187daf45 ]--- [ 707.406932] Kernel panic - not syncing: Fatal exception in interrupt Message from[ 707.414147] Pid: 11650, comm: 60address Tainted: G D 2.6.30.4 #3 syslogd@gibralt[ 707.423018] Call Trace: ar3-esys-master [ 707.427230] [<c04eb055>] ? printk+0x1d/0x30 at Aug 11 10:47:[ 707.433435] [<c04eaf93>] panic+0x53/0xf8 03 ... kernel[ 707.439358] [<c0206368>] oops_end+0x9f/0xbf :[ 707.046812] [ 707.445562] [<c021ceb4>] no_context+0x11a/0x135 Oops: 0000 [#1] [ 707.452146] [<c021d005>] __bad_area_nosemaphore+0x136/0x14f PREEMPT SMP [ 707.459910] [<c0374f70>] ? vsnprintf+0x91/0x332 Message from [ 707.466510] [<c04ee2bd>] ? _spin_unlock_irqrestore+0x31/0x44 syslogd@gibralta[ 707.474345] [<c04ee2bd>] ? _spin_unlock_irqrestore+0x31/0x44 r3-esys-master a[ 707.482190] [<c0232f4f>] ? release_console_sem+0x18b/0x1c9 t Aug 11 10:47:0[ 707.489813] [<c021d03b>] bad_area_nosemaphore+0x1d/0x34 3 ... kernel:[ 707.497163] [<c021d30b>] do_page_fault+0x110/0x21b [ 707.046812] l[ 707.504052] [<c021d1fb>] ? do_page_fault+0x0/0x21b ast sysfs file: [ 707.510906] [<c04ee732>] error_code+0x7a/0x80 /sys/devices/sys[ 707.517321] [<c037007b>] ? add_uevent_var+0x7/0xb9 tem/cpu/cpu0/cpu[ 707.524189] [<f8068d2d>] ? sky2_mac_intr+0x30/0xc1 [sky2] freq/scaling_set[ 707.531735] [<f806c064>] sky2_poll+0x1d2/0xb66 [sky2] speed Mess[ 707.538873] [<c04ee377>] ? _spin_unlock+0x29/0x3c age from syslogd[ 707.545648] [<c0243594>] ? insert_work+0xa5/0xbf @gibraltar3-esys[ 707.552333] [<c047732c>] ? __qdisc_run+0x73/0x1ca - -master at Aug 1[ 707.559115] [<c0463cf6>] net_rx_action+0x9e/0x1a2 [ 707.565893] [<c0237b6e>] __do_softirq+0xb2/0x188 kernel:[ 707.[ 707.572571] [<c0237c83>] do_softirq+0x3f/0x5c 169707] Process [ 707.578968] [<c0237e0d>] irq_exit+0x37/0x80 60address (pid: [ 707.585194] [<c0213cfd>] smp_apic_timer_interrupt+0x7c/0x9b 11650, ti=e19820[ 707.592938] [<c02037dd>] apic_timer_interrupt+0x31/0x38 00 task=dc0ce030[ 707.600296] [<c029804c>] ? unmap_vmas+0x1df/0x655 task.ti=e198200[ 707.607074] [<c028d170>] ? ____pagevec_lru_add+0x10b/0x12a 0) Message[ 707.614707] [<c029c293>] exit_mmap+0xb8/0x158 from syslogd@gi[ 707.621097] [<c02305e1>] mmput+0x2f/0xa5 braltar3-esys-ma[ 707.627024] [<c02b43b1>] flush_old_exec+0x3a0/0x630 ster at Aug 11 1[ 707.633988] [<c02b46da>] ? kernel_read+0x40/0x63 0:47:03 ... k[ 707.640669] [<c02e25e9>] load_elf_binary+0x355/0x11e4 ernel:[ 707.195[ 707.647821] [<c0299591>] ? __get_user_pages+0x28f/0x310 323] Stack: [ 707.655179] [<c029964a>] ? get_user_pages+0x38/0x50 Message from s[ 707.662148] [<c02b3825>] ? get_arg_page+0x38/0x9c yslogd@gibraltar[ 707.668929] [<c02b3b80>] search_binary_handler+0xed/0x273 3-esys-master at[ 707.676471] [<c02e2294>] ? load_elf_binary+0x0/0x11e4 Aug 11 10:47:03[ 707.683677] [<c02b4ed8>] do_execve+0x24d/0x35c ... kernel:[[ 707.690143] [<c02016f0>] sys_execve+0x34/0x6d 707.195323] c[ 707.696519] [<c0202df3>] sysenter_do_call+0x12/0x28 04ee377 6f11c339[ 707.703480] Rebooting in 30 seconds.. Thus, there really seems to be an uncaught case in sky2.c. When sky2_phy_power_down is not called, chip should not go down, right? But still sky2_poll seems to be called (maybe by an interrupt belonging to another network interface but the same chip)? Any other hints? Rene -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkqBMdoACgkQq7SPDcPCS94SugCguCfe45JB+nNi+jE28JynRWtX 2M4Ani/SHmCaslHWy9gf0UT2Egp6Ql1+ =K4Qh -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi everybody, On Tuesday 11 August 2009 10:54:53 am Rene Mayrhofer wrote: > Thus, there really seems to be an uncaught case in sky2.c. When > sky2_phy_power_down is not called, chip should not go down, right? But > still sky2_poll seems to be called (maybe by an interrupt belonging to > another network interface but the same chip)? Is there anything else I could try? We still have this issue, making one range of hardware appliances unusable with 2.6 kernels... best regards, Rene
--- a/drivers/net/sky2.c 2009-07-27 15:28:27.653757064 -0700 +++ b/drivers/net/sky2.c 2009-07-27 15:34:24.358730966 -0700 @@ -2763,6 +2763,11 @@ static int sky2_poll(struct napi_struct int work_done = 0; u16 idx; + if (unlikely(status == ~0)) { + dev_info(&hw->pdev->dev, "device status error\n"); + goto clear_napi; + } + if (unlikely(status & Y2_IS_ERROR)) sky2_err_intr(hw, status); @@ -2779,6 +2784,7 @@ static int sky2_poll(struct napi_struct goto done; } +clear_napi: napi_complete(napi); sky2_read32(hw, B0_Y2_SP_LISR); done: