Message ID | 20080930213052.GA6449@ami.dom.local |
---|---|
State | Not Applicable, archived |
Delegated to: | Jeff Garzik |
Headers | show |
Jarek, With the patch #2, there have been only one reboot after the machine was frozen. I did not notice that was happening for there was no kernel panic message but a frozen display only. Then I could only write down partially the values displayed after AX25_DBG: c446c220, c..... 3, 5, 240 Second number was not 000000 as it is most of the time. The machine rebooted after the usual 60 seconds delay and since then there was no kernel panic. This is the first time kernel 2.6.27-rc7 does not panic within a minute. The system is up since about 20 minutes now with frequent AX25_DBG messages. I am sorry that I must go to sleep now to be at work early in the morning. I will be away from my Linux box for a few days. Then we should probably interrupt this interesting bug hunting. I will send more results when I am back. Thank you and best regards, Bernard Jarek Poplawski wrote: > On Tue, Sep 30, 2008 at 10:59:35PM +0200, Bernard Pidoux F6BVP wrote: >> Jarek, >> >> Yes I am using n2kpci/8390 driver. >> The second patch seems to have removed the inconsistent lock state. > > It's fine: this patch is only for this one (simpler) problem. > >> But the kernel panic still occured systematically. >> However I did not catch netconsole messages since the patch prevented >> transmission to remote console via ethernet. >> >> Before the machine rebooted I only noted the following information at >> the bottom of the local console page : >> >> EIP: [<.....>] datagram_poll + 0xe9/0xf0 >> >> Does it help ? > > Probably the beginning of the same oops as before. > > Here is a debugging patch #2, which should give us more details. > Apply after reverting debugging patch #1 (lib8390 patch should stay). > Alas the oops in datagram_poll is still possible. > > Thanks, > Jarek P. > > --- > > net/core/sock.c | 9 +++++++++ > 1 files changed, 9 insertions(+), 0 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 2d358dd..3ad8eaa 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -960,6 +960,15 @@ void sk_free(struct sock *sk) > { > struct sk_filter *filter; > > + if (sk->sk_socket) { > + printk("AX25_DBG: %p, %p, %u, %u, %u, %p\n", sk, sk->sk_socket, > + sk->sk_family, sk->sk_type, sk->sk_protocol, sk->sk_socket->sk); > + if (sk->sk_family == 3 && sk->sk_type == 5 && sk->sk_protocol == 240) { > + WARN_ON_ONCE(1); > + sock_orphan(sk); > + } > + } > + > if (sk->sk_destruct) > sk->sk_destruct(sk); > > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 01, 2008 at 12:49:02AM +0200, Bernard Pidoux F6BVP wrote: > Jarek, > > With the patch #2, there have been only one reboot after the machine was > frozen. I did not notice that was happening for there was no kernel > panic message but a frozen display only. > Then I could only write down partially the values displayed after > AX25_DBG: c446c220, c..... 3, 5, 240 > Second number was not 000000 as it is most of the time. > The machine rebooted after the usual 60 seconds delay and since then > there was no kernel panic. > This is the first time kernel 2.6.27-rc7 does not panic within a minute. > The system is up since about 20 minutes now with frequent AX25_DBG messages. Hmm... Since this was intended mainly for debugging something went wrong. (I expected to get at least one warning.) > > I am sorry that I must go to sleep now to be at work early in the morning. > I will be away from my Linux box for a few days. > Then we should probably interrupt this interesting bug hunting. > I will send more results when I am back. OK, I'll try to look at this in the meantime. Cheers, Jarek P. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Jarek, Finally I am able to access the faulty 2.6.27-rc7 f6bvp-9 system via ssh. I can read /var/log/kernel/message file immediately after a kernel failure and a reboot. When this is done, the system is stable until I start FPAC suite applications (fpad, fpacwpd ...) as shown below. Oct 2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc338, 00000000, 1, 0, 0 Oct 2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc008, 00000000, 1, 1, 0 Oct 2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc338, 00000000, 1, 0, 0 Oct 2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc008, 00000000, 1, 1, 0 Oct 2 16:50:09 f6bvp-9 fpad: starting FPAD Oct 2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:09 f6bvp-9 kernel: AX25_DBG: c15245f0, 00000000, 11, 5, 0 Oct 2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:09 f6bvp-9 fpad: FPAD becomes a daemon Oct 2 16:50:09 f6bvp-9 fpacwpd[5169]: Starting Oct 2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:10 f6bvp-9 last message repeated 6 times Oct 2 16:50:10 f6bvp-9 ax25ipd: assemble_kiss: dumped - control byte non-zero Oct 2 16:50:10 f6bvp-9 kernel: mkiss: ax4: Trying crc-smack Oct 2 16:50:10 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:10 f6bvp-9 last message repeated 7 times Oct 2 16:50:10 f6bvp-9 fpad: FPAD opened WP service Oct 2 16:50:11 f6bvp-9 ax25ipd: assemble_kiss: dumped - control byte non-zero Oct 2 16:50:11 f6bvp-9 kernel: mkiss: ax4: Trying crc-flexnet Oct 2 16:50:11 f6bvp-9 kernel: mkiss: ax0: Trying crc-smack Oct 2 16:50:13 f6bvp-9 kernel: AX25_DBG: c1520630, 00000000, 11, 5, 0 Oct 2 16:50:16 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:50:16 f6bvp-9 kernel: mkiss: ax0: Trying crc-flexnet Oct 2 16:50:18 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:50:21 f6bvp-9 kernel: AX25_DBG: c36090e0, 00000000, 3, 2, 207 Oct 2 16:50:21 f6bvp-9 kernel: mkiss: ax1: Trying crc-smack Oct 2 16:50:22 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:50:25 f6bvp-9 kernel: AX25_DBG: c14662d0, 00000000, 11, 5, 0 Oct 2 16:50:26 f6bvp-9 kernel: mkiss: ax1: Trying crc-flexnet Oct 2 16:50:28 f6bvp-9 kernel: AX25_DBG: c4cd0a68, 00000000, 11, 5, 0 Oct 2 16:50:28 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:50:30 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17 Oct 2 16:50:30 f6bvp-9 last message repeated 2 times Oct 2 16:50:31 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:50:31 f6bvp-9 kernel: AX25_DBG: c146a5d0, 00000000, 11, 5, 0 Oct 2 16:50:31 f6bvp-9 kernel: AX25_DBG: c36090e0, 00000000, 3, 2, 207 Oct 2 16:50:31 f6bvp-9 kernel: mkiss: ax3: Trying crc-smack Oct 2 16:50:33 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out where to send this! Oct 2 16:53:21 f6bvp-9 syslogd 1.4.2: restart. Oct 2 16:53:22 f6bvp-9 kernel: klogd 1.4.2, log source = /proc/kmsg started. Oct 2 16:53:23 f6bvp-9 kernel: Linux version 2.6.27-rc7 (root@f6bvp-9) (gcc version 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)) #3 Tue Sep 30 10:55:01 CEST 2008 Oct 2 16:53:23 f6bvp-9 kernel: BIOS-provided physical RAM map: Oct 2 16:53:23 f6bvp-9 kernel: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) Oct 2 16:53:23 f6bvp-9 kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) Oct 2 16:53:23 f6bvp-9 kernel: BIOS-e820: 0000000000100000 - 0000000008000000 (usable) Oct 2 16:53:23 f6bvp-9 kernel: BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) Oct 2 16:53:23 f6bvp-9 kernel: last_pfn = 0x8000 max_arch_pfn = 0x100000 Oct 2 16:53:23 f6bvp-9 kernel: RAMDISK: 07f2a000 - 07fef581 Although I did not change anything, and contrarily to my previous observation, the system instability as shown above occurs systematically. There was no problem with Kernel 2.6.25-10 I was using before (with patches for AX25 and ROSE that are now included in 2.6.27-rc7). I did not try 2.6.26 on this machine, thus I cannot tell if the bug was already present. Would it be worth to test 2.6.26 ? With the SSH limited access I have from my remote site, can I continue this debuging effort constructively, or do we wait until I am back in front of the local console ? Bernard Le mercredi 01 octobre 2008 à 05:58 +0000, Jarek Poplawski a écrit : > On Wed, Oct 01, 2008 at 12:49:02AM +0200, Bernard Pidoux F6BVP wrote: > > Jarek, > > > > With the patch #2, there have been only one reboot after the machine was > > frozen. I did not notice that was happening for there was no kernel > > panic message but a frozen display only. > > Then I could only write down partially the values displayed after > > AX25_DBG: c446c220, c..... 3, 5, 240 > > Second number was not 000000 as it is most of the time. > > The machine rebooted after the usual 60 seconds delay and since then > > there was no kernel panic. > > This is the first time kernel 2.6.27-rc7 does not panic within a minute. > > The system is up since about 20 minutes now with frequent AX25_DBG messages. > > Hmm... Since this was intended mainly for debugging something went > wrong. (I expected to get at least one warning.) > > > > > I am sorry that I must go to sleep now to be at work early in the morning. > > I will be away from my Linux box for a few days. > > Then we should probably interrupt this interesting bug hunting. > > I will send more results when I am back. > > OK, I'll try to look at this in the meantime. > > Cheers, > Jarek P. > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/core/sock.c b/net/core/sock.c index 2d358dd..3ad8eaa 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -960,6 +960,15 @@ void sk_free(struct sock *sk) { struct sk_filter *filter; + if (sk->sk_socket) { + printk("AX25_DBG: %p, %p, %u, %u, %u, %p\n", sk, sk->sk_socket, + sk->sk_family, sk->sk_type, sk->sk_protocol, sk->sk_socket->sk); + if (sk->sk_family == 3 && sk->sk_type == 5 && sk->sk_protocol == 240) { + WARN_ON_ONCE(1); + sock_orphan(sk); + } + } + if (sk->sk_destruct) sk->sk_destruct(sk);