Patchwork ax25 rose Re: kernel panic linux-2.6.27-rc7

login
register
mail settings
Submitter Jarek Poplawski
Date Sept. 30, 2008, 9:30 p.m.
Message ID <20080930213052.GA6449@ami.dom.local>
Download mbox | patch
Permalink /patch/2138/
State Not Applicable
Delegated to: Jeff Garzik
Headers show

Comments

Jarek Poplawski - Sept. 30, 2008, 9:30 p.m.
On Tue, Sep 30, 2008 at 10:59:35PM +0200, Bernard Pidoux F6BVP wrote:
> Jarek,
>
> Yes I am using n2kpci/8390 driver.
> The second patch seems to have removed the inconsistent lock state.

It's fine: this patch is only for this one (simpler) problem.

> But the kernel panic still occured systematically.
> However I did not catch netconsole messages since the patch prevented  
> transmission to remote console via ethernet.
>
> Before the machine rebooted I only noted the following information at  
> the bottom of the local console page :
>
> EIP: [<.....>] datagram_poll + 0xe9/0xf0
>
> Does it help ?

Probably the beginning of the same oops as before.

Here is a debugging patch #2, which should give us more details.
Apply after reverting debugging patch #1 (lib8390 patch should stay).
Alas the oops in datagram_poll is still possible.

Thanks,
Jarek P.

---

 net/core/sock.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernard Pidoux F6BVP - Sept. 30, 2008, 10:49 p.m.
Jarek,

With the patch #2, there have been only one reboot after the machine was 
frozen. I did not notice that was happening for there was no kernel 
panic message but a frozen display only.
Then I could only write down partially the values displayed after 
AX25_DBG: c446c220, c..... 3, 5, 240
Second number was not 000000 as it is most of the time.
The machine rebooted after the usual 60 seconds delay and since then 
there was no kernel panic.
This is the first time kernel 2.6.27-rc7 does not panic within a minute.
The system is up since about 20 minutes now with frequent AX25_DBG messages.

I am sorry that I must go to sleep now to be at work early in the morning.
I will be away from my Linux box for a few days.
Then we should probably interrupt this interesting bug hunting.
I will send more results when I am back.

Thank you and best regards,

Bernard



Jarek Poplawski wrote:
> On Tue, Sep 30, 2008 at 10:59:35PM +0200, Bernard Pidoux F6BVP wrote:
>> Jarek,
>>
>> Yes I am using n2kpci/8390 driver.
>> The second patch seems to have removed the inconsistent lock state.
> 
> It's fine: this patch is only for this one (simpler) problem.
> 
>> But the kernel panic still occured systematically.
>> However I did not catch netconsole messages since the patch prevented  
>> transmission to remote console via ethernet.
>>
>> Before the machine rebooted I only noted the following information at  
>> the bottom of the local console page :
>>
>> EIP: [<.....>] datagram_poll + 0xe9/0xf0
>>
>> Does it help ?
> 
> Probably the beginning of the same oops as before.
> 
> Here is a debugging patch #2, which should give us more details.
> Apply after reverting debugging patch #1 (lib8390 patch should stay).
> Alas the oops in datagram_poll is still possible.
> 
> Thanks,
> Jarek P.
> 
> ---
> 
>  net/core/sock.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 2d358dd..3ad8eaa 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -960,6 +960,15 @@ void sk_free(struct sock *sk)
>  {
>  	struct sk_filter *filter;
>  
> +	if (sk->sk_socket) {
> +		printk("AX25_DBG: %p, %p, %u, %u, %u, %p\n", sk, sk->sk_socket,
> +			 sk->sk_family, sk->sk_type, sk->sk_protocol, sk->sk_socket->sk);
> +		if (sk->sk_family == 3 && sk->sk_type == 5 && sk->sk_protocol == 240) {
> +			WARN_ON_ONCE(1);
> +			sock_orphan(sk);
> +		}
> +	}
> +
>  	if (sk->sk_destruct)
>  		sk->sk_destruct(sk);
>  
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jarek Poplawski - Oct. 1, 2008, 5:58 a.m.
On Wed, Oct 01, 2008 at 12:49:02AM +0200, Bernard Pidoux F6BVP wrote:
> Jarek,
>
> With the patch #2, there have been only one reboot after the machine was  
> frozen. I did not notice that was happening for there was no kernel  
> panic message but a frozen display only.
> Then I could only write down partially the values displayed after  
> AX25_DBG: c446c220, c..... 3, 5, 240
> Second number was not 000000 as it is most of the time.
> The machine rebooted after the usual 60 seconds delay and since then  
> there was no kernel panic.
> This is the first time kernel 2.6.27-rc7 does not panic within a minute.
> The system is up since about 20 minutes now with frequent AX25_DBG messages.

Hmm... Since this was intended mainly for debugging something went
wrong. (I expected to get at least one warning.)

>
> I am sorry that I must go to sleep now to be at work early in the morning.
> I will be away from my Linux box for a few days.
> Then we should probably interrupt this interesting bug hunting.
> I will send more results when I am back.

OK, I'll try to look at this in the meantime.

Cheers,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bernard Pidoux F6BVP - Oct. 2, 2008, 6:20 p.m.
Hi Jarek,

Finally I am able to access the faulty 2.6.27-rc7 f6bvp-9 system via
ssh.
I can read /var/log/kernel/message file immediately after a kernel
failure and a reboot. 
When this is done, the system is stable until I start FPAC suite
applications (fpad, fpacwpd ...) as shown below.

Oct  2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc338, 00000000, 1, 0, 0
Oct  2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc008, 00000000, 1, 1, 0
Oct  2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc338, 00000000, 1, 0, 0
Oct  2 16:50:00 f6bvp-9 kernel: AX25_DBG: c36fc008, 00000000, 1, 1, 0
Oct  2 16:50:09 f6bvp-9 fpad: starting FPAD
Oct  2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:09 f6bvp-9 kernel: AX25_DBG: c15245f0, 00000000, 11, 5, 0
Oct  2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:09 f6bvp-9 fpad: FPAD becomes a daemon 
Oct  2 16:50:09 f6bvp-9 fpacwpd[5169]: Starting
Oct  2 16:50:09 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:10 f6bvp-9 last message repeated 6 times
Oct  2 16:50:10 f6bvp-9 ax25ipd: assemble_kiss: dumped - control byte
non-zero 
Oct  2 16:50:10 f6bvp-9 kernel: mkiss: ax4: Trying crc-smack
Oct  2 16:50:10 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:10 f6bvp-9 last message repeated 7 times
Oct  2 16:50:10 f6bvp-9 fpad: FPAD opened WP service 
Oct  2 16:50:11 f6bvp-9 ax25ipd: assemble_kiss: dumped - control byte
non-zero 
Oct  2 16:50:11 f6bvp-9 kernel: mkiss: ax4: Trying crc-flexnet
Oct  2 16:50:11 f6bvp-9 kernel: mkiss: ax0: Trying crc-smack
Oct  2 16:50:13 f6bvp-9 kernel: AX25_DBG: c1520630, 00000000, 11, 5, 0
Oct  2 16:50:16 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:50:16 f6bvp-9 kernel: mkiss: ax0: Trying crc-flexnet
Oct  2 16:50:18 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:50:21 f6bvp-9 kernel: AX25_DBG: c36090e0, 00000000, 3, 2, 207
Oct  2 16:50:21 f6bvp-9 kernel: mkiss: ax1: Trying crc-smack
Oct  2 16:50:22 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:50:25 f6bvp-9 kernel: AX25_DBG: c14662d0, 00000000, 11, 5, 0
Oct  2 16:50:26 f6bvp-9 kernel: mkiss: ax1: Trying crc-flexnet
Oct  2 16:50:28 f6bvp-9 kernel: AX25_DBG: c4cd0a68, 00000000, 11, 5, 0
Oct  2 16:50:28 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:50:30 f6bvp-9 kernel: AX25_DBG: c691a688, 00000000, 2, 2, 17
Oct  2 16:50:30 f6bvp-9 last message repeated 2 times
Oct  2 16:50:31 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:50:31 f6bvp-9 kernel: AX25_DBG: c146a5d0, 00000000, 11, 5, 0
Oct  2 16:50:31 f6bvp-9 kernel: AX25_DBG: c36090e0, 00000000, 3, 2, 207
Oct  2 16:50:31 f6bvp-9 kernel: mkiss: ax3: Trying crc-smack
Oct  2 16:50:33 f6bvp-9 ax25ipd: from_kiss: dumped - cannot figure out
where to send this! 
Oct  2 16:53:21 f6bvp-9 syslogd 1.4.2: restart.
Oct  2 16:53:22 f6bvp-9 kernel: klogd 1.4.2, log source = /proc/kmsg
started.
Oct  2 16:53:23 f6bvp-9 kernel: Linux version 2.6.27-rc7 (root@f6bvp-9)
(gcc version 4.2.2 20071128 (prerelease) (4.2.2-3.1mdv2008.0)) #3 Tue
Sep 30 10:55:01 CEST 2008
Oct  2 16:53:23 f6bvp-9 kernel: BIOS-provided physical RAM map:
Oct  2 16:53:23 f6bvp-9 kernel:  BIOS-e820: 0000000000000000 -
00000000000a0000 (usable)
Oct  2 16:53:23 f6bvp-9 kernel:  BIOS-e820: 00000000000f0000 -
0000000000100000 (reserved)
Oct  2 16:53:23 f6bvp-9 kernel:  BIOS-e820: 0000000000100000 -
0000000008000000 (usable)
Oct  2 16:53:23 f6bvp-9 kernel:  BIOS-e820: 00000000ffff0000 -
0000000100000000 (reserved)
Oct  2 16:53:23 f6bvp-9 kernel: last_pfn = 0x8000 max_arch_pfn =
0x100000
Oct  2 16:53:23 f6bvp-9 kernel: RAMDISK: 07f2a000 - 07fef581
 
Although I did not change anything, and contrarily to my previous
observation, the system instability as shown above occurs
systematically.
There was no problem with Kernel 2.6.25-10 I was using before (with
patches for AX25 and ROSE that are now included in 2.6.27-rc7).
I did not try 2.6.26 on this machine, thus I cannot tell if the bug was
already present.
Would it be worth to test 2.6.26 ?   
With the SSH limited access I have from my remote site, can I continue
this debuging effort constructively, or do we wait until I am back in
front of the local console ?

Bernard  

Le mercredi 01 octobre 2008 à 05:58 +0000, Jarek Poplawski a écrit :
> On Wed, Oct 01, 2008 at 12:49:02AM +0200, Bernard Pidoux F6BVP wrote:
> > Jarek,
> >
> > With the patch #2, there have been only one reboot after the machine was  
> > frozen. I did not notice that was happening for there was no kernel  
> > panic message but a frozen display only.
> > Then I could only write down partially the values displayed after  
> > AX25_DBG: c446c220, c..... 3, 5, 240
> > Second number was not 000000 as it is most of the time.
> > The machine rebooted after the usual 60 seconds delay and since then  
> > there was no kernel panic.
> > This is the first time kernel 2.6.27-rc7 does not panic within a minute.
> > The system is up since about 20 minutes now with frequent AX25_DBG messages.
> 
> Hmm... Since this was intended mainly for debugging something went
> wrong. (I expected to get at least one warning.)
> 
> >
> > I am sorry that I must go to sleep now to be at work early in the morning.
> > I will be away from my Linux box for a few days.
> > Then we should probably interrupt this interesting bug hunting.
> > I will send more results when I am back.
> 
> OK, I'll try to look at this in the meantime.
> 
> Cheers,
> Jarek P.
> 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 2d358dd..3ad8eaa 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -960,6 +960,15 @@  void sk_free(struct sock *sk)
 {
 	struct sk_filter *filter;
 
+	if (sk->sk_socket) {
+		printk("AX25_DBG: %p, %p, %u, %u, %u, %p\n", sk, sk->sk_socket,
+			 sk->sk_family, sk->sk_type, sk->sk_protocol, sk->sk_socket->sk);
+		if (sk->sk_family == 3 && sk->sk_type == 5 && sk->sk_protocol == 240) {
+			WARN_ON_ONCE(1);
+			sock_orphan(sk);
+		}
+	}
+
 	if (sk->sk_destruct)
 		sk->sk_destruct(sk);