diff mbox

kernel crash - CIFS client unstable on faulty network conditions

Message ID 20091202131835.3f2584bd@tlielax.poochiereds.net
State New
Headers show

Commit Message

Jeff Layton Dec. 2, 2009, 6:18 p.m. UTC
On Wed, 2 Dec 2009 17:15:30 +0000
Gustavo Carvalho Homem <gustavo@angulosolido.pt> wrote:

> Hi,
> 
> We are using:
> 
> kernel 2.6.31-5
> samba 3.4.2
> 
> to mount CIFS shares over DFS.
> 
> Everything works fine under normal conditions. However if some server(s) is/are unreachable we end up with a kernel crash that locks up the machine.
> 
> Kernel logs can be seen below.
> 
> Any comment?
> 
> Cheers
> Gustavo
> 
> 
> -------------------------------------
> 
> Dec  2 15:13:39 CGDWX08027093 klogd:  CIFS VFS: No response for cmd 114 mid 1
> Dec  2 15:13:39 CGDWX08027093 klogd: BUG: unable to handle kernel NULL pointer dereference at 00000020
> Dec  2 15:13:39 CGDWX08027093 klogd: IP: [<f847e804>] cifs_put_smb_ses+0x14/0xd0 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd: *pde = 00000000 
> Dec  2 15:13:39 CGDWX08027093 klogd: Oops: 0000 [#1] SMP 
> Dec  2 15:13:39 CGDWX08027093 klogd: last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/net/eth0/ifindex
> Dec  2 15:13:39 CGDWX08027093 klogd: Modules linked in: nls_utf8 nls_iso8859_1 cifs i915 drm i2c_algo_bit i2c_core af_packet ipv6 binfmt_misc loop dm_mirror dm_region_hash dm_log dm_mod cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_mixer_oss ehci_hcd snd e1000e heci(C) soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr uhci_hcd processor floppy button evdev video wmi output tpm_infineon tpm tpm_bios thermal sg usbcore ide_generic ata_generic ide_pci_generic ide_gd_mod ide_core pata_acpi ahci ata_piix libata sd_mod scsi_mod crc_t10dif ext3 jbd
> Dec  2 15:13:39 CGDWX08027093 klogd: 
> Dec  2 15:13:39 CGDWX08027093 klogd: Pid: 3511, comm: mount.cifs Tainted: G        WC (2.6.31.5-desktop-1xcm #1) HP Compaq dc7900 Small Form Factor
> Dec  2 15:13:39 CGDWX08027093 klogd: EIP: 0060:[<f847e804>] EFLAGS: 00010282 CPU: 1
> Dec  2 15:13:39 CGDWX08027093 klogd: EIP is at cifs_put_smb_ses+0x14/0xd0 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd: EAX: 00000000 EBX: f5e64400 ECX: f5e64400 EDX: f5e65600
> Dec  2 15:13:39 CGDWX08027093 klogd: ESI: 00000079 EDI: 00000000 EBP: f5dc1dfc ESP: f5dc1de0
> Dec  2 15:13:39 CGDWX08027093 klogd:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Dec  2 15:13:39 CGDWX08027093 klogd: Process mount.cifs (pid: 3511, ti=f5dc0000 task=f61ca550 task.ti=f5dc0000)
> Dec  2 15:13:39 CGDWX08027093 klogd: Stack:
> Dec  2 15:13:39 CGDWX08027093 klogd:  00000000 f5dc1dfc f848e101 f5dc1dfc f5e64400 00000079 00000000 f5dc1e20
> Dec  2 15:13:39 CGDWX08027093 klogd: <0> f847e957 f8481193 00000000 c16bb260 f8481193 ffffff90 f614b000 f5e64400
> Dec  2 15:13:39 CGDWX08027093 klogd: <0> f5dc1eb8 f84811a6 f5f9e950 f849d2f8 f5e64430 00000000 c0701028 c0701038
> Dec  2 15:13:39 CGDWX08027093 klogd: Call Trace:
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f848e101>] ? tconInfoFree+0x61/0x90 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f847e957>] ? cifs_put_tcon+0x97/0xd0 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f8481193>] ? cifs_mount+0x4e3/0x2570 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f8481193>] ? cifs_mount+0x4e3/0x2570 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f84811a6>] ? cifs_mount+0x4f6/0x2570 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<f8473df4>] ? cifs_get_sb+0x124/0x2c0 [cifs]
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c01e4aae>] ? vfs_kern_mount+0x5e/0x120
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c01e4bce>] ? do_kern_mount+0x3e/0xe0
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c01fb376>] ? do_mount+0x446/0x7d0
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c01f969d>] ? copy_mount_options+0xad/0x130
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c01fb78c>] ? sys_mount+0x8c/0xb0
> Dec  2 15:13:39 CGDWX08027093 klogd:  [<c0103cfb>] ? sysenter_do_call+0x12/0x28
> Dec  2 15:13:39 CGDWX08027093 klogd: Code: a4 00 00 00 85 d2 74 bc b8 09 00 00 00 e8 85 30 cd c7 5b 5d c3 66 90 55 89 e5 83 ec 1c 89 5d f4 89 75 f8 89 7d fc 0f 1f 44 00 00 <8b> 70 20 89 c3 b8 20 fd 4a f8 e8 4d 35 f9 c7 8b 43 24 83 e8 01 
> Dec  2 15:13:39 CGDWX08027093 klogd: EIP: [<f847e804>] cifs_put_smb_ses+0x14/0xd0 [cifs] SS:ESP 0068:f5dc1de0
> Dec  2 15:13:39 CGDWX08027093 klogd: CR2: 0000000000000020
> Dec  2 15:13:39 CGDWX08027093 klogd: ---[ end trace 93d72a36b9146f24 ]---
> Dec  2 15:14:01 CGDWX08027093 CROND[6710]: (root) CMD (   /usr/share/msec/promisc_check.sh)
> Dec  2 15:14:10 CGDWX08027093 klogd: BUG: unable to handle kernel NULL pointer dereference at (null)
> Dec  2 15:14:10 CGDWX08027093 klogd: IP: [<f847f91c>] cifs_demultiplex_thread+0x37c/0xc50 [cifs]
> Dec  2 15:14:10 CGDWX08027093 klogd: *pde = 00000000 
> Dec  2 15:14:10 CGDWX08027093 klogd: Oops: 0000 [#2] SMP 
> Dec  2 15:14:10 CGDWX08027093 klogd: last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/net/eth0/ifindex
> Dec  2 15:14:10 CGDWX08027093 klogd: Modules linked in: nls_utf8 nls_iso8859_1 cifs i915 drm i2c_algo_bit i2c_core af_packet ipv6 binfmt_misc loop dm_mirror dm_region_hash dm_log dm_mod cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq freq_table snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_timer snd_mixer_oss ehci_hcd snd e1000e heci(C) soundcore snd_page_alloc iTCO_wdt iTCO_vendor_support pcspkr uhci_hcd processor floppy button evdev video wmi output tpm_infineon tpm tpm_bios thermal sg usbcore ide_generic ata_generic ide_pci_generic ide_gd_mod ide_core pata_acpi ahci ata_piix libata sd_mod scsi_mod crc_t10dif ext3 jbd
> Dec  2 15:14:10 CGDWX08027093 klogd: 
> Dec  2 15:14:10 CGDWX08027093 klogd: Pid: 5120, comm: cifsd Tainted: G      D WC (2.6.31.5-desktop-1xcm #1) HP Compaq dc7900 Small Form Factor
> Dec  2 15:14:10 CGDWX08027093 klogd: EIP: 0060:[<f847f91c>] EFLAGS: 00010216 CPU: 1
> Dec  2 15:14:10 CGDWX08027093 klogd: EIP is at cifs_demultiplex_thread+0x37c/0xc50 [cifs]
> Dec  2 15:14:10 CGDWX08027093 klogd: EAX: f84afd20 EBX: f5e64400 ECX: f6c38000 EDX: 00000000
> Dec  2 15:14:10 CGDWX08027093 klogd: ESI: f5e64460 EDI: f5e64408 EBP: f5f73fb8 ESP: f5f73f40
> Dec  2 15:14:10 CGDWX08027093 klogd:  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> Dec  2 15:14:10 CGDWX08027093 klogd: Process cifsd (pid: 5120, ti=f5f72000 task=f5f5eff0 task.ti=f5f72000)
> Dec  2 15:14:10 CGDWX08027093 klogd: Stack:
> Dec  2 15:14:10 CGDWX08027093 klogd:  00000000 00000004 00000000 743594af f5e64448 c1f9a420 f5e64460 f61ca550
> Dec  2 15:14:10 CGDWX08027093 klogd: <0> f5fd0000 f5dbb340 f6b46e00 f5f5eff0 00f5f270 c1f9a420 00000001 c012e588
> Dec  2 15:14:10 CGDWX08027093 klogd: <0> 74359c99 83000001 00000003 00000000 f5f73fa4 00000001 00000000 00000000
> Dec  2 15:14:10 CGDWX08027093 klogd: Call Trace:
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<c012e588>] ? __wake_up_common+0x48/0x70
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<c013218e>] ? complete+0x4e/0x60
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<f847f5a0>] ? cifs_demultiplex_thread+0x0/0xc50 [cifs]
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<c0159e14>] ? kthread+0x84/0x90
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<c0159d90>] ? kthread+0x0/0x90
> Dec  2 15:14:10 CGDWX08027093 klogd:  [<c0104947>] ? kernel_thread_helper+0x7/0x10
> Dec  2 15:14:10 CGDWX08027093 klogd: Code: 4d a0 3b 4b 60 74 17 f6 05 04 fd 4a f8 01 0f 85 4e 04 00 00 b8 b0 b3 00 00 e8 71 e9 cc c7 b8 20 fd 4a f8 e8 57 22 f9 c7 8b 53 08 <8b> 02 0f 18 00 90 39 fa 74 15 66 90 c7 42 20 00 00 00 00 89 c2 
> Dec  2 15:14:10 CGDWX08027093 klogd: EIP: [<f847f91c>] cifs_demultiplex_thread+0x37c/0xc50 [cifs] SS:ESP 0068:f5f73f40
> Dec  2 15:14:10 CGDWX08027093 klogd: CR2: 0000000000000000
> Dec  2 15:14:10 CGDWX08027093 klogd: ---[ end trace 93d72a36b9146f25 ]---
> 
> -------------------------------------
> 

Hard to be sure from the info here but I think I might see the problem...

Can you reproduce this at will? If you can, could you let me know
whether the attached patch fixes it? Note that I haven't even compile
tested it, but it's pretty straightforward.

Thanks,

Comments

Gustavo Carvalho Homem Dec. 2, 2009, 9:12 p.m. UTC | #1
Hi,


> Dec  2 15:14:10 CGDWX08027093 klogd: ---[ end trace 93d72a36b9146f25 ]---
> >
> > -------------------------------------
>
> Hard to be sure from the info here but I think I might see the problem...
>
> Can you reproduce this at will? If you can, could you let me know
> whether the attached patch fixes it? Note that I haven't even compile
> tested it, but it's pretty straightforward.

We can't reproduce this because we don't have rights over the windows 
infra-structure that got unstable. And I belive that it won't stay unstable 
for long :|

We'll see if the symptom persists.

Is this patch commited for the current kernel GIT so that it will appear on 
the next kernel release? 

Cheers
Gustavo
Jeff Layton Dec. 3, 2009, 1:28 a.m. UTC | #2
On Wed, 2 Dec 2009 21:12:54 +0000
Gustavo Carvalho Homem <gustavo@angulosolido.pt> wrote:

> Hi,
> 
> 
> > Dec  2 15:14:10 CGDWX08027093 klogd: ---[ end trace 93d72a36b9146f25 ]---
> > >
> > > -------------------------------------
> >
> > Hard to be sure from the info here but I think I might see the problem...
> >
> > Can you reproduce this at will? If you can, could you let me know
> > whether the attached patch fixes it? Note that I haven't even compile
> > tested it, but it's pretty straightforward.
> 
> We can't reproduce this because we don't have rights over the windows 
> infra-structure that got unstable. And I belive that it won't stay unstable 
> for long :|
> 
> We'll see if the symptom persists.
> 
> Is this patch commited for the current kernel GIT so that it will appear on 
> the next kernel release? 
> 
> Cheers
> Gustavo

No, I just wrote it today based on the stack traces you sent. I'm
pretty sure that the patch is correct, but unfortunately it's tough to
be certain it'll fix the problem you reported.

If you don't have a way to test it, I'll probably just go ahead and
send it to Steve later this week once I've given it some smoke testing.
We'll just have to hope for the best in that case.

Cheers,
diff mbox

Patch

From a2d6f76bb2bbc45ab9a534fdfe5c7f1617c0e87a Mon Sep 17 00:00:00 2001
From: Jeff Layton <jlayton@redhat.com>
Date: Wed, 2 Dec 2009 13:16:20 -0500
Subject: [PATCH] cifs: NULL out tcon, pSesInfo, and srvTcp pointers when chasing DFS referrals

The scenario is this:

We've got a valid tcon pointer and we're chasing a DFS referral. We put
the tcon reference, which puts the session reference too. Then we try
the mount again with the new mount info. That mount fails, and we goto
mount_fail_check. The tcon and pSesInfo pointers are non-NULL, but no
longer valid, and things blow up when we try to put references to them.

Fix this by zeroing out the tcon, tcp and smb session pointers before
retrying the mount.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/cifs/connect.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index 63ea83f..54f38f1 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -2595,6 +2595,9 @@  remote_path_check:
 			else if (pSesInfo)
 				cifs_put_smb_ses(pSesInfo);
 
+			tcon = NULL;
+			pSesInfo = NULL;
+			srvTcp = NULL;
 			cleanup_volume_info(&volume_info);
 			referral_walks_count++;
 			goto try_mount_again;
-- 
1.6.5.2