Patchwork Drives missing at boot

login
register
mail settings
Submitter Tejun Heo
Date July 5, 2010, 6:30 a.m.
Message ID <4C317C04.20500@kernel.org>
Download mbox | patch
Permalink /patch/57866/
State Not Applicable
Delegated to: David Miller
Headers show

Comments

Tejun Heo - July 5, 2010, 6:30 a.m.
On 07/03/2010 06:42 PM, Mark Knecht wrote:
> On Sat, Jul 3, 2010 at 9:13 AM, Tejun Heo <tj@kernel.org> wrote:
>> Hello,
>>
>> On 07/03/2010 06:06 PM, Mark Knecht wrote:
>>>> Can you please *attach* full logs of a successful boot and several
>>>> failing boots?
>>>
>>> Certainly? Which logs? dmesg or something else?
>>
>> dmesg output preferably with printk timestamp enabled.

Can you please apply the attached patch, reproduce the problem and
post the kernel log?

Thanks.
Mark Knecht - July 5, 2010, 4:56 p.m.
On Sun, Jul 4, 2010 at 11:30 PM, Tejun Heo <tj@kernel.org> wrote:
> On 07/03/2010 06:42 PM, Mark Knecht wrote:
>> On Sat, Jul 3, 2010 at 9:13 AM, Tejun Heo <tj@kernel.org> wrote:
>>> Hello,
>>>
>>> On 07/03/2010 06:06 PM, Mark Knecht wrote:
>>>>> Can you please *attach* full logs of a successful boot and several
>>>>> failing boots?
>>>>
>>>> Certainly? Which logs? dmesg or something else?
>>>
>>> dmesg output preferably with printk timestamp enabled.
>
> Can you please apply the attached patch, reproduce the problem and
> post the kernel log?
>
> Thanks.
>
> --
> tejun
>

I'm sorry. What am I patching? I'm not a kernel developer - not even a
programmer - so I'll need some help with this. What's the command I
should use?

c2stable src # ls -la /usr/src/
total 32
drwxr-xr-x  8 root root 4096 Jul  2 09:56 .
drwxr-xr-x 14 root root 4096 Apr 15 07:46 ..
-rw-r--r--  1 root root    0 Mar 24 18:37 .keep
lrwxrwxrwx  1 root root   22 Jul  2 09:56 linux ->
linux-2.6.34-gentoo-r1
drwxr-xr-x 23 root root 4096 Jun 16 07:23 linux-2.6.32-gentoo-r7
drwxr-xr-x 24 root root 4096 Jun 16 08:42 linux-2.6.34-gentoo
drwxr-xr-x 24 root root 4096 Jul  3 15:30 linux-2.6.34-gentoo-r1
drwxr-xr-x 21 root root 4096 Jun 27 13:12 linux-2.6.34-rc3
drwxr-xr-x 20 root root 4096 Jun 15 08:05 linux-2.6.34-rc5
drwxr-xr-x 20 root root 4096 Jun 27 13:13 linux-2.6.35-rc3
c2stable src #

Thanks,
Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tejun Heo - July 6, 2010, 6:33 a.m.
Hello,

On 07/05/2010 06:56 PM, Mark Knecht wrote:
>>>> dmesg output preferably with printk timestamp enabled.
>>
>> Can you please apply the attached patch, reproduce the problem and
>> post the kernel log?
> 
> I'm sorry. What am I patching? I'm not a kernel developer - not even a
> programmer - so I'll need some help with this. What's the command I
> should use?
> 
> c2stable src # ls -la /usr/src/
> total 32
> drwxr-xr-x  8 root root 4096 Jul  2 09:56 .
> drwxr-xr-x 14 root root 4096 Apr 15 07:46 ..
> -rw-r--r--  1 root root    0 Mar 24 18:37 .keep
> lrwxrwxrwx  1 root root   22 Jul  2 09:56 linux ->
> linux-2.6.34-gentoo-r1
> drwxr-xr-x 23 root root 4096 Jun 16 07:23 linux-2.6.32-gentoo-r7
> drwxr-xr-x 24 root root 4096 Jun 16 08:42 linux-2.6.34-gentoo
> drwxr-xr-x 24 root root 4096 Jul  3 15:30 linux-2.6.34-gentoo-r1
> drwxr-xr-x 21 root root 4096 Jun 27 13:12 linux-2.6.34-rc3
> drwxr-xr-x 20 root root 4096 Jun 15 08:05 linux-2.6.34-rc5
> drwxr-xr-x 20 root root 4096 Jun 27 13:13 linux-2.6.35-rc3

Hmm...

$ cd /usr/src/linux && patch -p1 < PATCH_FILE

should do it.  You know how to build and install the compiled kernel,
right?

Thanks.
Mark Knecht - July 6, 2010, 6:13 p.m.
On Mon, Jul 5, 2010 at 11:33 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On 07/05/2010 06:56 PM, Mark Knecht wrote:
>>>>> dmesg output preferably with printk timestamp enabled.
>>>
>>> Can you please apply the attached patch, reproduce the problem and
>>> post the kernel log?
>>
>> I'm sorry. What am I patching? I'm not a kernel developer - not even a
>> programmer - so I'll need some help with this. What's the command I
<SNIP>
>
> Hmm...
>
> $ cd /usr/src/linux && patch -p1 < PATCH_FILE
>
> should do it.  You know how to build and install the compiled kernel,
> right?
>
> Thanks.
>
> --
> tejun
>

OK - thanks. The patch seemed to install correctly. I then did

make clean
make && make modules_install

and then a Gentoo command:

modules-rebuild -X rebuild

to pick up any package modules that need to be rebuild when I use a
new kernel. (X drivers, vmware, etc.)

The kernel boots fine:

mark@c2stable ~ $ uname -a
Linux c2stable 2.6.34-gentoo-r1 #4 SMP PREEMPT Tue Jul 6 10:35:15 PDT
2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel
GNU/Linux
mark@c2stable ~ $

I don't know what I'm looking for in the dmesg files but I do see one
message about a SATA Link being down.

Files attached.

Cheers,
Mark
Tejun Heo - July 7, 2010, 5:50 a.m.
Hello,

On 07/06/2010 08:13 PM, Mark Knecht wrote:
> OK - thanks. The patch seemed to install correctly. I then did
> 
> make clean
> make && make modules_install
> 
> and then a Gentoo command:
> 
> modules-rebuild -X rebuild
> 
> to pick up any package modules that need to be rebuild when I use a
> new kernel. (X drivers, vmware, etc.)
> 
> The kernel boots fine:
> 
> mark@c2stable ~ $ uname -a
> Linux c2stable 2.6.34-gentoo-r1 #4 SMP PREEMPT Tue Jul 6 10:35:15 PDT
> 2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel
> GNU/Linux
> mark@c2stable ~ $
> 
> I don't know what I'm looking for in the dmesg files but I do see one
> message about a SATA Link being down.

Umm... the patched module isn't being loaded.  If patched, it should
be reporting whether it's hard or soft resetting and some other debug
messages.  One common mistake is not updating initrd with patched
modules.  Is "modules-rebuild -X rebuild" the command for initrd
update?

Thanks.
Mark Knecht - July 7, 2010, 3:34 p.m.
On Tue, Jul 6, 2010 at 10:50 PM, Tejun Heo <tj@kernel.org> wrote:
> Hello,
>
> On 07/06/2010 08:13 PM, Mark Knecht wrote:
>> OK - thanks. The patch seemed to install correctly. I then did
>>
>> make clean
>> make && make modules_install
>>
>> and then a Gentoo command:
>>
>> modules-rebuild -X rebuild
>>
>> to pick up any package modules that need to be rebuild when I use a
>> new kernel. (X drivers, vmware, etc.)
>>
>> The kernel boots fine:
>>
>> mark@c2stable ~ $ uname -a
>> Linux c2stable 2.6.34-gentoo-r1 #4 SMP PREEMPT Tue Jul 6 10:35:15 PDT
>> 2010 x86_64 Intel(R) Core(TM) i7 CPU X 980 @ 3.33GHz GenuineIntel
>> GNU/Linux
>> mark@c2stable ~ $
>>
>> I don't know what I'm looking for in the dmesg files but I do see one
>> message about a SATA Link being down.
>
> Umm... the patched module isn't being loaded.  If patched, it should
> be reporting whether it's hard or soft resetting and some other debug
> messages.  One common mistake is not updating initrd with patched
> modules.  Is "modules-rebuild -X rebuild" the command for initrd
> update?
>
> Thanks.
>
> --
> tejun
>
I don't use an initrd.

I don't know what happened with the patch but clearly it wasn't in
there. I wasn't confident so I used the --dry-run option. Maybe I
forgot to remove it or something. Sorry. It's in now.

OK - I don't know if this was you intention but since adding this
patch I've not had a single drive missing failure. I've cold booted
about 8 times and warm booted at least 20 times. Every one has come up
fine. I've even gone so far as to turn off the UPS and sit for 5
minutes before cold booting. Still nothing fails right now.

I've had this sort of statistical thing happen before where it hasn't
failed for days, maybe even weeks, but then it starts failing and
fails every time for awhile. Over the past few days working with you
I've never had to reboot more than twice to get you a file. Now I've
tried 30 times this morning and I've come up with nothing.

I will continue to watch the machine and send you the failing dmesg
file whenever I finally get it. For now I can only attach the passing
file showing the patch is now included.

modules-rebuild is just a little Gentoo script that rebuilds a list of
modules that I've previously set up. Each time I build a new kernel I
rebuild some modules as well as mesa:

xorg-drivers
xorg-input-evdev
xorg-input-keyboard
xorg-input-mouse
xf86-video-ati
xf86-video-fbdev
xf86-video-vmware
vmware-modules
mesa

Clearly I shouldn't need both evdev as well as keyboard/mouse but
there's been problems with hald so I've been flipping back and forth a
bit. mesa is just superstition but I _think_ it's helped a couple of
times.

Thanks!

- Mark

Patch

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 2984e45..987ca80 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -3739,6 +3739,14 @@  int sata_link_resume(struct ata_link *link, const unsigned long *params,
 			return rc;
 	} while ((scontrol & 0xf0f) != 0x300 && --tries);
 
+	/* check once more */
+	msleep(100);
+	if ((rc = sata_scr_read(link, SCR_CONTROL, &scontrol)))
+			return rc;
+	ata_link_printk(link, KERN_ERR,
+			"XXX SControl after resume = %X, tries=%d\n",
+			scontrol, ATA_LINK_RESUME_TRIES - tries + 1);
+
 	if ((scontrol & 0xf0f) != 0x300) {
 		ata_link_printk(link, KERN_ERR,
 				"failed to resume link (SControl %X)\n",
@@ -6007,7 +6015,7 @@  static void async_port_probe(void *data, async_cookie_t cookie)
 
 		ehi->probe_mask |= ATA_ALL_DEVICES;
 		ehi->action |= ATA_EH_RESET | ATA_EH_LPM;
-		ehi->flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_QUIET;
+		ehi->flags |= ATA_EHI_NO_AUTOPSY/* | ATA_EHI_QUIET*/;
 
 		ap->pflags &= ~ATA_PFLAG_INITIALIZING;
 		ap->pflags |= ATA_PFLAG_LOADING;