Message ID | 4AA1ACF8.7030101@kernel.org |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
On 09/05/2009 02:12 AM, Tejun Heo wrote: > Tim Blechmann wrote: >>>>>>>> booting the machine today, one hd is missing again ... bootlog attached >>>>>>> Hmmm... strange. I don't really see how it could be escaping. Can >>>>>>> you please apply the attached patch? It still won't change the >>>>>>> behavior but should be able to catch where it's escaping. >>>>>> attached you find two bootlogs, for a correct boot, and with one hd >>>>>> missing ... >>>>> Heh heh, this is getting a bit embarrassing. Seems like I wasn't >>>>> looking at the right path. Can you please try this one too? If it >>>>> says "XXX D7 pulldown quick exit path" and then succeed to probe, >>>>> that's the previous failure case so you don't need to keep trying to >>>>> reproduce the problem. >>>> i've attached the two boot logs again ... >>> Okay, it was another wrong guess. Can you please try this one? >> >> unfortunately, i haven't been able to get a bootlog of a failure the >> issue after rebooting like 20 times with yesterday's linus/master. >> once i couldn't boot, since the root hd wasn't found, so i don't think, >> the issue is solved, it just doesn't show very frequently ... >> >> the bootlog of a working system is attached, if i experience another >> issue, i will send you another bootlog. since i am out of town for a few >> days, it may take some time, though ... > > Alright, please keep me posted. Another possibility is that it's > timing related and the PHY goes down briefly post-reset. I think I've > found the code path but not sure yet and given how many times my hunch > has been wrong on this case, not too confident either. Anyways, if > it's timing related, too many printks could have thrown it off. If > you can't reproduce the failure with the previous patch, please try > this one and see whether it prints out "XXX: clearing to > ATA_DEV_NONE" on failure. with this patch, i could reproduce it again on the first boot. bootlog attached. cheers, tim
Hello, Tim Blechmann wrote: > with this patch, i could reproduce it again on the first boot. bootlog > attached. Thanks a lot for testing. The offending commit is 816ab897. commit 816ab89782ac139a8b65147cca990822bb7e8675 Author: Tejun Heo <tj@kernel.org> Date: Wed Oct 22 00:31:34 2008 +0900 libata: set device class to NONE if phys_offline Reset methods don't have access to phys link status for slave links and may incorrectly indicate device presence causing unnecessary probe failures for unoccupied links. This patch clears device class to NONE during post-reset processing if phys link is offline. As on/offlineness semantics is strictly defined and used in multiple places by the core layer, this won't change behavior for drivers which don't use slave links. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> The problem is that I don't really remember why I added this one back then. This is incorrect because the condition should be dealt with later in the reset logic. That didn't work quite as expected and I ended up adding the above to work around that and it turned out wrong. I'll dig deeper and find out what was the problem back then. Thanks.
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c index a04488f..d0d0f88 100644 --- a/drivers/ata/libata-eh.c +++ b/drivers/ata/libata-eh.c @@ -2673,8 +2673,10 @@ int ata_eh_reset(struct ata_link *link, int classify, classes[dev->devno] = ATA_DEV_ATA; else if (lflags & ATA_LFLAG_ASSUME_SEMB) classes[dev->devno] = ATA_DEV_SEMB_UNSUP; - } else + } else { + ata_dev_printk(dev, KERN_INFO, "XXX clearing to ATA_DEV_NONE\n"); classes[dev->devno] = ATA_DEV_NONE; + } } /* record current link speed */