Message ID | 20130718181355.GA31889@rhlx01.hs-esslingen.de |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Hi, forgot to mention that I had already added a libata.force=40c boot after my 80c config issue discovery which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg yet where there's now initial but strong confirmation via reports that the resulting UDMA/33 limit did NOT manage to fix corruption issues, thus it's more strongly likely that the "bound to ill-suited pata_amd driver" (due to incorrectly configuring speeds, or due to not configuring speeds at all due to possibly BIOS not providing emulation of PCI register accesses to Geode bus, which would be properly supported by pata_cs5536 OTOH) thing is the actual reason and might fix it (fingers crossed). Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote: > Hi, > > forgot to mention that I had already added a libata.force=40c boot > after my 80c config issue discovery > which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg > yet where there's now initial but strong confirmation via reports > that the resulting UDMA/33 limit did NOT manage to fix corruption issues, > thus it's more strongly likely that the "bound to ill-suited pata_amd driver" > (due to incorrectly configuring speeds, or due to not configuring speeds > at all due to possibly BIOS not providing emulation of PCI register accesses > to Geode bus, which would be properly supported by pata_cs5536 OTOH) > thing is the actual reason and might fix it (fingers crossed). UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33 should work just fine with 40-wire cable. Therefore this indeed sounds more like wrong driver being selected issue than a cable problem. Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jul 19, 2013 at 12:40:38PM +0200, Bartlomiej Zolnierkiewicz wrote: > > Hi, > > On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote: > > Hi, > > > > forgot to mention that I had already added a libata.force=40c boot > > after my 80c config issue discovery > > which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg > > yet where there's now initial but strong confirmation via reports > > that the resulting UDMA/33 limit did NOT manage to fix corruption issues, > > thus it's more strongly likely that the "bound to ill-suited pata_amd driver" > > (due to incorrectly configuring speeds, or due to not configuring speeds > > at all due to possibly BIOS not providing emulation of PCI register accesses > > to Geode bus, which would be properly supported by pata_cs5536 OTOH) > > thing is the actual reason and might fix it (fingers crossed). > > UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states > that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33 > should work just fine with 40-wire cable. Therefore this indeed sounds more > like wrong driver being selected issue than a cable problem. Hohumm, news flash: We just found out that such image corruption (not sure yet whether it's an *identical* case of corruption though) also can occur when having the CF image written by a premium USB combo writer on a *completely different machine* (should have done that counter check somewhat earlier, sorry...). IOW, this quite likely frees the driver from being the culprit for our specific case, i.e. we're likely talking about a stupid software issue. BUT, I'm still suspecting that we have a severe case of having the wrong driver activated: . lsmod: sd_mod 25977 2 snd_cs5535audio 6485 0 ata_generic 2239 0 pata_cs5536 2294 0 pata_amd 6641 1 libata 117483 3 ata_generic,pata_cs5536,pata_amd scsi_mod 106093 2 sd_mod,libata cs5535_gpio 1756 0 Pray tell me, this does look like pata_cs5536 gets loaded due to also containing the PCI ID, yet then fails to bind, as opposed to pata_amd which does (reference count 1)? We also tried the pata_cs5536.msr=1 boot parm: [ 0.000000] Kernel command line: [..........] pata_cs5536.msr=1 libata.force=40c [............] , and that did NOT cause the only recognizeable pata_cs5536-triggered log message printk(KERN_ERR DRV_NAME ": Using MSR regs instead of PCI\n"); to occur in dmesg, IOW this driver *is* inactive. We only had a [ 68.861778] pata_amd 0000:00:0f.2: version 0.4.1 [ 68.864570] scsi0 : pata_amd [ 68.865240] scsi1 : pata_amd So, AFAICS, this leaves us with up to three (perhaps four?) corrections that one would want to have: - remove the woefully improper (forgotten!?!) CS5536 ID from pata_amd (this correction is what one would want to have, and this would work fine, right? Famous last words...) - do a cable correction patch for libata side (I do think that indicating 40c if even one device is 40c-only is the way to go [as long as libata cannot handle per-device settings], for safety reasons, and if so that's probably also preferrable to advertising an UNK value, since UNK would skip towards device-side cable detection which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG) happen to be incorrect, e.g. especially in the case of CF cards (OTOH it specifically looks for a 0x2000 bit being *set* for 80c indication, thus it should be reliable after all since you'd expect ignorant/older devices to provide zeroed fields only...) - and what about cable correction patch for IDE side? Since IDE layer cable probing sequence algo appears to be different ("why??"), this might require advertising a different cable flag - perhaps pata_cs5536 DMI quirk for that board, in case UDMA/100 (BIOS reports cable 0x03 value i.e. both 80c) happens to be too fast, but given the advanced state of our discussion (pata_cs5536 driver even completely inactive!!) and the DMI recognition issues that I reported (empty strings) I don't think that's relevant, at least not now Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote: > On Fri, Jul 19, 2013 at 12:40:38PM +0200, Bartlomiej Zolnierkiewicz wrote: > > > > Hi, > > > > On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote: > > > Hi, > > > > > > forgot to mention that I had already added a libata.force=40c boot > > > after my 80c config issue discovery > > > which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg > > > yet where there's now initial but strong confirmation via reports > > > that the resulting UDMA/33 limit did NOT manage to fix corruption issues, > > > thus it's more strongly likely that the "bound to ill-suited pata_amd driver" > > > (due to incorrectly configuring speeds, or due to not configuring speeds > > > at all due to possibly BIOS not providing emulation of PCI register accesses > > > to Geode bus, which would be properly supported by pata_cs5536 OTOH) > > > thing is the actual reason and might fix it (fingers crossed). > > > > UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states > > that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33 > > should work just fine with 40-wire cable. Therefore this indeed sounds more > > like wrong driver being selected issue than a cable problem. > > > Hohumm, news flash: > > We just found out that such image corruption > (not sure yet whether it's an *identical* case of corruption though) > also can occur when having the CF image written by a premium USB combo writer > on a *completely different machine* (should have done that counter check > somewhat earlier, sorry...). > IOW, this quite likely frees the driver from being the culprit > for our specific case, i.e. we're likely talking about a stupid software issue. OK. > BUT, I'm still suspecting that we have a severe case of having the wrong driver activated: > > . lsmod: > sd_mod 25977 2 > snd_cs5535audio 6485 0 > ata_generic 2239 0 > pata_cs5536 2294 0 > pata_amd 6641 1 > libata 117483 3 ata_generic,pata_cs5536,pata_amd > scsi_mod 106093 2 sd_mod,libata > cs5535_gpio 1756 0 > > Pray tell me, this does look like pata_cs5536 gets loaded > due to also containing the PCI ID, yet then fails to bind, > as opposed to pata_amd which does (reference count 1)? Yes, pata_amd gets probed first and claims the device so pata_cs5536 can't bind to it when it gets loaded later. > We also tried the pata_cs5536.msr=1 boot parm: > > [ 0.000000] Kernel command line: [..........] pata_cs5536.msr=1 libata.force=40c [............] > > , and that did NOT cause the only recognizeable pata_cs5536-triggered > log message > printk(KERN_ERR DRV_NAME ": Using MSR regs instead of PCI\n"); > to occur in dmesg, IOW this driver *is* inactive. > > We only had a > > [ 68.861778] pata_amd 0000:00:0f.2: version 0.4.1 > [ 68.864570] scsi0 : pata_amd > [ 68.865240] scsi1 : pata_amd > > > > So, AFAICS, this leaves us with up to three (perhaps four?) corrections > that one would want to have: > > - remove the woefully improper (forgotten!?!) CS5536 ID from pata_amd > (this correction is what one would want to have, and this would work fine, > right? Famous last words...) Sounds fine to me given the description of the original commit 3957df6 ("pata_cs5536: ATA driver for Geode companion chip"): This is a driver for the ATA controller on the Geode CS5536 companion chip. The PCI device ID for this device was previously claimed by pata_amd.c but the PIO timings were not correct. This driver also works around a bug in some BIOSes that handle unaligned access to the PCI config registers poorly. Finally, the driver allows fallback to using MSR registers for configuration on BIOSes that are truly broken. Martin, what is your opinion on this? > - do a cable correction patch for libata side (I do think that indicating > 40c if even one device is 40c-only is the way to go [as long as libata > cannot handle per-device settings], for safety reasons, Why would libata need per-device cable setting? There is only one cable per-port and it is shared between master/slave devices and some fancy embedded implementations of the cable (CF slot + connector to slave device) are not changing this fundamental design issue AFAIK. > and if so that's probably also preferrable to advertising an UNK value, > since UNK would skip towards device-side cable detection > which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG) > happen to be incorrect, > e.g. especially in the case of CF cards (OTOH it specifically looks > for a 0x2000 bit being *set* for 80c indication, > thus it should be reliable after all since you'd expect ignorant/older > devices to provide zeroed fields only...) Most controllers have per-port cable detection and CS5536 is being an exception here. I don't know how this detection is implemented in the hardware / BIOS but it could be that it is already taking into the account the drive side cable detection and that is why we have two values per-port. Please also note that the cable detection is only one of components of selecting device transfer speed which also involves comparing maximum host and device capabilities. Indicating 40c in case if one device detects the cable to be 40c and the other one detects 80c would unnecessarily punish the faster device (which would then need usage of kernel parameter to make it work at the full speed). Moreover it is unlikely that the hardware / BIOS incorrectly detects 40c cable as 80c one and if this happens the error recovery should trigger on CRC transfer errors and downgrade the current speed. Therefore I would stronly suggest leaving pata_cs5536 cable detection as it is currently. > - and what about cable correction patch for IDE side? Since IDE layer > cable probing sequence algo appears to be different ("why??"), > this might require advertising a different cable flag I think that IDE CS5536 host driver doesn't need any changes w.r.t. cable detection currently. The IDE layer code is different for historical reasons and having different probe method than libata (ah, the wonders of having two pieces of code supporing the same hardware). > - perhaps pata_cs5536 DMI quirk for that board, in case UDMA/100 > (BIOS reports cable 0x03 value i.e. both 80c) happens to be too fast, > but given the advanced state of our discussion (pata_cs5536 driver > even completely inactive!!) and the DMI recognition issues that I reported > (empty strings) I don't think that's relevant, at least not now Agreed. Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Fri, Jul 19, 2013 at 04:30:22PM +0200, Bartlomiej Zolnierkiewicz wrote: > On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote: > > - do a cable correction patch for libata side (I do think that indicating > > 40c if even one device is 40c-only is the way to go [as long as libata > > cannot handle per-device settings], for safety reasons, > > Why would libata need per-device cable setting? There is only one cable > per-port and it is shared between master/slave devices and some fancy > embedded implementations of the cable (CF slot + connector to slave > device) are not changing this fundamental design issue AFAIK. I think it's simply because while the CF slot more or less obviously seems to be rated "80c capable"-equivalent (due to its short connection? c.f. various notebook situations), *extension cable*(!) connections being connected to 44pin connectors ("ATA-44" ? conflicts with ATA/44 speed naming though) seem to be rated as ATA/33 (https://de.wikipedia.org/wiki/ATA/ATAPI) or IOW UDMA 2 (see various pages of 44pin CF adapter vendors). So that means that you'd end up with a "mixed" capability primary port (however my BIOS advertises 80c for both connectors, which I suspect may not really be appropriate). If you are able to argue hard evidence that this primary port Master/Slave apparatus metallic layer connection is "one big shared resource" (i.e. from a HF / EMI POV) where behaviour ends up identical both for command and address timings for all devices for all things practical that matter, then yeah. But I have my doubts... (i.e., that it [address or command timings] actually is to be seen in a device-connection-specific light, to be reasonably safe). But https://en.wikipedia.org/wiki/Parallel_ATA "For all modern ATA host adapters this is not true, as modern ATA host adapters support independent device timing. This allows each device on the cable to transfer data at its own best speed. Even with older adapters without independent timing, this effect applies only to the data transfer phase of a read or write operation. This is usually the shortest part of a complete read or write operation." seems to tend towards indicating that libata *should* have a per-device form of .cable_detect(), unless "timing" != "cable type". And, OTOH, you simply could pose the argument that all that pseudo-intellectual reasoning does not matter, since CS5536 actively *chose* to offer *per-device-specific* cable type bits thus since it *does* offer such config possibility, libata *should* support such (unless CS5536 somehow happened to be wrong in offering such [useless?] liberty). One way to extend libata to support such functionality might be to have the driver set a flag to indicate that it has per-device cable type, and then let libata serve a .cable_detect_per_device() or some such instead. Or choose to rework all drivers to have a per-device-extended .cable_detect() instead (where most drivers would not [need to] make the per-device distinction i.e. simply return the same setting for both). > > and if so that's probably also preferrable to advertising an UNK value, > > since UNK would skip towards device-side cable detection > > which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG) > > happen to be incorrect, > > e.g. especially in the case of CF cards (OTOH it specifically looks > > for a 0x2000 bit being *set* for 80c indication, > > thus it should be reliable after all since you'd expect ignorant/older > > devices to provide zeroed fields only...) > > Most controllers have per-port cable detection and CS5536 is being > an exception here. I don't know how this detection is implemented in > the hardware / BIOS but it could be that it is already taking into > the account the drive side cable detection and that is why we have two > values per-port. Please also note that the cable detection is only one > of components of selecting device transfer speed which also involves > comparing maximum host and device capabilities. Yes, possibly because most controllers *do* have one single shared-medium cable connection location per port only, whereas CS5536 (it only offers one port in the first place, not two unlike many others!) seems to treat the two connectors (soldered-socket CF, and ATA-44 44pin, in the case of my system) differently, *for that single port*. CS5536 actually implicitly doing drive-side cable detection might be a reason for the separate bits as you say, but I doubt it (the controller would have to read out the device's ATA ID words, right??). > Indicating 40c in case if one device detects the cable to be 40c and > the other one detects 80c would unnecessarily punish the faster device > (which would then need usage of kernel parameter to make it work at > the full speed). Moreover it is unlikely that the hardware / BIOS > incorrectly detects 40c cable as 80c one and if this happens the error > recovery should trigger on CRC transfer errors and downgrade the current > speed. Therefore I would stronly suggest leaving pata_cs5536 cable > detection as it is currently. OK, those are somewhat good {counter arguments | safety mechanisms} (but see my point above about my ATA-44 connector getting advertised as 80c), so perhaps... BTW, does error recovery actively downgrade DMA mode on errors? (kernel-side? or hardware even?) Didn't know that, but if so then a more aggressive default configuration probably is acceptable. > > - and what about cable correction patch for IDE side? Since IDE layer > > cable probing sequence algo appears to be different ("why??"), > > this might require advertising a different cable flag > > I think that IDE CS5536 host driver doesn't need any changes w.r.t. > cable detection currently. > > The IDE layer code is different for historical reasons and having > different probe method than libata (ah, the wonders of having two > pieces of code supporing the same hardware). OK, so that leaves us with a possibly remaining debate on libata side only, and I'd argue for extending libata for per-device cable detect support, since that's arguably more precise and doesn't really cost anything (that callback likely is utter slowpath, init-only)... well, except for a sizeable amount of driver rework patches. Thanks, Andreas Mohr -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi, On Friday, July 19, 2013 06:45:31 PM Andreas Mohr wrote: > Hi, > > On Fri, Jul 19, 2013 at 04:30:22PM +0200, Bartlomiej Zolnierkiewicz wrote: > > On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote: > > > - do a cable correction patch for libata side (I do think that indicating > > > 40c if even one device is 40c-only is the way to go [as long as libata > > > cannot handle per-device settings], for safety reasons, > > > > Why would libata need per-device cable setting? There is only one cable > > per-port and it is shared between master/slave devices and some fancy > > embedded implementations of the cable (CF slot + connector to slave > > device) are not changing this fundamental design issue AFAIK. > > I think it's simply because while the CF slot more or less obviously > seems to be rated "80c capable"-equivalent (due to its short connection? > c.f. various notebook situations), > *extension cable*(!) connections being connected to 44pin connectors > ("ATA-44" ? conflicts with ATA/44 speed naming though) > seem to be rated as ATA/33 (https://de.wikipedia.org/wiki/ATA/ATAPI) > or IOW UDMA 2 (see various pages of 44pin CF adapter vendors). > So that means that you'd end up with a "mixed" capability primary port > (however my BIOS advertises 80c for both connectors, > which I suspect may not really be appropriate). > > If you are able to argue hard evidence that this primary port Master/Slave > apparatus metallic layer connection is "one big shared resource" > (i.e. from a HF / EMI POV) > where behaviour ends up identical both for command and address timings > for all devices for all things practical that matter, then yeah. > But I have my doubts... (i.e., that it [address or command timings] > actually is to be seen in a device-connection-specific light, > to be reasonably safe). > > But > https://en.wikipedia.org/wiki/Parallel_ATA > > "For all modern ATA host adapters this is not true, as modern ATA host > adapters support independent device timing. This allows each device on > the cable to transfer data at its own best speed. Even with older > adapters without independent timing, this effect applies only to the > data transfer phase of a read or write operation. This is usually the > shortest part of a complete read or write operation." > seems to tend towards indicating that libata *should* have a per-device > form of .cable_detect(), unless "timing" != "cable type". and indeed "timing" != "cable type".. > And, OTOH, you simply could pose the argument > that all that pseudo-intellectual reasoning does not matter, > since CS5536 actively *chose* to offer *per-device-specific* cable type bits > thus since it *does* offer such config possibility, libata *should* > support such (unless CS5536 somehow happened to be wrong in offering such > [useless?] liberty). > > > One way to extend libata to support such functionality might be > to have the driver set a flag to indicate that it has per-device cable type, > and then let libata serve a .cable_detect_per_device() or some such > instead. > > Or choose to rework all drivers to have a per-device-extended > .cable_detect() instead (where most drivers would not [need to] make the > per-device distinction i.e. simply return the same setting for both). I still don't see much need for this rework currently given that: * such hardware setups are very rare (only CS5336 PATA controller gives you per-device cable type info, no other PATA controller does it AFAIK) * we have fallback to lower speeds on errors * even CS5536 BIOS describes the 44pin connector as 80c one and returns identical cable types for both master and slave devices (as reported by you in previous mail) IOW until there is a practical need to enhance cable detection to work per device it should not be done. > > > and if so that's probably also preferrable to advertising an UNK value, > > > since UNK would skip towards device-side cable detection > > > which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG) > > > happen to be incorrect, > > > e.g. especially in the case of CF cards (OTOH it specifically looks > > > for a 0x2000 bit being *set* for 80c indication, > > > thus it should be reliable after all since you'd expect ignorant/older > > > devices to provide zeroed fields only...) > > > > Most controllers have per-port cable detection and CS5536 is being > > an exception here. I don't know how this detection is implemented in > > the hardware / BIOS but it could be that it is already taking into > > the account the drive side cable detection and that is why we have two > > values per-port. Please also note that the cable detection is only one > > of components of selecting device transfer speed which also involves > > comparing maximum host and device capabilities. > > Yes, possibly because most controllers *do* have one single shared-medium > cable connection location per port only, whereas CS5536 (it only offers > one port in the first place, not two unlike many others!) > seems to treat the two connectors > (soldered-socket CF, and ATA-44 44pin, in the case of my system) > differently, *for that single port*. > > > CS5536 actually implicitly doing drive-side cable detection might be > a reason for the separate bits as you say, but I doubt it > (the controller would have to read out the device's ATA ID words, right??). > > > > Indicating 40c in case if one device detects the cable to be 40c and > > the other one detects 80c would unnecessarily punish the faster device > > (which would then need usage of kernel parameter to make it work at > > the full speed). Moreover it is unlikely that the hardware / BIOS > > incorrectly detects 40c cable as 80c one and if this happens the error > > recovery should trigger on CRC transfer errors and downgrade the current > > speed. Therefore I would stronly suggest leaving pata_cs5536 cable > > detection as it is currently. > > OK, those are somewhat good {counter arguments | safety mechanisms} > (but see my point above about my ATA-44 connector getting advertised as 80c), > so perhaps... > > BTW, does error recovery actively downgrade DMA mode on errors? > (kernel-side? or hardware even?) Kernel downgrades UDMA modes on CRC (checked by hardware) transfer errors. > Didn't know that, but if so then a more aggressive default configuration > probably is acceptable. > > > > > - and what about cable correction patch for IDE side? Since IDE layer > > > cable probing sequence algo appears to be different ("why??"), > > > this might require advertising a different cable flag > > > > I think that IDE CS5536 host driver doesn't need any changes w.r.t. > > cable detection currently. > > > > The IDE layer code is different for historical reasons and having > > different probe method than libata (ah, the wonders of having two > > pieces of code supporing the same hardware). > > OK, so that leaves us with a possibly remaining debate on libata side only, > and I'd argue for extending libata for per-device cable detect support, > since that's arguably more precise and doesn't really cost anything > (that callback likely is utter slowpath, init-only)... > well, except for a sizeable amount of driver rework patches. Until there is a demonstreable need for it (real hardware configuration requiring such changes to work) there is no need to do it IMHO. Instead I would like to come back to the wrong driver being selected issue and fix it (cause this is a real problem with the pata_amd driver taking control over CS5536 PATA controller and pata_cs5536 never being used). Best regards, -- Bartlomiej Zolnierkiewicz Samsung R&D Institute Poland Samsung Electronics -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
==================== While trying to debug a suspected UDMA config issue on PCM-9375 (some images written end up corrupted, which quite likely is due to the combined evaluation ending up in dmesg as UDMA/100 whereas sheets of this vendor quite strictly specify up to 66 only despite CS5536 doing 100 in general), I happened to stumble on the following semi-related thing: | Do you mean that PCM-9375 documentation states that only up to UDMA/66 | is supported? : Well, not that strictly after all (it doesn't actively speak out : against > 66, merely mentions 66 only). : Dito some other internet results. : And some even mention UDMA/33 for their two primary port devices : with a soldered-socket CF / ATA-44 combo. pata_cs5536.c: static int cs5536_cable_detect(struct ata_port *ap) { struct pci_dev *pdev = to_pci_dev(ap->host->dev); u32 cfg; cs5536_read(pdev, CFG, &cfg); if (cfg & IDE_CFG_CABLE) return ATA_CBL_PATA80; else return ATA_CBL_PATA40; } This, AFAICS, is not correct. | Could you use ATA_CBL_PATA_UNK here and re-try? | (This will cause drive-side detection to trigger.) : Hmmmm, not that easy. I don't have a readily available custom kernel build : here, that would require some effort to achieve. I later found that in cs5536.c commit, you had said: * Fix cable detection to report 80-wires cable if BIOS set it for any device on a port (IDE core will do drive-side cable detection later). ...except it doesn't ;-P | IDE and libata have some differences in their cable detection codes. :-P : If you happen to say so... :) ...at least not in libata (where the same cable_detect logic was done): static int cable_is_40wire(struct ata_port *ap): /* If the controller thinks we are 40 wire, we are. */ if (ap->cbl == ATA_CBL_PATA40) return 1; /* If the controller thinks we are 80 wire, we are. */ if (ap->cbl == ATA_CBL_PATA80 || ap->cbl == ATA_CBL_SATA) return 0; . . . /* If the controller doesn't know, we scan. * * Note: We look for all 40 wire detects at this point. Any * 80 wire detect is taken to be 80 wire cable because * - in many setups only the one drive (slave if present) will * give a valid detect * - if you have a non detect capable drive you don't want it * to colour the choice */ ata_for_each_link(link, ap, EDGE) { ata_for_each_dev(dev, link, ENABLED) { if (!ata_is_40wire(dev)) return 0; } } Note that given a prior ATA_CBL_PATA80 .cable_detect() result, we're long gone here prior to any ata_is_40wire() scan... I haven't determined yet whether my BIOS does in fact configure a mixed-config cable value indication (will gain access tomorrow), but this handling does seem problematic irrespective of that. : [no, it doesn't - it has a 0x03 value, i.e. both ports BIOS-advertised as 80c] Current very experimental commit found below (any comments? And obviously this would probably have to be corrected in IDE as well...). In addition to this fix I'm planning to add a DMI quirk to restrict PCM-9375 to UDMA/66 if this would happen to be an actually most precise way to successfully fix the observed corruption. | If the vendor specifies that only UDMA/66 is supported this is a good | idea. : This turned out to become more difficult since dmidecode result is... awful : (merely empty strings for all system ID entries of this box). | (OTOH the patch below seems to be too restrictive.) : Let me respectfully and tactfully disagree with that ;) : I believe we do have an "insufficient API" issue with libata : since for this controller, speed settings are per-device rather than per-port. : Thus IMHO we *should* return the combined conservative setting : (40c in case *any* device is 40c only) : in order to not end up with an excessively fast setting. : (OTOH your ATA_CBL_PATA_UNK value suggestion perhaps is more suitable indeed, : since this then eventually shifts processing to device-side cable detect : which might be the best handling (though certainly less flexibly complete) : given current libata limitations. : BTW, today I discovered even bigger black holes: : : $ grep PCI.*CS5536 *.c : pata_amd.c: { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_CS5536_IDE), : 9 }, : pata_cs5536.c: { PCI_VDEVICE(AMD, PCI_DEVICE_ID_AMD_CS5536_IDE), : }, : : I got hinted at this by an lspci output log which shockingly listed : that CS5536 IDE was "claimed" by **pata_amd** (only!), on a 2.6.32 kernel, : i.e. a kernel long after the CS5536-specific pata_cs5536.c was added : in addition to the prior generic pata_amd.c. : I don't have lsmod output of that machine yet, but even now it strongly looks : like this config there does have both drivers and binds to the : **wrong** one (which quite likely is the actual reason for the : major data corruption screwup here, when reading between the lines of : the original patch thread at : http://www.mail-archive.com/linux-ide@vger.kernel.org/msg11033.html : (pata_amd most likely will not configure timings correctly, : which seems to have been the original reason for writing pata_cs5536.c). : Or would this be merely a limitation of that lspci version listing : claims for a single module only, yet both modules actually being : (correctly/rightfully?) loaded? : I'm planning to *somehow* (even on this hard-coded and older setup) : achieve getting pata_amd blacklisted tomorrow (blacklisting mechanisms : are/were very inconvenient and inconsistent), : to see whether indeed it's a very painful unremoved duplicate PCI ID issue : (existing since 2007 - I seem to have a good hand for uncovering : awfully long-standing issues ;-P). : : BTW 9272dcc2 "pata_cs5536: Add support for non-X86_32 platforms" : happened to also mention "pata_amd also supports cs5536 IDE controller" : - probably someone ought to have managed to realize to voice a complaint : at that time already ;) | PS Could we please move this discussion to | [2]linux-ide@vger.kernel.org | mailing list if possible? : Indeed, I should just have done that. I had intended it to be a : preliminary private inquiry, but the usual consequences of that : (an ugly rewrite) are just too "challenging". : I chose to completely rewrite the mail since the reply contained : some weird non-ASCII indent operations (perhaps weird MUA?). Thanks! From 374506ab6c3a57bd8890b5192b2047d7b96cb542 Mon Sep 17 00:00:00 2001 From: Andreas Mohr <andim2@users.sf.net> Date: Wed, 17 Jul 2013 18:49:58 +0200 Subject: [PATCH] CS5536: fix overly optimistic cable detect (handle libata API imprecision). We unconditionally indicated 80-pin cable capability in case *any* of Master/Slave devices had 80c cable presence configure-indicated by the BIOS. This, however, does not seem at all appropriate since CS5536 does manage its primary IDE's master/slave devices independently, both in cable detect and timing programming. Since libata does not offer API support for such per-device configurations yet, we really need to restrict things to the lowest common denominator. Signed-off-by: Andreas Mohr <andim2@users.sf.net> --- drivers/ata/pata_cs5536.c | 34 +++++++++++++++++++++++++++++++--- 1 Datei geändert, 31 Zeilen hinzugefügt(+), 3 Zeilen entfernt(-) diff --git a/drivers/ata/pata_cs5536.c b/drivers/ata/pata_cs5536.c index 0448860..f58bf04 100644 --- a/drivers/ata/pata_cs5536.c +++ b/drivers/ata/pata_cs5536.c @@ -146,10 +146,38 @@ static int cs5536_cable_detect(struct ata_port *ap) cs5536_read(pdev, CFG, &cfg); - if (cfg & IDE_CFG_CABLE) - return ATA_CBL_PATA80; - else + /* + * FIXME libata API limitation: + * CS5536 data sheet says + * "The IDE interface timing is completely programmable. + * Timing control covers the command active and recover pulse + * widths, and command block register accesses. The IDE + * data transfer speed for each device on each channel can + * be independently programmed allowing high speed IDE + * peripherals to co-exist on the same channel as older, + * compatible devices." and it does have per-drive-separate + * cable detect and timing programming bits for primary Master/Slave + * and in fact some devices (e.g. PCM-9375) do have e.g. + * directly-soldered (read: ~80c) CompactFlash primary Master + * and a 44pin (read: 40c) IDE primary Slave. + * However, libata only offers whole-ata_port cable detect callback + * that's arguably too imprecise for such hardware capabilities. + * Given this imprecision, we arguably are required for now + * to indicate the lowest common denominator, i.e. 40c in case + * *any* of Master/Slave does not indicate 80c support, + * in order to avoid imposing incorrect timing to the + * per-device-separate timing programming. + * This care seems especially important given evidence of + * existing other timing issues of CS5536: + * see errata 47 "UDMA Mode 5 stability issues" in PDF + * "AMD Geode (tm) CS5536 Companion Device Silicon Revision B1 + * Specification Update". + */ + bool have_any_40c_only_device = (cfg & IDE_CFG_CABLE) < IDE_CFG_CABLE; + if (have_any_40c_only_device) return ATA_CBL_PATA40; + else + return ATA_CBL_PATA80; } /**