diff mbox

libata / IDE cs5536: 80c cable detect issue (and worse?)

Message ID 20130718181355.GA31889@rhlx01.hs-esslingen.de
State Not Applicable
Delegated to: David Miller
Headers show

Commit Message

Andreas Mohr July 18, 2013, 6:13 p.m. UTC
Hi,

[CC'd further affected/interested(?) people]

rewriting a private mail for public subsequent discussion
(sprinkling Bartlomiej's prior replies in between... - sorry
for the "challenging" quoting!).

Comments

Andreas Mohr July 18, 2013, 6:25 p.m. UTC | #1
Hi,

forgot to mention that I had already added a libata.force=40c boot
after my 80c config issue discovery
which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg
yet where there's now initial but strong confirmation via reports
that the resulting UDMA/33 limit did NOT manage to fix corruption issues,
thus it's more strongly likely that the "bound to ill-suited pata_amd driver"
(due to incorrectly configuring speeds, or due to not configuring speeds
at all due to possibly BIOS not providing emulation of PCI register accesses
to Geode bus, which would be properly supported by pata_cs5536 OTOH)
thing is the actual reason and might fix it (fingers crossed).

Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartlomiej Zolnierkiewicz July 19, 2013, 10:40 a.m. UTC | #2
Hi,

On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote:
> Hi,
> 
> forgot to mention that I had already added a libata.force=40c boot
> after my 80c config issue discovery
> which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg
> yet where there's now initial but strong confirmation via reports
> that the resulting UDMA/33 limit did NOT manage to fix corruption issues,
> thus it's more strongly likely that the "bound to ill-suited pata_amd driver"
> (due to incorrectly configuring speeds, or due to not configuring speeds
> at all due to possibly BIOS not providing emulation of PCI register accesses
> to Geode bus, which would be properly supported by pata_cs5536 OTOH)
> thing is the actual reason and might fix it (fingers crossed).

UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states
that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33
should work just fine with 40-wire cable. Therefore this indeed sounds more
like wrong driver being selected issue than a cable problem.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Mohr July 19, 2013, 12:26 p.m. UTC | #3
On Fri, Jul 19, 2013 at 12:40:38PM +0200, Bartlomiej Zolnierkiewicz wrote:
> 
> Hi,
> 
> On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote:
> > Hi,
> > 
> > forgot to mention that I had already added a libata.force=40c boot
> > after my 80c config issue discovery
> > which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg
> > yet where there's now initial but strong confirmation via reports
> > that the resulting UDMA/33 limit did NOT manage to fix corruption issues,
> > thus it's more strongly likely that the "bound to ill-suited pata_amd driver"
> > (due to incorrectly configuring speeds, or due to not configuring speeds
> > at all due to possibly BIOS not providing emulation of PCI register accesses
> > to Geode bus, which would be properly supported by pata_cs5536 OTOH)
> > thing is the actual reason and might fix it (fingers crossed).
> 
> UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states
> that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33
> should work just fine with 40-wire cable. Therefore this indeed sounds more
> like wrong driver being selected issue than a cable problem.


Hohumm, news flash:

We just found out that such image corruption
(not sure yet whether it's an *identical* case of corruption though)
also can occur when having the CF image written by a premium USB combo writer
on a *completely different machine* (should have done that counter check
somewhat earlier, sorry...).
IOW, this quite likely frees the driver from being the culprit
for our specific case, i.e. we're likely talking about a stupid software issue.

BUT, I'm still suspecting that we have a severe case of having the wrong driver activated:

. lsmod:
sd_mod                 25977  2
snd_cs5535audio         6485  0
ata_generic             2239  0
pata_cs5536             2294  0
pata_amd                6641  1
libata                117483  3 ata_generic,pata_cs5536,pata_amd
scsi_mod              106093  2 sd_mod,libata
cs5535_gpio             1756  0

Pray tell me, this does look like pata_cs5536 gets loaded
due to also containing the PCI ID, yet then fails to bind,
as opposed to pata_amd which does (reference count 1)?

We also tried the pata_cs5536.msr=1 boot parm:

[    0.000000] Kernel command line: [..........] pata_cs5536.msr=1 libata.force=40c [............]

, and that did NOT cause the only recognizeable pata_cs5536-triggered
log message
                printk(KERN_ERR DRV_NAME ": Using MSR regs instead of PCI\n");
to occur in dmesg, IOW this driver *is* inactive.

We only had a

[   68.861778] pata_amd 0000:00:0f.2: version 0.4.1
[   68.864570] scsi0 : pata_amd
[   68.865240] scsi1 : pata_amd



So, AFAICS, this leaves us with up to three (perhaps four?) corrections
that one would want to have:

- remove the woefully improper (forgotten!?!) CS5536 ID from pata_amd
  (this correction is what one would want to have, and this would work fine,
  right? Famous last words...)

- do a cable correction patch for libata side (I do think that indicating
  40c if even one device is 40c-only is the way to go [as long as libata
  cannot handle per-device settings], for safety reasons,
  and if so that's probably also preferrable to advertising an UNK value,
  since UNK would skip towards device-side cable detection
  which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG)
  happen to be incorrect,
  e.g. especially in the case of CF cards (OTOH it specifically looks
  for a 0x2000 bit being *set* for 80c indication,
  thus it should be reliable after all since you'd expect ignorant/older
  devices to provide zeroed fields only...)

- and what about cable correction patch for IDE side? Since IDE layer
  cable probing sequence algo appears to be different ("why??"),
  this might require advertising a different cable flag

- perhaps pata_cs5536 DMI quirk for that board, in case UDMA/100
  (BIOS reports cable 0x03 value i.e. both 80c) happens to be too fast,
  but given the advanced state of our discussion (pata_cs5536 driver
  even completely inactive!!) and the DMI recognition issues that I reported
  (empty strings) I don't think that's relevant, at least not now


Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartlomiej Zolnierkiewicz July 19, 2013, 2:30 p.m. UTC | #4
On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote:
> On Fri, Jul 19, 2013 at 12:40:38PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > 
> > Hi,
> > 
> > On Thursday, July 18, 2013 08:25:41 PM Andreas Mohr wrote:
> > > Hi,
> > > 
> > > forgot to mention that I had already added a libata.force=40c boot
> > > after my 80c config issue discovery
> > > which did successfully cause the "limiting to UDMA/33 due to 40c foo" dmesg
> > > yet where there's now initial but strong confirmation via reports
> > > that the resulting UDMA/33 limit did NOT manage to fix corruption issues,
> > > thus it's more strongly likely that the "bound to ill-suited pata_amd driver"
> > > (due to incorrectly configuring speeds, or due to not configuring speeds
> > > at all due to possibly BIOS not providing emulation of PCI register accesses
> > > to Geode bus, which would be properly supported by pata_cs5536 OTOH)
> > > thing is the actual reason and might fix it (fingers crossed).
> > 
> > UDMA/66 (and higher) requires 80-wire cable to work (if the vendor states
> > that UDMA/66 is supported then UDMA/100 should also work on CS5536). UDMA/33
> > should work just fine with 40-wire cable. Therefore this indeed sounds more
> > like wrong driver being selected issue than a cable problem.
> 
> 
> Hohumm, news flash:
> 
> We just found out that such image corruption
> (not sure yet whether it's an *identical* case of corruption though)
> also can occur when having the CF image written by a premium USB combo writer
> on a *completely different machine* (should have done that counter check
> somewhat earlier, sorry...).
> IOW, this quite likely frees the driver from being the culprit
> for our specific case, i.e. we're likely talking about a stupid software issue.

OK.

> BUT, I'm still suspecting that we have a severe case of having the wrong driver activated:
> 
> . lsmod:
> sd_mod                 25977  2
> snd_cs5535audio         6485  0
> ata_generic             2239  0
> pata_cs5536             2294  0
> pata_amd                6641  1
> libata                117483  3 ata_generic,pata_cs5536,pata_amd
> scsi_mod              106093  2 sd_mod,libata
> cs5535_gpio             1756  0
> 
> Pray tell me, this does look like pata_cs5536 gets loaded
> due to also containing the PCI ID, yet then fails to bind,
> as opposed to pata_amd which does (reference count 1)?

Yes, pata_amd gets probed first and claims the device so pata_cs5536
can't bind to it when it gets loaded later.

> We also tried the pata_cs5536.msr=1 boot parm:
> 
> [    0.000000] Kernel command line: [..........] pata_cs5536.msr=1 libata.force=40c [............]
> 
> , and that did NOT cause the only recognizeable pata_cs5536-triggered
> log message
>                 printk(KERN_ERR DRV_NAME ": Using MSR regs instead of PCI\n");
> to occur in dmesg, IOW this driver *is* inactive.
> 
> We only had a
> 
> [   68.861778] pata_amd 0000:00:0f.2: version 0.4.1
> [   68.864570] scsi0 : pata_amd
> [   68.865240] scsi1 : pata_amd
> 
> 
> 
> So, AFAICS, this leaves us with up to three (perhaps four?) corrections
> that one would want to have:
> 
> - remove the woefully improper (forgotten!?!) CS5536 ID from pata_amd
>   (this correction is what one would want to have, and this would work fine,
>   right? Famous last words...)

Sounds fine to me given the description of the original commit 3957df6
("pata_cs5536: ATA driver for Geode companion chip"):

    This is a driver for the ATA controller on the Geode CS5536 companion
    chip.  The PCI device ID for this device was previously claimed by
    pata_amd.c but the PIO timings were not correct.  This driver also
    works around a bug in some BIOSes that handle unaligned access to the
    PCI config registers poorly.  Finally, the driver allows fallback to
    using MSR registers for configuration on BIOSes that are truly
    broken.

Martin, what is your opinion on this?

> - do a cable correction patch for libata side (I do think that indicating
>   40c if even one device is 40c-only is the way to go [as long as libata
>   cannot handle per-device settings], for safety reasons,

Why would libata need per-device cable setting? There is only one cable
per-port and it is shared between master/slave devices and some fancy
embedded implementations of the cable (CF slot + connector to slave
device) are not changing this fundamental design issue AFAIK.

>   and if so that's probably also preferrable to advertising an UNK value,
>   since UNK would skip towards device-side cable detection
>   which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG)
>   happen to be incorrect,
>   e.g. especially in the case of CF cards (OTOH it specifically looks
>   for a 0x2000 bit being *set* for 80c indication,
>   thus it should be reliable after all since you'd expect ignorant/older
>   devices to provide zeroed fields only...)

Most controllers have per-port cable detection and CS5536 is being
an exception here. I don't know how this detection is implemented in
the hardware / BIOS but it could be that it is already taking into
the account the drive side cable detection and that is why we have two
values per-port. Please also note that the cable detection is only one
of components of selecting device transfer speed which also involves
comparing maximum host and device capabilities.

Indicating 40c in case if one device detects the cable to be 40c and
the other one detects 80c would unnecessarily punish the faster device
(which would then need usage of kernel parameter to make it work at
the full speed). Moreover it is unlikely that the hardware / BIOS
incorrectly detects 40c cable as 80c one and if this happens the error
recovery should trigger on CRC transfer errors and downgrade the current
speed. Therefore I would stronly suggest leaving pata_cs5536 cable
detection as it is currently.

> - and what about cable correction patch for IDE side? Since IDE layer
>   cable probing sequence algo appears to be different ("why??"),
>   this might require advertising a different cable flag

I think that IDE CS5536 host driver doesn't need any changes w.r.t.
cable detection currently.

The IDE layer code is different for historical reasons and having
different probe method than libata (ah, the wonders of having two
pieces of code supporing the same hardware).

> - perhaps pata_cs5536 DMI quirk for that board, in case UDMA/100
>   (BIOS reports cable 0x03 value i.e. both 80c) happens to be too fast,
>   but given the advanced state of our discussion (pata_cs5536 driver
>   even completely inactive!!) and the DMI recognition issues that I reported
>   (empty strings) I don't think that's relevant, at least not now

Agreed.

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Mohr July 19, 2013, 4:45 p.m. UTC | #5
Hi,

On Fri, Jul 19, 2013 at 04:30:22PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote:
> > - do a cable correction patch for libata side (I do think that indicating
> >   40c if even one device is 40c-only is the way to go [as long as libata
> >   cannot handle per-device settings], for safety reasons,
> 
> Why would libata need per-device cable setting? There is only one cable
> per-port and it is shared between master/slave devices and some fancy
> embedded implementations of the cable (CF slot + connector to slave
> device) are not changing this fundamental design issue AFAIK.

I think it's simply because while the CF slot more or less obviously
seems to be rated "80c capable"-equivalent (due to its short connection?
c.f. various notebook situations),
*extension cable*(!) connections being connected to 44pin connectors
("ATA-44" ? conflicts with ATA/44 speed naming though)
seem to be rated as ATA/33 (https://de.wikipedia.org/wiki/ATA/ATAPI)
or IOW UDMA 2 (see various pages of 44pin CF adapter vendors).
So that means that you'd end up with a "mixed" capability primary port
(however my BIOS advertises 80c for both connectors,
which I suspect may not really be appropriate).

If you are able to argue hard evidence that this primary port Master/Slave
apparatus metallic layer connection is "one big shared resource"
(i.e. from a HF / EMI POV)
where behaviour ends up identical both for command and address timings
for all devices for all things practical that matter, then yeah.
But I have my doubts... (i.e., that it [address or command timings]
actually is to be seen in a device-connection-specific light,
to be reasonably safe).


But
https://en.wikipedia.org/wiki/Parallel_ATA

"For all modern ATA host adapters this is not true, as modern ATA host
adapters support independent device timing. This allows each device on
the cable to transfer data at its own best speed. Even with older
adapters without independent timing, this effect applies only to the
data transfer phase of a read or write operation. This is usually the
shortest part of a complete read or write operation."

seems to tend towards indicating that libata *should* have a per-device
form of .cable_detect(), unless "timing" != "cable type".


And, OTOH, you simply could pose the argument
that all that pseudo-intellectual reasoning does not matter,
since CS5536 actively *chose* to offer *per-device-specific* cable type bits
thus since it *does* offer such config possibility, libata *should*
support such (unless CS5536 somehow happened to be wrong in offering such
[useless?] liberty).


One way to extend libata to support such functionality might be
to have the driver set a flag to indicate that it has per-device cable type,
and then let libata serve a .cable_detect_per_device() or some such
instead.

Or choose to rework all drivers to have a per-device-extended
.cable_detect() instead (where most drivers would not [need to] make the
per-device distinction i.e. simply return the same setting for both).


> >   and if so that's probably also preferrable to advertising an UNK value,
> >   since UNK would skip towards device-side cable detection
> >   which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG)
> >   happen to be incorrect,
> >   e.g. especially in the case of CF cards (OTOH it specifically looks
> >   for a 0x2000 bit being *set* for 80c indication,
> >   thus it should be reliable after all since you'd expect ignorant/older
> >   devices to provide zeroed fields only...)
> 
> Most controllers have per-port cable detection and CS5536 is being
> an exception here. I don't know how this detection is implemented in
> the hardware / BIOS but it could be that it is already taking into
> the account the drive side cable detection and that is why we have two
> values per-port. Please also note that the cable detection is only one
> of components of selecting device transfer speed which also involves
> comparing maximum host and device capabilities.

Yes, possibly because most controllers *do* have one single shared-medium
cable connection location per port only, whereas CS5536 (it only offers
one port in the first place, not two unlike many others!)
seems to treat the two connectors
(soldered-socket CF, and ATA-44 44pin, in the case of my system)
differently, *for that single port*.


CS5536 actually implicitly doing drive-side cable detection might be
a reason for the separate bits as you say, but I doubt it
(the controller would have to read out the device's ATA ID words, right??).


> Indicating 40c in case if one device detects the cable to be 40c and
> the other one detects 80c would unnecessarily punish the faster device
> (which would then need usage of kernel parameter to make it work at
> the full speed). Moreover it is unlikely that the hardware / BIOS
> incorrectly detects 40c cable as 80c one and if this happens the error
> recovery should trigger on CRC transfer errors and downgrade the current
> speed. Therefore I would stronly suggest leaving pata_cs5536 cable
> detection as it is currently.

OK, those are somewhat good {counter arguments | safety mechanisms}
(but see my point above about my ATA-44 connector getting advertised as 80c),
so perhaps...

BTW, does error recovery actively downgrade DMA mode on errors?
(kernel-side? or hardware even?)
Didn't know that, but if so then a more aggressive default configuration
probably is acceptable.


> > - and what about cable correction patch for IDE side? Since IDE layer
> >   cable probing sequence algo appears to be different ("why??"),
> >   this might require advertising a different cable flag
> 
> I think that IDE CS5536 host driver doesn't need any changes w.r.t.
> cable detection currently.
> 
> The IDE layer code is different for historical reasons and having
> different probe method than libata (ah, the wonders of having two
> pieces of code supporing the same hardware).

OK, so that leaves us with a possibly remaining debate on libata side only,
and I'd argue for extending libata for per-device cable detect support,
since that's arguably more precise and doesn't really cost anything
(that callback likely is utter slowpath, init-only)...
well, except for a sizeable amount of driver rework patches.


Thanks,

Andreas Mohr
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartlomiej Zolnierkiewicz Aug. 6, 2013, 3:47 p.m. UTC | #6
Hi,

On Friday, July 19, 2013 06:45:31 PM Andreas Mohr wrote:
> Hi,
> 
> On Fri, Jul 19, 2013 at 04:30:22PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Friday, July 19, 2013 02:26:53 PM Andreas Mohr wrote:
> > > - do a cable correction patch for libata side (I do think that indicating
> > >   40c if even one device is 40c-only is the way to go [as long as libata
> > >   cannot handle per-device settings], for safety reasons,
> > 
> > Why would libata need per-device cable setting? There is only one cable
> > per-port and it is shared between master/slave devices and some fancy
> > embedded implementations of the cable (CF slot + connector to slave
> > device) are not changing this fundamental design issue AFAIK.
> 
> I think it's simply because while the CF slot more or less obviously
> seems to be rated "80c capable"-equivalent (due to its short connection?
> c.f. various notebook situations),
> *extension cable*(!) connections being connected to 44pin connectors
> ("ATA-44" ? conflicts with ATA/44 speed naming though)
> seem to be rated as ATA/33 (https://de.wikipedia.org/wiki/ATA/ATAPI)
> or IOW UDMA 2 (see various pages of 44pin CF adapter vendors).
> So that means that you'd end up with a "mixed" capability primary port
> (however my BIOS advertises 80c for both connectors,
> which I suspect may not really be appropriate).
> 
> If you are able to argue hard evidence that this primary port Master/Slave
> apparatus metallic layer connection is "one big shared resource"
> (i.e. from a HF / EMI POV)
> where behaviour ends up identical both for command and address timings
> for all devices for all things practical that matter, then yeah.
> But I have my doubts... (i.e., that it [address or command timings]
> actually is to be seen in a device-connection-specific light,
> to be reasonably safe).
> 
> But
> https://en.wikipedia.org/wiki/Parallel_ATA
> 
> "For all modern ATA host adapters this is not true, as modern ATA host
> adapters support independent device timing. This allows each device on
> the cable to transfer data at its own best speed. Even with older
> adapters without independent timing, this effect applies only to the
> data transfer phase of a read or write operation. This is usually the
> shortest part of a complete read or write operation."
> seems to tend towards indicating that libata *should* have a per-device
> form of .cable_detect(), unless "timing" != "cable type".

and indeed "timing" != "cable type"..

> And, OTOH, you simply could pose the argument
> that all that pseudo-intellectual reasoning does not matter,
> since CS5536 actively *chose* to offer *per-device-specific* cable type bits
> thus since it *does* offer such config possibility, libata *should*
> support such (unless CS5536 somehow happened to be wrong in offering such
> [useless?] liberty).
> 
> 
> One way to extend libata to support such functionality might be
> to have the driver set a flag to indicate that it has per-device cable type,
> and then let libata serve a .cable_detect_per_device() or some such
> instead.
> 
> Or choose to rework all drivers to have a per-device-extended
> .cable_detect() instead (where most drivers would not [need to] make the
> per-device distinction i.e. simply return the same setting for both).

I still don't see much need for this rework currently given that:
* such hardware setups are very rare (only CS5336 PATA controller gives
  you per-device cable type info, no other PATA controller does it AFAIK)
* we have fallback to lower speeds on errors
* even CS5536 BIOS describes the 44pin connector as 80c one and returns
  identical cable types for both master and slave devices (as reported by
  you in previous mail)

IOW until there is a practical need to enhance cable detection to work
per device it should not be done.

> > >   and if so that's probably also preferrable to advertising an UNK value,
> > >   since UNK would skip towards device-side cable detection
> > >   which might be unreliable in case ATA ID values (i.e., ATA_ID_HW_CONFIG)
> > >   happen to be incorrect,
> > >   e.g. especially in the case of CF cards (OTOH it specifically looks
> > >   for a 0x2000 bit being *set* for 80c indication,
> > >   thus it should be reliable after all since you'd expect ignorant/older
> > >   devices to provide zeroed fields only...)
> > 
> > Most controllers have per-port cable detection and CS5536 is being
> > an exception here. I don't know how this detection is implemented in
> > the hardware / BIOS but it could be that it is already taking into
> > the account the drive side cable detection and that is why we have two
> > values per-port. Please also note that the cable detection is only one
> > of components of selecting device transfer speed which also involves
> > comparing maximum host and device capabilities.
> 
> Yes, possibly because most controllers *do* have one single shared-medium
> cable connection location per port only, whereas CS5536 (it only offers
> one port in the first place, not two unlike many others!)
> seems to treat the two connectors
> (soldered-socket CF, and ATA-44 44pin, in the case of my system)
> differently, *for that single port*.
> 
> 
> CS5536 actually implicitly doing drive-side cable detection might be
> a reason for the separate bits as you say, but I doubt it
> (the controller would have to read out the device's ATA ID words, right??).
> 
> 
> > Indicating 40c in case if one device detects the cable to be 40c and
> > the other one detects 80c would unnecessarily punish the faster device
> > (which would then need usage of kernel parameter to make it work at
> > the full speed). Moreover it is unlikely that the hardware / BIOS
> > incorrectly detects 40c cable as 80c one and if this happens the error
> > recovery should trigger on CRC transfer errors and downgrade the current
> > speed. Therefore I would stronly suggest leaving pata_cs5536 cable
> > detection as it is currently.
> 
> OK, those are somewhat good {counter arguments | safety mechanisms}
> (but see my point above about my ATA-44 connector getting advertised as 80c),
> so perhaps...
> 
> BTW, does error recovery actively downgrade DMA mode on errors?
> (kernel-side? or hardware even?)

Kernel downgrades UDMA modes on CRC (checked by hardware) transfer errors.

> Didn't know that, but if so then a more aggressive default configuration
> probably is acceptable.
> 
> 
> > > - and what about cable correction patch for IDE side? Since IDE layer
> > >   cable probing sequence algo appears to be different ("why??"),
> > >   this might require advertising a different cable flag
> > 
> > I think that IDE CS5536 host driver doesn't need any changes w.r.t.
> > cable detection currently.
> > 
> > The IDE layer code is different for historical reasons and having
> > different probe method than libata (ah, the wonders of having two
> > pieces of code supporing the same hardware).
> 
> OK, so that leaves us with a possibly remaining debate on libata side only,
> and I'd argue for extending libata for per-device cable detect support,
> since that's arguably more precise and doesn't really cost anything
> (that callback likely is utter slowpath, init-only)...
> well, except for a sizeable amount of driver rework patches.

Until there is a demonstreable need for it (real hardware configuration
requiring such changes to work) there is no need to do it IMHO.

Instead I would like to come back to the wrong driver being selected
issue and fix it (cause this is a real problem with the pata_amd driver
taking control over CS5536 PATA controller and pata_cs5536 never being
used).

Best regards,
--
Bartlomiej Zolnierkiewicz
Samsung R&D Institute Poland
Samsung Electronics

--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

====================

While trying to debug a suspected UDMA config issue on PCM-9375
(some images written end up corrupted,
which quite likely is due to the combined evaluation ending up in dmesg as
UDMA/100 whereas sheets of this vendor quite strictly specify up to 66 only
despite CS5536 doing 100 in general),
I happened to stumble on the following semi-related thing:

| Do you mean that PCM-9375 documentation states that only up to UDMA/66
|   is supported?

: Well, not that strictly after all (it doesn't actively speak out
: against > 66, merely mentions 66 only).
: Dito some other internet results.
: And some even mention UDMA/33 for their two primary port devices
: with a soldered-socket CF / ATA-44 combo.



pata_cs5536.c:

static int cs5536_cable_detect(struct ata_port *ap)
{
        struct pci_dev *pdev = to_pci_dev(ap->host->dev);
        u32 cfg;

        cs5536_read(pdev, CFG, &cfg);

        if (cfg & IDE_CFG_CABLE)
                return ATA_CBL_PATA80;
        else
                return ATA_CBL_PATA40;
}


This, AFAICS, is not correct.

| Could you use ATA_CBL_PATA_UNK here and re-try?
|   (This will cause drive-side detection to trigger.)

: Hmmmm, not that easy. I don't have a readily available custom kernel build
: here, that would require some effort to achieve.



I later found that in cs5536.c commit, you had said:

    * Fix cable detection to report 80-wires cable if BIOS set it for any
      device on a port (IDE core will do drive-side cable detection later).

...except it doesn't ;-P

| IDE and libata have some differences in their cable detection codes. :-P

: If you happen to say so... :)



...at least not in libata (where the same cable_detect logic was done):

static int cable_is_40wire(struct ata_port *ap):

        /* If the controller thinks we are 40 wire, we are. */
        if (ap->cbl == ATA_CBL_PATA40)
                return 1;

        /* If the controller thinks we are 80 wire, we are. */
        if (ap->cbl == ATA_CBL_PATA80 || ap->cbl == ATA_CBL_SATA)
                return 0;
.
.
.
        /* If the controller doesn't know, we scan.
         *
         * Note: We look for all 40 wire detects at this point.  Any
         *       80 wire detect is taken to be 80 wire cable because
         * - in many setups only the one drive (slave if present) will
         *   give a valid detect
         * - if you have a non detect capable drive you don't want it
         *   to colour the choice
         */
        ata_for_each_link(link, ap, EDGE) {
                ata_for_each_dev(dev, link, ENABLED) {
                        if (!ata_is_40wire(dev))
                                return 0;
                }
        }


Note that given a prior ATA_CBL_PATA80 .cable_detect() result,
we're long gone here prior to any ata_is_40wire() scan...


I haven't determined yet whether my BIOS does in fact
configure a mixed-config cable value indication (will gain access tomorrow),
but this handling does seem problematic irrespective of that.

: [no, it doesn't - it has a 0x03 value, i.e. both ports BIOS-advertised as 80c]



Current very experimental commit found below (any comments?
And obviously this would probably have to be corrected in IDE as well...).


In addition to this fix I'm planning to add a DMI quirk to restrict
PCM-9375 to UDMA/66 if this would happen to be an actually most precise way
to successfully fix the observed corruption.

| If the vendor specifies that only UDMA/66 is supported this is a good
| idea.

: This turned out to become more difficult since dmidecode result is... awful
: (merely empty strings for all system ID entries of this box).


| (OTOH the patch below seems to be too restrictive.)

: Let me respectfully and tactfully disagree with that ;)
: I believe we do have an "insufficient API" issue with libata
: since for this controller, speed settings are per-device rather than per-port.
: Thus IMHO we *should* return the combined conservative setting
: (40c in case *any* device is 40c only)
: in order to not end up with an excessively fast setting.
: (OTOH your ATA_CBL_PATA_UNK value suggestion perhaps is more suitable indeed,
: since this then eventually shifts processing to device-side cable detect
: which might be the best handling (though certainly less flexibly complete)
: given current libata limitations.


: BTW, today I discovered even bigger black holes:
: 
: $ grep PCI.*CS5536 *.c
: pata_amd.c:     { PCI_VDEVICE(AMD,      PCI_DEVICE_ID_AMD_CS5536_IDE),
: 9 },
: pata_cs5536.c:  { PCI_VDEVICE(AMD,      PCI_DEVICE_ID_AMD_CS5536_IDE),
: },
: 
: I got hinted at this by an lspci output log which shockingly listed
: that CS5536 IDE was "claimed" by **pata_amd** (only!), on a 2.6.32 kernel,
: i.e. a kernel long after the CS5536-specific pata_cs5536.c was added
: in addition to the prior generic pata_amd.c.
: I don't have lsmod output of that machine yet, but even now it strongly looks
: like this config there does have both drivers and binds to the
: **wrong** one (which quite likely is the actual reason for the
: major data corruption screwup here, when reading between the lines of
: the original patch thread at
: http://www.mail-archive.com/linux-ide@vger.kernel.org/msg11033.html
: (pata_amd most likely will not configure timings correctly,
: which seems to have been the original reason for writing pata_cs5536.c).
: Or would this be merely a limitation of that lspci version listing
: claims for a single module only, yet both modules actually being
: (correctly/rightfully?) loaded?
: I'm planning to *somehow* (even on this hard-coded and older setup)
: achieve getting pata_amd blacklisted tomorrow (blacklisting mechanisms
: are/were very inconvenient and inconsistent),
: to see whether indeed it's a very painful unremoved duplicate PCI ID issue
: (existing since 2007 - I seem to have a good hand for uncovering
: awfully long-standing issues ;-P).
:
: BTW 9272dcc2 "pata_cs5536: Add support for non-X86_32 platforms"
: happened to also mention "pata_amd also supports cs5536 IDE controller"
: - probably someone ought to have managed to realize to voice a complaint
: at that time already ;)



| PS Could we please move this discussion to
| [2]linux-ide@vger.kernel.org
| mailing list if possible?

: Indeed, I should just have done that. I had intended it to be a
: preliminary private inquiry, but the usual consequences of that
: (an ugly rewrite) are just too "challenging".
: I chose to completely rewrite the mail since the reply contained
: some weird non-ASCII indent operations (perhaps weird MUA?).

Thanks!


From 374506ab6c3a57bd8890b5192b2047d7b96cb542 Mon Sep 17 00:00:00 2001
From: Andreas Mohr <andim2@users.sf.net>
Date: Wed, 17 Jul 2013 18:49:58 +0200
Subject: [PATCH] CS5536: fix overly optimistic cable detect (handle libata
 API imprecision).

We unconditionally indicated 80-pin cable capability
in case *any* of Master/Slave devices had 80c cable presence
configure-indicated by the BIOS.
This, however, does not seem at all appropriate since CS5536
does manage its primary IDE's master/slave devices independently,
both in cable detect and timing programming.
Since libata does not offer API support for such per-device
configurations yet, we really need to restrict things to the lowest
common denominator.

Signed-off-by: Andreas Mohr <andim2@users.sf.net>
---
 drivers/ata/pata_cs5536.c | 34 +++++++++++++++++++++++++++++++---
 1 Datei geändert, 31 Zeilen hinzugefügt(+), 3 Zeilen entfernt(-)

diff --git a/drivers/ata/pata_cs5536.c b/drivers/ata/pata_cs5536.c
index 0448860..f58bf04 100644
--- a/drivers/ata/pata_cs5536.c
+++ b/drivers/ata/pata_cs5536.c
@@ -146,10 +146,38 @@  static int cs5536_cable_detect(struct ata_port *ap)
 
 	cs5536_read(pdev, CFG, &cfg);
 
-	if (cfg & IDE_CFG_CABLE)
-		return ATA_CBL_PATA80;
-	else
+	/*
+	 * FIXME libata API limitation:
+	 * CS5536 data sheet says
+	 * "The IDE interface timing is completely programmable.
+	 * Timing control covers the command active and recover pulse
+	 * widths, and command block register accesses. The IDE
+	 * data transfer speed for each device on each channel can
+	 * be independently programmed allowing high speed IDE
+	 * peripherals to co-exist on the same channel as older,
+	 * compatible devices." and it does have per-drive-separate
+	 * cable detect and timing programming bits for primary Master/Slave
+	 * and in fact some devices (e.g. PCM-9375) do have e.g.
+	 * directly-soldered (read: ~80c) CompactFlash primary Master
+	 * and a 44pin (read: 40c) IDE primary Slave.
+	 * However, libata only offers whole-ata_port cable detect callback
+	 * that's arguably too imprecise for such hardware capabilities.
+	 * Given this imprecision, we arguably are required for now
+	 * to indicate the lowest common denominator, i.e. 40c in case
+	 * *any* of Master/Slave does not indicate 80c support,
+	 * in order to avoid imposing incorrect timing to the
+	 * per-device-separate timing programming.
+	 * This care seems especially important given evidence of
+	 * existing other timing issues of CS5536:
+	 * see errata 47 "UDMA Mode 5 stability issues" in PDF
+	 * "AMD Geode (tm) CS5536 Companion Device Silicon Revision B1
+	 * Specification Update".
+	 */
+	bool have_any_40c_only_device = (cfg & IDE_CFG_CABLE) < IDE_CFG_CABLE;
+	if (have_any_40c_only_device)
 		return ATA_CBL_PATA40;
+	else
+		return ATA_CBL_PATA80;
 }
 
 /**