mbox series

[0/2,RFC] Make delay before debouncing configurable

Message ID 20220113154635.17581-1-pmenzel@molgen.mpg.de
Headers show
Series Make delay before debouncing configurable | expand

Message

Paul Menzel Jan. 13, 2022, 3:46 p.m. UTC
The 200 ms delay before debouncing the PHY was introduced for some buggy
old controllers. To decrease the boot time to come closer do instant
boot, add a parameter so users can override that delay.

The current implementation has several drawbacks, and is just a proof of
concept, which some experienced Linux kernel developer can probably
implement in a better way.

One problem is, that the warning is shown for each link and not just per
controller.

Paul Menzel (2):
  ata: Add module parameter `debounce_delay_ms`
  ata: Warn about removal of debounce delay in Linux 5.19

 Documentation/admin-guide/kernel-parameters.txt |  6 ++++++
 drivers/ata/libata-core.c                       |  4 ++++
 drivers/ata/libata-sata.c                       | 12 +++++++++---
 drivers/ata/libata.h                            |  1 +
 4 files changed, 20 insertions(+), 3 deletions(-)

Comments

Damien Le Moal Jan. 14, 2022, 9:23 a.m. UTC | #1
On 1/14/22 00:46, Paul Menzel wrote:
> The 200 ms delay before debouncing the PHY was introduced for some buggy
> old controllers. To decrease the boot time to come closer do instant
> boot, add a parameter so users can override that delay.
> 
> The current implementation has several drawbacks, and is just a proof of
> concept, which some experienced Linux kernel developer can probably
> implement in a better way.

I do not think that a libata module parameter is not the way to go with
this: libata is used by all drivers, so for a system that has multiple
adapters, different delays cannot be specified easily.

I am really thinking that the way to go about this is to remove the
200ms delay by default and add it only for drivers that request it with
a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become
ATA_LFLAG_DEBOUNCE_DELAY.

The other large delay is the link stability check in
sata_link_debounce(). 100ms is added (more for hotplug case) to ensure
that the SStatus register DET field provides a stable value. But I
cannot find any text in the AHCI and SATA IO specs that mandate such
large delay.

I tried to address all of the above. Please have a look at the top 4
patches in the sata-timing branch of the libata tree:

git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata

The sata-timing branch is for now based on libata for-5.17 branch.

The 200ms delay in sata_link_resume() is gone by default, replaced with
a 1ms delay (totally arbitrary). The 200ms delay is executed only if a
driver has the ATA_LFLAG_DEBOUNCE_DELAY link flag set.

The next part is sata_link_debounce(): I *think* that we can assume that
a link is stable if we see cur_det == last_det == 3. In this case,
bailing out early seems to be fine, at least on my test box (Intel
dual-socket Xeon server with Intel AHCI chipset). But I only tested
boot/reboot. Hotplug/unplug and suspend/resume need to be tested, but I
need to go to the lab for that (working from home). Will try next week.

Could you give this branch a try and check how that improves device scan
times ?

On my test box, which has *a lot* of drives, I see something like this:

Before:
[   16.696140] ata4: SATA max UDMA/133 abar m524288@0x9d200000 port
0x9d200180 irq 341
[   17.527446] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
...
-> 831 ms to get the link ready

After:
 [   15.957946] ata4: SATA max UDMA/133 abar m524288@0x9d200000 port
0x9d200180 irq 341
[   16.245066] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
...
-> 287 ms to get the link ready

There are differences between the many HDDs & SSDs I have connected
though. There is a lot of scheduling side effects at play, so the gains
are variable in my case. A system with a single disk attached should be
used for proper evaluation.

Going forward, if more testing do not show any problem, I am thinking of
pushing these changes to for-next to get things tested more widely and
see who screams that they lost their drives :)
For now, I added the ATA_LFLAG_DEBOUNCE_DELAY to the ata_piix driver
only. Likely, this flag will be needed for most legacy/old adapters
(which I do not have).

Cheers.
Robin H. Johnson Jan. 19, 2022, 5:57 p.m. UTC | #2
Hi, not originally in the thread, but I've run into hardware where the
delay was bumpy before, when I did early porting around SATA PMP code
(https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want
to see really old patches from 2006)

This series esp of a code approach that didn't get merged might be
interesting, that implements hotplug by polling:
https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/

On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote:
> On 1/14/22 00:46, Paul Menzel wrote:
> > The 200 ms delay before debouncing the PHY was introduced for some buggy
> > old controllers. To decrease the boot time to come closer do instant
> > boot, add a parameter so users can override that delay.
> > 
> > The current implementation has several drawbacks, and is just a proof of
> > concept, which some experienced Linux kernel developer can probably
> > implement in a better way.
> I do not think that a libata module parameter is not the way to go with
> this: libata is used by all drivers, so for a system that has multiple
> adapters, different delays cannot be specified easily.
I think this is a key thing here; and I like that your patch moves to a
flag.

> I am really thinking that the way to go about this is to remove the
> 200ms delay by default and add it only for drivers that request it with
> a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become
> ATA_LFLAG_DEBOUNCE_DELAY.
I agree that removing it by default is right, but I'd like to make one
additional request here:
Please add a libata.force= flag that lets users enable/disable the delay
per adapter/link.

I think this would be valuable to rule out issues where the debounce
delay is needed on the drive side more than the controller side, esp. in
cases of poorly implemented port multipliers as Tejun & I found back in
2006.

Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay

The ata_parse_force_one function as it stands can't handle a parameter
to the value, so you cannot get libata.force=X.Y:debounce_delay=N
without also improving ata_parse_force_one.

> The other large delay is the link stability check in
> sata_link_debounce(). 100ms is added (more for hotplug case) to ensure
> that the SStatus register DET field provides a stable value. But I
> cannot find any text in the AHCI and SATA IO specs that mandate such
> large delay.
Nice find!

> There are differences between the many HDDs & SSDs I have connected
> though. There is a lot of scheduling side effects at play, so the gains
> are variable in my case. A system with a single disk attached should be
> used for proper evaluation.
That gets likely single-disk worst/best case, but I'm still worried
about port multipliers (sadly I don't have the worst-implemented ones
anymore, I sold them to some Windows users)
Damien Le Moal Jan. 20, 2022, 12:14 a.m. UTC | #3
On 2022/01/20 2:57, Robin H. Johnson wrote:
> Hi, not originally in the thread, but I've run into hardware where the
> delay was bumpy before, when I did early porting around SATA PMP code
> (https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want
> to see really old patches from 2006)
> 
> This series esp of a code approach that didn't get merged might be
> interesting, that implements hotplug by polling:
> https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/
> 
> On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote:
>> On 1/14/22 00:46, Paul Menzel wrote:
>>> The 200 ms delay before debouncing the PHY was introduced for some buggy
>>> old controllers. To decrease the boot time to come closer do instant
>>> boot, add a parameter so users can override that delay.
>>>
>>> The current implementation has several drawbacks, and is just a proof of
>>> concept, which some experienced Linux kernel developer can probably
>>> implement in a better way.
>> I do not think that a libata module parameter is not the way to go with
>> this: libata is used by all drivers, so for a system that has multiple
>> adapters, different delays cannot be specified easily.
> I think this is a key thing here; and I like that your patch moves to a
> flag.
> 
>> I am really thinking that the way to go about this is to remove the
>> 200ms delay by default and add it only for drivers that request it with
>> a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become
>> ATA_LFLAG_DEBOUNCE_DELAY.
> I agree that removing it by default is right, but I'd like to make one
> additional request here:
> Please add a libata.force= flag that lets users enable/disable the delay
> per adapter/link.
> 
> I think this would be valuable to rule out issues where the debounce
> delay is needed on the drive side more than the controller side, esp. in
> cases of poorly implemented port multipliers as Tejun & I found back in
> 2006.
> 
> Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay
> 
> The ata_parse_force_one function as it stands can't handle a parameter
> to the value, so you cannot get libata.force=X.Y:debounce_delay=N
> without also improving ata_parse_force_one.

Good point. I will look into adding this.

> 
>> The other large delay is the link stability check in
>> sata_link_debounce(). 100ms is added (more for hotplug case) to ensure
>> that the SStatus register DET field provides a stable value. But I
>> cannot find any text in the AHCI and SATA IO specs that mandate such
>> large delay.
> Nice find!
> 
>> There are differences between the many HDDs & SSDs I have connected
>> though. There is a lot of scheduling side effects at play, so the gains
>> are variable in my case. A system with a single disk attached should be
>> used for proper evaluation.
> That gets likely single-disk worst/best case, but I'm still worried
> about port multipliers (sadly I don't have the worst-implemented ones
> anymore, I sold them to some Windows users)

:)

I have a e-sata port-multiplier box in the lab. But I need to hook it up to my
test box, which means that I have to get out of home for once and go to the
office :) Will do that. Port-multiplier tests are also needed to complete Hannes
series renaming sysfs fields to match the debug messages.
Paul Menzel Feb. 14, 2022, 7:09 a.m. UTC | #4
Dear Damien, dear Robin,


Am 20.01.22 um 01:14 schrieb Damien Le Moal:
> On 2022/01/20 2:57, Robin H. Johnson wrote:
>> Hi, not originally in the thread, but I've run into hardware where the
>> delay was bumpy before, when I did early porting around SATA PMP code
>> (https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want
>> to see really old patches from 2006)
>>
>> This series esp of a code approach that didn't get merged might be
>> interesting, that implements hotplug by polling:
>> https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/

Polling and a warning, when polling time exceeds like 10 ms, so users 
can contact the hardware vendor, would indeed be the most flexible solution.

Robin, do you remember, why these patches were not applied? Just lack of 
time and review, or where there issues?

>> On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote:
>>> On 1/14/22 00:46, Paul Menzel wrote:
>>>> The 200 ms delay before debouncing the PHY was introduced for some buggy
>>>> old controllers. To decrease the boot time to come closer do instant
>>>> boot, add a parameter so users can override that delay.
>>>>
>>>> The current implementation has several drawbacks, and is just a proof of
>>>> concept, which some experienced Linux kernel developer can probably
>>>> implement in a better way.
>>> I do not think that a libata module parameter is not the way to go with
>>> this: libata is used by all drivers, so for a system that has multiple
>>> adapters, different delays cannot be specified easily.
>> I think this is a key thing here; and I like that your patch moves to a
>> flag.

Indeed, I did not think of that.

>>> I am really thinking that the way to go about this is to remove the
>>> 200ms delay by default and add it only for drivers that request it with
>>> a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become
>>> ATA_LFLAG_DEBOUNCE_DELAY.
>> I agree that removing it by default is right, but I'd like to make one
>> additional request here:
>> Please add a libata.force= flag that lets users enable/disable the delay
>> per adapter/link.
>>
>> I think this would be valuable to rule out issues where the debounce
>> delay is needed on the drive side more than the controller side, esp. in
>> cases of poorly implemented port multipliers as Tejun & I found back in
>> 2006.
>>
>> Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay
>>
>> The ata_parse_force_one function as it stands can't handle a parameter
>> to the value, so you cannot get libata.force=X.Y:debounce_delay=N
>> without also improving ata_parse_force_one.
> 
> Good point. I will look into adding this.

Awesome.

>>> The other large delay is the link stability check in
>>> sata_link_debounce(). 100ms is added (more for hotplug case) to ensure
>>> that the SStatus register DET field provides a stable value. But I
>>> cannot find any text in the AHCI and SATA IO specs that mandate such
>>> large delay.
>> Nice find!

Adding back Damien’s answer text:

>> I tried to address all of the above. Please have a look at the top 4
>> patches in the sata-timing branch of the libata tree:
>> 
>> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata
>> 
>> The sata-timing branch is for now based on libata for-5.17 branch.

Thank you for cooking this up. I tested this on the ASUS F2A85-M PRO 
(AMD, 1022:0x7801), MSI B350M MORTAR (AMD, 1022:0x7901), and IBM S822LC 
(Marvell, 1b4b:9235) with no issues and the expected decrease in boot time.

>>> There are differences between the many HDDs & SSDs I have connected
>>> though. There is a lot of scheduling side effects at play, so the gains
>>> are variable in my case. A system with a single disk attached should be
>>> used for proper evaluation.
>> That gets likely single-disk worst/best case, but I'm still worried
>> about port multipliers (sadly I don't have the worst-implemented ones
>> anymore, I sold them to some Windows users)
> 
> :)
> 
> I have a e-sata port-multiplier box in the lab. But I need to hook it up to my
> test box, which means that I have to get out of home for once and go to the
> office :) Will do that. Port-multiplier tests are also needed to complete Hannes
> series renaming sysfs fields to match the debug messages.


Kind regards,

Paul
Robin H. Johnson Feb. 14, 2022, 5:50 p.m. UTC | #5
On Mon, Feb 14, 2022 at 08:09:53AM +0100, Paul Menzel wrote:
> >> This series esp of a code approach that didn't get merged might be
> >> interesting, that implements hotplug by polling:
> >> https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/
> Polling and a warning, when polling time exceeds like 10 ms, so users 
> can contact the hardware vendor, would indeed be the most flexible solution.
> 
> Robin, do you remember, why these patches were not applied? Just lack of 
> time and review, or where there issues?
Disagreement on the per-link configurability via sysfs, and then lack of
time to come to an agreed solution there.

The 2007 linux-ide list has some of that discussion, but there was also
some on IRC that is lost to time.
Damien Le Moal Feb. 25, 2022, 1:15 a.m. UTC | #6
On 2/24/22 23:34, Paul Menzel wrote:
[...]
>> Thank you for cooking this up. I tested this on the ASUS F2A85-M PRO 
>> (AMD, 1022:0x7801), MSI B350M MORTAR (AMD, 1022:0x7901), and IBM S822LC 
>> (Marvell, 1b4b:9235) with no issues and the expected decrease in boot time.
> 
> There is still one issue I noticed. The MSI B350M MORTAR has nine(?) 
> SATA ports, and only one is connected to an SSD. For some of the other 
> unpopulated ports a delay of 100 ms happens (although in Linux kernel 
> threads, but still in serial and not parallel). Unfortunately, that 
> system uses LUKS encryption, and as things are happening in initrd, I do 
> not know if the delays would hold up the overall boot. I need to do more 
> tests.
> 
> ```
> $ grep sata_link_hardreset 20220220-sata-hardreset.txt
>      0.706289 |    0)   scsi_eh-70   |               | 
> sata_link_hardreset() {
>      0.718497 |    0)   scsi_eh-92   |               | 
> sata_link_hardreset() {
>      0.728425 |    0)   scsi_eh-92   | # 9927.978 us |  } /* 
> sata_link_hardreset */
>      0.811159 |    2)   scsi_eh-70   | @ 104870.3 us |  } /* 
> sata_link_hardreset */
>      0.811329 |    2)   scsi_eh-72   |               | 
> sata_link_hardreset() {
>      0.920672 |    3)   scsi_eh-72   | @ 109343.5 us |  } /* 
> sata_link_hardreset */
>      0.920915 |    2)   scsi_eh-78   |               | 
> sata_link_hardreset() {
>      1.024618 |    2)   scsi_eh-78   | @ 103703.7 us |  } /* 
> sata_link_hardreset */
>      1.025027 |    0)   scsi_eh-80   |               | 
> sata_link_hardreset() {
>      1.128589 |    0)   scsi_eh-80   | @ 103561.6 us |  } /* 
> sata_link_hardreset */
> ```

This looks like the delay for the link stability check in
sata_link_debounce(). 100ms is added (more for hotplug case) to ensure
that the SStatus register DET field provides a stable value.

As mentioned in another email on this subject, I cannot find any text in
the AHCI and SATA IO specs that mandate such large delay. Still need to
dig the history of this delay and why it was added.