Message ID | 20220113154635.17581-1-pmenzel@molgen.mpg.de |
---|---|
Headers | show |
Series | Make delay before debouncing configurable | expand |
On 1/14/22 00:46, Paul Menzel wrote: > The 200 ms delay before debouncing the PHY was introduced for some buggy > old controllers. To decrease the boot time to come closer do instant > boot, add a parameter so users can override that delay. > > The current implementation has several drawbacks, and is just a proof of > concept, which some experienced Linux kernel developer can probably > implement in a better way. I do not think that a libata module parameter is not the way to go with this: libata is used by all drivers, so for a system that has multiple adapters, different delays cannot be specified easily. I am really thinking that the way to go about this is to remove the 200ms delay by default and add it only for drivers that request it with a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become ATA_LFLAG_DEBOUNCE_DELAY. The other large delay is the link stability check in sata_link_debounce(). 100ms is added (more for hotplug case) to ensure that the SStatus register DET field provides a stable value. But I cannot find any text in the AHCI and SATA IO specs that mandate such large delay. I tried to address all of the above. Please have a look at the top 4 patches in the sata-timing branch of the libata tree: git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata The sata-timing branch is for now based on libata for-5.17 branch. The 200ms delay in sata_link_resume() is gone by default, replaced with a 1ms delay (totally arbitrary). The 200ms delay is executed only if a driver has the ATA_LFLAG_DEBOUNCE_DELAY link flag set. The next part is sata_link_debounce(): I *think* that we can assume that a link is stable if we see cur_det == last_det == 3. In this case, bailing out early seems to be fine, at least on my test box (Intel dual-socket Xeon server with Intel AHCI chipset). But I only tested boot/reboot. Hotplug/unplug and suspend/resume need to be tested, but I need to go to the lab for that (working from home). Will try next week. Could you give this branch a try and check how that improves device scan times ? On my test box, which has *a lot* of drives, I see something like this: Before: [ 16.696140] ata4: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200180 irq 341 [ 17.527446] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ... -> 831 ms to get the link ready After: [ 15.957946] ata4: SATA max UDMA/133 abar m524288@0x9d200000 port 0x9d200180 irq 341 [ 16.245066] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300) ... -> 287 ms to get the link ready There are differences between the many HDDs & SSDs I have connected though. There is a lot of scheduling side effects at play, so the gains are variable in my case. A system with a single disk attached should be used for proper evaluation. Going forward, if more testing do not show any problem, I am thinking of pushing these changes to for-next to get things tested more widely and see who screams that they lost their drives :) For now, I added the ATA_LFLAG_DEBOUNCE_DELAY to the ata_piix driver only. Likely, this flag will be needed for most legacy/old adapters (which I do not have). Cheers.
Hi, not originally in the thread, but I've run into hardware where the delay was bumpy before, when I did early porting around SATA PMP code (https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want to see really old patches from 2006) This series esp of a code approach that didn't get merged might be interesting, that implements hotplug by polling: https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/ On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote: > On 1/14/22 00:46, Paul Menzel wrote: > > The 200 ms delay before debouncing the PHY was introduced for some buggy > > old controllers. To decrease the boot time to come closer do instant > > boot, add a parameter so users can override that delay. > > > > The current implementation has several drawbacks, and is just a proof of > > concept, which some experienced Linux kernel developer can probably > > implement in a better way. > I do not think that a libata module parameter is not the way to go with > this: libata is used by all drivers, so for a system that has multiple > adapters, different delays cannot be specified easily. I think this is a key thing here; and I like that your patch moves to a flag. > I am really thinking that the way to go about this is to remove the > 200ms delay by default and add it only for drivers that request it with > a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become > ATA_LFLAG_DEBOUNCE_DELAY. I agree that removing it by default is right, but I'd like to make one additional request here: Please add a libata.force= flag that lets users enable/disable the delay per adapter/link. I think this would be valuable to rule out issues where the debounce delay is needed on the drive side more than the controller side, esp. in cases of poorly implemented port multipliers as Tejun & I found back in 2006. Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay The ata_parse_force_one function as it stands can't handle a parameter to the value, so you cannot get libata.force=X.Y:debounce_delay=N without also improving ata_parse_force_one. > The other large delay is the link stability check in > sata_link_debounce(). 100ms is added (more for hotplug case) to ensure > that the SStatus register DET field provides a stable value. But I > cannot find any text in the AHCI and SATA IO specs that mandate such > large delay. Nice find! > There are differences between the many HDDs & SSDs I have connected > though. There is a lot of scheduling side effects at play, so the gains > are variable in my case. A system with a single disk attached should be > used for proper evaluation. That gets likely single-disk worst/best case, but I'm still worried about port multipliers (sadly I don't have the worst-implemented ones anymore, I sold them to some Windows users)
On 2022/01/20 2:57, Robin H. Johnson wrote: > Hi, not originally in the thread, but I've run into hardware where the > delay was bumpy before, when I did early porting around SATA PMP code > (https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want > to see really old patches from 2006) > > This series esp of a code approach that didn't get merged might be > interesting, that implements hotplug by polling: > https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/ > > On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote: >> On 1/14/22 00:46, Paul Menzel wrote: >>> The 200 ms delay before debouncing the PHY was introduced for some buggy >>> old controllers. To decrease the boot time to come closer do instant >>> boot, add a parameter so users can override that delay. >>> >>> The current implementation has several drawbacks, and is just a proof of >>> concept, which some experienced Linux kernel developer can probably >>> implement in a better way. >> I do not think that a libata module parameter is not the way to go with >> this: libata is used by all drivers, so for a system that has multiple >> adapters, different delays cannot be specified easily. > I think this is a key thing here; and I like that your patch moves to a > flag. > >> I am really thinking that the way to go about this is to remove the >> 200ms delay by default and add it only for drivers that request it with >> a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become >> ATA_LFLAG_DEBOUNCE_DELAY. > I agree that removing it by default is right, but I'd like to make one > additional request here: > Please add a libata.force= flag that lets users enable/disable the delay > per adapter/link. > > I think this would be valuable to rule out issues where the debounce > delay is needed on the drive side more than the controller side, esp. in > cases of poorly implemented port multipliers as Tejun & I found back in > 2006. > > Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay > > The ata_parse_force_one function as it stands can't handle a parameter > to the value, so you cannot get libata.force=X.Y:debounce_delay=N > without also improving ata_parse_force_one. Good point. I will look into adding this. > >> The other large delay is the link stability check in >> sata_link_debounce(). 100ms is added (more for hotplug case) to ensure >> that the SStatus register DET field provides a stable value. But I >> cannot find any text in the AHCI and SATA IO specs that mandate such >> large delay. > Nice find! > >> There are differences between the many HDDs & SSDs I have connected >> though. There is a lot of scheduling side effects at play, so the gains >> are variable in my case. A system with a single disk attached should be >> used for proper evaluation. > That gets likely single-disk worst/best case, but I'm still worried > about port multipliers (sadly I don't have the worst-implemented ones > anymore, I sold them to some Windows users) :) I have a e-sata port-multiplier box in the lab. But I need to hook it up to my test box, which means that I have to get out of home for once and go to the office :) Will do that. Port-multiplier tests are also needed to complete Hannes series renaming sysfs fields to match the debug messages.
Dear Damien, dear Robin, Am 20.01.22 um 01:14 schrieb Damien Le Moal: > On 2022/01/20 2:57, Robin H. Johnson wrote: >> Hi, not originally in the thread, but I've run into hardware where the >> delay was bumpy before, when I did early porting around SATA PMP code >> (https://dev.gentoo.org/~robbat2/patches/libata-development/ if you want >> to see really old patches from 2006) >> >> This series esp of a code approach that didn't get merged might be >> interesting, that implements hotplug by polling: >> https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/ Polling and a warning, when polling time exceeds like 10 ms, so users can contact the hardware vendor, would indeed be the most flexible solution. Robin, do you remember, why these patches were not applied? Just lack of time and review, or where there issues? >> On Fri, Jan 14, 2022 at 06:23:26PM +0900, Damien Le Moal wrote: >>> On 1/14/22 00:46, Paul Menzel wrote: >>>> The 200 ms delay before debouncing the PHY was introduced for some buggy >>>> old controllers. To decrease the boot time to come closer do instant >>>> boot, add a parameter so users can override that delay. >>>> >>>> The current implementation has several drawbacks, and is just a proof of >>>> concept, which some experienced Linux kernel developer can probably >>>> implement in a better way. >>> I do not think that a libata module parameter is not the way to go with >>> this: libata is used by all drivers, so for a system that has multiple >>> adapters, different delays cannot be specified easily. >> I think this is a key thing here; and I like that your patch moves to a >> flag. Indeed, I did not think of that. >>> I am really thinking that the way to go about this is to remove the >>> 200ms delay by default and add it only for drivers that request it with >>> a link flag. That is, ATA_LFLAG_NO_DEBOUNCE_DELAY needs to become >>> ATA_LFLAG_DEBOUNCE_DELAY. >> I agree that removing it by default is right, but I'd like to make one >> additional request here: >> Please add a libata.force= flag that lets users enable/disable the delay >> per adapter/link. >> >> I think this would be valuable to rule out issues where the debounce >> delay is needed on the drive side more than the controller side, esp. in >> cases of poorly implemented port multipliers as Tejun & I found back in >> 2006. >> >> Maybe libata.force=X.Y:no_debounce_delay & libata.force=X.Y:force_debounce_delay >> >> The ata_parse_force_one function as it stands can't handle a parameter >> to the value, so you cannot get libata.force=X.Y:debounce_delay=N >> without also improving ata_parse_force_one. > > Good point. I will look into adding this. Awesome. >>> The other large delay is the link stability check in >>> sata_link_debounce(). 100ms is added (more for hotplug case) to ensure >>> that the SStatus register DET field provides a stable value. But I >>> cannot find any text in the AHCI and SATA IO specs that mandate such >>> large delay. >> Nice find! Adding back Damien’s answer text: >> I tried to address all of the above. Please have a look at the top 4 >> patches in the sata-timing branch of the libata tree: >> >> git@gitolite.kernel.org:pub/scm/linux/kernel/git/dlemoal/libata >> >> The sata-timing branch is for now based on libata for-5.17 branch. Thank you for cooking this up. I tested this on the ASUS F2A85-M PRO (AMD, 1022:0x7801), MSI B350M MORTAR (AMD, 1022:0x7901), and IBM S822LC (Marvell, 1b4b:9235) with no issues and the expected decrease in boot time. >>> There are differences between the many HDDs & SSDs I have connected >>> though. There is a lot of scheduling side effects at play, so the gains >>> are variable in my case. A system with a single disk attached should be >>> used for proper evaluation. >> That gets likely single-disk worst/best case, but I'm still worried >> about port multipliers (sadly I don't have the worst-implemented ones >> anymore, I sold them to some Windows users) > > :) > > I have a e-sata port-multiplier box in the lab. But I need to hook it up to my > test box, which means that I have to get out of home for once and go to the > office :) Will do that. Port-multiplier tests are also needed to complete Hannes > series renaming sysfs fields to match the debug messages. Kind regards, Paul
On Mon, Feb 14, 2022 at 08:09:53AM +0100, Paul Menzel wrote: > >> This series esp of a code approach that didn't get merged might be > >> interesting, that implements hotplug by polling: > >> https://dev.gentoo.org/~robbat2/patches/libata-development/2007/00-hp-poll/ > Polling and a warning, when polling time exceeds like 10 ms, so users > can contact the hardware vendor, would indeed be the most flexible solution. > > Robin, do you remember, why these patches were not applied? Just lack of > time and review, or where there issues? Disagreement on the per-link configurability via sysfs, and then lack of time to come to an agreed solution there. The 2007 linux-ide list has some of that discussion, but there was also some on IRC that is lost to time.
On 2/24/22 23:34, Paul Menzel wrote: [...] >> Thank you for cooking this up. I tested this on the ASUS F2A85-M PRO >> (AMD, 1022:0x7801), MSI B350M MORTAR (AMD, 1022:0x7901), and IBM S822LC >> (Marvell, 1b4b:9235) with no issues and the expected decrease in boot time. > > There is still one issue I noticed. The MSI B350M MORTAR has nine(?) > SATA ports, and only one is connected to an SSD. For some of the other > unpopulated ports a delay of 100 ms happens (although in Linux kernel > threads, but still in serial and not parallel). Unfortunately, that > system uses LUKS encryption, and as things are happening in initrd, I do > not know if the delays would hold up the overall boot. I need to do more > tests. > > ``` > $ grep sata_link_hardreset 20220220-sata-hardreset.txt > 0.706289 | 0) scsi_eh-70 | | > sata_link_hardreset() { > 0.718497 | 0) scsi_eh-92 | | > sata_link_hardreset() { > 0.728425 | 0) scsi_eh-92 | # 9927.978 us | } /* > sata_link_hardreset */ > 0.811159 | 2) scsi_eh-70 | @ 104870.3 us | } /* > sata_link_hardreset */ > 0.811329 | 2) scsi_eh-72 | | > sata_link_hardreset() { > 0.920672 | 3) scsi_eh-72 | @ 109343.5 us | } /* > sata_link_hardreset */ > 0.920915 | 2) scsi_eh-78 | | > sata_link_hardreset() { > 1.024618 | 2) scsi_eh-78 | @ 103703.7 us | } /* > sata_link_hardreset */ > 1.025027 | 0) scsi_eh-80 | | > sata_link_hardreset() { > 1.128589 | 0) scsi_eh-80 | @ 103561.6 us | } /* > sata_link_hardreset */ > ``` This looks like the delay for the link stability check in sata_link_debounce(). 100ms is added (more for hotplug case) to ensure that the SStatus register DET field provides a stable value. As mentioned in another email on this subject, I cannot find any text in the AHCI and SATA IO specs that mandate such large delay. Still need to dig the history of this delay and why it was added.