Message ID | 95e1fc6e84cf42e094d4a8478c37b18e@FE-MBX1012.de.bosch.com |
---|---|
State | Not Applicable |
Delegated to: | Bjorn Helgaas |
Headers | show |
On Wed, Nov 02, 2016 at 09:26:55AM +0000, Koehrer Mathias (ETAS/ESW5) wrote: > On systems that have PCIe ASPM support in the BIOS enabled > the commit 387d37577fdd05e9472c20885464c2a53b3c945f may lead > to the situation that accesses to registers of enabled > PCIe devices are extremely slow. > This happens if the ACPI FADT declares incorrectly that the > system doesn't support PCIe ASPM even if this is enabled in > the BIOS. > In this case the function pcie_no_aspm() will be called. > However this sets the aspm_policy to POLICY_DEFAULT even > if CONFIG_PCIEASPM_PERFORMANCE has been configured. > As result, the ASPM on a PCIe may still be set even if > this is not expected. > > This happens e.g. on a HP workstation 800 G1 together with > an Intel dual port Ethernet server adapter i350 plugged in. > Whenever ASPM is enabled in the BIOS the access to the > LAN registers are really slow (read access: slower than 20us). > In this setup the LnkCap of the two LAN controllers and > of the integrated PCIe switch is set to "ASPM L1 Enabled" > even if the controller is configured to be up and running. > There has been a lengthy discussion on this performance issue > due to this issue on the linux-rt-users list: > http://marc.info/?l=linux-rt-users&m=147454824515022&w=2 > > This patch solves this issue by forcing to disable ASPM > if CONFIG_PCIEASPM_PERFORMANCE has been set. The 387d37577fdd changelog says that when ACPI_FADT_NO_ASPM is set, the expected behavior is for the OS to not touch the ASPM configuration, and that this is what Windows does. If I understand correctly, your proposal in this patch is to change this so that if ACPI_FADT_NO_ASPM is set and CONFIG_PCIEASPM_PERFORMANCE=y, we go ahead and disable ASPM. Only the firmware author can tell whether it's really a bug that this system sets ACPI_FADT_NO_ASPM. It could be that there's some platform issue that is avoided by leaving the BIOS ASPM configuration untouched. If it really *is* a firmware bug, and ACPI_FADT_NO_ASPM is not supposed to be set on this platform, CONFIG_PCIEASPM_PERFORMANCE doesn't feel like the right mechanism for working around it. It should be safe to enable that for all systems, and for other systems that set ACPI_FADT_NO_ASPM correctly, we should not touch ASPM configuration. Can you open a bugzilla at http://bugzilla.kernel.org and attach the complete dmesg log and "lspci -vv" output? This is a subtle area where we've had many break/fix cycles, and it's important to be able to go back and look at details of problems that motivated previous changes. BTW, the conventional commit reference format is 387d37577fdd ("PCI: Don't clear ASPM bits when the FADT declares it's unsupported") > Signed-off-by: Mathias Koehrer <mathias.koehrer@etas.com> > > --- > drivers/pci/pcie/aspm.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > Index: linux-4.8/drivers/pci/pcie/aspm.c > =================================================================== > --- linux-4.8.orig/drivers/pci/pcie/aspm.c > +++ linux-4.8/drivers/pci/pcie/aspm.c > @@ -79,10 +79,13 @@ static LIST_HEAD(link_list); > > #ifdef CONFIG_PCIEASPM_PERFORMANCE > static int aspm_policy = POLICY_PERFORMANCE; > +static int aspm_default_config_policy = POLICY_PERFORMANCE; > #elif defined CONFIG_PCIEASPM_POWERSAVE > static int aspm_policy = POLICY_POWERSAVE; > +static int aspm_default_config_policy = POLICY_POWERSAFE; > #else > static int aspm_policy; > +static int aspm_default_config_policy; > #endif > > static const char *policy_str[] = { > @@ -946,7 +949,7 @@ void pcie_no_aspm(void) > * (b) prevent userspace from changing policy > */ > if (!aspm_force) { > - aspm_policy = POLICY_DEFAULT; > + aspm_policy = aspm_default_config_policy; You would need to update the comment just above this to reflect the new code. > aspm_disabled = 1; > } > } -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bjorn, thanks for the response. > > On systems that have PCIe ASPM support in the BIOS enabled the commit > > 387d37577fdd05e9472c20885464c2a53b3c945f may lead to the situation > > that accesses to registers of enabled PCIe devices are extremely slow. > > This happens if the ACPI FADT declares incorrectly that the system > > doesn't support PCIe ASPM even if this is enabled in the BIOS. > > In this case the function pcie_no_aspm() will be called. > > However this sets the aspm_policy to POLICY_DEFAULT even if > > CONFIG_PCIEASPM_PERFORMANCE has been configured. > > As result, the ASPM on a PCIe may still be set even if this is not > > expected. > > > > This happens e.g. on a HP workstation 800 G1 together with an Intel > > dual port Ethernet server adapter i350 plugged in. > > Whenever ASPM is enabled in the BIOS the access to the LAN registers > > are really slow (read access: slower than 20us). > > In this setup the LnkCap of the two LAN controllers and of the > > integrated PCIe switch is set to "ASPM L1 Enabled" > > even if the controller is configured to be up and running. > > There has been a lengthy discussion on this performance issue due to > > this issue on the linux-rt-users list: > > http://marc.info/?l=linux-rt-users&m=147454824515022&w=2 > > > > This patch solves this issue by forcing to disable ASPM if > > CONFIG_PCIEASPM_PERFORMANCE has been set. > > The 387d37577fdd changelog says that when ACPI_FADT_NO_ASPM is set, the > expected behavior is for the OS to not touch the ASPM configuration, and that this is > what Windows does. > > If I understand correctly, your proposal in this patch is to change this so that if > ACPI_FADT_NO_ASPM is set and CONFIG_PCIEASPM_PERFORMANCE=y, > we go ahead and disable ASPM. > > Only the firmware author can tell whether it's really a bug that this system sets > ACPI_FADT_NO_ASPM. It could be that there's some platform issue that is > avoided by leaving the BIOS ASPM configuration untouched. > > If it really *is* a firmware bug, and ACPI_FADT_NO_ASPM is not supposed to be > set on this platform, CONFIG_PCIEASPM_PERFORMANCE doesn't feel like the > right mechanism for working around it. It should be safe to enable that for all > systems, and for other systems that set ACPI_FADT_NO_ASPM correctly, we > should not touch ASPM configuration. > > Can you open a bugzilla at http://bugzilla.kernel.org and attach the complete dmesg > log and "lspci -vv" output? This is a subtle area where we've had many break/fix > cycles, and it's important to be able to go back and look at details of problems that > motivated previous changes. > I have create a new bug report (id=189951). https://bugzilla.kernel.org/show_bug.cgi?id=189951 Regards Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Dec 09, 2016 at 09:57:03AM +0000, Koehrer Mathias (ETAS/ESW5) wrote: > Hi Bjorn, > > thanks for the response. > > > > On systems that have PCIe ASPM support in the BIOS enabled the commit > > > 387d37577fdd05e9472c20885464c2a53b3c945f may lead to the situation > > > that accesses to registers of enabled PCIe devices are extremely slow. > > > This happens if the ACPI FADT declares incorrectly that the system > > > doesn't support PCIe ASPM even if this is enabled in the BIOS. > > > In this case the function pcie_no_aspm() will be called. > > > However this sets the aspm_policy to POLICY_DEFAULT even if > > > CONFIG_PCIEASPM_PERFORMANCE has been configured. > > > As result, the ASPM on a PCIe may still be set even if this is not > > > expected. > > > > > > This happens e.g. on a HP workstation 800 G1 together with an Intel > > > dual port Ethernet server adapter i350 plugged in. > > > Whenever ASPM is enabled in the BIOS the access to the LAN registers > > > are really slow (read access: slower than 20us). > > > In this setup the LnkCap of the two LAN controllers and of the > > > integrated PCIe switch is set to "ASPM L1 Enabled" > > > even if the controller is configured to be up and running. > > > There has been a lengthy discussion on this performance issue due to > > > this issue on the linux-rt-users list: > > > http://marc.info/?l=linux-rt-users&m=147454824515022&w=2 > > > > > > This patch solves this issue by forcing to disable ASPM if > > > CONFIG_PCIEASPM_PERFORMANCE has been set. > > > > The 387d37577fdd changelog says that when ACPI_FADT_NO_ASPM is set, the > > expected behavior is for the OS to not touch the ASPM configuration, and that this is > > what Windows does. > > > > If I understand correctly, your proposal in this patch is to change this so that if > > ACPI_FADT_NO_ASPM is set and CONFIG_PCIEASPM_PERFORMANCE=y, > > we go ahead and disable ASPM. > > > > Only the firmware author can tell whether it's really a bug that this system sets > > ACPI_FADT_NO_ASPM. It could be that there's some platform issue that is > > avoided by leaving the BIOS ASPM configuration untouched. > > > > If it really *is* a firmware bug, and ACPI_FADT_NO_ASPM is not supposed to be > > set on this platform, CONFIG_PCIEASPM_PERFORMANCE doesn't feel like the > > right mechanism for working around it. It should be safe to enable that for all > > systems, and for other systems that set ACPI_FADT_NO_ASPM correctly, we > > should not touch ASPM configuration. > > > > Can you open a bugzilla at http://bugzilla.kernel.org and attach the complete dmesg > > log and "lspci -vv" output? This is a subtle area where we've had many break/fix > > cycles, and it's important to be able to go back and look at details of problems that > > motivated previous changes. > > > I have create a new bug report (id=189951). > https://bugzilla.kernel.org/show_bug.cgi?id=189951 Thanks for that. In addition to yours, we have the following report, which was also bisected to 387d37577fdd: https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working The changelog for 387d37577fdd doesn't mention any actual bugs that it fixes. Should we consider reverting it? Can you shed any light on this, Matthew? The 387d37577fdd *does* say that it mimics the behavior of Windows. Is there any chance you could test that, Mathias? E.g., your report is that the NIC performs well in Linux when the BIOS PCIe energy saving feature is disabled, but poorly when the BIOS feature is enabled. Is it the same under Windows? I would sort of expect that the NIC performs well under Windows with energy saving enabled, because I would think HP would have tested that and noticed any problem, but of course I have no evidence either way. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 12:54 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > Thanks for that. In addition to yours, we have the following report, > which was also bisected to 387d37577fdd: > > https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working > > The changelog for 387d37577fdd doesn't mention any actual bugs that it > fixes. Should we consider reverting it? Can you shed any light on > this, Matthew? Without this commit we disable ASPM on devices that expect to have it enabled and increase idle power consumption by approximately 200% on some laptops. Windows drivers may explicitly change the ASPM state on devices or PCIe bridges, and if so we should be mimicing that behaviour as well. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[+cc Emmanuel because of https://bugzilla.kernel.org/show_bug.cgi?id=57331] On Thu, Jan 12, 2017 at 01:48:15PM -0800, Matthew Garrett wrote: > On Thu, Jan 12, 2017 at 12:54 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > Thanks for that. In addition to yours, we have the following report, > > which was also bisected to 387d37577fdd: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working > > > > The changelog for 387d37577fdd doesn't mention any actual bugs that it > > fixes. Should we consider reverting it? Can you shed any light on > > this, Matthew? > > Without this commit we disable ASPM on devices that expect to have it > enabled and increase idle power consumption by approximately 200% on > some laptops. Windows drivers may explicitly change the ASPM state on > devices or PCIe bridges, and if so we should be mimicing that > behaviour as well. Thanks, I should have asked for that information in the 387d37577fdd changelog, but it's good to have it now. On a system with ACPI_FADT_NO_ASPM set (as in both https://bugzilla.kernel.org/show_bug.cgi?id=189951 and https://bugzilla.kernel.org/show_bug.cgi?id=102311): - We set aspm_disabled = 1 (since 5fde244d39b8). - Before 387d37577fdd, we called pcie_clear_aspm(), which disabled all ASPM link states, even if aspm_disabled == 1. - After 387d37577fdd, we leave ASPM link states as BIOS configured them. This explains why the NIC performance issue is related to this commit. A driver can explicitly change the ASPM state with pci_disable_link_state(). But that doesn't do anything when ACPI_FADT_NO_ASPM (and thus aspm_disabled) is set. Are you suggesting that we should remove the aspm_disabled check and make pci_disable_link_state() work regardless of ACPI_FADT_NO_ASPM? It's only called by drivers that presumably know about their device issues, and one would think it would be relatively safe to *disable* ASPM states (or at least safer than *enabling* them). Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 3:17 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > A driver can explicitly change the ASPM state with > pci_disable_link_state(). But that doesn't do anything when > ACPI_FADT_NO_ASPM (and thus aspm_disabled) is set. > > Are you suggesting that we should remove the aspm_disabled check and > make pci_disable_link_state() work regardless of ACPI_FADT_NO_ASPM? > It's only called by drivers that presumably know about their device > issues, and one would think it would be relatively safe to *disable* > ASPM states (or at least safer than *enabling* them). That would be my gut feeling. Windows provides an API in .inf files to disable ASPM states on driver load, but it's unclear whether that does anything if the firmware asks that ASPM be left alone. Windows drivers obviously have the ability to hit the PCI registers directly anyway, and there's evidence that some do so. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bjorn thanks for taking care of this issue. > The 387d37577fdd *does* say that it mimics the behavior of Windows. > Is there any chance you could test that, Mathias? Unfortunately no. Sorry for that. Regards Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 05:17:26PM -0600, Bjorn Helgaas wrote: > On Thu, Jan 12, 2017 at 01:48:15PM -0800, Matthew Garrett wrote: > > On Thu, Jan 12, 2017 at 12:54 PM, Bjorn Helgaas <helgaas@kernel.org> wrote: > > > > > Thanks for that. In addition to yours, we have the following report, > > > which was also bisected to 387d37577fdd: > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=102311 ASPM: ASMEDA asm1062 not working > > > > > > The changelog for 387d37577fdd doesn't mention any actual bugs that it > > > fixes. Should we consider reverting it? Can you shed any light on > > > this, Matthew? > > > > Without this commit we disable ASPM on devices that expect to have it > > enabled and increase idle power consumption by approximately 200% on > > some laptops. Windows drivers may explicitly change the ASPM state on > > devices or PCIe bridges, and if so we should be mimicing that > > behaviour as well. > > Thanks, I should have asked for that information in the 387d37577fdd > changelog, but it's good to have it now. > > On a system with ACPI_FADT_NO_ASPM set (as in both > https://bugzilla.kernel.org/show_bug.cgi?id=189951 and > https://bugzilla.kernel.org/show_bug.cgi?id=102311): > > - We set aspm_disabled = 1 (since 5fde244d39b8). > > - Before 387d37577fdd, we called pcie_clear_aspm(), which disabled > all ASPM link states, even if aspm_disabled == 1. > > - After 387d37577fdd, we leave ASPM link states as BIOS configured > them. This explains why the NIC performance issue is related to > this commit. I've been relying on the 387d37577fdd changelog, which says: Communications with a hardware vendor confirm that the expected behaviour on systems that set the FADT ASPM disable bit but which still grant full PCIe control is for the OS to leave any BIOS configuration intact and refuse to touch the ASPM bits. But I don't think the code actually matches this. After 387d37577fdd, if ACPI_FADT_NO_ASPM is set, we don't call pcie_clear_aspm(). We do call pcie_no_aspm(), which sets aspm_disabled = 1. Setting aspm_disabled affects pcie_aspm_powersave_config_link() and pcie_aspm_pm_state_change(), which are used in the runtime pci_enable_device() and pci_power_up(), and pci_set_power_state() paths. But it does not affect the pcie_aspm_init_link_state() path, where we may update ASPM common clock, L0s, and L1 control bits: pcie_aspm_init_link_state pcie_aspm_cap_init pcie_aspm_configure_common_clock # write PCI_EXP_LNKCTL_CCC, PCI_EXP_LNKCTL_RL pcie_clkpm_cap_init if (aspm_policy != POLICY_POWERSAVE) pcie_config_aspm_path while (link) pcie_config_aspm_link pcie_config_aspm_dev # write PCI_EXP_LNKCTL_ASPM_L0S, PCI_EXP_LNKCTL_ASPM_L1 link = link->parent pcie_set_clkpm # write PCI_EXP_LNKCTL_CLKREQ_EN So even if ACPI_FADT_NO_ASPM is set, I think we *do* touch the ASPM bits. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-4.8/drivers/pci/pcie/aspm.c =================================================================== --- linux-4.8.orig/drivers/pci/pcie/aspm.c +++ linux-4.8/drivers/pci/pcie/aspm.c @@ -79,10 +79,13 @@ static LIST_HEAD(link_list); #ifdef CONFIG_PCIEASPM_PERFORMANCE static int aspm_policy = POLICY_PERFORMANCE; +static int aspm_default_config_policy = POLICY_PERFORMANCE; #elif defined CONFIG_PCIEASPM_POWERSAVE static int aspm_policy = POLICY_POWERSAVE; +static int aspm_default_config_policy = POLICY_POWERSAFE; #else static int aspm_policy; +static int aspm_default_config_policy; #endif static const char *policy_str[] = { @@ -946,7 +949,7 @@ void pcie_no_aspm(void) * (b) prevent userspace from changing policy */ if (!aspm_force) { - aspm_policy = POLICY_DEFAULT; + aspm_policy = aspm_default_config_policy; aspm_disabled = 1; } }
On systems that have PCIe ASPM support in the BIOS enabled the commit 387d37577fdd05e9472c20885464c2a53b3c945f may lead to the situation that accesses to registers of enabled PCIe devices are extremely slow. This happens if the ACPI FADT declares incorrectly that the system doesn't support PCIe ASPM even if this is enabled in the BIOS. In this case the function pcie_no_aspm() will be called. However this sets the aspm_policy to POLICY_DEFAULT even if CONFIG_PCIEASPM_PERFORMANCE has been configured. As result, the ASPM on a PCIe may still be set even if this is not expected. This happens e.g. on a HP workstation 800 G1 together with an Intel dual port Ethernet server adapter i350 plugged in. Whenever ASPM is enabled in the BIOS the access to the LAN registers are really slow (read access: slower than 20us). In this setup the LnkCap of the two LAN controllers and of the integrated PCIe switch is set to "ASPM L1 Enabled" even if the controller is configured to be up and running. There has been a lengthy discussion on this performance issue due to this issue on the linux-rt-users list: http://marc.info/?l=linux-rt-users&m=147454824515022&w=2 This patch solves this issue by forcing to disable ASPM if CONFIG_PCIEASPM_PERFORMANCE has been set. Signed-off-by: Mathias Koehrer <mathias.koehrer@etas.com> --- drivers/pci/pcie/aspm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html