PCI: Add pci=safemode option

Message ID 1527650389-31575-1-git-send-email-okaya@codeaurora.org
State Not Applicable
Delegated to: Bjorn Helgaas
Headers show
Series
  • PCI: Add pci=safemode option
Related show

Commit Message

Sinan Kaya May 30, 2018, 3:19 a.m.
Adding pci=safemode kernel command line parameter to turn off all PCI
Express service driver as well as all optional PCIe features such as LTR,
Extended tags, Relaxed Ordering etc.

Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
reconfigured with by the kernel in case BIOS hands off a broken
configuration.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 Documentation/admin-guide/kernel-parameters.txt | 2 ++
 drivers/pci/pci.c                               | 7 +++++++
 drivers/pci/pci.h                               | 2 ++
 drivers/pci/pcie/portdrv_core.c                 | 2 +-
 drivers/pci/probe.c                             | 6 ++++++
 5 files changed, 18 insertions(+), 1 deletion(-)

Comments

Greg Kroah-Hartman May 30, 2018, 4:31 a.m. | #1
On Tue, May 29, 2018 at 11:19:41PM -0400, Sinan Kaya wrote:
> Adding pci=safemode kernel command line parameter to turn off all PCI
> Express service driver as well as all optional PCIe features such as LTR,
> Extended tags, Relaxed Ordering etc.
> 
> Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
> reconfigured with by the kernel in case BIOS hands off a broken
> configuration.

Why not fix the BIOS?  That's what sane platforms do :)

> 
> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> ---
>  Documentation/admin-guide/kernel-parameters.txt | 2 ++
>  drivers/pci/pci.c                               | 7 +++++++
>  drivers/pci/pci.h                               | 2 ++
>  drivers/pci/pcie/portdrv_core.c                 | 2 +-
>  drivers/pci/probe.c                             | 6 ++++++
>  5 files changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 641ec9c..247adbb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3153,6 +3153,8 @@
>  		noari		do not use PCIe ARI.
>  		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
>  				do not use PCIe ATS (and IOMMU device IOTLB).
> +		safemode	turns of all optinal PCI features. Useful
> +				for bringup/troubleshooting.

s/optinal/optional/ ?

And you should explain what exactly in PCI is "optional".  Who defines
this and where is that list and what can go wrong if those options are
not enabled?

In looking at your patch, I can't determine that at all, so there's no
way that someone just looking at this sentence will be able to
understand.

thanks,

greg k-h
Sinan Kaya May 30, 2018, 4:41 a.m. | #2
On 5/29/2018 9:31 PM, Greg Kroah-Hartman wrote:
> On Tue, May 29, 2018 at 11:19:41PM -0400, Sinan Kaya wrote:
>> Adding pci=safemode kernel command line parameter to turn off all PCI
>> Express service driver as well as all optional PCIe features such as LTR,
>> Extended tags, Relaxed Ordering etc.
>>
>> Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
>> reconfigured with by the kernel in case BIOS hands off a broken
>> configuration.
> 
> Why not fix the BIOS?  That's what sane platforms do :)
> 
>>
>> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
>> ---
>>  Documentation/admin-guide/kernel-parameters.txt | 2 ++
>>  drivers/pci/pci.c                               | 7 +++++++
>>  drivers/pci/pci.h                               | 2 ++
>>  drivers/pci/pcie/portdrv_core.c                 | 2 +-
>>  drivers/pci/probe.c                             | 6 ++++++
>>  5 files changed, 18 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 641ec9c..247adbb 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -3153,6 +3153,8 @@
>>  		noari		do not use PCIe ARI.
>>  		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
>>  				do not use PCIe ATS (and IOMMU device IOTLB).
>> +		safemode	turns of all optinal PCI features. Useful
>> +				for bringup/troubleshooting.
> 
> s/optinal/optional/ ?

sure.

> 
> And you should explain what exactly in PCI is "optional".  Who defines
> this and where is that list and what can go wrong if those options are
> not enabled?

Bjorn and I discussed the need for such a "safe" mode feature when you
want to bring up PCI for a platform. You want to turn off everything as
a starter and just stick to bare minimum.

I can add a few words describing them. The goal of this option is to keep
base PCI features with MSI only. Things like PME, AER, ASPM, Extended
Tags, LTR, Relaxed Ordering, SRIOV are all considered optional. safemode
is certainly not intended for production environments. 

I can taint the kernel as a suggestion.

I defined minimum as just booting a device and to be able to do DMA traffic
only with 0 optimization

> 
> In looking at your patch, I can't determine that at all, so there's no
> way that someone just looking at this sentence will be able to
> understand.
> 
> thanks,
> 
> greg k-h
>
Greg Kroah-Hartman May 30, 2018, 4:55 a.m. | #3
On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
> On 5/29/2018 9:31 PM, Greg Kroah-Hartman wrote:
> > On Tue, May 29, 2018 at 11:19:41PM -0400, Sinan Kaya wrote:
> >> Adding pci=safemode kernel command line parameter to turn off all PCI
> >> Express service driver as well as all optional PCIe features such as LTR,
> >> Extended tags, Relaxed Ordering etc.
> >>
> >> Also setting MPS configuration to PCIE_BUS_SAFE so that MPS and MRRS can be
> >> reconfigured with by the kernel in case BIOS hands off a broken
> >> configuration.
> > 
> > Why not fix the BIOS?  That's what sane platforms do :)
> > 
> >>
> >> Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
> >> ---
> >>  Documentation/admin-guide/kernel-parameters.txt | 2 ++
> >>  drivers/pci/pci.c                               | 7 +++++++
> >>  drivers/pci/pci.h                               | 2 ++
> >>  drivers/pci/pcie/portdrv_core.c                 | 2 +-
> >>  drivers/pci/probe.c                             | 6 ++++++
> >>  5 files changed, 18 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> >> index 641ec9c..247adbb 100644
> >> --- a/Documentation/admin-guide/kernel-parameters.txt
> >> +++ b/Documentation/admin-guide/kernel-parameters.txt
> >> @@ -3153,6 +3153,8 @@
> >>  		noari		do not use PCIe ARI.
> >>  		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
> >>  				do not use PCIe ATS (and IOMMU device IOTLB).
> >> +		safemode	turns of all optinal PCI features. Useful
> >> +				for bringup/troubleshooting.
> > 
> > s/optinal/optional/ ?
> 
> sure.
> 
> > 
> > And you should explain what exactly in PCI is "optional".  Who defines
> > this and where is that list and what can go wrong if those options are
> > not enabled?
> 
> Bjorn and I discussed the need for such a "safe" mode feature when you
> want to bring up PCI for a platform. You want to turn off everything as
> a starter and just stick to bare minimum.
> 
> I can add a few words describing them. The goal of this option is to keep
> base PCI features with MSI only. Things like PME, AER, ASPM, Extended
> Tags, LTR, Relaxed Ordering, SRIOV are all considered optional. safemode
> is certainly not intended for production environments. 

Ok, then you should say that here, or somewhere, so that people know
this.  Otherwise people will see that "hey this lets my hardware boot!"
and then never change it :(

> I can taint the kernel as a suggestion.

I would not worry about that.

> I defined minimum as just booting a device and to be able to do DMA traffic
> only with 0 optimization

Ok, again, just document this really well, so that people do not have
questions and start wondering why their devices barely seem to work.

thanks,

greg k-h
Christoph Hellwig May 30, 2018, 7:37 a.m. | #4
On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
> Bjorn and I discussed the need for such a "safe" mode feature when you
> want to bring up PCI for a platform. You want to turn off everything as
> a starter and just stick to bare minimum.

Can we please make it a config option the instead of adding code
to every kernel?  Also maybe the bringup should be in the name
to make this more clear?
Sinan Kaya May 30, 2018, 7:44 a.m. | #5
On 2018-05-30 00:37, Christoph Hellwig wrote:
> On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
>> Bjorn and I discussed the need for such a "safe" mode feature when you
>> want to bring up PCI for a platform. You want to turn off everything 
>> as
>> a starter and just stick to bare minimum.
> 
> Can we please make it a config option the instead of adding code
> to every kernel?  Also maybe the bringup should be in the name
> to make this more clear?

One other requirement was to have a runtime option rather than compile 
time option.

When someone reported a problem, we wanted to be able to say "use this 
option and see if system boots" without doing any bisects or 
recompilation.

This would be the first step in troubleshooting a system to see if 
fundamental features are working.

I don't mind changing the name
Bjorn mentioned safe option. I made it safemode. I am looking at Bjorn 
for suggestions at this moment.
Greg Kroah-Hartman May 30, 2018, 7:48 a.m. | #6
On Wed, May 30, 2018 at 12:44:29AM -0700, okaya@codeaurora.org wrote:
> On 2018-05-30 00:37, Christoph Hellwig wrote:
> > On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
> > > Bjorn and I discussed the need for such a "safe" mode feature when you
> > > want to bring up PCI for a platform. You want to turn off everything
> > > as
> > > a starter and just stick to bare minimum.
> > 
> > Can we please make it a config option the instead of adding code
> > to every kernel?  Also maybe the bringup should be in the name
> > to make this more clear?
> 
> One other requirement was to have a runtime option rather than compile time
> option.
> 
> When someone reported a problem, we wanted to be able to say "use this
> option and see if system boots" without doing any bisects or recompilation.
> 
> This would be the first step in troubleshooting a system to see if
> fundamental features are working.

That makes sense, people can not rebuild their kernels for the most
part.  Putting it behind a config option would not make sense as it
would always have to be enabled.

> I don't mind changing the name Bjorn mentioned safe option. I made it
> safemode. I am looking at Bjorn for suggestions at this moment.

"minimal"?  "basic"?  "crippled"?
"my_hardware_is_so_borked_it_needs_this_option"?  :)

Naming is hard...

greg k-h
Sinan Kaya May 30, 2018, 7:56 a.m. | #7
On 2018-05-30 00:48, Greg Kroah-Hartman wrote:
> On Wed, May 30, 2018 at 12:44:29AM -0700, okaya@codeaurora.org wrote:
>> On 2018-05-30 00:37, Christoph Hellwig wrote:
>> > On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
>> > > Bjorn and I discussed the need for such a "safe" mode feature when you
>> > > want to bring up PCI for a platform. You want to turn off everything
>> > > as
>> > > a starter and just stick to bare minimum.
>> >
>> > Can we please make it a config option the instead of adding code
>> > to every kernel?  Also maybe the bringup should be in the name
>> > to make this more clear?
>> 
>> One other requirement was to have a runtime option rather than compile 
>> time
>> option.
>> 
>> When someone reported a problem, we wanted to be able to say "use this
>> option and see if system boots" without doing any bisects or 
>> recompilation.
>> 
>> This would be the first step in troubleshooting a system to see if
>> fundamental features are working.
> 
> That makes sense, people can not rebuild their kernels for the most
> part.  Putting it behind a config option would not make sense as it
> would always have to be enabled.
> 

Here is where the discussion took place. Last 5-10  messages should 
help.


https://bugzilla.kernel.org/show_bug.cgi?id=196197


>> I don't mind changing the name Bjorn mentioned safe option. I made it
>> safemode. I am looking at Bjorn for suggestions at this moment.
> 
> "minimal"?  "basic"?  "crippled"?
> "my_hardware_is_so_borked_it_needs_this_option"?  :)
> 
> Naming is hard...
> 
> greg k-h
Sinan Kaya May 30, 2018, 8:22 a.m. | #8
On 2018-05-30 00:56, okaya@codeaurora.org wrote:
> On 2018-05-30 00:48, Greg Kroah-Hartman wrote:
>> On Wed, May 30, 2018 at 12:44:29AM -0700, okaya@codeaurora.org wrote:
>>> On 2018-05-30 00:37, Christoph Hellwig wrote:
>>> > On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
>>> > > Bjorn and I discussed the need for such a "safe" mode feature when you
>>> > > want to bring up PCI for a platform. You want to turn off everything
>>> > > as
>>> > > a starter and just stick to bare minimum.
>>> >
>>> > Can we please make it a config option the instead of adding code
>>> > to every kernel?  Also maybe the bringup should be in the name
>>> > to make this more clear?
>>> 
>>> One other requirement was to have a runtime option rather than 
>>> compile time
>>> option.
>>> 
>>> When someone reported a problem, we wanted to be able to say "use 
>>> this
>>> option and see if system boots" without doing any bisects or 
>>> recompilation.
>>> 
>>> This would be the first step in troubleshooting a system to see if
>>> fundamental features are working.
>> 
>> That makes sense, people can not rebuild their kernels for the most
>> part.  Putting it behind a config option would not make sense as it
>> would always have to be enabled.
>> 
> 
> Here is where the discussion took place. Last 5-10  messages should 
> help.
> 
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=196197
> 

Some more paper trail for general awareness.

https://lkml.org/lkml/2018/5/3/509

> 
>>> I don't mind changing the name Bjorn mentioned safe option. I made it
>>> safemode. I am looking at Bjorn for suggestions at this moment.
>> 
>> "minimal"?  "basic"?  "crippled"?
>> "my_hardware_is_so_borked_it_needs_this_option"?  :)
>> 
>> Naming is hard...
>> 
>> greg k-h
Bjorn Helgaas May 30, 2018, 2:56 p.m. | #9
On Wed, May 30, 2018 at 12:44:29AM -0700, okaya@codeaurora.org wrote:
> On 2018-05-30 00:37, Christoph Hellwig wrote:
> > On Tue, May 29, 2018 at 09:41:33PM -0700, Sinan Kaya wrote:
> > > Bjorn and I discussed the need for such a "safe" mode feature when you
> > > want to bring up PCI for a platform. You want to turn off everything
> > > as
> > > a starter and just stick to bare minimum.
> > 
> > Can we please make it a config option the instead of adding code
> > to every kernel?  Also maybe the bringup should be in the name
> > to make this more clear?
> 
> One other requirement was to have a runtime option rather than compile time
> option.
> 
> When someone reported a problem, we wanted to be able to say "use this
> option and see if system boots" without doing any bisects or recompilation.
> 
> This would be the first step in troubleshooting a system to see if
> fundamental features are working.
> 
> I don't mind changing the name
> Bjorn mentioned safe option. I made it safemode. I am looking at Bjorn for
> suggestions at this moment.

This *was* my idea, but I'm starting to think it was a bad idea.

I don't want people to use pci= parameters as the normal way to get a
system to boot.  That would be a huge support hassle (putting things
in release notes, diagnosing problems when people forget it, etc).

But the parameters *are* useful for debugging.  If we had a
"pci=safemode" and it avoided some problem, the next step would be to
narrow it down by using the more specific flags (pci=nomsi, pci=noari,
pci=no_ext_tags, etc).  So I think 95% of the value is in the specific
flags, and a "pci=safemode" might add a little bit of value but at the
cost of a small but nagging maintenance concern and code clutter.

Bjorn
Sinan Kaya May 30, 2018, 3:28 p.m. | #10
On 5/30/2018 10:56 AM, Bjorn Helgaas wrote:
> This *was* my idea, but I'm starting to think it was a bad idea.
> 
> I don't want people to use pci= parameters as the normal way to get a
> system to boot.  That would be a huge support hassle (putting things
> in release notes, diagnosing problems when people forget it, etc).
> 
> But the parameters *are* useful for debugging.  If we had a
> "pci=safemode" and it avoided some problem, the next step would be to
> narrow it down by using the more specific flags (pci=nomsi, pci=noari,
> pci=no_ext_tags, etc).  So I think 95% of the value is in the specific
> flags, and a "pci=safemode" might add a little bit of value but at the
> cost of a small but nagging maintenance concern and code clutter.

OK. Let's try noXYZ feature. Can you enumerate the XYZ features that you
want to see?
Pavel Machek June 2, 2018, 5:43 p.m. | #11
Hi!

> > And you should explain what exactly in PCI is "optional".  Who defines
> > this and where is that list and what can go wrong if those options are
> > not enabled?
> 
> Bjorn and I discussed the need for such a "safe" mode feature when you
> want to bring up PCI for a platform. You want to turn off everything as
> a starter and just stick to bare minimum.
> 
> I can add a few words describing them. The goal of this option is to keep
> base PCI features with MSI only. Things like PME, AER, ASPM, Extended
> Tags, LTR, Relaxed Ordering, SRIOV are all considered optional. safemode
> is certainly not intended for production environments. 
> 
> I can taint the kernel as a suggestion.

I don't think tainting is required. even modern platforms should work
in the safe mode.
									Pavel
Sinan Kaya June 2, 2018, 5:57 p.m. | #12
On 2018-06-02 13:43, Pavel Machek wrote:
> Hi!
> 
>> > And you should explain what exactly in PCI is "optional".  Who defines
>> > this and where is that list and what can go wrong if those options are
>> > not enabled?
>> 
>> Bjorn and I discussed the need for such a "safe" mode feature when you
>> want to bring up PCI for a platform. You want to turn off everything 
>> as
>> a starter and just stick to bare minimum.
>> 
>> I can add a few words describing them. The goal of this option is to 
>> keep
>> base PCI features with MSI only. Things like PME, AER, ASPM, Extended
>> Tags, LTR, Relaxed Ordering, SRIOV are all considered optional. 
>> safemode
>> is certainly not intended for production environments.
>> 
>> I can taint the kernel as a suggestion.
> 
> I don't think tainting is required. even modern platforms should work
> in the safe mode.

Yeah, concern was getting used to the safe mode and never running the 
full stack to fix the actual issues like getting away with crappy 
hardware and firmware.

It becomes a support issue for the community.


> 									Pavel

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 641ec9c..247adbb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3153,6 +3153,8 @@ 
 		noari		do not use PCIe ARI.
 		noats		[PCIE, Intel-IOMMU, AMD-IOMMU]
 				do not use PCIe ATS (and IOMMU device IOTLB).
+		safemode	turns of all optinal PCI features. Useful
+				for bringup/troubleshooting.
 		pcie_scan_all	Scan all possible PCIe devices.  Otherwise we
 				only look for one device below a PCIe downstream
 				port.
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d27f771..11f0282 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -115,6 +115,9 @@  static bool pcie_ari_disabled;
 /* If set, the PCIe ATS capability will not be used. */
 static bool pcie_ats_disabled;
 
+/* If set, disables most of the optional PCI features */
+bool pci_safe_mode;
+
 bool pci_ats_disabled(void)
 {
 	return pcie_ats_disabled;
@@ -5845,6 +5848,10 @@  static int __init pci_setup(char *str)
 		if (*str && (str = pcibios_setup(str)) && *str) {
 			if (!strcmp(str, "nomsi")) {
 				pci_no_msi();
+			} else if (!strncmp(str, "safemode", 8)) {
+				pr_info("PCI: safe mode with minimum features\n");
+				pci_safe_mode = true;
+				pcie_bus_config = PCIE_BUS_SAFE;
 			} else if (!strncmp(str, "noats", 5)) {
 				pr_info("PCIe: ATS is disabled\n");
 				pcie_ats_disabled = true;
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index c358e7a0..4517bcd 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -8,6 +8,8 @@ 
 
 extern const unsigned char pcie_link_speed[];
 
+extern bool pci_safe_mode;
+
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
 
 /* Functions internal to the PCI core code */
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index a5b3b3a..9fe4ed6 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -311,7 +311,7 @@  int pcie_port_device_register(struct pci_dev *dev)
 
 	/* Get and check PCI Express port services */
 	capabilities = get_port_device_capability(dev);
-	if (!capabilities)
+	if (!capabilities || pci_safe_mode)
 		return 0;
 
 	pci_set_master(dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 3840207..295b79c 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2047,6 +2047,9 @@  static void pci_configure_device(struct pci_dev *dev)
 	struct hotplug_params hpp;
 	int ret;
 
+	if (pci_safe_mode)
+		return;
+
 	pci_configure_mps(dev);
 	pci_configure_extended_tags(dev, NULL);
 	pci_configure_relaxed_ordering(dev);
@@ -2213,6 +2216,9 @@  static void pci_init_capabilities(struct pci_dev *dev)
 	/* Setup MSI caps & disable MSI/MSI-X interrupts */
 	pci_msi_setup_pci_dev(dev);
 
+	if (pci_safe_mode)
+		return;
+
 	/* Buffers for saving PCIe and PCI-X capabilities */
 	pci_allocate_cap_save_buffers(dev);