diff mbox

PCI: disable auto realloc detection for lsi devices

Message ID 1423011246-14429-1-git-send-email-yinghai@kernel.org
State Changes Requested
Headers show

Commit Message

Yinghai Lu Feb. 4, 2015, 12:54 a.m. UTC
LSI cards do not work if pci core change pci bar vaules.
will need to disable/enable pcie link of bridge manually.

So disable auto realloc if there is any lsi device on that pci root
bus.

If BIOS is broken, solution would be booting pci=realloc and
1. check with lsi to get new hba firmware to handle BAR changing.
2. or disable/enable pcie link manually to get firmare reset.

Need to backport to kernel before 3.12 as this one can not be
applied directly.

Reported-by: Paul Johnson <pjay@nwtrail.com>
Tested-by: Paul Johnson <pjay@nwtrail.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92351
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+

---
 drivers/pci/setup-bus.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Bjorn Helgaas Feb. 7, 2015, 2:14 p.m. UTC | #1
On Tue, Feb 3, 2015 at 6:54 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> LSI cards do not work if pci core change pci bar vaules.
> will need to disable/enable pcie link of bridge manually.

Maybe.  There's no analysis to substantiate this theory in the
bugzilla, and there's no link to email discussion that shows what
experiments you did.

The BIOS sets the BAR.  Why does the device work when the BIOS sets
the BAR but not when Linux does?

> So disable auto realloc if there is any lsi device on that pci root
> bus.
>
> If BIOS is broken, solution would be booting pci=realloc and
> 1. check with lsi to get new hba firmware to handle BAR changing.
> 2. or disable/enable pcie link manually to get firmare reset.
>
> Need to backport to kernel before 3.12 as this one can not be
> applied directly.
>
> Reported-by: Paul Johnson <pjay@nwtrail.com>
> Tested-by: Paul Johnson <pjay@nwtrail.com>
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=92351
> Signed-off-by: Yinghai Lu <yinghai@kernel.org>
> Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
>
> ---
>  drivers/pci/setup-bus.c |   17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> Index: linux-2.6/drivers/pci/setup-bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/setup-bus.c
> +++ linux-2.6/drivers/pci/setup-bus.c
> @@ -1548,14 +1548,31 @@ static int iov_resources_unassigned(stru
>         return 0;
>  }
>
> +static int strange_dev_check(struct pci_dev *dev, void *data)
> +{
> +       bool *found = data;
> +
> +       /* LSI device firmware is not happy with changing BAR value */
> +       if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC) {
> +               *found = true;
> +               return 1; /* return early from pci_walk_bus() */
> +       }

This is pretty ugly.  Do you have evidence that *all* LSI devices have
an issue like this?

Can you use the usual quirk style rather than polluting common code
paths like this?

There's an IORESOURCE_PCI_FIXED flag that might be useful in this situation.

> +
> +       return 0;
> +}
>  static enum enable_type pci_realloc_detect(struct pci_bus *bus,
>                          enum enable_type enable_local)
>  {
>         bool unassigned = false;
> +       bool found = false;
>
>         if (enable_local != undefined)
>                 return enable_local;
>
> +       pci_walk_bus(bus, strange_dev_check, &found);
> +       if (found)
> +               return enable_local;
> +
>         pci_walk_bus(bus, iov_resources_unassigned, &unassigned);
>         if (unassigned)
>                 return auto_enabled;
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Feb. 7, 2015, 7:42 p.m. UTC | #2
On Sat, Feb 7, 2015 at 6:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> The BIOS sets the BAR.  Why does the device work when the BIOS sets
> the BAR but not when Linux does?

If you reset the firmware with disable/enable pcie link, the new BAR will
work.
...
>
>
> This is pretty ugly.  Do you have evidence that *all* LSI devices have
> an issue like this?

I hit the problem when I was working on pci bus number allocation ...
all lsi cards that I touch including the ones that use mpt2sas and megaraid_sas
have this problem.

I guess they are sharing some core code in the firmware.

>
> Can you use the usual quirk style rather than polluting common code
> paths like this?
>
> There's an IORESOURCE_PCI_FIXED flag that might be useful in this situation.
>

Let me check if i can have that work when pci=realloc is passed via command.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas Feb. 7, 2015, 9:12 p.m. UTC | #3
[+cc Paul]

On Sat, Feb 7, 2015 at 1:42 PM, Yinghai Lu <yinghai@kernel.org> wrote:
> On Sat, Feb 7, 2015 at 6:14 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>
>> The BIOS sets the BAR.  Why does the device work when the BIOS sets
>> the BAR but not when Linux does?
>
> If you reset the firmware with disable/enable pcie link, the new BAR will
> work.

I remember experiments like that with changing the bus number on an
LSI device (https://bugzilla.kernel.org/show_bug.cgi?id=84281,
http://lkml.kernel.org/r/ghiol53r9u.fsf@lena.gouders.net).  Did you do
similar experiments with link enable/disable when changing the BAR?

It sounds like you think it's standard practice for BIOSes to disable
and re-enable the link after programming BARs?  I'm not confident
about that because I don't think there's anything in the spec that
would suggest that.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Feb. 7, 2015, 10:03 p.m. UTC | #4
On Sat, Feb 7, 2015 at 1:12 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:

> It sounds like you think it's standard practice for BIOSes to disable
> and re-enable the link after programming BARs?  I'm not confident
> about that because I don't think there's anything in the spec that
> would suggest that.

That should be Firmware bug, it could save the BAR or bus info, and use them
to trap some operation from host.

I still remember that hotplug sometime does not with some LSI cards, as
firmware trap the pci conf access and not respond in time.
If I erase the firmware, the silicon always respond to host pci conf in time.

I think we can not say they violate the spec, but that is definitely not a good
practice of development.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/drivers/pci/setup-bus.c
===================================================================
--- linux-2.6.orig/drivers/pci/setup-bus.c
+++ linux-2.6/drivers/pci/setup-bus.c
@@ -1548,14 +1548,31 @@  static int iov_resources_unassigned(stru
 	return 0;
 }
 
+static int strange_dev_check(struct pci_dev *dev, void *data)
+{
+	bool *found = data;
+
+	/* LSI device firmware is not happy with changing BAR value */
+	if (dev->vendor == PCI_VENDOR_ID_LSI_LOGIC) {
+		*found = true;
+		return 1; /* return early from pci_walk_bus() */
+	}
+
+	return 0;
+}
 static enum enable_type pci_realloc_detect(struct pci_bus *bus,
 			 enum enable_type enable_local)
 {
 	bool unassigned = false;
+	bool found = false;
 
 	if (enable_local != undefined)
 		return enable_local;
 
+	pci_walk_bus(bus, strange_dev_check, &found);
+	if (found)
+		return enable_local;
+
 	pci_walk_bus(bus, iov_resources_unassigned, &unassigned);
 	if (unassigned)
 		return auto_enabled;