Message ID | 20170830142454.10971-4-jglauber@cavium.com |
---|---|
State | Superseded |
Headers | show |
Series | Workaround for bus/slot reset on Cavium cn8xxx root ports | expand |
On Wed, 30 Aug 2017 16:24:54 +0200 Jan Glauber <jglauber@cavium.com> wrote: > Root ports of cn8xxx do not function after a slot reset when used with > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on > these root ports. > > Signed-off-by: Jan Glauber <jglauber@cavium.com> > --- > drivers/pci/quirks.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index 85191b8..6679971 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); > #endif > > +/* > + * Root port on some Cavium CN8xxx chips do not successfully complete > + * a bus reset when used with certain types of child devices. Config > + * space access to the child may quit responding. Flag all devices under > + * the secondary bus as non-resettable. > + */ > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) > +{ > + struct pci_dev *pdev; > + > + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); > + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) > + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > +} > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); > + > /* > * Some settings of MMRBC can lead to data corruption so block changes. > * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide This doesn't seem reliable, doesn't the user just need to remove and reprobe the slot and the device would re-appear without this flag set? Thanks, Alex
On Wed, Aug 30, 2017 at 08:40:12AM -0600, Alex Williamson wrote: > On Wed, 30 Aug 2017 16:24:54 +0200 > Jan Glauber <jglauber@cavium.com> wrote: > > > Root ports of cn8xxx do not function after a slot reset when used with > > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on > > these root ports. > > > > Signed-off-by: Jan Glauber <jglauber@cavium.com> > > --- > > drivers/pci/quirks.c | 16 ++++++++++++++++ > > 1 file changed, 16 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index 85191b8..6679971 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); > > #endif > > > > +/* > > + * Root port on some Cavium CN8xxx chips do not successfully complete > > + * a bus reset when used with certain types of child devices. Config > > + * space access to the child may quit responding. Flag all devices under > > + * the secondary bus as non-resettable. > > + */ > > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) > > +{ > > + struct pci_dev *pdev; > > + > > + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); > > + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) > > + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > > +} > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); > > + > > /* > > * Some settings of MMRBC can lead to data corruption so block changes. > > * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide > > > This doesn't seem reliable, doesn't the user just need to remove and > reprobe the slot and the device would re-appear without this flag set? No, I tried before to disable the slot with "echo 0 > /sys/bus/pci/slots/3/power" but that does not work as it is not supported. I'm not familiar with the quirk types, would another one be better suited here (even if we don't have the problem you descibed)? thanks, Jan > Thanks, > > Alex
On Thu, 31 Aug 2017 11:40:52 +0200 Jan Glauber <jan.glauber@caviumnetworks.com> wrote: > On Wed, Aug 30, 2017 at 08:40:12AM -0600, Alex Williamson wrote: > > On Wed, 30 Aug 2017 16:24:54 +0200 > > Jan Glauber <jglauber@cavium.com> wrote: > > > > > Root ports of cn8xxx do not function after a slot reset when used with > > > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on > > > these root ports. > > > > > > Signed-off-by: Jan Glauber <jglauber@cavium.com> > > > --- > > > drivers/pci/quirks.c | 16 ++++++++++++++++ > > > 1 file changed, 16 insertions(+) > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > index 85191b8..6679971 100644 > > > --- a/drivers/pci/quirks.c > > > +++ b/drivers/pci/quirks.c > > > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); > > > #endif > > > > > > +/* > > > + * Root port on some Cavium CN8xxx chips do not successfully complete > > > + * a bus reset when used with certain types of child devices. Config > > > + * space access to the child may quit responding. Flag all devices under > > > + * the secondary bus as non-resettable. > > > + */ > > > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) > > > +{ > > > + struct pci_dev *pdev; > > > + > > > + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); > > > + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) > > > + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > > > +} > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); > > > + > > > /* > > > * Some settings of MMRBC can lead to data corruption so block changes. > > > * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide > > > > > > This doesn't seem reliable, doesn't the user just need to remove and > > reprobe the slot and the device would re-appear without this flag set? > > No, I tried before to disable the slot with "echo 0 > /sys/bus/pci/slots/3/power" > but that does not work as it is not supported. > > I'm not familiar with the quirk types, would another one be better > suited here (even if we don't have the problem you descibed)? The scenario I'm mentioning is to "echo 1 > /sys/bus/pci/devices/<some device under the slot>/remove", then "echo <that device address> > /sys/bus/pci/rescan". This would break the ordering implicit in using a fixup defined for the root port. It seems like it'd make a lot more sense to add a test on the parent bridge more similar to how the bus reset works. It's not the subordinate devices imposing the no-bus-reset flag, it's the bridge device and the objects and code should support and reflect that. Thanks, Alex
On Thu, Aug 31, 2017 at 10:01:30AM -0600, Alex Williamson wrote: > On Thu, 31 Aug 2017 11:40:52 +0200 > Jan Glauber <jan.glauber@caviumnetworks.com> wrote: > > > On Wed, Aug 30, 2017 at 08:40:12AM -0600, Alex Williamson wrote: > > > On Wed, 30 Aug 2017 16:24:54 +0200 > > > Jan Glauber <jglauber@cavium.com> wrote: > > > > > > > Root ports of cn8xxx do not function after a slot reset when used with > > > > some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on > > > > these root ports. > > > > > > > > Signed-off-by: Jan Glauber <jglauber@cavium.com> > > > > --- > > > > drivers/pci/quirks.c | 16 ++++++++++++++++ > > > > 1 file changed, 16 insertions(+) > > > > > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > > > index 85191b8..6679971 100644 > > > > --- a/drivers/pci/quirks.c > > > > +++ b/drivers/pci/quirks.c > > > > @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) > > > > DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); > > > > #endif > > > > > > > > +/* > > > > + * Root port on some Cavium CN8xxx chips do not successfully complete > > > > + * a bus reset when used with certain types of child devices. Config > > > > + * space access to the child may quit responding. Flag all devices under > > > > + * the secondary bus as non-resettable. > > > > + */ > > > > +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) > > > > +{ > > > > + struct pci_dev *pdev; > > > > + > > > > + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); > > > > + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) > > > > + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; > > > > +} > > > > +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); > > > > + > > > > /* > > > > * Some settings of MMRBC can lead to data corruption so block changes. > > > > * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide > > > > > > > > > This doesn't seem reliable, doesn't the user just need to remove and > > > reprobe the slot and the device would re-appear without this flag set? > > > > No, I tried before to disable the slot with "echo 0 > /sys/bus/pci/slots/3/power" > > but that does not work as it is not supported. > > > > I'm not familiar with the quirk types, would another one be better > > suited here (even if we don't have the problem you descibed)? > > The scenario I'm mentioning is to "echo 1 > /sys/bus/pci/devices/<some > device under the slot>/remove", then "echo <that device address> > > /sys/bus/pci/rescan". This would break the ordering implicit in using > a fixup defined for the root port. It seems like it'd make a lot more > sense to add a test on the parent bridge more similar to how the bus > reset works. It's not the subordinate devices imposing the > no-bus-reset flag, it's the bridge device and the objects and code > should support and reflect that. Thanks, Doing "echo <that device address> > /sys/bus/pci/rescan" after the remove did not work for me, but maybe the format of the device address needs to be different. Anyway, the sequence echo 1 > /sys/bus/pci/devices/<some device under the slot>/remove echo 1 > /sys/bus/pci/rescan still triggers the panic as you mentioned above. I agree that the subordinate devices are not causing the issue, still I need to make pci_slot_resetable() return false in our case. So what if we add an additional check like: diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index fdf65a6..389db4b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot) { struct pci_dev *dev; + if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET) + return false; + list_for_each_entry(dev, &slot->bus->devices, bus_list) { if (!dev->slot || dev->slot != slot) continue; --Jan
On Thu, Sep 07, 2017 at 09:40:11AM +0200, Jan Glauber wrote: > So what if we add an additional check like: > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index fdf65a6..389db4b 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot) > { > struct pci_dev *dev; > > + if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET) > + return false; > + > list_for_each_entry(dev, &slot->bus->devices, bus_list) { > if (!dev->slot || dev->slot != slot) > continue; Obviously I meant: if (slot->bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) --Jan
On Thu, 7 Sep 2017 09:49:04 +0200 Jan Glauber <jan.glauber@caviumnetworks.com> wrote: > On Thu, Sep 07, 2017 at 09:40:11AM +0200, Jan Glauber wrote: > > So what if we add an additional check like: > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index fdf65a6..389db4b 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -4389,6 +4389,9 @@ static bool pci_slot_resetable(struct pci_slot *slot) > > { > > struct pci_dev *dev; > > > > + if (slot->bus->self & PCI_DEV_FLAGS_NO_BUS_RESET) > > + return false; > > + > > list_for_each_entry(dev, &slot->bus->devices, bus_list) { > > if (!dev->slot || dev->slot != slot) > > continue; > > Obviously I meant: > if (slot->bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET) Much better, perhaps even incorporate the bus->self check for good measure... is it possible to have a slot on a root bus? Taking different approaches for bus vs slot reset should have been a giant red flag that something is wrong. Thanks, Alex
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 85191b8..6679971 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -845,6 +845,22 @@ static void quirk_cavium_sriov_rnm_link(struct pci_dev *dev) DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa018, quirk_cavium_sriov_rnm_link); #endif +/* + * Root port on some Cavium CN8xxx chips do not successfully complete + * a bus reset when used with certain types of child devices. Config + * space access to the child may quit responding. Flag all devices under + * the secondary bus as non-resettable. + */ +static void quirk_CN8xxx_secondary_bus(struct pci_dev *dev) +{ + struct pci_dev *pdev; + + dev_warn(&dev->dev, "Cavium CN8xxx quirk detected; reset for devices on secondary bus disabled\n"); + list_for_each_entry(pdev, &dev->subordinate->devices, bus_list) + pdev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET; +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_CAVIUM, 0xa100, quirk_CN8xxx_secondary_bus); + /* * Some settings of MMRBC can lead to data corruption so block changes. * See AMD 8131 HyperTransport PCI-X Tunnel Revision Guide
Root ports of cn8xxx do not function after a slot reset when used with some e1000e and LSI HBA devices. Add a quirk to prevent slot reset on these root ports. Signed-off-by: Jan Glauber <jglauber@cavium.com> --- drivers/pci/quirks.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)