mbox series

[RFC,0/9] PCIe Hotplug Slot Emulation driver

Message ID 1581120007-5280-1-git-send-email-jonathan.derrick@intel.com
Headers show
Series PCIe Hotplug Slot Emulation driver | expand

Message

Jon Derrick Feb. 7, 2020, 11:59 p.m. UTC
This set adds an emulation driver for PCIe Hotplug. There may be platforms with
specific configurations that can support hotplug but don't provide the logical
slot hotplug hardware. For instance, the platform may use an
electrically-tolerant interposer between the slot and the device.

This driver utilizes the pci-bridge-emul architecture to manage register reads
and writes. The underlying functionality of the hotplug emulation driver uses
the Data Link Layer Link Active Reporting mechanism in a polling loop, but can
tolerate other event sources such as AER or DPC.

When enabled and a slot is managed by the driver, all port services are managed
by the kernel. This is done to ensure that firmware hotplug and error
architecture does not (correctly) halt/machine check the system when hotplug is
performed on a non-hotplug slot.

The driver offers two active mode: Auto and Force.
auto: The driver will bind to non-hotplug slots
force: The driver will bind to all slots and overrides the slot's services

There are three kernel params:
pciehp.pciehp_emul_mode={off, auto, force}
pciehp.pciehp_emul_time=<msecs polling time> (def 1000, min 100, max 60000)
pciehp.pciehp_emul_ports=<PCI [S]BDF/ID format string>

The pciehp_emul_ports kernel parameter takes a semi-colon tokenized string
representing PCI [S]BDFs and IDs. The pciehp_emul_mode will then be applied to
only those slots, leaving other slots unmanaged by pciehp_emul.

The string follows the pci_dev_str_match() format:

  [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
  pci:<vendor>:<device>[:<subvendor>:<subdevice>]

When using the path format, the path for the device can be obtained
using 'lspci -t' and further specified using the upstream bridge and the
downstream port's device-function to be more robust against bus
renumbering.

When using the vendor-device format, a value of '0' in any field acts as
a wildcard for that field, matching all values.

The driver is enabled with CONFIG_HOTPLUG_PCI_PCIE_EMUL=y.

The driver should be considered 'use at own risk' unless the platform/hardware
vendor recommends this mode.

Jon Derrick (9):
  PCI: pci-bridge-emul: Update PCIe register behaviors
  PCI: pci-bridge-emul: Eliminate reserved member
  PCI: pci-bridge-emul: Provide a helper to set behavior
  PCI: pciehp: Indirect slot register operations
  PCI: Add pcie_port_slot_emulated stub
  PCI: pciehp: Expose the poll loop to other drivers
  PCI: Move pci_dev_str_match to search.c
  PCI: pciehp: Add hotplug slot emulation driver
  PCI: pciehp: Wire up pcie_port_emulate_slot and pciehp_emul

 drivers/pci/hotplug/Makefile      |   4 +
 drivers/pci/hotplug/pciehp.h      |  28 +++
 drivers/pci/hotplug/pciehp_emul.c | 378 ++++++++++++++++++++++++++++++++++++++
 drivers/pci/hotplug/pciehp_hpc.c  | 136 ++++++++++----
 drivers/pci/pci-acpi.c            |   3 +
 drivers/pci/pci-bridge-emul.c     |  95 +++++-----
 drivers/pci/pci-bridge-emul.h     |  10 +
 drivers/pci/pci.c                 | 163 ----------------
 drivers/pci/pcie/Kconfig          |  14 ++
 drivers/pci/pcie/portdrv_core.c   |  14 +-
 drivers/pci/probe.c               |   2 +-
 drivers/pci/search.c              | 162 ++++++++++++++++
 include/linux/pci.h               |   8 +
 13 files changed, 775 insertions(+), 242 deletions(-)
 create mode 100644 drivers/pci/hotplug/pciehp_emul.c

Comments

Christoph Hellwig Feb. 10, 2020, 7:01 a.m. UTC | #1
On Fri, Feb 07, 2020 at 04:59:58PM -0700, Jon Derrick wrote:
> This set adds an emulation driver for PCIe Hotplug. There may be platforms with
> specific configurations that can support hotplug but don't provide the logical
> slot hotplug hardware. For instance, the platform may use an
> electrically-tolerant interposer between the slot and the device.

The code seems like one giant hack to me.  What is the real life
use case for this?  Another Intel chipset fuckup like vmd or the ahci
remapping?
Jon Derrick Feb. 10, 2020, 3:05 p.m. UTC | #2
On Mon, 2020-02-10 at 08:01 +0100, Christoph Hellwig wrote:
> On Fri, Feb 07, 2020 at 04:59:58PM -0700, Jon Derrick wrote:
> > This set adds an emulation driver for PCIe Hotplug. There may be platforms with
> > specific configurations that can support hotplug but don't provide the logical
> > slot hotplug hardware. For instance, the platform may use an
> > electrically-tolerant interposer between the slot and the device.
> 
> The code seems like one giant hack to me.  What is the real life
> use case for this?  Another Intel chipset fuckup like vmd or the ahci
> remapping?
> 
Exactly as the cover letter describes. An interposer being used on a
non-hotplug slot.
Christoph Hellwig Feb. 10, 2020, 4:58 p.m. UTC | #3
On Mon, Feb 10, 2020 at 03:05:47PM +0000, Derrick, Jonathan wrote:
> > The code seems like one giant hack to me.  What is the real life
> > use case for this?  Another Intel chipset fuckup like vmd or the ahci
> > remapping?
> > 
> Exactly as the cover letter describes. An interposer being used on a
> non-hotplug slot.

That isn't a use a case, that iѕ a description of the implementation.
Why would you want this code?
Jon Derrick Feb. 10, 2020, 5:09 p.m. UTC | #4
On Mon, 2020-02-10 at 17:58 +0100, hch@lst.de wrote:
> On Mon, Feb 10, 2020 at 03:05:47PM +0000, Derrick, Jonathan wrote:
> > > The code seems like one giant hack to me.  What is the real life
> > > use case for this?  Another Intel chipset fuckup like vmd or the ahci
> > > remapping?
> > > 
> > Exactly as the cover letter describes. An interposer being used on a
> > non-hotplug slot.
> 
> That isn't a use a case, that iѕ a description of the implementation.
> Why would you want this code?
It allows non-hotplug slots to take advantage of the kernel's robust
hotplug ecosystem, if the platform configuration can tolerate the
events. This could also reduce BOM cost by eliminating some slot
controllers. Say you had something that only needed to be hotplugged
very infrequently, like RAIDed OS drives, versus something needing to
be hotplugged very frequently like data drives.


Granted it probably could be fit into pciehp_poll, but it seemed to
have a different objective (emulating slot)
Bjorn Helgaas March 28, 2020, 9:51 p.m. UTC | #5
[+cc Stuart, Lukas]

On Fri, Feb 07, 2020 at 04:59:58PM -0700, Jon Derrick wrote:
> This set adds an emulation driver for PCIe Hotplug. There may be platforms with
> specific configurations that can support hotplug but don't provide the logical
> slot hotplug hardware. For instance, the platform may use an
> electrically-tolerant interposer between the slot and the device.
> 
> This driver utilizes the pci-bridge-emul architecture to manage register reads
> and writes. The underlying functionality of the hotplug emulation driver uses
> the Data Link Layer Link Active Reporting mechanism in a polling loop, but can
> tolerate other event sources such as AER or DPC.
> 
> When enabled and a slot is managed by the driver, all port services are managed
> by the kernel. This is done to ensure that firmware hotplug and error
> architecture does not (correctly) halt/machine check the system when hotplug is
> performed on a non-hotplug slot.
> 
> The driver offers two active mode: Auto and Force.
> auto: The driver will bind to non-hotplug slots
> force: The driver will bind to all slots and overrides the slot's services
> 
> There are three kernel params:
> pciehp.pciehp_emul_mode={off, auto, force}
> pciehp.pciehp_emul_time=<msecs polling time> (def 1000, min 100, max 60000)
> pciehp.pciehp_emul_ports=<PCI [S]BDF/ID format string>
> 
> The pciehp_emul_ports kernel parameter takes a semi-colon tokenized string
> representing PCI [S]BDFs and IDs. The pciehp_emul_mode will then be applied to
> only those slots, leaving other slots unmanaged by pciehp_emul.
> 
> The string follows the pci_dev_str_match() format:
> 
>   [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
>   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
> 
> When using the path format, the path for the device can be obtained
> using 'lspci -t' and further specified using the upstream bridge and the
> downstream port's device-function to be more robust against bus
> renumbering.
> 
> When using the vendor-device format, a value of '0' in any field acts as
> a wildcard for that field, matching all values.
> 
> The driver is enabled with CONFIG_HOTPLUG_PCI_PCIE_EMUL=y.
> 
> The driver should be considered 'use at own risk' unless the platform/hardware
> vendor recommends this mode.

There's a lot of good work in here, and I don't claim to understand
the use case and all the benefits.

But it seems like quite a lot of additional code and complexity in an
area that's already pretty subtle, so I'm not yet convinced that it's
all worthwhile.

It seems like this would rely on Data Link Layer Link Active
Reporting.  Is that something we could add to pciehp as a generic
feature without making a separate driver for it?  I haven't looked at
this for a while, but I would assume that if we find out that a link
went down, pciehp could/should be smart enough to notice that even if
it didn't come via the usual pciehp Slot Status path.

> Jon Derrick (9):
>   PCI: pci-bridge-emul: Update PCIe register behaviors
>   PCI: pci-bridge-emul: Eliminate reserved member
>   PCI: pci-bridge-emul: Provide a helper to set behavior
>   PCI: pciehp: Indirect slot register operations
>   PCI: Add pcie_port_slot_emulated stub
>   PCI: pciehp: Expose the poll loop to other drivers
>   PCI: Move pci_dev_str_match to search.c
>   PCI: pciehp: Add hotplug slot emulation driver
>   PCI: pciehp: Wire up pcie_port_emulate_slot and pciehp_emul
> 
>  drivers/pci/hotplug/Makefile      |   4 +
>  drivers/pci/hotplug/pciehp.h      |  28 +++
>  drivers/pci/hotplug/pciehp_emul.c | 378 ++++++++++++++++++++++++++++++++++++++
>  drivers/pci/hotplug/pciehp_hpc.c  | 136 ++++++++++----
>  drivers/pci/pci-acpi.c            |   3 +
>  drivers/pci/pci-bridge-emul.c     |  95 +++++-----
>  drivers/pci/pci-bridge-emul.h     |  10 +
>  drivers/pci/pci.c                 | 163 ----------------
>  drivers/pci/pcie/Kconfig          |  14 ++
>  drivers/pci/pcie/portdrv_core.c   |  14 +-
>  drivers/pci/probe.c               |   2 +-
>  drivers/pci/search.c              | 162 ++++++++++++++++
>  include/linux/pci.h               |   8 +
>  13 files changed, 775 insertions(+), 242 deletions(-)
>  create mode 100644 drivers/pci/hotplug/pciehp_emul.c
> 
> -- 
> 1.8.3.1
>
Jon Derrick March 30, 2020, 5:43 p.m. UTC | #6
Hi Bjorn,

On Sat, 2020-03-28 at 16:51 -0500, Bjorn Helgaas wrote:
> [+cc Stuart, Lukas]
> 
> On Fri, Feb 07, 2020 at 04:59:58PM -0700, Jon Derrick wrote:
> > This set adds an emulation driver for PCIe Hotplug. There may be platforms with
> > specific configurations that can support hotplug but don't provide the logical
> > slot hotplug hardware. For instance, the platform may use an
> > electrically-tolerant interposer between the slot and the device.
> > 
> > This driver utilizes the pci-bridge-emul architecture to manage register reads
> > and writes. The underlying functionality of the hotplug emulation driver uses
> > the Data Link Layer Link Active Reporting mechanism in a polling loop, but can
> > tolerate other event sources such as AER or DPC.
> > 
> > When enabled and a slot is managed by the driver, all port services are managed
> > by the kernel. This is done to ensure that firmware hotplug and error
> > architecture does not (correctly) halt/machine check the system when hotplug is
> > performed on a non-hotplug slot.
> > 
> > The driver offers two active mode: Auto and Force.
> > auto: The driver will bind to non-hotplug slots
> > force: The driver will bind to all slots and overrides the slot's services
> > 
> > There are three kernel params:
> > pciehp.pciehp_emul_mode={off, auto, force}
> > pciehp.pciehp_emul_time=<msecs polling time> (def 1000, min 100, max 60000)
> > pciehp.pciehp_emul_ports=<PCI [S]BDF/ID format string>
> > 
> > The pciehp_emul_ports kernel parameter takes a semi-colon tokenized string
> > representing PCI [S]BDFs and IDs. The pciehp_emul_mode will then be applied to
> > only those slots, leaving other slots unmanaged by pciehp_emul.
> > 
> > The string follows the pci_dev_str_match() format:
> > 
> >   [<domain>:]<bus>:<device>.<func>[/<device>.<func>]*
> >   pci:<vendor>:<device>[:<subvendor>:<subdevice>]
> > 
> > When using the path format, the path for the device can be obtained
> > using 'lspci -t' and further specified using the upstream bridge and the
> > downstream port's device-function to be more robust against bus
> > renumbering.
> > 
> > When using the vendor-device format, a value of '0' in any field acts as
> > a wildcard for that field, matching all values.
> > 
> > The driver is enabled with CONFIG_HOTPLUG_PCI_PCIE_EMUL=y.
> > 
> > The driver should be considered 'use at own risk' unless the platform/hardware
> > vendor recommends this mode.
> 
> There's a lot of good work in here, and I don't claim to understand
> the use case and all the benefits.
I've received more info that the customer use case is an AIC that
breaks out 1-4 M.2 cards which have been made hotplug tolerant.


> 
> But it seems like quite a lot of additional code and complexity in an
> area that's already pretty subtle, so I'm not yet convinced that it's
> all worthwhile.
> 
> It seems like this would rely on Data Link Layer Link Active
> Reporting.  Is that something we could add to pciehp as a generic
> feature without making a separate driver for it?  I haven't looked at
> this for a while, but I would assume that if we find out that a link
> went down, pciehp could/should be smart enough to notice that even if
> it didn't come via the usual pciehp Slot Status path.
I had a plan to do V2 by intercepting bus_ops rather than indirecting
slot_ops in pciehp. That should touch /a lot/ less code.

The problem I saw with adding DLLLA as a primary signal in pciehp is
that most of the pciehp boilerplate relies on valid Slot register
logic. I don't know how reliable pciehp will be if there's no backing
slot register logic, emulated or real. Consider how many slot
capability reads are in hpc.

I could add a non-slot flag check to each of those callers, but it
might be worse than the emulation alternative.

What do you think?

Thanks

> 
> > Jon Derrick (9):
> >   PCI: pci-bridge-emul: Update PCIe register behaviors
> >   PCI: pci-bridge-emul: Eliminate reserved member
> >   PCI: pci-bridge-emul: Provide a helper to set behavior
> >   PCI: pciehp: Indirect slot register operations
> >   PCI: Add pcie_port_slot_emulated stub
> >   PCI: pciehp: Expose the poll loop to other drivers
> >   PCI: Move pci_dev_str_match to search.c
> >   PCI: pciehp: Add hotplug slot emulation driver
> >   PCI: pciehp: Wire up pcie_port_emulate_slot and pciehp_emul
> > 
> >  drivers/pci/hotplug/Makefile      |   4 +
> >  drivers/pci/hotplug/pciehp.h      |  28 +++
> >  drivers/pci/hotplug/pciehp_emul.c | 378 ++++++++++++++++++++++++++++++++++++++
> >  drivers/pci/hotplug/pciehp_hpc.c  | 136 ++++++++++----
> >  drivers/pci/pci-acpi.c            |   3 +
> >  drivers/pci/pci-bridge-emul.c     |  95 +++++-----
> >  drivers/pci/pci-bridge-emul.h     |  10 +
> >  drivers/pci/pci.c                 | 163 ----------------
> >  drivers/pci/pcie/Kconfig          |  14 ++
> >  drivers/pci/pcie/portdrv_core.c   |  14 +-
> >  drivers/pci/probe.c               |   2 +-
> >  drivers/pci/search.c              | 162 ++++++++++++++++
> >  include/linux/pci.h               |   8 +
> >  13 files changed, 775 insertions(+), 242 deletions(-)
> >  create mode 100644 drivers/pci/hotplug/pciehp_emul.c
> > 
> > -- 
> > 1.8.3.1
> >
Christoph Hellwig March 30, 2020, 5:49 p.m. UTC | #7
On Mon, Mar 30, 2020 at 05:43:33PM +0000, Derrick, Jonathan wrote:
> > There's a lot of good work in here, and I don't claim to understand
> > the use case and all the benefits.
> I've received more info that the customer use case is an AIC that
> breaks out 1-4 M.2 cards which have been made hotplug tolerant.

Which sounds completely bogus.  M.2 cards are eletrically not designed
for this at all, and neither is the AIC.  If people want to support
similar use cases they need to get them standardized by PCI SIG
and handled by hotplug capable slots.
Bjorn Helgaas April 1, 2020, 9:45 p.m. UTC | #8
On Mon, Mar 30, 2020 at 12:43 PM Derrick, Jonathan
<jonathan.derrick@intel.com> wrote:
> On Sat, 2020-03-28 at 16:51 -0500, Bjorn Helgaas wrote:
> > On Fri, Feb 07, 2020 at 04:59:58PM -0700, Jon Derrick wrote:
> > > This set adds an emulation driver for PCIe Hotplug. There may be platforms with
> > > specific configurations that can support hotplug but don't provide the logical
> > > slot hotplug hardware. For instance, the platform may use an
> > > electrically-tolerant interposer between the slot and the device.
> > > ...

> > There's a lot of good work in here, and I don't claim to understand
> > the use case and all the benefits.
> I've received more info that the customer use case is an AIC that
> breaks out 1-4 M.2 cards which have been made hotplug tolerant.
>
> > But it seems like quite a lot of additional code and complexity in an
> > area that's already pretty subtle, so I'm not yet convinced that it's
> > all worthwhile.
> >
> > It seems like this would rely on Data Link Layer Link Active
> > Reporting.  Is that something we could add to pciehp as a generic
> > feature without making a separate driver for it?  I haven't looked at
> > this for a while, but I would assume that if we find out that a link
> > went down, pciehp could/should be smart enough to notice that even if
> > it didn't come via the usual pciehp Slot Status path.
> I had a plan to do V2 by intercepting bus_ops rather than indirecting
> slot_ops in pciehp. That should touch /a lot/ less code.

I assume this is something like pci_bus_set_ops() or
pci_bus_set_aer_ops()?  Probably touches less code, but I'm not really
a fan of either of those current situations because they make things
magical -- there's a lot of stuff going on behind the curtain, and it
makes another thing to consider when we evaluate every pciehp change.

> The problem I saw with adding DLLLA as a primary signal in pciehp is
> that most of the pciehp boilerplate relies on valid Slot register
> logic. I don't know how reliable pciehp will be if there's no backing
> slot register logic, emulated or real. Consider how many slot
> capability reads are in hpc.
>
> I could add a non-slot flag check to each of those callers, but it
> might be worse than the emulation alternative.

I see what you mean -- there are quite a few reads of PCI_EXP_SLTSTA.
I'm not 100% sure all of those really need to exist.  I would expect
that we'd read it once in the ISR and then operate on that value.  So
maybe there's some middle ground of restructuring to remove some of
those reads and making the remaining few more generic.

All that being said, I'm also sympathetic to Christoph's concern about
cluttering pciehp to deal with non-standard topologies.  At some point
if you need to do non-standard things you may have to write and
maintain your own drivers.  I don't know where that point is yet.

Bjorn