mbox series

[RFC,00/11] PCI: hotplug: Movable bus numbers

Message ID 20191024172157.878735-1-s.miroshnichenko@yadro.com
Headers show
Series PCI: hotplug: Movable bus numbers | expand

Message

Sergei Miroshnichenko Oct. 24, 2019, 5:21 p.m. UTC
To allow hotplugging bridges, the kernel or BIOS/bootloader/firmware add
extra bus numbers per slot, but this range may be not enough for a large
bridge and/or nested bridges when hot-adding a chassis full of devices.

This patchset proposes an approach similar to movable BARs: bus numbers are
not reserved anymore, instead the kernel moves the "tail" of the PCI tree
by one, when needed a new bus.

When something like this is going to happen:
                                                                   *LARGE*
 +-[0020:00]---00.0-[01-20]--+-00.0-[02-08]--+-00.0-[03]--   <--  *NESTED*
 |                           |               +-01.0-[04]--        *BRIDGE*
 |                           |               +-02.0-[05]--
 |                           |               +-03.0-[06]--
 |                           |               +-04.0-[07]--
 |                           |               \-05.0-[08]--
 ...

, this will result into the following:

 +-[0020:00]---00.0-[01-22]--+-00.0-[02-22]--+-00.0-[03-1d]----04.0-[04-1d]--+-00.0-[05]--
 |                           |               |                               +-04.0-[06]--
 |                           |               |                               +-09.0-[07]--
 |                           |               |                               +-0c.0-[08-19]----00.0-[09-19]--+-01.0-[0a]--
 |                           |               |                               |                               ...
 |                           |               |                               |                               \-11.0-[19]--
 |                           |               |                               ...
 |                           |               |                               \-15.0-[1d]--
 |                           |               +-01.0-[1e]--  <-- Renamed from 04
 |                           |               +-02.0-[1f]--  <-- Renamed from 05
 |                           |               +-03.0-[20]--  <-- Renamed from 06
 |                           |               +-04.0-[21]--  <-- Renamed from 07
 |                           |               \-05.0-[22]--  <-- Renamed from 08
 ...


This looks to be safe in the kernel, because drivers don't use the raw PCI
BDF ID, and we've tested that on our x86 and PowerNV machines: mass storage
with roots and network adapters just continue their work while their bus
numbers had moved.

But here comes the userspace:

 - procfs entries:

    % ls -la /proc/bus/pci/*
    /proc/bus/pci/00:
    00.0
    02.0
    ...
    1f.4
    1f.6

    /proc/bus/pci/04:
    00.0

    /proc/bus/pci/40:
    00.0

 - sysfs entries:

    % ls -la /sys/devices/pci0000:00/
    0000:00:00.0
    0000:00:02.0
    ...
    0000:00:1f.3
    0000:00:1f.4
    0000:00:1f.6

    % ls -la /sys/devices/pci0000:00/0000:00:1c.6/0000:04:00.0/driver
    driver -> ../../../../bus/pci/drivers/iwlwifi

 - sysfs symlinks:

    % ls -la /sys/bus/pci/devices
    0000:00:00.0 -> ../../../devices/pci0000:00/0000:00:00.0
    0000:00:02.0 -> ../../../devices/pci0000:00/0000:00:02.0
    ...
    0000:04:00.0 -> ../../../devices/pci0000:00/0000:00:1c.6/0000:04:00.0
    0000:40:00.0 -> ../../../devices/pci0000:00/0000:00:1d.2/0000:40:00.0


These patches alter the kernel public API and some internals to be able to
remove these files before changing a bus number, and create new versions
of them after device has changed its BDF.

On one hand, this makes the hotplug predictable, independent of non-kernel
program components (BIOS, bootloader, etc.) and cross-platform, but this is
also a severe ABI violation.

Probably, the udev should have a new action like "rename" in addition to
"add" and "remove".

Is it feasible to have this feature disabled by default, but with a chance
to enable by a kernel command line argument like this:

  pci=realloc,movable_buses

?

This code is follow-up of the "PCI: Allow BAR movement during hotplug"
series (v6).

Sergey Miroshnichenko (11):
  PCI: sysfs: Nullify freed pointers
  PCI: proc: Nullify a freed pointer
  drivers: base: Make bus_add_device() public
  drivers: base: Make device_{add|remove}_class_symlinks() public
  drivers: base: Add bus_disconnect_device()
  powerpc/pci: Enable assigning bus numbers instead of reading them from
    DT
  powerpc/pci: Don't reduce the host bridge bus range
  PCI: Allow expanding the bridges
  PCI: hotplug: Add initial support for movable bus numbers
  PCI: hotplug: movable bus numbers: rename proc and sysfs entries
  PCI: hotplug: movable bus numbers: compact the gaps in numbering

 .../admin-guide/kernel-parameters.txt         |   3 +
 arch/powerpc/kernel/pci-common.c              |   1 -
 arch/powerpc/kernel/pci_dn.c                  |   5 +
 arch/powerpc/platforms/powernv/eeh-powernv.c  |   3 +-
 drivers/base/base.h                           |   1 -
 drivers/base/bus.c                            |  37 +++
 drivers/base/core.c                           |   6 +-
 drivers/pci/pci-sysfs.c                       |   7 +-
 drivers/pci/pci.c                             |   3 +
 drivers/pci/pci.h                             |   2 +
 drivers/pci/probe.c                           | 291 +++++++++++++++++-
 drivers/pci/proc.c                            |   1 +
 include/linux/device.h                        |   5 +
 13 files changed, 351 insertions(+), 14 deletions(-)