diff mbox

[V5] docs: add PCIe devices placement guidelines

Message ID 1478007587-4560-1-git-send-email-marcel@redhat.com
State New
Headers show

Commit Message

Marcel Apfelbaum Nov. 1, 2016, 1:39 p.m. UTC
Proposes best practices on how to use PCI Express/PCI device
in PCI Express based machines and explain the reasoning behind them.

Reviewed-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
---
 
Hi,

v4->v5:
 - Addressed Laine's comments:
   - Advertize (slot,chassis) parameters as mandatory
   - Stated the Downstream ports are not hot-pluggable
   - Other minor typos

v3->v4:
 - Addressed minor typos spotted by Laszlo, thanks!

v2->v3:
 - Addressed the comments from Andrea Bolognani and Laszlo Ersek, which are
   much appreciated!
 - Added links to presentations that may help the understanding of the document.

RFC->v2:
 - Addressed a lot of comments from the reviewers (many thanks to all, especially to Laszlo)


Thanks,
Marcel
 

 docs/pcie.txt | 311 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 311 insertions(+)
 create mode 100644 docs/pcie.txt

Comments

Marcel Apfelbaum Nov. 9, 2016, 2:30 p.m. UTC | #1
On 11/01/2016 03:39 PM, Marcel Apfelbaum wrote:
> Proposes best practices on how to use PCI Express/PCI device
> in PCI Express based machines and explain the reasoning behind them.
>


Hi Michael,

Can you please apply this doc for 2.8 ?

Thanks,
Marcel

> Reviewed-by: Laszlo Ersek <lersek@redhat.com>
> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
> ---
>
> Hi,
>
> v4->v5:
>  - Addressed Laine's comments:
>    - Advertize (slot,chassis) parameters as mandatory
>    - Stated the Downstream ports are not hot-pluggable
>    - Other minor typos
>
> v3->v4:
>  - Addressed minor typos spotted by Laszlo, thanks!
>
> v2->v3:
>  - Addressed the comments from Andrea Bolognani and Laszlo Ersek, which are
>    much appreciated!
>  - Added links to presentations that may help the understanding of the document.
>
> RFC->v2:
>  - Addressed a lot of comments from the reviewers (many thanks to all, especially to Laszlo)
>
>
> Thanks,
> Marcel
>
>
>  docs/pcie.txt | 311 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 311 insertions(+)
>  create mode 100644 docs/pcie.txt
>
> diff --git a/docs/pcie.txt b/docs/pcie.txt
> new file mode 100644
> index 0000000..1d9bd18
> --- /dev/null
> +++ b/docs/pcie.txt
> @@ -0,0 +1,311 @@
> +PCI EXPRESS GUIDELINES
> +======================
> +
> +1. Introduction
> +================
> +The doc proposes best practices on how to use PCI Express/PCI device
> +in PCI Express based machines and explains the reasoning behind them.
> +
> +The following presentations accompany this document:
> + (1) Q35 overview.
> +     http://wiki.qemu.org/images/4/4e/Q35.pdf
> + (2) A comparison between PCI and PCI Express technologies.
> +     http://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf
> +
> +Note: The usage examples are not intended to replace the full
> +documentation, please use QEMU help to retrieve all options.
> +
> +2. Device placement strategy
> +============================
> +QEMU does not have a clear socket-device matching mechanism
> +and allows any PCI/PCI Express device to be plugged into any
> +PCI/PCI Express slot.
> +Plugging a PCI device into a PCI Express slot might not always work and
> +is weird anyway since it cannot be done for "bare metal".
> +Plugging a PCI Express device into a PCI slot will hide the Extended
> +Configuration Space thus is also not recommended.
> +
> +The recommendation is to separate the PCI Express and PCI hierarchies.
> +PCI Express devices should be plugged only into PCI Express Root Ports and
> +PCI Express Downstream ports.
> +
> +2.1 Root Bus (pcie.0)
> +=====================
> +Place only the following kinds of devices directly on the Root Complex:
> +    (1) PCI Devices (e.g. network card, graphics card, IDE controller),
> +        not controllers. Place only legacy PCI devices on
> +        the Root Complex. These will be considered Integrated Endpoints.
> +        Note: Integrated Endpoints are not hot-pluggable.
> +
> +        Although the PCI Express spec does not forbid PCI Express devices as
> +        Integrated Endpoints, existing hardware mostly integrates legacy PCI
> +        devices with the Root Complex. Guest OSes are suspected to behave
> +        strangely when PCI Express devices are integrated
> +        with the Root Complex.
> +
> +    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
> +        hierarchies.
> +
> +    (3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI
> +        hierarchies.
> +
> +    (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses
> +        are needed.
> +
> +   pcie.0 bus
> +   ----------------------------------------------------------------------------
> +        |                |                    |                  |
> +   -----------   ------------------   ------------------   --------------
> +   | PCI Dev |   | PCIe Root Port |   | DMI-PCI Bridge |   |  pxb-pcie  |
> +   -----------   ------------------   ------------------   --------------
> +
> +2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use:
> +          -device <dev>[,bus=pcie.0]
> +2.1.2 To expose a new PCI Express Root Bus use:
> +          -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z]
> +      Only PCI Express Root Ports and DMI-PCI bridges can be connected
> +      to the pcie.1 bus:
> +          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z]                                     \
> +          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
> +
> +
> +2.2 PCI Express only hierarchy
> +==============================
> +Always use PCI Express Root Ports to start PCI Express hierarchies.
> +
> +A PCI Express Root bus supports up to 32 devices. Since each
> +PCI Express Root Port is a function and a multi-function
> +device may support up to 8 functions, the maximum possible
> +number of PCI Express Root Ports per PCI Express Root Bus is 256.
> +
> +Prefer grouping PCI Express Root Ports into multi-function devices
> +to keep a simple flat hierarchy that is enough for most scenarios.
> +Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
> +if there is no more room for PCI Express Root Ports.
> +Please see section 4. for further justifications.
> +
> +Plug only PCI Express devices into PCI Express Ports.
> +
> +
> +   pcie.0 bus
> +   ----------------------------------------------------------------------------------
> +        |                 |                                    |
> +   -------------    -------------                        -------------
> +   | Root Port |    | Root Port |                        | Root Port |
> +   ------------     -------------                        -------------
> +         |                            -------------------------|------------------------
> +    ------------                      |                 -----------------              |
> +    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
> +    ------------                      |      Switch     -----------------              |
> +                                      |                  |            |                |
> +                                      |    -------------------    -------------------  |
> +                                      |    | Downstream Port |    | Downstream Port |  |
> +                                      |    -------------------    -------------------  |
> +                                      -------------|-----------------------|------------
> +                                             ------------
> +                                             | PCIe Dev |
> +                                             ------------
> +
> +2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
> +          -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
> +          -device <dev>,bus=root_port1
> +2.2.2 Using multi-function PCI Express Root Ports:
> +      -device ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0] \
> +      -device ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] \
> +      -device ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] \
> +2.2.2 Plugging a PCI Express device into a Switch:
> +      -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
> +      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
> +      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \
> +      -device <dev>,bus=downstream_port1
> +
> +Notes:
> +  - (slot, chassis) pair is mandatory and must be
> +     unique for each PCI Express Root Port.
> +  - 'addr' parameter can be 0 for all the examples above.
> +
> +
> +2.3 PCI only hierarchy
> +======================
> +Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
> +but, as mentioned in section 5, doing so means the legacy PCI
> +device in question will be incapable of hot-unplugging.
> +Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
> +with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
> +
> +Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
> +(having 32 slots) and several PCI-PCI Bridges attached to it
> +(each supporting also 32 slots) will support hundreds of legacy devices.
> +The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge
> +until is full and then plug a new PCI-PCI Bridge...
> +
> +   pcie.0 bus
> +   ----------------------------------------------
> +        |                            |
> +   -----------               ------------------
> +   | PCI Dev |               | DMI-PCI BRIDGE |
> +   ----------                ------------------
> +                               |            |
> +                  ------------------    ------------------
> +                  | PCI-PCI Bridge |    | PCI-PCI Bridge |   ...
> +                  ------------------    ------------------
> +                                         |           |
> +                                  -----------     -----------
> +                                  | PCI Dev |     | PCI Dev |
> +                                  -----------     -----------
> +
> +2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use:
> +      -device <dev>[,bus=pcie.0]
> +2.3.2 Plugging a PCI device into a PCI-PCI Bridge:
> +      -device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0]                        \
> +      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]   \
> +      -device <dev>,bus=pci_bridge1[,addr=x]
> +      Note that 'addr' cannot be 0 unless shpc=off parameter is passed to
> +      the PCI Bridge.
> +
> +3. IO space issues
> +===================
> +The PCI Express Root Ports and PCI Express Downstream ports are seen by
> +Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each
> +such Port should be reserved a 4K IO range for, even though only one
> +(multifunction) device can be plugged into each Port. This results in
> +poor IO space utilization.
> +
> +The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
> +by not allocating IO space for each PCI Express Root / PCI Express
> +Downstream port if:
> +    (1) the port is empty, or
> +    (2) the device behind the port has no IO BARs.
> +
> +The IO space is very limited, to 65536 byte-wide IO ports, and may even be
> +fragmented by fixed IO ports owned by platform devices resulting in at most
> +10 PCI Express Root Ports or PCI Express Downstream Ports per system
> +if devices with IO BARs are used in the PCI Express hierarchy. Using the
> +proposed device placing strategy solves this issue by using only
> +PCI Express devices within PCI Express hierarchy.
> +
> +The PCI Express spec requires that PCI Express devices work properly
> +without using IO ports. The PCI hierarchy has no such limitations.
> +
> +
> +4. Bus numbers issues
> +======================
> +Each PCI domain can have up to only 256 buses and the QEMU PCI Express
> +machines do not support multiple PCI domains even if extra Root
> +Complexes (pxb-pcie) are used.
> +
> +Each element of the PCI Express hierarchy (Root Complexes,
> +PCI Express Root Ports, PCI Express Downstream/Upstream ports)
> +uses one bus number. Since only one (multifunction) device
> +can be attached to a PCI Express Root Port or PCI Express Downstream
> +Port it is advised to plan in advance for the expected number of
> +devices to prevent bus number starvation.
> +
> +Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI
> +Express hierarchy) enables the hierarchy to not spend bus numbers on
> +Upstream Ports.
> +
> +The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
> +number space. All bus numbers assigned to the buses recursively behind a
> +given pxb-pcie device's root bus must fit between the bus_nr property of
> +that pxb-pcie device, and the lowest of the higher bus_nr properties
> +that the command line sets for other pxb-pcie devices.
> +
> +
> +5. Hot-plug
> +============
> +The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
> +do not support hot-plug, so any devices plugged into Root Complexes
> +cannot be hot-plugged/hot-unplugged:
> +    (1) PCI Express Integrated Endpoints
> +    (2) PCI Express Root Ports
> +    (3) DMI-PCI Bridges
> +    (4) pxb-pcie
> +
> +Be aware that PCI Express Downstream Ports can't be hot-plugged into
> +an existing PCI Express Upstream Port.
> +
> +PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI
> +based and can work side by side with the PCI Express native hot-plug.
> +
> +PCI Express devices can be natively hot-plugged/hot-unplugged into/from
> +PCI Express Root Ports (and PCI Express Downstream Ports).
> +
> +5.1 Planning for hot-plug:
> +    (1) PCI hierarchy
> +        Leave enough PCI-PCI Bridge slots empty or add one
> +        or more empty PCI-PCI Bridges to the DMI-PCI Bridge.
> +
> +        For each such PCI-PCI Bridge the Guest Firmware is expected to reserve
> +        4K IO space and 2M MMIO range to be used for all devices behind it.
> +
> +        Because of the hard IO limit of around 10 PCI Bridges (~ 40K space)
> +        per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the
> +        Integrated Endpoints. (The PCI Express Hierarchy needs no IO space).
> +
> +    (2) PCI Express hierarchy:
> +        Leave enough PCI Express Root Ports empty. Use multifunction
> +        PCI Express Root Ports (up to 8 ports per pcie.0 slot)
> +        on the Root Complex(es), for keeping the
> +        hierarchy as flat as possible, thereby saving PCI bus numbers.
> +        Don't use PCI Express Switches if you don't have
> +        to, each one of those uses an extra PCI bus (for its Upstream Port)
> +        that could be put to better use with another Root Port or Downstream
> +        Port, which may come handy for hot-plugging another device.
> +
> +
> +5.3 Hot-plug example:
> +Using HMP: (add -monitor stdio to QEMU command line)
> +  device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/>
> +
> +
> +6. Device assignment
> +====================
> +Host devices are mostly PCI Express and should be plugged only into
> +PCI Express Root Ports or PCI Express Downstream Ports.
> +PCI-PCI Bridge slots can be used for legacy PCI host devices.
> +
> +6.1 How to detect if a device is PCI Express:
> +  > lspci -s 03:00.0 -v (as root)
> +
> +    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
> +    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
> +    Flags: bus master, fast devsel, latency 0, IRQ 50
> +    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
> +    Capabilities: [c8] Power Management version 3
> +    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
> +    Capabilities: [40] Express Endpoint, MSI 00
> +    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +    Capabilities: [100] Advanced Error Reporting
> +    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
> +    Capabilities: [14c] Latency Tolerance Reporting
> +    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014
> +
> +If you can see the "Express Endpoint" capability in the
> +output, then the device is indeed PCI Express.
> +
> +
> +7. Virtio devices
> +=================
> +Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints
> +will remain PCI and have transitional behaviour as default.
> +Transitional virtio devices work in both IO and MMIO modes depending on
> +the guest support. The Guest firmware will assign both IO and MMIO resources
> +to transitional virtio devices.
> +
> +Virtio devices plugged into PCI Express ports are PCI Express devices and
> +have "1.0" behavior by default without IO support.
> +In both cases disable-legacy and disable-modern properties can be used
> +to override the behaviour.
> +
> +Note that setting disable-legacy=off will enable legacy mode (enabling
> +legacy behavior) for PCI Express virtio devices causing them to
> +require IO space, which, given the limited available IO space, may quickly
> +lead to resource exhaustion, and is therefore strongly discouraged.
> +
> +
> +8. Conclusion
> +==============
> +The proposal offers a usage model that is easy to understand and follow
> +and at the same time overcomes the PCI Express architecture limitations.
> +
>
diff mbox

Patch

diff --git a/docs/pcie.txt b/docs/pcie.txt
new file mode 100644
index 0000000..1d9bd18
--- /dev/null
+++ b/docs/pcie.txt
@@ -0,0 +1,311 @@ 
+PCI EXPRESS GUIDELINES
+======================
+
+1. Introduction
+================
+The doc proposes best practices on how to use PCI Express/PCI device
+in PCI Express based machines and explains the reasoning behind them.
+
+The following presentations accompany this document:
+ (1) Q35 overview.
+     http://wiki.qemu.org/images/4/4e/Q35.pdf
+ (2) A comparison between PCI and PCI Express technologies.
+     http://wiki.qemu.org/images/f/f6/PCIvsPCIe.pdf
+
+Note: The usage examples are not intended to replace the full
+documentation, please use QEMU help to retrieve all options.
+
+2. Device placement strategy
+============================
+QEMU does not have a clear socket-device matching mechanism
+and allows any PCI/PCI Express device to be plugged into any
+PCI/PCI Express slot.
+Plugging a PCI device into a PCI Express slot might not always work and
+is weird anyway since it cannot be done for "bare metal".
+Plugging a PCI Express device into a PCI slot will hide the Extended
+Configuration Space thus is also not recommended.
+
+The recommendation is to separate the PCI Express and PCI hierarchies.
+PCI Express devices should be plugged only into PCI Express Root Ports and
+PCI Express Downstream ports.
+
+2.1 Root Bus (pcie.0)
+=====================
+Place only the following kinds of devices directly on the Root Complex:
+    (1) PCI Devices (e.g. network card, graphics card, IDE controller),
+        not controllers. Place only legacy PCI devices on
+        the Root Complex. These will be considered Integrated Endpoints.
+        Note: Integrated Endpoints are not hot-pluggable.
+
+        Although the PCI Express spec does not forbid PCI Express devices as
+        Integrated Endpoints, existing hardware mostly integrates legacy PCI
+        devices with the Root Complex. Guest OSes are suspected to behave
+        strangely when PCI Express devices are integrated
+        with the Root Complex.
+
+    (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
+        hierarchies.
+
+    (3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI
+        hierarchies.
+
+    (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses
+        are needed.
+
+   pcie.0 bus
+   ----------------------------------------------------------------------------
+        |                |                    |                  |
+   -----------   ------------------   ------------------   --------------
+   | PCI Dev |   | PCIe Root Port |   | DMI-PCI Bridge |   |  pxb-pcie  |
+   -----------   ------------------   ------------------   --------------
+
+2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use:
+          -device <dev>[,bus=pcie.0]
+2.1.2 To expose a new PCI Express Root Bus use:
+          -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z]
+      Only PCI Express Root Ports and DMI-PCI bridges can be connected
+      to the pcie.1 bus:
+          -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z]                                     \
+          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
+
+
+2.2 PCI Express only hierarchy
+==============================
+Always use PCI Express Root Ports to start PCI Express hierarchies.
+
+A PCI Express Root bus supports up to 32 devices. Since each
+PCI Express Root Port is a function and a multi-function
+device may support up to 8 functions, the maximum possible
+number of PCI Express Root Ports per PCI Express Root Bus is 256.
+
+Prefer grouping PCI Express Root Ports into multi-function devices
+to keep a simple flat hierarchy that is enough for most scenarios.
+Only use PCI Express Switches (x3130-upstream, xio3130-downstream)
+if there is no more room for PCI Express Root Ports.
+Please see section 4. for further justifications.
+
+Plug only PCI Express devices into PCI Express Ports.
+
+
+   pcie.0 bus
+   ----------------------------------------------------------------------------------
+        |                 |                                    |
+   -------------    -------------                        -------------
+   | Root Port |    | Root Port |                        | Root Port |
+   ------------     -------------                        -------------
+         |                            -------------------------|------------------------
+    ------------                      |                 -----------------              |
+    | PCIe Dev |                      |    PCI Express  | Upstream Port |              |
+    ------------                      |      Switch     -----------------              |
+                                      |                  |            |                |
+                                      |    -------------------    -------------------  |
+                                      |    | Downstream Port |    | Downstream Port |  |
+                                      |    -------------------    -------------------  |
+                                      -------------|-----------------------|------------
+                                             ------------
+                                             | PCIe Dev |
+                                             ------------
+
+2.2.1 Plugging a PCI Express device into a PCI Express Root Port:
+          -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
+          -device <dev>,bus=root_port1
+2.2.2 Using multi-function PCI Express Root Ports:
+      -device ioh3420,id=root_port1,multifunction=on,chassis=x,slot=y[,bus=pcie.0][,addr=z.0] \
+      -device ioh3420,id=root_port2,chassis=x1,slot=y1[,bus=pcie.0][,addr=z.1] \
+      -device ioh3420,id=root_port3,chassis=x2,slot=y2[,bus=pcie.0][,addr=z.2] \
+2.2.2 Plugging a PCI Express device into a Switch:
+      -device ioh3420,id=root_port1,chassis=x,slot=y[,bus=pcie.0][,addr=z]  \
+      -device x3130-upstream,id=upstream_port1,bus=root_port1[,addr=x]          \
+      -device xio3130-downstream,id=downstream_port1,bus=upstream_port1,chassis=x1,slot=y1[,addr=z1]] \
+      -device <dev>,bus=downstream_port1
+
+Notes:
+  - (slot, chassis) pair is mandatory and must be
+     unique for each PCI Express Root Port.
+  - 'addr' parameter can be 0 for all the examples above.
+
+
+2.3 PCI only hierarchy
+======================
+Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
+but, as mentioned in section 5, doing so means the legacy PCI
+device in question will be incapable of hot-unplugging.
+Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
+with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
+
+Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
+(having 32 slots) and several PCI-PCI Bridges attached to it
+(each supporting also 32 slots) will support hundreds of legacy devices.
+The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge
+until is full and then plug a new PCI-PCI Bridge...
+
+   pcie.0 bus
+   ----------------------------------------------
+        |                            |
+   -----------               ------------------
+   | PCI Dev |               | DMI-PCI BRIDGE |
+   ----------                ------------------
+                               |            |
+                  ------------------    ------------------
+                  | PCI-PCI Bridge |    | PCI-PCI Bridge |   ...
+                  ------------------    ------------------
+                                         |           |
+                                  -----------     -----------
+                                  | PCI Dev |     | PCI Dev |
+                                  -----------     -----------
+
+2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use:
+      -device <dev>[,bus=pcie.0]
+2.3.2 Plugging a PCI device into a PCI-PCI Bridge:
+      -device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0]                        \
+      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]   \
+      -device <dev>,bus=pci_bridge1[,addr=x]
+      Note that 'addr' cannot be 0 unless shpc=off parameter is passed to
+      the PCI Bridge.
+
+3. IO space issues
+===================
+The PCI Express Root Ports and PCI Express Downstream ports are seen by
+Firmware/Guest OS as PCI-PCI Bridges. As required by the PCI spec, each
+such Port should be reserved a 4K IO range for, even though only one
+(multifunction) device can be plugged into each Port. This results in
+poor IO space utilization.
+
+The firmware used by QEMU (SeaBIOS/OVMF) may try further optimizations
+by not allocating IO space for each PCI Express Root / PCI Express
+Downstream port if:
+    (1) the port is empty, or
+    (2) the device behind the port has no IO BARs.
+
+The IO space is very limited, to 65536 byte-wide IO ports, and may even be
+fragmented by fixed IO ports owned by platform devices resulting in at most
+10 PCI Express Root Ports or PCI Express Downstream Ports per system
+if devices with IO BARs are used in the PCI Express hierarchy. Using the
+proposed device placing strategy solves this issue by using only
+PCI Express devices within PCI Express hierarchy.
+
+The PCI Express spec requires that PCI Express devices work properly
+without using IO ports. The PCI hierarchy has no such limitations.
+
+
+4. Bus numbers issues
+======================
+Each PCI domain can have up to only 256 buses and the QEMU PCI Express
+machines do not support multiple PCI domains even if extra Root
+Complexes (pxb-pcie) are used.
+
+Each element of the PCI Express hierarchy (Root Complexes,
+PCI Express Root Ports, PCI Express Downstream/Upstream ports)
+uses one bus number. Since only one (multifunction) device
+can be attached to a PCI Express Root Port or PCI Express Downstream
+Port it is advised to plan in advance for the expected number of
+devices to prevent bus number starvation.
+
+Avoiding PCI Express Switches (and thereby striving for a 'flatter' PCI
+Express hierarchy) enables the hierarchy to not spend bus numbers on
+Upstream Ports.
+
+The bus_nr properties of the pxb-pcie devices partition the 0..255 bus
+number space. All bus numbers assigned to the buses recursively behind a
+given pxb-pcie device's root bus must fit between the bus_nr property of
+that pxb-pcie device, and the lowest of the higher bus_nr properties
+that the command line sets for other pxb-pcie devices.
+
+
+5. Hot-plug
+============
+The PCI Express root buses (pcie.0 and the buses exposed by pxb-pcie devices)
+do not support hot-plug, so any devices plugged into Root Complexes
+cannot be hot-plugged/hot-unplugged:
+    (1) PCI Express Integrated Endpoints
+    (2) PCI Express Root Ports
+    (3) DMI-PCI Bridges
+    (4) pxb-pcie
+
+Be aware that PCI Express Downstream Ports can't be hot-plugged into
+an existing PCI Express Upstream Port.
+
+PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI
+based and can work side by side with the PCI Express native hot-plug.
+
+PCI Express devices can be natively hot-plugged/hot-unplugged into/from
+PCI Express Root Ports (and PCI Express Downstream Ports).
+
+5.1 Planning for hot-plug:
+    (1) PCI hierarchy
+        Leave enough PCI-PCI Bridge slots empty or add one
+        or more empty PCI-PCI Bridges to the DMI-PCI Bridge.
+
+        For each such PCI-PCI Bridge the Guest Firmware is expected to reserve
+        4K IO space and 2M MMIO range to be used for all devices behind it.
+
+        Because of the hard IO limit of around 10 PCI Bridges (~ 40K space)
+        per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the
+        Integrated Endpoints. (The PCI Express Hierarchy needs no IO space).
+
+    (2) PCI Express hierarchy:
+        Leave enough PCI Express Root Ports empty. Use multifunction
+        PCI Express Root Ports (up to 8 ports per pcie.0 slot)
+        on the Root Complex(es), for keeping the
+        hierarchy as flat as possible, thereby saving PCI bus numbers.
+        Don't use PCI Express Switches if you don't have
+        to, each one of those uses an extra PCI bus (for its Upstream Port)
+        that could be put to better use with another Root Port or Downstream
+        Port, which may come handy for hot-plugging another device.
+
+
+5.3 Hot-plug example:
+Using HMP: (add -monitor stdio to QEMU command line)
+  device_add <dev>,id=<id>,bus=<PCI Express Root Port Id/PCI Express Downstream Port Id/PCI-PCI Bridge Id/>
+
+
+6. Device assignment
+====================
+Host devices are mostly PCI Express and should be plugged only into
+PCI Express Root Ports or PCI Express Downstream Ports.
+PCI-PCI Bridge slots can be used for legacy PCI host devices.
+
+6.1 How to detect if a device is PCI Express:
+  > lspci -s 03:00.0 -v (as root)
+
+    03:00.0 Network controller: Intel Corporation Wireless 7260 (rev 83)
+    Subsystem: Intel Corporation Dual Band Wireless-AC 7260
+    Flags: bus master, fast devsel, latency 0, IRQ 50
+    Memory at f0400000 (64-bit, non-prefetchable) [size=8K]
+    Capabilities: [c8] Power Management version 3
+    Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
+    Capabilities: [40] Express Endpoint, MSI 00
+    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    Capabilities: [100] Advanced Error Reporting
+    Capabilities: [140] Device Serial Number 7c-7a-91-ff-ff-90-db-20
+    Capabilities: [14c] Latency Tolerance Reporting
+    Capabilities: [154] Vendor Specific Information: ID=cafe Rev=1 Len=014 
+
+If you can see the "Express Endpoint" capability in the
+output, then the device is indeed PCI Express.
+
+
+7. Virtio devices
+=================
+Virtio devices plugged into the PCI hierarchy or as Integrated Endpoints
+will remain PCI and have transitional behaviour as default.
+Transitional virtio devices work in both IO and MMIO modes depending on
+the guest support. The Guest firmware will assign both IO and MMIO resources
+to transitional virtio devices.
+
+Virtio devices plugged into PCI Express ports are PCI Express devices and
+have "1.0" behavior by default without IO support.
+In both cases disable-legacy and disable-modern properties can be used
+to override the behaviour.
+
+Note that setting disable-legacy=off will enable legacy mode (enabling
+legacy behavior) for PCI Express virtio devices causing them to
+require IO space, which, given the limited available IO space, may quickly
+lead to resource exhaustion, and is therefore strongly discouraged.
+
+
+8. Conclusion
+==============
+The proposal offers a usage model that is easy to understand and follow
+and at the same time overcomes the PCI Express architecture limitations.
+