diff mbox

[v4,5/5] docs: update documentation considering PCIE-PCI bridge

Message ID 1501964858-5159-6-git-send-email-zuban32s@gmail.com
State New
Headers show

Commit Message

Aleksandr Bezzubikov Aug. 5, 2017, 8:27 p.m. UTC
Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com>
---
 docs/pcie.txt            |  49 +++++++++++----------
 docs/pcie_pci_bridge.txt | 110 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+), 23 deletions(-)
 create mode 100644 docs/pcie_pci_bridge.txt

Comments

Laszlo Ersek Aug. 8, 2017, 3:11 p.m. UTC | #1
one comment below

On 08/05/17 22:27, Aleksandr Bezzubikov wrote:

> +Capability layout (defined in include/hw/pci/pci_bridge.h):
> +
> +    uint8_t id;     Standard PCI capability header field
> +    uint8_t next;   Standard PCI capability header field
> +    uint8_t len;    Standard PCI vendor-specific capability header field
> +
> +    uint8_t type;   Red Hat vendor-specific capability type
> +                    List of currently existing types:
> +                        QEMU_RESERVE = 1
> +
> +
> +    uint32_t bus_res;   Minimum number of buses to reserve
> +
> +    uint64_t io;        IO space to reserve
> +    uint64_t mem        Non-prefetchable memory to reserve
> +    uint64_t mem_pref;  Prefetchable memory to reserve

(I apologize if I missed any concrete points from the past messages
regarding this structure.)

How is the firmware supposed to know whether the prefetchable MMIO
reservation should be made in 32-bit or 64-bit address space? If we
reserve prefetchable MMIO outside of the 32-bit address space, then
hot-plugging a device without 64-bit MMIO support could fail.

My earlier request, to distinguish "prefetchable_32" from
"prefetchable_64" (mutually exclusively), was so that firmware would
know whether to restrict the MMIO reservation to 32-bit address space.

This is based on an earlier email from Alex to me:

On 10/03/16 18:01, Alex Williamson wrote:
> I don't think there's such a thing as a 64-bit non-prefetchable
> aperture.  In fact, there are not separate 32 and 64 bit prefetchable
> apertures.  The apertures are:
>
> I/O base/limit - (default 16bit, may be 32bit)
> Memory base/limit - (32bit only, non-prefetchable)
> Prefetchable Memory base/limit - (default 32bit, may be 64bit)
>
> This is according to Table 3-2 in the PCI-to-PCI bridge spec rev 1.2.

I don't care much about the 16-bit vs. 32-bit IO difference (that's
entirely academic and the Platform Spec init doesn't even provide a way
for OVMF to express such a difference). However, the optional
restriction to 32-bit matters for the prefetchable MMIO aperture.

Other than this, the patch looks good to me, and I'm ready to R-b.

Thanks!
Laszlo
Aleksandr Bezzubikov Aug. 8, 2017, 7:21 p.m. UTC | #2
2017-08-08 18:11 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
> one comment below
>
> On 08/05/17 22:27, Aleksandr Bezzubikov wrote:
>
>> +Capability layout (defined in include/hw/pci/pci_bridge.h):
>> +
>> +    uint8_t id;     Standard PCI capability header field
>> +    uint8_t next;   Standard PCI capability header field
>> +    uint8_t len;    Standard PCI vendor-specific capability header field
>> +
>> +    uint8_t type;   Red Hat vendor-specific capability type
>> +                    List of currently existing types:
>> +                        QEMU_RESERVE = 1
>> +
>> +
>> +    uint32_t bus_res;   Minimum number of buses to reserve
>> +
>> +    uint64_t io;        IO space to reserve
>> +    uint64_t mem        Non-prefetchable memory to reserve
>> +    uint64_t mem_pref;  Prefetchable memory to reserve
>
> (I apologize if I missed any concrete points from the past messages
> regarding this structure.)
>
> How is the firmware supposed to know whether the prefetchable MMIO
> reservation should be made in 32-bit or 64-bit address space? If we
> reserve prefetchable MMIO outside of the 32-bit address space, then
> hot-plugging a device without 64-bit MMIO support could fail.
>
> My earlier request, to distinguish "prefetchable_32" from
> "prefetchable_64" (mutually exclusively), was so that firmware would
> know whether to restrict the MMIO reservation to 32-bit address space.

IIUC now (in SeaBIOS at least) we just assign this PREF registers
unconditionally,
so the decision about the mode can be made basing on !=0
UPPER_PREF_LIMIT register.
My idea was the same - we can just check if the value doesn't fit into
16-bit (PREF_LIMIT reg size, 32-bit MMIO). Do we really need separate
fields for that?

>
> This is based on an earlier email from Alex to me:
>
> On 10/03/16 18:01, Alex Williamson wrote:
>> I don't think there's such a thing as a 64-bit non-prefetchable
>> aperture.  In fact, there are not separate 32 and 64 bit prefetchable
>> apertures.  The apertures are:
>>
>> I/O base/limit - (default 16bit, may be 32bit)
>> Memory base/limit - (32bit only, non-prefetchable)
>> Prefetchable Memory base/limit - (default 32bit, may be 64bit)
>>
>> This is according to Table 3-2 in the PCI-to-PCI bridge spec rev 1.2.
>
> I don't care much about the 16-bit vs. 32-bit IO difference (that's
> entirely academic and the Platform Spec init doesn't even provide a way
> for OVMF to express such a difference). However, the optional
> restriction to 32-bit matters for the prefetchable MMIO aperture.
>
> Other than this, the patch looks good to me, and I'm ready to R-b.
>
> Thanks!
> Laszlo
Laszlo Ersek Aug. 9, 2017, 10:18 a.m. UTC | #3
On 08/08/17 21:21, Aleksandr Bezzubikov wrote:
> 2017-08-08 18:11 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
>> one comment below
>>
>> On 08/05/17 22:27, Aleksandr Bezzubikov wrote:
>>
>>> +Capability layout (defined in include/hw/pci/pci_bridge.h):
>>> +
>>> +    uint8_t id;     Standard PCI capability header field
>>> +    uint8_t next;   Standard PCI capability header field
>>> +    uint8_t len;    Standard PCI vendor-specific capability header field
>>> +
>>> +    uint8_t type;   Red Hat vendor-specific capability type
>>> +                    List of currently existing types:
>>> +                        QEMU_RESERVE = 1
>>> +
>>> +
>>> +    uint32_t bus_res;   Minimum number of buses to reserve
>>> +
>>> +    uint64_t io;        IO space to reserve
>>> +    uint64_t mem        Non-prefetchable memory to reserve
>>> +    uint64_t mem_pref;  Prefetchable memory to reserve
>>
>> (I apologize if I missed any concrete points from the past messages
>> regarding this structure.)
>>
>> How is the firmware supposed to know whether the prefetchable MMIO
>> reservation should be made in 32-bit or 64-bit address space? If we
>> reserve prefetchable MMIO outside of the 32-bit address space, then
>> hot-plugging a device without 64-bit MMIO support could fail.
>>
>> My earlier request, to distinguish "prefetchable_32" from
>> "prefetchable_64" (mutually exclusively), was so that firmware would
>> know whether to restrict the MMIO reservation to 32-bit address
>> space.
>
> IIUC now (in SeaBIOS at least) we just assign this PREF registers
> unconditionally,
> so the decision about the mode can be made basing on !=0
> UPPER_PREF_LIMIT register.
> My idea was the same - we can just check if the value doesn't fit into
> 16-bit (PREF_LIMIT reg size, 32-bit MMIO). Do we really need separate
> fields for that?

The PciBusDxe driver in edk2 tracks 32-bit and 64-bit MMIO resources
separately from each other, and other (independent) logic exists in it
that, on some conditions, allocates 64-bit MMIO BARs from 32-bit address
space. This is just to say that the distinction is intentional in
PciBusDxe.

Furthermore, the Platform Init spec v1.6 says the following (this is
what OVMF will have to comply with, in the "platform hook" called by
PciBusDxe):

> 12.6 PCI Hot Plug PCI Initialization Protocol
> EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding()
> ...
> Padding  The amount of resource padding that is required by the PCI
>          bus under the control of the specified HPC. Because the
>          caller does not know the size of this buffer, this buffer is
>          allocated by the callee and freed by the caller.
> ...
> The padding is returned in the form of ACPI (2.0 & 3.0) resource
> descriptors. The exact definition of each of the fields is the same as
> in the
> EFI_PCI_HOST_BRIDGE_RESOURCE_ALLOCATION_PROTOCOL.SubmitResources()
> function. See the section 10.8 for the definition of this function.

Following that pointer:

> 10.8 PCI HostBridge Code Definitions
> 10.8.2 PCI Host Bridge Resource Allocation Protocol
>
> Table 8. ACPI 2.0 & 3.0 QWORD Address Space Descriptor Usage
>
> Byte    Byte    Data  Description
> Offset  Length
> ...
> 0x03    0x01          Resource type:
>                         0: Memory range
>                         1: I/O range
>                         2: Bus number range
> ...
> 0x05    0x01          Type-specific flags. Ignored except as defined
>                       in Table 3-3 and Table 3-4 below.
>
> 0x06    0x08          Address Space Granularity. Used to differentiate
>                       between a 32-bit memory request and a 64-bit
>                       memory request. For a 32-bit memory request,
>                       this field should be set to 32. For a 64-bit
>                       memory request, this field should be set to 64.
>                       Ignored for I/O and bus resource requests.
>                       Ignored during GetProposedResources().

The "Table 3-3" and "Table 3-4" references under "Type-specific flags"
are out of date (spec bug); in reality those are:
- Table 10. I/O Resource Flag (Resource Type = 1) Usage,
- Table 11. Memory Resource Flag (Resource Type = 0) Usage.

The latter is relevant here:

> Table 11. Memory Resource Flag (Resource Type = 0) Usage
>
> Bits      Meaning
> ...
> Bit[2:1]  _MEM. Memory attributes.
>           Value and Meaning:
>             0 The memory is nonprefetchable.
>             1 Invalid.
>             2 Invalid.
>             3 The memory is prefetchable.
>           Note: The interpretation of these bits is somewhat different
>           from the ACPI Specification. According to the ACPI
>           Specification, a value of 0 implies noncacheable memory and
>           the value of 3 indicates prefetchable and cacheable memory.

So whatever OVMF sees in the capability, it must be able to translate to
the above representation.

Thanks
Laszlo

>
>>
>> This is based on an earlier email from Alex to me:
>>
>> On 10/03/16 18:01, Alex Williamson wrote:
>>> I don't think there's such a thing as a 64-bit non-prefetchable
>>> aperture.  In fact, there are not separate 32 and 64 bit
>>> prefetchable apertures.  The apertures are:
>>>
>>> I/O base/limit - (default 16bit, may be 32bit)
>>> Memory base/limit - (32bit only, non-prefetchable)
>>> Prefetchable Memory base/limit - (default 32bit, may be 64bit)
>>>
>>> This is according to Table 3-2 in the PCI-to-PCI bridge spec rev
>>> 1.2.
>>
>> I don't care much about the 16-bit vs. 32-bit IO difference (that's
>> entirely academic and the Platform Spec init doesn't even provide a
>> way for OVMF to express such a difference). However, the optional
>> restriction to 32-bit matters for the prefetchable MMIO aperture.
>>
>> Other than this, the patch looks good to me, and I'm ready to R-b.
>>
>> Thanks!
>> Laszlo
Aleksandr Bezzubikov Aug. 9, 2017, 4:52 p.m. UTC | #4
2017-08-09 13:18 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
> On 08/08/17 21:21, Aleksandr Bezzubikov wrote:
>> 2017-08-08 18:11 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
>>> one comment below
>>>
>>> On 08/05/17 22:27, Aleksandr Bezzubikov wrote:
>>>
>>>> +Capability layout (defined in include/hw/pci/pci_bridge.h):
>>>> +
>>>> +    uint8_t id;     Standard PCI capability header field
>>>> +    uint8_t next;   Standard PCI capability header field
>>>> +    uint8_t len;    Standard PCI vendor-specific capability header field
>>>> +
>>>> +    uint8_t type;   Red Hat vendor-specific capability type
>>>> +                    List of currently existing types:
>>>> +                        QEMU_RESERVE = 1
>>>> +
>>>> +
>>>> +    uint32_t bus_res;   Minimum number of buses to reserve
>>>> +
>>>> +    uint64_t io;        IO space to reserve
>>>> +    uint64_t mem        Non-prefetchable memory to reserve
>>>> +    uint64_t mem_pref;  Prefetchable memory to reserve
>>>
>>> (I apologize if I missed any concrete points from the past messages
>>> regarding this structure.)
>>>
>>> How is the firmware supposed to know whether the prefetchable MMIO
>>> reservation should be made in 32-bit or 64-bit address space? If we
>>> reserve prefetchable MMIO outside of the 32-bit address space, then
>>> hot-plugging a device without 64-bit MMIO support could fail.
>>>
>>> My earlier request, to distinguish "prefetchable_32" from
>>> "prefetchable_64" (mutually exclusively), was so that firmware would
>>> know whether to restrict the MMIO reservation to 32-bit address
>>> space.
>>
>> IIUC now (in SeaBIOS at least) we just assign this PREF registers
>> unconditionally,
>> so the decision about the mode can be made basing on !=0
>> UPPER_PREF_LIMIT register.
>> My idea was the same - we can just check if the value doesn't fit into
>> 16-bit (PREF_LIMIT reg size, 32-bit MMIO). Do we really need separate
>> fields for that?
>
> The PciBusDxe driver in edk2 tracks 32-bit and 64-bit MMIO resources
> separately from each other, and other (independent) logic exists in it
> that, on some conditions, allocates 64-bit MMIO BARs from 32-bit address
> space. This is just to say that the distinction is intentional in
> PciBusDxe.
>
> Furthermore, the Platform Init spec v1.6 says the following (this is
> what OVMF will have to comply with, in the "platform hook" called by
> PciBusDxe):
>
>> 12.6 PCI Hot Plug PCI Initialization Protocol
>> EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding()
>> ...
>> Padding  The amount of resource padding that is required by the PCI
>>          bus under the control of the specified HPC. Because the
>>          caller does not know the size of this buffer, this buffer is
>>          allocated by the callee and freed by the caller.
>> ...
>> The padding is returned in the form of ACPI (2.0 & 3.0) resource
>> descriptors. The exact definition of each of the fields is the same as
>> in the
>> EFI_PCI_HOST_BRIDGE_RESOURCE_ALLOCATION_PROTOCOL.SubmitResources()
>> function. See the section 10.8 for the definition of this function.
>
> Following that pointer:
>
>> 10.8 PCI HostBridge Code Definitions
>> 10.8.2 PCI Host Bridge Resource Allocation Protocol
>>
>> Table 8. ACPI 2.0 & 3.0 QWORD Address Space Descriptor Usage
>>
>> Byte    Byte    Data  Description
>> Offset  Length
>> ...
>> 0x03    0x01          Resource type:
>>                         0: Memory range
>>                         1: I/O range
>>                         2: Bus number range
>> ...
>> 0x05    0x01          Type-specific flags. Ignored except as defined
>>                       in Table 3-3 and Table 3-4 below.
>>
>> 0x06    0x08          Address Space Granularity. Used to differentiate
>>                       between a 32-bit memory request and a 64-bit
>>                       memory request. For a 32-bit memory request,
>>                       this field should be set to 32. For a 64-bit
>>                       memory request, this field should be set to 64.
>>                       Ignored for I/O and bus resource requests.
>>                       Ignored during GetProposedResources().
>
> The "Table 3-3" and "Table 3-4" references under "Type-specific flags"
> are out of date (spec bug); in reality those are:
> - Table 10. I/O Resource Flag (Resource Type = 1) Usage,
> - Table 11. Memory Resource Flag (Resource Type = 0) Usage.
>
> The latter is relevant here:
>
>> Table 11. Memory Resource Flag (Resource Type = 0) Usage
>>
>> Bits      Meaning
>> ...
>> Bit[2:1]  _MEM. Memory attributes.
>>           Value and Meaning:
>>             0 The memory is nonprefetchable.
>>             1 Invalid.
>>             2 Invalid.
>>             3 The memory is prefetchable.
>>           Note: The interpretation of these bits is somewhat different
>>           from the ACPI Specification. According to the ACPI
>>           Specification, a value of 0 implies noncacheable memory and
>>           the value of 3 indicates prefetchable and cacheable memory.
>
> So whatever OVMF sees in the capability, it must be able to translate to
> the above representation.

OK, I got it.
Then I suggest this part of the cap look like

uint64_t mem_pref_32;
uint64_t mem_pref_64;

'mem_pref_32' field can be uint32_t, but this will require 4-byte padding,
so what looks more preferable here - uint64_t for 32-bit value or
4-byte padding in the middle
of the capapbility?

>
> Thanks
> Laszlo
>
>>
>>>
>>> This is based on an earlier email from Alex to me:
>>>
>>> On 10/03/16 18:01, Alex Williamson wrote:
>>>> I don't think there's such a thing as a 64-bit non-prefetchable
>>>> aperture.  In fact, there are not separate 32 and 64 bit
>>>> prefetchable apertures.  The apertures are:
>>>>
>>>> I/O base/limit - (default 16bit, may be 32bit)
>>>> Memory base/limit - (32bit only, non-prefetchable)
>>>> Prefetchable Memory base/limit - (default 32bit, may be 64bit)
>>>>
>>>> This is according to Table 3-2 in the PCI-to-PCI bridge spec rev
>>>> 1.2.
>>>
>>> I don't care much about the 16-bit vs. 32-bit IO difference (that's
>>> entirely academic and the Platform Spec init doesn't even provide a
>>> way for OVMF to express such a difference). However, the optional
>>> restriction to 32-bit matters for the prefetchable MMIO aperture.
>>>
>>> Other than this, the patch looks good to me, and I'm ready to R-b.
>>>
>>> Thanks!
>>> Laszlo
Laszlo Ersek Aug. 9, 2017, 5:44 p.m. UTC | #5
On 08/09/17 18:52, Aleksandr Bezzubikov wrote:
> 2017-08-09 13:18 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
>> On 08/08/17 21:21, Aleksandr Bezzubikov wrote:
>>> 2017-08-08 18:11 GMT+03:00 Laszlo Ersek <lersek@redhat.com>:
>>>> one comment below
>>>>
>>>> On 08/05/17 22:27, Aleksandr Bezzubikov wrote:
>>>>
>>>>> +Capability layout (defined in include/hw/pci/pci_bridge.h):
>>>>> +
>>>>> +    uint8_t id;     Standard PCI capability header field
>>>>> +    uint8_t next;   Standard PCI capability header field
>>>>> +    uint8_t len;    Standard PCI vendor-specific capability header field
>>>>> +
>>>>> +    uint8_t type;   Red Hat vendor-specific capability type
>>>>> +                    List of currently existing types:
>>>>> +                        QEMU_RESERVE = 1
>>>>> +
>>>>> +
>>>>> +    uint32_t bus_res;   Minimum number of buses to reserve
>>>>> +
>>>>> +    uint64_t io;        IO space to reserve
>>>>> +    uint64_t mem        Non-prefetchable memory to reserve
>>>>> +    uint64_t mem_pref;  Prefetchable memory to reserve
>>>>
>>>> (I apologize if I missed any concrete points from the past messages
>>>> regarding this structure.)
>>>>
>>>> How is the firmware supposed to know whether the prefetchable MMIO
>>>> reservation should be made in 32-bit or 64-bit address space? If we
>>>> reserve prefetchable MMIO outside of the 32-bit address space, then
>>>> hot-plugging a device without 64-bit MMIO support could fail.
>>>>
>>>> My earlier request, to distinguish "prefetchable_32" from
>>>> "prefetchable_64" (mutually exclusively), was so that firmware would
>>>> know whether to restrict the MMIO reservation to 32-bit address
>>>> space.
>>>
>>> IIUC now (in SeaBIOS at least) we just assign this PREF registers
>>> unconditionally,
>>> so the decision about the mode can be made basing on !=0
>>> UPPER_PREF_LIMIT register.
>>> My idea was the same - we can just check if the value doesn't fit into
>>> 16-bit (PREF_LIMIT reg size, 32-bit MMIO). Do we really need separate
>>> fields for that?
>>
>> The PciBusDxe driver in edk2 tracks 32-bit and 64-bit MMIO resources
>> separately from each other, and other (independent) logic exists in it
>> that, on some conditions, allocates 64-bit MMIO BARs from 32-bit address
>> space. This is just to say that the distinction is intentional in
>> PciBusDxe.
>>
>> Furthermore, the Platform Init spec v1.6 says the following (this is
>> what OVMF will have to comply with, in the "platform hook" called by
>> PciBusDxe):
>>
>>> 12.6 PCI Hot Plug PCI Initialization Protocol
>>> EFI_PCI_HOT_PLUG_INIT_PROTOCOL.GetResourcePadding()
>>> ...
>>> Padding  The amount of resource padding that is required by the PCI
>>>          bus under the control of the specified HPC. Because the
>>>          caller does not know the size of this buffer, this buffer is
>>>          allocated by the callee and freed by the caller.
>>> ...
>>> The padding is returned in the form of ACPI (2.0 & 3.0) resource
>>> descriptors. The exact definition of each of the fields is the same as
>>> in the
>>> EFI_PCI_HOST_BRIDGE_RESOURCE_ALLOCATION_PROTOCOL.SubmitResources()
>>> function. See the section 10.8 for the definition of this function.
>>
>> Following that pointer:
>>
>>> 10.8 PCI HostBridge Code Definitions
>>> 10.8.2 PCI Host Bridge Resource Allocation Protocol
>>>
>>> Table 8. ACPI 2.0 & 3.0 QWORD Address Space Descriptor Usage
>>>
>>> Byte    Byte    Data  Description
>>> Offset  Length
>>> ...
>>> 0x03    0x01          Resource type:
>>>                         0: Memory range
>>>                         1: I/O range
>>>                         2: Bus number range
>>> ...
>>> 0x05    0x01          Type-specific flags. Ignored except as defined
>>>                       in Table 3-3 and Table 3-4 below.
>>>
>>> 0x06    0x08          Address Space Granularity. Used to differentiate
>>>                       between a 32-bit memory request and a 64-bit
>>>                       memory request. For a 32-bit memory request,
>>>                       this field should be set to 32. For a 64-bit
>>>                       memory request, this field should be set to 64.
>>>                       Ignored for I/O and bus resource requests.
>>>                       Ignored during GetProposedResources().
>>
>> The "Table 3-3" and "Table 3-4" references under "Type-specific flags"
>> are out of date (spec bug); in reality those are:
>> - Table 10. I/O Resource Flag (Resource Type = 1) Usage,
>> - Table 11. Memory Resource Flag (Resource Type = 0) Usage.
>>
>> The latter is relevant here:
>>
>>> Table 11. Memory Resource Flag (Resource Type = 0) Usage
>>>
>>> Bits      Meaning
>>> ...
>>> Bit[2:1]  _MEM. Memory attributes.
>>>           Value and Meaning:
>>>             0 The memory is nonprefetchable.
>>>             1 Invalid.
>>>             2 Invalid.
>>>             3 The memory is prefetchable.
>>>           Note: The interpretation of these bits is somewhat different
>>>           from the ACPI Specification. According to the ACPI
>>>           Specification, a value of 0 implies noncacheable memory and
>>>           the value of 3 indicates prefetchable and cacheable memory.
>>
>> So whatever OVMF sees in the capability, it must be able to translate to
>> the above representation.
> 
> OK, I got it.
> Then I suggest this part of the cap look like
> 
> uint64_t mem_pref_32;
> uint64_t mem_pref_64;
> 
> 'mem_pref_32' field can be uint32_t, but this will require 4-byte padding,
> so what looks more preferable here - uint64_t for 32-bit value or
> 4-byte padding in the middle
> of the capapbility?

The last field before this part is "uint64_t io", and it is naturally
aligned. So, how about:

- uint32_t mem;         /* non-prefetchable, 32-bit only */
- uint32_t mem_pref_32; /* prefetchable, 32-bit,
                         * mutually exclusive with mem_pref_64
                         */
- uint64_t mem_pref_64; /* prefetchable, 64-bit,
                         * mutually exclusive with mem_pref_32
                         */

Again, the comments to the right come from the email that I got earlier
from Alex Williamson (which he wrote "according to Table 3-2 in the
PCI-to-PCI bridge spec rev 1.2").

IOW, "mem" need not be uint64_t, it can be uint32_t just as well, and
then we don't need padding for "mem_pref_32" either.

(I also think "uint64_t io" is overkill, but I care precious little
about IO reservation, beyond *disabling* it :) I intend to implement
"io" as well, of course.)

Thanks!
Laszlo
diff mbox

Patch

diff --git a/docs/pcie.txt b/docs/pcie.txt
index 5bada24..76b85ec 100644
--- a/docs/pcie.txt
+++ b/docs/pcie.txt
@@ -46,7 +46,7 @@  Place only the following kinds of devices directly on the Root Complex:
     (2) PCI Express Root Ports (ioh3420), for starting exclusively PCI Express
         hierarchies.
 
-    (3) DMI-PCI Bridges (i82801b11-bridge), for starting legacy PCI
+    (3) PCI Express to PCI Bridge (pcie-pci-bridge), for starting legacy PCI
         hierarchies.
 
     (4) Extra Root Complexes (pxb-pcie), if multiple PCI Express Root Buses
@@ -55,18 +55,18 @@  Place only the following kinds of devices directly on the Root Complex:
    pcie.0 bus
    ----------------------------------------------------------------------------
         |                |                    |                  |
-   -----------   ------------------   ------------------   --------------
-   | PCI Dev |   | PCIe Root Port |   | DMI-PCI Bridge |   |  pxb-pcie  |
-   -----------   ------------------   ------------------   --------------
+   -----------   ------------------   -------------------   --------------
+   | PCI Dev |   | PCIe Root Port |   | PCIe-PCI Bridge |   |  pxb-pcie  |
+   -----------   ------------------   -------------------   --------------
 
 2.1.1 To plug a device into pcie.0 as a Root Complex Integrated Endpoint use:
           -device <dev>[,bus=pcie.0]
 2.1.2 To expose a new PCI Express Root Bus use:
           -device pxb-pcie,id=pcie.1,bus_nr=x[,numa_node=y][,addr=z]
-      Only PCI Express Root Ports and DMI-PCI bridges can be connected
-      to the pcie.1 bus:
+      PCI Express Root Ports and PCI Express to PCI bridges can be
+      connected to the pcie.1 bus:
           -device ioh3420,id=root_port1[,bus=pcie.1][,chassis=x][,slot=y][,addr=z]                                     \
-          -device i82801b11-bridge,id=dmi_pci_bridge1,bus=pcie.1
+          -device pcie-pci-bridge,id=pcie_pci_bridge1,bus=pcie.1
 
 
 2.2 PCI Express only hierarchy
@@ -130,24 +130,24 @@  Notes:
 Legacy PCI devices can be plugged into pcie.0 as Integrated Endpoints,
 but, as mentioned in section 5, doing so means the legacy PCI
 device in question will be incapable of hot-unplugging.
-Besides that use DMI-PCI Bridges (i82801b11-bridge) in combination
-with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
+Besides that use PCI Express to PCI Bridges (pcie-pci-bridge) in
+combination with PCI-PCI Bridges (pci-bridge) to start PCI hierarchies.
 
-Prefer flat hierarchies. For most scenarios a single DMI-PCI Bridge
+Prefer flat hierarchies. For most scenarios a single PCI Express to PCI Bridge
 (having 32 slots) and several PCI-PCI Bridges attached to it
 (each supporting also 32 slots) will support hundreds of legacy devices.
-The recommendation is to populate one PCI-PCI Bridge under the DMI-PCI Bridge
-until is full and then plug a new PCI-PCI Bridge...
+The recommendation is to populate one PCI-PCI Bridge under the
+PCI Express to PCI Bridge until is full and then plug a new PCI-PCI Bridge...
 
    pcie.0 bus
    ----------------------------------------------
         |                            |
-   -----------               ------------------
-   | PCI Dev |               | DMI-PCI BRIDGE |
-   ----------                ------------------
+   -----------               -------------------
+   | PCI Dev |               | PCIe-PCI Bridge |
+   -----------               -------------------
                                |            |
                   ------------------    ------------------
-                  | PCI-PCI Bridge |    | PCI-PCI Bridge |   ...
+                  | PCI-PCI Bridge |    | PCI-PCI Bridge |
                   ------------------    ------------------
                                          |           |
                                   -----------     -----------
@@ -157,11 +157,11 @@  until is full and then plug a new PCI-PCI Bridge...
 2.3.1 To plug a PCI device into pcie.0 as an Integrated Endpoint use:
       -device <dev>[,bus=pcie.0]
 2.3.2 Plugging a PCI device into a PCI-PCI Bridge:
-      -device i82801b11-bridge,id=dmi_pci_bridge1[,bus=pcie.0]                        \
-      -device pci-bridge,id=pci_bridge1,bus=dmi_pci_bridge1[,chassis_nr=x][,addr=y]   \
+      -device pcie-pci-bridge,id=pcie_pci_bridge1[,bus=pcie.0] \
+      -device pci-bridge,id=pci_bridge1,bus=pcie_pci_bridge1[,chassis_nr=x][,addr=y] \
       -device <dev>,bus=pci_bridge1[,addr=x]
       Note that 'addr' cannot be 0 unless shpc=off parameter is passed to
-      the PCI Bridge.
+      the PCI Bridge/PCI Express to PCI Bridge.
 
 3. IO space issues
 ===================
@@ -219,14 +219,16 @@  do not support hot-plug, so any devices plugged into Root Complexes
 cannot be hot-plugged/hot-unplugged:
     (1) PCI Express Integrated Endpoints
     (2) PCI Express Root Ports
-    (3) DMI-PCI Bridges
+    (3) PCI Express to PCI Bridges
     (4) pxb-pcie
 
 Be aware that PCI Express Downstream Ports can't be hot-plugged into
 an existing PCI Express Upstream Port.
 
-PCI devices can be hot-plugged into PCI-PCI Bridges. The PCI hot-plug is ACPI
-based and can work side by side with the PCI Express native hot-plug.
+PCI devices can be hot-plugged into PCI Express to PCI and PCI-PCI Bridges.
+The PCI hot-plug into PCI-PCI bridge is ACPI based, whereas hot-plug into
+PCI Express to PCI bridges is SHPC-based. They both can work side by side with
+the PCI Express native hot-plug.
 
 PCI Express devices can be natively hot-plugged/hot-unplugged into/from
 PCI Express Root Ports (and PCI Express Downstream Ports).
@@ -234,10 +236,11 @@  PCI Express Root Ports (and PCI Express Downstream Ports).
 5.1 Planning for hot-plug:
     (1) PCI hierarchy
         Leave enough PCI-PCI Bridge slots empty or add one
-        or more empty PCI-PCI Bridges to the DMI-PCI Bridge.
+        or more empty PCI-PCI Bridges to the PCI Express to PCI Bridge.
 
         For each such PCI-PCI Bridge the Guest Firmware is expected to reserve
         4K IO space and 2M MMIO range to be used for all devices behind it.
+        Appropriate PCI capability is designed, see pcie_pci_bridge.txt.
 
         Because of the hard IO limit of around 10 PCI Bridges (~ 40K space)
         per system don't use more than 9 PCI-PCI Bridges, leaving 4K for the
diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
new file mode 100644
index 0000000..89d6754
--- /dev/null
+++ b/docs/pcie_pci_bridge.txt
@@ -0,0 +1,110 @@ 
+Generic PCI Express to PCI Bridge
+================================
+
+Description
+===========
+PCIE-to-PCI bridge is a new method for legacy PCI
+hierarchies creation on Q35 machines.
+
+Previously Intel DMI-to-PCI bridge was used for this purpose.
+But due to its strict limitations - no support of hot-plug,
+no cross-platform and cross-architecture support - a new generic
+PCIE-to-PCI bridge should now be used for any legacy PCI device usage
+with PCI Express machine.
+
+This generic PCIE-PCI bridge is a cross-platform device,
+can be hot-plugged into appropriate root port (requires additional actions,
+see 'PCIE-PCI bridge hot-plug' section),
+and supports devices hot-plug into the bridge itself
+(with some limitations, see below).
+
+Hot-plug of legacy PCI devices into the bridge
+is provided by bridge's built-in Standard hot-plug Controller.
+Though it still has some limitations, see below.
+
+PCIE-PCI bridge hot-plug
+=======================
+Guest OSes require extra efforts to enable PCIE-PCI bridge hot-plug.
+Motivation - now on init any PCI Express root port which doesn't have
+any device plugged in, has no free buses reserved to provide any of them
+to a hot-plugged devices in future.
+
+To solve this problem we reserve additional buses on a firmware level.
+Currently only SeaBIOS is supported.
+The way of bus number to reserve delivery is special
+Red Hat vendor-specific PCI capability, added to the root port
+that is planned to have PCIE-PCI bridge hot-plugged in.
+
+Capability layout (defined in include/hw/pci/pci_bridge.h):
+
+    uint8_t id;     Standard PCI capability header field
+    uint8_t next;   Standard PCI capability header field
+    uint8_t len;    Standard PCI vendor-specific capability header field
+
+    uint8_t type;   Red Hat vendor-specific capability type
+                    List of currently existing types:
+                        QEMU_RESERVE = 1
+
+
+    uint32_t bus_res;   Minimum number of buses to reserve
+
+    uint64_t io;        IO space to reserve
+    uint64_t mem        Non-prefetchable memory to reserve
+    uint64_t mem_pref;  Prefetchable memory to reserve
+
+If any reservation field is equal to -1 then this kind of reservation is not
+needed and must be ignored by firmware.
+
+At the moment this capability is used only in QEMU generic PCIe root port
+(-device pcie-root-port). Capability construction function takes all reservation
+fields values from corresponding device properties. By default all of them are
+set to -1 to leave root port's default behavior unchanged.
+
+Usage
+=====
+A detailed command line would be:
+
+[qemu-bin + storage options] \
+-m 2G \
+-device ioh3420,bus=pcie.0,id=rp1 \
+-device ioh3420,bus=pcie.0,id=rp2 \
+-device pcie-root-port,bus=pcie.0,id=rp3,bus-reserve=1 \
+-device pcie-pci-bridge,id=br1,bus=rp1 \
+-device pcie-pci-bridge,id=br2,bus=rp2 \
+-device e1000,bus=br1,addr=8
+
+Then in monitor it's OK to execute next commands:
+device_add pcie-pci-bridge,id=br3,bus=rp3
+device_add e1000,bus=br2,addr=1
+device_add e1000,bus=br3,addr=1
+
+Here you have:
+ (1) Cold-plugged:
+    - Root ports: 1 QEMU generic root port with the capability mentioned above,
+                  2 ioh3420 root ports;
+    - 2 PCIE-PCI bridges plugged into 2 different root ports;
+    - e1000 plugged into the first bridge.
+ (2) Hot-plugged:
+    - PCIE-PCI bridge, plugged into QEMU generic root port;
+    - 2 e1000 cards, one plugged into the cold-plugged PCIE-PCI bridge,
+                     another plugged into the hot-plugged bridge.
+
+Limitations
+===========
+The PCIE-PCI bridge can be hot-plugged only into pcie-root-port that
+has proper 'bus-reserve' property value to provide secondary bus for the
+hot-plugged bridge.
+
+Windows 7 and older versions don't support hot-plug devices into the PCIE-PCI bridge.
+To enable device hot-plug into the bridge on Linux there're 3 ways:
+1) Build shpchp module with this patch http://www.spinics.net/lists/linux-pci/msg63052.html
+2) Use kernel 4.14+ where the patch mentioned above is already merged.
+3) Set 'msi' property to off - this forced the bridge to use legacy INTx,
+    which allows the bridge to notify the OS about hot-plug event without having
+    BUSMASTER set.
+
+Implementation
+==============
+The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates PCI Express
+features as a PCI Express device (is_express=1).
+