[V2] pci: fixes to allow booting from extra root pci buses.

Message ID	1434029828-31954-1-git-send-email-marcel@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Marcel Apfelbaum <marcel@redhat.com> To: seabios@seabios.org Date: Thu, 11 Jun 2015 16:37:08 +0300 Message-Id: <1434029828-31954-1-git-send-email-marcel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: marcel@redhat.com, kevin@koconnor.net, lersek@redhat.com, qemu-devel@nongnu.org, mst@redhat.com Subject: [Qemu-devel] [PATCH V2] pci: fixes to allow booting from extra root pci buses. Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Marcel Apfelbaum June 11, 2015, 1:37 p.m. UTC

The fixes solves the following issue:
The PXB device exposes a new  pci root bridge with the
fw path:  /pci-root@4/..., in which 4 is the root bus number.
Before this patch the fw path was wrongly computed:
    /pci-root@1/pci@i0cf8/...
Fix the above issues: Correct the bus number and remove the
extra host bridge description.

The IEEE Std 1275-1994:

  IEEE Standard for Boot (Initialization Configuration)
    Firmware: Core Requirements and Practices
      3.2.1.1 Node names
          Each node in the device tree is identified by a node name
          using the following notation:
              driver-name@unit-address:device-arguments

          The driver name field is a sequence of between one and 31
          letters [...]. By convention, this name includes the name of
          the device’s manufacturer and the device’s model name separated by
          a “,”.

          The unit address field is the text representation of the
          physical address of the device within the address space
          defined by its parent node. The form of the text
          representation is bus-dependent.
    3.2.1.2 Path names
        A particular node is uniquely identified by describing its position
        in the device tree by completely specifying the
        path from the root node through all intermediate nodes to the node
        in question. The textual representation of a
        such a path is called a device path. Device paths are composed as
        follows:
        /node-name0/node-name1/ ... /node-nameN
        When Open Firmware is searching for a particular node, and either
        the driver name or @unit-address portion of
        the node name is not given, Open Firmware shall arbitrarily choose a
        node matching the portion that is present.

Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
---
v1->v2:
  - fixed a bug preventing the boot of devices behind the main host bridge
    This approach leaves previous code paths intact so it will work the same
    as before if no PXB is present.
  - Added spec for fw path naming conventions. (Michael S. Tsirkin)

 src/boot.c   | 3 ++-
 src/hw/pci.c | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

Laszlo Ersek June 11, 2015, 1:57 p.m. UTC | #1

On 06/11/15 15:37, Marcel Apfelbaum wrote:
> The fixes solves the following issue:
> The PXB device exposes a new  pci root bridge with the
> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> Before this patch the fw path was wrongly computed:
>     /pci-root@1/pci@i0cf8/...
> Fix the above issues: Correct the bus number and remove the
> extra host bridge description.
> 
> The IEEE Std 1275-1994:
> 
>   IEEE Standard for Boot (Initialization Configuration)
>     Firmware: Core Requirements and Practices
>       3.2.1.1 Node names
>           Each node in the device tree is identified by a node name
>           using the following notation:
>               driver-name@unit-address:device-arguments
> 
>           The driver name field is a sequence of between one and 31
>           letters [...]. By convention, this name includes the name of
>           the device’s manufacturer and the device’s model name separated by
>           a “,”.
> 
>           The unit address field is the text representation of the
>           physical address of the device within the address space
>           defined by its parent node. The form of the text
>           representation is bus-dependent.
>     3.2.1.2 Path names
>         A particular node is uniquely identified by describing its position
>         in the device tree by completely specifying the
>         path from the root node through all intermediate nodes to the node
>         in question. The textual representation of a
>         such a path is called a device path. Device paths are composed as
>         follows:
>         /node-name0/node-name1/ ... /node-nameN
>         When Open Firmware is searching for a particular node, and either
>         the driver name or @unit-address portion of
>         the node name is not given, Open Firmware shall arbitrarily choose a
>         node matching the portion that is present.
> 
> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
> ---
> v1->v2:
>   - fixed a bug preventing the boot of devices behind the main host bridge
>     This approach leaves previous code paths intact so it will work the same
>     as before if no PXB is present.
>   - Added spec for fw path naming conventions. (Michael S. Tsirkin)
> 
>  src/boot.c   | 3 ++-
>  src/hw/pci.c | 2 +-
>  2 files changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/src/boot.c b/src/boot.c
> index ec59c37..e241d1c 100644
> --- a/src/boot.c
> +++ b/src/boot.c
> @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci)
>      } else {
>          if (pci->rootbus)
>              p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
> -        p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
> +        else
> +            p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
>      }
>  
>      int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
> diff --git a/src/hw/pci.c b/src/hw/pci.c
> index 0379b55..9e77af4 100644
> --- a/src/hw/pci.c
> +++ b/src/hw/pci.c
> @@ -133,7 +133,7 @@ pci_probe_devices(void)
>                  if (bus != lastbus)
>                      rootbuses++;
>                  lastbus = bus;
> -                rootbus = rootbuses;
> +                rootbus = bus;
>                  if (bus > MaxPCIBus)
>                      MaxPCIBus = bus;
>              } else {
> 

I think the commit message is somewhat overkill, but I'll leave that to
Michael. :)

Regardig the rootbus question. As far as I can see, the last hunk
changes the dev->rootbus assignment for parentless devices, so they pick
up the last bus rather than the number of buses found.

Then, the only difference this makes is in build_pci_path() -- I grepped
the tree for whole-word "rootbus".

(It's quite a serendipity that my v3 qemu patchset produces exactly this
pattern in OFW device paths, without my then-knowledge of the SeaBIOS
code. Although, admittedly, the fw_name := "pci-root" change there was
suggested by Marcel.)

So, unless I'm missing something:

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Kevin O'Connor June 11, 2015, 1:58 p.m. UTC | #2

On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> The fixes solves the following issue:
> The PXB device exposes a new  pci root bridge with the
> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> Before this patch the fw path was wrongly computed:
>     /pci-root@1/pci@i0cf8/...
> Fix the above issues: Correct the bus number and remove the
> extra host bridge description.

Why is that wrong?  The previous path looks correct to me.

> The IEEE Std 1275-1994:
> 
>   IEEE Standard for Boot (Initialization Configuration)
>     Firmware: Core Requirements and Practices
>       3.2.1.1 Node names
>           Each node in the device tree is identified by a node name
>           using the following notation:
>               driver-name@unit-address:device-arguments
> 
>           The driver name field is a sequence of between one and 31
>           letters [...]. By convention, this name includes the name of
>           the device’s manufacturer and the device’s model name separated by
>           a “,”.
> 
>           The unit address field is the text representation of the
>           physical address of the device within the address space
>           defined by its parent node. The form of the text
>           representation is bus-dependent.

Note the "physical address" part in the above.  Your patch changes the
"pci-root@" syntax to use a logical address instead of a physical
address.  That is, unless I've missed something, SeaBIOS today uses a
physical address (the n'th root bus) and the patch would change it to
use a logical address.

One of the goals of using an "openfirmware" like address was so that
they would be stable across boots (the same mechanism is also used
with coreboot).  Using a physical address is key for this, because
simply adding or removing a PCI device could cause the logical PCI
bridge enumeration to change - and that would mess up the bootorder
list if it was based on logical addresses.

-Kevin

Marcel Apfelbaum June 11, 2015, 2:12 p.m. UTC | #3

On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>> The fixes solves the following issue:
>> The PXB device exposes a new  pci root bridge with the
>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>> Before this patch the fw path was wrongly computed:
>>      /pci-root@1/pci@i0cf8/...
>> Fix the above issues: Correct the bus number and remove the
>> extra host bridge description.
>
> Why is that wrong?  The previous path looks correct to me.
The prev path includes both the extra root bridge and *then* the usual host bridge.
  /pci-root@1/pci@i0cf8/   ...
     ^ new       ^ regular  ^ devices

Since the new pci root bridge (and bus) is on "paralel" with the regular one.
it is not correct to add it to the path.

The architecture is:
  /<host bridge>/devices...
  /extra root bridge/devices...
  /extra root bridge/devices...
And not
/extra root bridge//<host bridge>/devices

Thanks,
Marcel



>
>> The IEEE Std 1275-1994:
>>
>>    IEEE Standard for Boot (Initialization Configuration)
>>      Firmware: Core Requirements and Practices
>>        3.2.1.1 Node names
>>            Each node in the device tree is identified by a node name
>>            using the following notation:
>>                driver-name@unit-address:device-arguments
>>
>>            The driver name field is a sequence of between one and 31
>>            letters [...]. By convention, this name includes the name of
>>            the device’s manufacturer and the device’s model name separated by
>>            a “,”.
>>
>>            The unit address field is the text representation of the
>>            physical address of the device within the address space
>>            defined by its parent node. The form of the text
>>            representation is bus-dependent.
>
> Note the "physical address" part in the above.  Your patch changes the
> "pci-root@" syntax to use a logical address instead of a physical
> address.  That is, unless I've missed something, SeaBIOS today uses a
> physical address (the n'th root bus) and the patch would change it to
> use a logical address.
>
> One of the goals of using an "openfirmware" like address was so that
> they would be stable across boots (the same mechanism is also used
> with coreboot).  Using a physical address is key for this, because
> simply adding or removing a PCI device could cause the logical PCI
> bridge enumeration to change - and that would mess up the bootorder
> list if it was based on logical addresses.
>
> -Kevin
>

Kevin O'Connor June 11, 2015, 2:24 p.m. UTC | #4

On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
> >On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> >>The fixes solves the following issue:
> >>The PXB device exposes a new  pci root bridge with the
> >>fw path:  /pci-root@4/..., in which 4 is the root bus number.
> >>Before this patch the fw path was wrongly computed:
> >>     /pci-root@1/pci@i0cf8/...
> >>Fix the above issues: Correct the bus number and remove the
> >>extra host bridge description.
> >
> >Why is that wrong?  The previous path looks correct to me.
> The prev path includes both the extra root bridge and *then* the usual host bridge.
>  /pci-root@1/pci@i0cf8/   ...
>     ^ new       ^ regular  ^ devices
> 
> Since the new pci root bridge (and bus) is on "paralel" with the regular one.
> it is not correct to add it to the path.
> 
> The architecture is:
>  /<host bridge>/devices...
>  /extra root bridge/devices...
>  /extra root bridge/devices...
> And not
> /extra root bridge//<host bridge>/devices

Your patch changed both the "/extra root bridge/devices..." part and
the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
correct IMO.

Does open-firmware have any examples for PCI paths and in particular
PCI paths when there are multiple root-buses?

It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
seems odd as the extra root bus is accessible via io accesses to
0x0cf8.

Another option would be to place the pci-root@1 behind the pci@i0cf8
as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
to the host bridge as in "/pci@i0cf8,1/...".

-Kevin

Laszlo Ersek June 11, 2015, 2:35 p.m. UTC | #5

On 06/11/15 15:58, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>> The fixes solves the following issue:
>> The PXB device exposes a new  pci root bridge with the
>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>> Before this patch the fw path was wrongly computed:
>>     /pci-root@1/pci@i0cf8/...
>> Fix the above issues: Correct the bus number and remove the
>> extra host bridge description.
> 
> Why is that wrong?  The previous path looks correct to me.
> 
>> The IEEE Std 1275-1994:
>>
>>   IEEE Standard for Boot (Initialization Configuration)
>>     Firmware: Core Requirements and Practices
>>       3.2.1.1 Node names
>>           Each node in the device tree is identified by a node name
>>           using the following notation:
>>               driver-name@unit-address:device-arguments
>>
>>           The driver name field is a sequence of between one and 31
>>           letters [...]. By convention, this name includes the name of
>>           the device’s manufacturer and the device’s model name separated by
>>           a “,”.
>>
>>           The unit address field is the text representation of the
>>           physical address of the device within the address space
>>           defined by its parent node. The form of the text
>>           representation is bus-dependent.
> 
> Note the "physical address" part in the above.  Your patch changes the
> "pci-root@" syntax to use a logical address instead of a physical
> address.  That is, unless I've missed something, SeaBIOS today uses a
> physical address (the n'th root bus) and the patch would change it to
> use a logical address.
> 
> One of the goals of using an "openfirmware" like address was so that
> they would be stable across boots (the same mechanism is also used
> with coreboot).  Using a physical address is key for this, because
> simply adding or removing a PCI device could cause the logical PCI
> bridge enumeration to change - and that would mess up the bootorder
> list if it was based on logical addresses.

There are two questions here. The first is the inclusion of the
"pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
The hunk that changes that is not your main concern, right? (And Marcel
just described that hunk in more detail.)

The other question is how "x" is selected in "pci-root@x".

On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
If you change that property from (say) 3 to 4, then the device paths
exported by QEMU will change. However, the location (in the PCI
hierarchy) of all the affected devices will *also* change at once, and
their auto-enumerated, firmware-side device paths will reflect that.
Therefore the new "bootorder" fw_cfg entries will match the freshly
generated firmware-side device paths.

So why is this not stable? If you change the hardware without
automatically updating any stashed firmware-side device paths, then
things will fall apart without "bootorder" entries in the picture anyway.

Also, assuming you key off "x" of the running counter that counts root
buses as they are found during enumeration, that's a possibility too,
but I don't see how it gives more stability. If you insert a new root
bus (with a device on it) between to preexistent ones, that will offset
all the "x" values for the root buses that come after it by one.

In UEFI at least (I'm not speaking about OVMF in particular, but the
UEFI spec), there is a "short-form device path" concept for hard drive
and USB boot options. For hard disks, it is practically a relative
device path that lacks the path fragment from the root node until just
before the GPT partition identifier. The idea being, if you plug your
SCSI controller in another PCI slot, the change in the full device path
will be local to the path fragment that is not captured in the
(persistent) boot option. The GPT GUID can identify the partition
uniquely in the system wherever it exists, so it can be booted even
without fully enumerating all devices and reproducing all the default
boot options.

Short of such a "uniquely identifying relative devpath" trick, I don't
think stability in firmware-stashed (ie. not regenerated) device paths
exists in general, if the underlying hardware configuration is changed.

In summary: I think we could modify both QEMU and OVMF to use the
"serial numbers" of the extra PCI root buses, in increasing bus number
order, instead of their actual bus numbers, for identifying them. That's
just a convention. Then the second hunk of this patch would not be
necessary for SeaBIOS. But I think this convention would be only less
logical, and not more stable.

Can you please elaborate? I'm confused.

Thanks
Laszlo

Marcel Apfelbaum June 11, 2015, 2:36 p.m. UTC | #6

On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
>> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>>> The fixes solves the following issue:
>>>> The PXB device exposes a new  pci root bridge with the
>>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>>> Before this patch the fw path was wrongly computed:
>>>>      /pci-root@1/pci@i0cf8/...
>>>> Fix the above issues: Correct the bus number and remove the
>>>> extra host bridge description.
>>>
>>> Why is that wrong?  The previous path looks correct to me.
>> The prev path includes both the extra root bridge and *then* the usual host bridge.
>>   /pci-root@1/pci@i0cf8/   ...
>>      ^ new       ^ regular  ^ devices
>>
>> Since the new pci root bridge (and bus) is on "paralel" with the regular one.
>> it is not correct to add it to the path.
>>
>> The architecture is:
>>   /<host bridge>/devices...
>>   /extra root bridge/devices...
>>   /extra root bridge/devices...
>> And not
>> /extra root bridge//<host bridge>/devices
>
> Your patch changed both the "/extra root bridge/devices..." part and
> the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
> correct IMO.
Why? @1 should be the unit address which is the text representation
of the physical address, in our case the slot. Since the bus number
in our case is 4, I think /pci-root@4/ is the 'correct' address.
>
> Does open-firmware have any examples for PCI paths and in particular
> PCI paths when there are multiple root-buses?
Maybe Laszlo can say more, but we both agreed that this would be the
berst representation of extra root buses on both OVMF and Seabios.
>
> It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
> seems odd as the extra root bus is accessible via io accesses to
> 0x0cf8.
While this is true, /pci-root@[...]/ may represent also other kind of host
bridges not only PXBs. But we can change this of course, as long as OVMF can also
work with it.

>
> Another option would be to place the pci-root@1 behind the pci@i0cf8
> as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
> to the host bridge as in "/pci@i0cf8,1/...".
The latest representation makes sense to me,  but "/pci@i0cf8,4/...", after comma
the bus number.

Laszlo, will this work for OVMF?

Thanks,
Marcel

>
> -Kevin
>

Laszlo Ersek June 11, 2015, 3 p.m. UTC | #7

On 06/11/15 16:36, Marcel Apfelbaum wrote:
> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
>> On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
>>> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
>>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>>>> The fixes solves the following issue:
>>>>> The PXB device exposes a new  pci root bridge with the
>>>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>>>> Before this patch the fw path was wrongly computed:
>>>>>      /pci-root@1/pci@i0cf8/...
>>>>> Fix the above issues: Correct the bus number and remove the
>>>>> extra host bridge description.
>>>>
>>>> Why is that wrong?  The previous path looks correct to me.
>>> The prev path includes both the extra root bridge and *then* the
>>> usual host bridge.
>>>   /pci-root@1/pci@i0cf8/   ...
>>>      ^ new       ^ regular  ^ devices
>>>
>>> Since the new pci root bridge (and bus) is on "paralel" with the
>>> regular one.
>>> it is not correct to add it to the path.
>>>
>>> The architecture is:
>>>   /<host bridge>/devices...
>>>   /extra root bridge/devices...
>>>   /extra root bridge/devices...
>>> And not
>>> /extra root bridge//<host bridge>/devices
>>
>> Your patch changed both the "/extra root bridge/devices..." part and
>> the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
>> correct IMO.
> Why? @1 should be the unit address which is the text representation
> of the physical address, in our case the slot. Since the bus number
> in our case is 4, I think /pci-root@4/ is the 'correct' address.
>>
>> Does open-firmware have any examples for PCI paths and in particular
>> PCI paths when there are multiple root-buses?
> Maybe Laszlo can say more, but we both agreed that this would be the
> berst representation of extra root buses on both OVMF and Seabios.

The

  PCI Bus Binding to:
  IEEE Std 1275-1994 Standard for Boot
  (Initialization Configuration) Firmware

document (binding) does speak about this, as far as I can see, in

  2.2.1. Physical Address Formats

It first gives a "Numerical Representation" in device tree format (same
thing as in DTB / FDT), and then a "Text Representation" with references
to "Numerical Representation". It is *completely* Greek to me. It took
me minutes of staring just to vaguely understand how the current

  i0cf8

unit address comes together.

I've always treated the OFW devpaths that QEMU generates only
*syntactically* conformant to the (base) OFW spec, and never considered
the particular bindings 100% binding. That said, if someone finds where
the PCI binding defines unit addresses for *root* buses, please let me
know, just for reference.

>> It's possible to replace the "pci@i0cf8" with "pci-root@1" but that
>> seems odd as the extra root bus is accessible via io accesses to
>> 0x0cf8.
> While this is true, /pci-root@[...]/ may represent also other kind of host
> bridges not only PXBs. But we can change this of course, as long as OVMF
> can also
> work with it.
> 
>>
>> Another option would be to place the pci-root@1 behind the pci@i0cf8
>> as in "/pci@i0cf8/pci-root@1/...".  Or, the root bus could be appended
>> to the host bridge as in "/pci@i0cf8,1/...".
> The latest representation makes sense to me,  but "/pci@i0cf8,4/...",
> after comma
> the bus number.
> 
> Laszlo, will this work for OVMF?

With the v3 patchset for QEMU, we could probably easily generate the
"i0cf8,4" unit address inside the PXB device model itself. (Of course
exactly what number should stand after the comma remains a question.)

Parsing it in OVMF is doable, albeit somewhat ugly.

In any case, I'm not convinced at all why this is a better idea than the
proposal in this patch.

Laszlo

Kevin O'Connor June 11, 2015, 4:48 p.m. UTC | #8

On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
> On 06/11/15 15:58, Kevin O'Connor wrote:
> > On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> >> The fixes solves the following issue:
> >> The PXB device exposes a new  pci root bridge with the
> >> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> >> Before this patch the fw path was wrongly computed:
> >>     /pci-root@1/pci@i0cf8/...
> >> Fix the above issues: Correct the bus number and remove the
> >> extra host bridge description.
> > 
> > Why is that wrong?  The previous path looks correct to me.
> > 
> >> The IEEE Std 1275-1994:
> >>
> >>   IEEE Standard for Boot (Initialization Configuration)
> >>     Firmware: Core Requirements and Practices
> >>       3.2.1.1 Node names
> >>           Each node in the device tree is identified by a node name
> >>           using the following notation:
> >>               driver-name@unit-address:device-arguments
> >>
> >>           The driver name field is a sequence of between one and 31
> >>           letters [...]. By convention, this name includes the name of
> >>           the device’s manufacturer and the device’s model name separated by
> >>           a “,”.
> >>
> >>           The unit address field is the text representation of the
> >>           physical address of the device within the address space
> >>           defined by its parent node. The form of the text
> >>           representation is bus-dependent.
> > 
> > Note the "physical address" part in the above.  Your patch changes the
> > "pci-root@" syntax to use a logical address instead of a physical
> > address.  That is, unless I've missed something, SeaBIOS today uses a
> > physical address (the n'th root bus) and the patch would change it to
> > use a logical address.
> > 
> > One of the goals of using an "openfirmware" like address was so that
> > they would be stable across boots (the same mechanism is also used
> > with coreboot).  Using a physical address is key for this, because
> > simply adding or removing a PCI device could cause the logical PCI
> > bridge enumeration to change - and that would mess up the bootorder
> > list if it was based on logical addresses.
> 
> There are two questions here. The first is the inclusion of the
> "pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
> The hunk that changes that is not your main concern, right? (And Marcel
> just described that hunk in more detail.)
> 
> The other question is how "x" is selected in "pci-root@x".
> 
> On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
> If you change that property from (say) 3 to 4, then the device paths
> exported by QEMU will change. However, the location (in the PCI
> hierarchy) of all the affected devices will *also* change at once, and
> their auto-enumerated, firmware-side device paths will reflect that.
> Therefore the new "bootorder" fw_cfg entries will match the freshly
> generated firmware-side device paths.
> 
> So why is this not stable? If you change the hardware without
> automatically updating any stashed firmware-side device paths, then
> things will fall apart without "bootorder" entries in the picture anyway.
> 
> Also, assuming you key off "x" of the running counter that counts root
> buses as they are found during enumeration, that's a possibility too,
> but I don't see how it gives more stability. If you insert a new root
> bus (with a device on it) between to preexistent ones, that will offset
> all the "x" values for the root buses that come after it by one.

The SeaBIOS code is used on both virtual machines and real machines.
The bus number is something that is generated by software and it is
not assured to be stable between boots.  (For example, if someone adds
a PCI device to their machine between boots then every bus number in
the system might be different on the next boot.)  The open firmware
paths go to great length to avoid arbitrary bus numbers today - for
example:

/pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0

Given the complexity to avoid arbitrary bus numbers I'm confused why
one would want to add them.

> In UEFI at least (I'm not speaking about OVMF in particular, but the
> UEFI spec), there is a "short-form device path" concept for hard drive
> and USB boot options. For hard disks, it is practically a relative
> device path that lacks the path fragment from the root node until just
> before the GPT partition identifier. The idea being, if you plug your
> SCSI controller in another PCI slot, the change in the full device path
> will be local to the path fragment that is not captured in the
> (persistent) boot option. The GPT GUID can identify the partition
> uniquely in the system wherever it exists, so it can be booted even
> without fully enumerating all devices and reproducing all the default
> boot options.
> 
> Short of such a "uniquely identifying relative devpath" trick, I don't
> think stability in firmware-stashed (ie. not regenerated) device paths
> exists in general, if the underlying hardware configuration is changed.

I'm not sure why you say that - it works just fine.  The open firmware
device paths relate a physical path to the given hardware and as long
as one doesn't alter that physical path it will be the same path on
every boot.  (Specifically, one can add or remove unrelated PCI
devices, USB devices, etc. without impacting the open firmware paths
to devices not modified.)

> In summary: I think we could modify both QEMU and OVMF to use the
> "serial numbers" of the extra PCI root buses, in increasing bus number
> order, instead of their actual bus numbers, for identifying them. That's
> just a convention. Then the second hunk of this patch would not be
> necessary for SeaBIOS. But I think this convention would be only less
> logical, and not more stable.
> 
> Can you please elaborate? I'm confused.

-Kevin

Kevin O'Connor June 11, 2015, 4:54 p.m. UTC | #9

On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
> >On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
> >>On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
> >>>On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> >>>>The fixes solves the following issue:
> >>>>The PXB device exposes a new  pci root bridge with the
> >>>>fw path:  /pci-root@4/..., in which 4 is the root bus number.
> >>>>Before this patch the fw path was wrongly computed:
> >>>>     /pci-root@1/pci@i0cf8/...
> >>>>Fix the above issues: Correct the bus number and remove the
> >>>>extra host bridge description.
> >>>
> >>>Why is that wrong?  The previous path looks correct to me.
> >>The prev path includes both the extra root bridge and *then* the usual host bridge.
> >>  /pci-root@1/pci@i0cf8/   ...
> >>     ^ new       ^ regular  ^ devices
> >>
> >>Since the new pci root bridge (and bus) is on "paralel" with the regular one.
> >>it is not correct to add it to the path.
> >>
> >>The architecture is:
> >>  /<host bridge>/devices...
> >>  /extra root bridge/devices...
> >>  /extra root bridge/devices...
> >>And not
> >>/extra root bridge//<host bridge>/devices
> >
> >Your patch changed both the "/extra root bridge/devices..." part and
> >the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
> >correct IMO.
> Why? @1 should be the unit address which is the text representation
> of the physical address, in our case the slot. Since the bus number
> in our case is 4, I think /pci-root@4/ is the 'correct' address.

On real machines, the firmware assigns the 4 - it's not a physical
address; it's a logical address (like all bus numbers in PCI).  The
firmware might assign a totally different number on the next boot.

-Kevin

Marcel Apfelbaum June 11, 2015, 5:46 p.m. UTC | #10

On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
>> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
>>> On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
>>>> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
>>>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>>>>> The fixes solves the following issue:
>>>>>> The PXB device exposes a new  pci root bridge with the
>>>>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>>>>> Before this patch the fw path was wrongly computed:
>>>>>>      /pci-root@1/pci@i0cf8/...
>>>>>> Fix the above issues: Correct the bus number and remove the
>>>>>> extra host bridge description.
>>>>>
>>>>> Why is that wrong?  The previous path looks correct to me.
>>>> The prev path includes both the extra root bridge and *then* the usual host bridge.
>>>>   /pci-root@1/pci@i0cf8/   ...
>>>>      ^ new       ^ regular  ^ devices
>>>>
>>>> Since the new pci root bridge (and bus) is on "paralel" with the regular one.
>>>> it is not correct to add it to the path.
>>>>
>>>> The architecture is:
>>>>   /<host bridge>/devices...
>>>>   /extra root bridge/devices...
>>>>   /extra root bridge/devices...
>>>> And not
>>>> /extra root bridge//<host bridge>/devices
>>>
>>> Your patch changed both the "/extra root bridge/devices..." part and
>>> the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
>>> correct IMO.
>> Why? @1 should be the unit address which is the text representation
>> of the physical address, in our case the slot. Since the bus number
>> in our case is 4, I think /pci-root@4/ is the 'correct' address.
>
> On real machines, the firmware assigns the 4 - it's not a physical
> address; it's a logical address (like all bus numbers in PCI).  The
> firmware might assign a totally different number on the next boot.
Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
try to understand it.

I looked up a real hardware machine and it seemed to me that the extra pci root numbers
are provided in the ACPI tables, meaning by the vendor, not the fw.
In this case QEMU is the vendor, i440fx is the machine, right?

I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*.
They are doing it for the pci-2-pci bridges of course.
I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range
and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding
PCI root buses.
Is my understanding incorrect?

Thanks,
Marcel






>
> -Kevin
>

Laszlo Ersek June 11, 2015, 6:34 p.m. UTC | #11

On 06/11/15 19:46, Marcel Apfelbaum wrote:
> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
>> On Thu, Jun 11, 2015 at 05:36:06PM +0300, Marcel Apfelbaum wrote:
>>> On 06/11/2015 05:24 PM, Kevin O'Connor wrote:
>>>> On Thu, Jun 11, 2015 at 05:12:33PM +0300, Marcel Apfelbaum wrote:
>>>>> On 06/11/2015 04:58 PM, Kevin O'Connor wrote:
>>>>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>>>>>> The fixes solves the following issue:
>>>>>>> The PXB device exposes a new  pci root bridge with the
>>>>>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>>>>>> Before this patch the fw path was wrongly computed:
>>>>>>>      /pci-root@1/pci@i0cf8/...
>>>>>>> Fix the above issues: Correct the bus number and remove the
>>>>>>> extra host bridge description.
>>>>>>
>>>>>> Why is that wrong?  The previous path looks correct to me.
>>>>> The prev path includes both the extra root bridge and *then* the
>>>>> usual host bridge.
>>>>>   /pci-root@1/pci@i0cf8/   ...
>>>>>      ^ new       ^ regular  ^ devices
>>>>>
>>>>> Since the new pci root bridge (and bus) is on "paralel" with the
>>>>> regular one.
>>>>> it is not correct to add it to the path.
>>>>>
>>>>> The architecture is:
>>>>>   /<host bridge>/devices...
>>>>>   /extra root bridge/devices...
>>>>>   /extra root bridge/devices...
>>>>> And not
>>>>> /extra root bridge//<host bridge>/devices
>>>>
>>>> Your patch changed both the "/extra root bridge/devices..." part and
>>>> the "@1" part.  The change of the "@1" in "/pci-root@1/" is not
>>>> correct IMO.
>>> Why? @1 should be the unit address which is the text representation
>>> of the physical address, in our case the slot. Since the bus number
>>> in our case is 4, I think /pci-root@4/ is the 'correct' address.
>>
>> On real machines, the firmware assigns the 4 - it's not a physical
>> address; it's a logical address (like all bus numbers in PCI).  The
>> firmware might assign a totally different number on the next boot.
> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> try to understand it.
> 
> I looked up a real hardware machine and it seemed to me that the extra
> pci root numbers
> are provided in the ACPI tables, meaning by the vendor, not the fw.
> In this case QEMU is the vendor, i440fx is the machine, right?
> 
> I am not aware that Seabios/OVMF are deciding the bus numbers for the
> *PCI roots*.
> They are doing it for the pci-2-pci bridges of course.
> I saw that Seabios is trying to "guess" the root-buses by going over all
> the 0-0xff range
> and probing all the slots, looking for devices. So it expects the hw to
> be hardwired regarding
> PCI root buses.

This is exactly how I understood it.

We're not interested in placing such bus numbers in device paths that
are assigned during PCI enumeration. (Like subordinate bus numbers.)
We're talking about the root bus numbers.

OVMF implements the same kind of probing that SeaBIOS does (based on
natural language description from Michael and Marcel, not on the actual
code). Devices on the root buses respond without any prior bus number
assignments. Therefore it makes sense to place those root bus numbers
into device paths.

The bus numbers assignable by the firmware come from the intervals
*between* the (fixed-in-hardware) root bus numbers. As I understand it,
for two adjacent root bus numbers R1 and R2, the both-sides-exclusive
interval (R1, R2) is available for secondary bus number assignment, to
non-root buses that are (recursively) behind PCI bridges that hang off
the R1 root bus (ie. the LHS of the interval).

We don't care about such firmware-assigned bus numbers at all, but R1,
R2 etc. must be communicated to the firmware *somehow* in order to
identify devices for booting.

Since R1, R2 etc are not *assigned* by the firmware, only detected (the
assignment happens in QEMU, and also by the hw vendor in case of
physical hardware), R1, R2 etc are permanent as long as the physical
configuration does not change. Hence they qualify for the physical
addressing nature of OFW device paths. We've just been looking for a
*syntax* to express them.

> Is my understanding incorrect?

FWIW I'm relieved that at least the two of us have been understanding
each other ;)

Laszlo

Laszlo Ersek June 11, 2015, 6:38 p.m. UTC | #12

On 06/11/15 18:48, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
>> On 06/11/15 15:58, Kevin O'Connor wrote:
>>> On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
>>>> The fixes solves the following issue:
>>>> The PXB device exposes a new  pci root bridge with the
>>>> fw path:  /pci-root@4/..., in which 4 is the root bus number.
>>>> Before this patch the fw path was wrongly computed:
>>>>     /pci-root@1/pci@i0cf8/...
>>>> Fix the above issues: Correct the bus number and remove the
>>>> extra host bridge description.
>>>
>>> Why is that wrong?  The previous path looks correct to me.
>>>
>>>> The IEEE Std 1275-1994:
>>>>
>>>>   IEEE Standard for Boot (Initialization Configuration)
>>>>     Firmware: Core Requirements and Practices
>>>>       3.2.1.1 Node names
>>>>           Each node in the device tree is identified by a node name
>>>>           using the following notation:
>>>>               driver-name@unit-address:device-arguments
>>>>
>>>>           The driver name field is a sequence of between one and 31
>>>>           letters [...]. By convention, this name includes the name of
>>>>           the device’s manufacturer and the device’s model name separated by
>>>>           a “,”.
>>>>
>>>>           The unit address field is the text representation of the
>>>>           physical address of the device within the address space
>>>>           defined by its parent node. The form of the text
>>>>           representation is bus-dependent.
>>>
>>> Note the "physical address" part in the above.  Your patch changes the
>>> "pci-root@" syntax to use a logical address instead of a physical
>>> address.  That is, unless I've missed something, SeaBIOS today uses a
>>> physical address (the n'th root bus) and the patch would change it to
>>> use a logical address.
>>>
>>> One of the goals of using an "openfirmware" like address was so that
>>> they would be stable across boots (the same mechanism is also used
>>> with coreboot).  Using a physical address is key for this, because
>>> simply adding or removing a PCI device could cause the logical PCI
>>> bridge enumeration to change - and that would mess up the bootorder
>>> list if it was based on logical addresses.
>>
>> There are two questions here. The first is the inclusion of the
>> "pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
>> The hunk that changes that is not your main concern, right? (And Marcel
>> just described that hunk in more detail.)
>>
>> The other question is how "x" is selected in "pci-root@x".
>>
>> On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
>> If you change that property from (say) 3 to 4, then the device paths
>> exported by QEMU will change. However, the location (in the PCI
>> hierarchy) of all the affected devices will *also* change at once, and
>> their auto-enumerated, firmware-side device paths will reflect that.
>> Therefore the new "bootorder" fw_cfg entries will match the freshly
>> generated firmware-side device paths.
>>
>> So why is this not stable? If you change the hardware without
>> automatically updating any stashed firmware-side device paths, then
>> things will fall apart without "bootorder" entries in the picture anyway.
>>
>> Also, assuming you key off "x" of the running counter that counts root
>> buses as they are found during enumeration, that's a possibility too,
>> but I don't see how it gives more stability. If you insert a new root
>> bus (with a device on it) between to preexistent ones, that will offset
>> all the "x" values for the root buses that come after it by one.
> 
> The SeaBIOS code is used on both virtual machines and real machines.
> The bus number is something that is generated by software

Not the root bus numbers, as far as I understand.

(Please see the rest of my reply in the other sub-thread.)

Thanks
Laszlo

> and it is
> not assured to be stable between boots.  (For example, if someone adds
> a PCI device to their machine between boots then every bus number in
> the system might be different on the next boot.)  The open firmware
> paths go to great length to avoid arbitrary bus numbers today - for
> example:
> 
> /pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
> 
> Given the complexity to avoid arbitrary bus numbers I'm confused why
> one would want to add them.
> 
>> In UEFI at least (I'm not speaking about OVMF in particular, but the
>> UEFI spec), there is a "short-form device path" concept for hard drive
>> and USB boot options. For hard disks, it is practically a relative
>> device path that lacks the path fragment from the root node until just
>> before the GPT partition identifier. The idea being, if you plug your
>> SCSI controller in another PCI slot, the change in the full device path
>> will be local to the path fragment that is not captured in the
>> (persistent) boot option. The GPT GUID can identify the partition
>> uniquely in the system wherever it exists, so it can be booted even
>> without fully enumerating all devices and reproducing all the default
>> boot options.
>>
>> Short of such a "uniquely identifying relative devpath" trick, I don't
>> think stability in firmware-stashed (ie. not regenerated) device paths
>> exists in general, if the underlying hardware configuration is changed.
> 
> I'm not sure why you say that - it works just fine.  The open firmware
> device paths relate a physical path to the given hardware and as long
> as one doesn't alter that physical path it will be the same path on
> every boot.  (Specifically, one can add or remove unrelated PCI
> devices, USB devices, etc. without impacting the open firmware paths
> to devices not modified.)
> 
>> In summary: I think we could modify both QEMU and OVMF to use the
>> "serial numbers" of the extra PCI root buses, in increasing bus number
>> order, instead of their actual bus numbers, for identifying them. That's
>> just a convention. Then the second hunk of this patch would not be
>> necessary for SeaBIOS. But I think this convention would be only less
>> logical, and not more stable.
>>
>> Can you please elaborate? I'm confused.
> 
> -Kevin
> 
> _______________________________________________
> SeaBIOS mailing list
> SeaBIOS@seabios.org
> http://www.seabios.org/mailman/listinfo/seabios
>

Kevin O'Connor June 11, 2015, 7:10 p.m. UTC | #13

On Thu, Jun 11, 2015 at 08:46:01PM +0300, Marcel Apfelbaum wrote:
> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> >On real machines, the firmware assigns the 4 - it's not a physical
> >address; it's a logical address (like all bus numbers in PCI).  The
> >firmware might assign a totally different number on the next boot.
> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> try to understand it.
> 
> I looked up a real hardware machine and it seemed to me that the extra pci root numbers
> are provided in the ACPI tables, meaning by the vendor, not the fw.
> In this case QEMU is the vendor, i440fx is the machine, right?
> 
> I am not aware that Seabios/OVMF are deciding the bus numbers for the *PCI roots*.

So, I'm also not an expert on this.  It seems to be a fairly esoteric
area of PC initialization.

My understanding is that extra PCI roots are configured by coreboot
outside of the normal PCI bridge mechanism.  They are configured by
assigning a base bus number and range (similar to the way PCI bridges
are configured).  All the PCI roots see all the PCI traffic, but they
only forward those requests that fall within their assigned bus range.

On each boot, coreboot might decide to assign a different bus id to
the extra roots (for example, if a device with a PCI bridge is
inserted and it's bus allocation causes bus ids to shift).
Technically, coreboot could even change the order extra buses are
assigned bus ids, but doesn't today.

This was seen on several AMD systems - I'm told at least some Intel
systems have multiple root buses, but the bus numbers are just hard
wired.

> They are doing it for the pci-2-pci bridges of course.
> I saw that Seabios is trying to "guess" the root-buses by going over all the 0-0xff range
> and probing all the slots, looking for devices. So it expects the hw to be hardwired regarding
> PCI root buses.
> Is my understanding incorrect?

SeaBIOS doesn't assign the extra PCI bus numbers on real hardware (nor
even regular PCI bridge numbers) - that's all handled by coreboot.

Under coreboot, SeaBIOS scans the PCI buses to figure out what
coreboot assigned - it doesn't mean the assignments are hard wired.

-Kevin

Kevin O'Connor June 11, 2015, 7:24 p.m. UTC | #14

On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
> On 06/11/15 19:46, Marcel Apfelbaum wrote:
> > On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> >> On real machines, the firmware assigns the 4 - it's not a physical
> >> address; it's a logical address (like all bus numbers in PCI).  The
> >> firmware might assign a totally different number on the next boot.
> > Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> > try to understand it.
> > 
> > I looked up a real hardware machine and it seemed to me that the extra
> > pci root numbers
> > are provided in the ACPI tables, meaning by the vendor, not the fw.
> > In this case QEMU is the vendor, i440fx is the machine, right?
> > 
> > I am not aware that Seabios/OVMF are deciding the bus numbers for the
> > *PCI roots*.
> > They are doing it for the pci-2-pci bridges of course.
> > I saw that Seabios is trying to "guess" the root-buses by going over all
> > the 0-0xff range
> > and probing all the slots, looking for devices. So it expects the hw to
> > be hardwired regarding
> > PCI root buses.
> 
> This is exactly how I understood it.
> 
> We're not interested in placing such bus numbers in device paths that
> are assigned during PCI enumeration. (Like subordinate bus numbers.)
> We're talking about the root bus numbers.
> 
> OVMF implements the same kind of probing that SeaBIOS does (based on
> natural language description from Michael and Marcel, not on the actual
> code). Devices on the root buses respond without any prior bus number
> assignments.

Alas, that is not correct.  Coreboot supports several AMD boards that
have multiple southbridge chips which provide independent PCI root
buses.  These chips have to be configured and assigned a bus number
prior to use (which coreboot does).

-Kevin

Gerd Hoffmann June 12, 2015, 6 a.m. UTC | #15

Hi,

> On each boot, coreboot might decide to assign a different bus id to
> the extra roots (for example, if a device with a PCI bridge is
> inserted and it's bus allocation causes bus ids to shift).
> Technically, coreboot could even change the order extra buses are
> assigned bus ids, but doesn't today.
> 
> This was seen on several AMD systems - I'm told at least some Intel
> systems have multiple root buses, but the bus numbers are just hard
> wired.

This is how the qemu pxb works: root bus numbers are a config option for
the root bridge device, i.e. from the guest point of view they are
hard-wired.

cheers,
  Gerd

Laszlo Ersek June 12, 2015, 9:25 a.m. UTC | #16

On 06/11/15 21:24, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
>> On 06/11/15 19:46, Marcel Apfelbaum wrote:
>>> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
>>>> On real machines, the firmware assigns the 4 - it's not a physical
>>>> address; it's a logical address (like all bus numbers in PCI).  The
>>>> firmware might assign a totally different number on the next boot.
>>> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
>>> try to understand it.
>>>
>>> I looked up a real hardware machine and it seemed to me that the extra
>>> pci root numbers
>>> are provided in the ACPI tables, meaning by the vendor, not the fw.
>>> In this case QEMU is the vendor, i440fx is the machine, right?
>>>
>>> I am not aware that Seabios/OVMF are deciding the bus numbers for the
>>> *PCI roots*.
>>> They are doing it for the pci-2-pci bridges of course.
>>> I saw that Seabios is trying to "guess" the root-buses by going over all
>>> the 0-0xff range
>>> and probing all the slots, looking for devices. So it expects the hw to
>>> be hardwired regarding
>>> PCI root buses.
>>
>> This is exactly how I understood it.
>>
>> We're not interested in placing such bus numbers in device paths that
>> are assigned during PCI enumeration. (Like subordinate bus numbers.)
>> We're talking about the root bus numbers.
>>
>> OVMF implements the same kind of probing that SeaBIOS does (based on
>> natural language description from Michael and Marcel, not on the actual
>> code). Devices on the root buses respond without any prior bus number
>> assignments.
> 
> Alas, that is not correct.  Coreboot supports several AMD boards that
> have multiple southbridge chips which provide independent PCI root
> buses.  These chips have to be configured and assigned a bus number
> prior to use (which coreboot does).

Thanks.

Assuming such a physical hardware configuration, and that Coreboot
configures the root buses before the SeaBIOS payload is launched: how
does Coreboot identify a device, on a nonzero root bus, for SeaBIOS to
boot from? Is that possible at all, or is the user expected to configure
/ select that in SeaBIOS exclusively?

Our use case does not include Coreboot (as far as I can tell), but I'm
trying to find some parallels here.

* In the "QEMU without Coreboot" case, QEMU is the component that sets
up the root buses for the firmware. Therefore it has all knowledge about
the root buses. OVMF is meant solely for QEMU "hardware", therefore it
has a full understanding with QEMU. QEMU can refer to root buses in the
"bootorder" fw_cfg file because it owns both the root buses and the
"bootorder" fw_cfg file, and OVMF can trust them to match.

* In the "physical hardware with Coreboot" case, Coreboot is the
component that sets up the root buses for the firmware (SeaBIOS).
Coreboot *could* refer to the root buses in some boot order file (a cbfs
file I guess?) -- if such a feature existed between Coreboot and SeaBIOS
-- because Coreboot would own both the root buses and the (theoretical)
cbfs boot order file. Hence SeaBIOS could trust them to match.

Assuming there is no such feature between Coreboot and SeaBIOS (ie. one
that would parallel our QEMU use case on physical hardware), what
solution would you find acceptable for the case when QEMU basically
promises "I know where you'll find those root buses, and the bootorder
fw_cfg file will match them"?

Could we simply make this patch conditional on runningOnQEMU()?

Thanks
Laszlo

Marcel Apfelbaum June 12, 2015, 12:17 p.m. UTC | #17

On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
>    Hi,
>
>> On each boot, coreboot might decide to assign a different bus id to
>> the extra roots (for example, if a device with a PCI bridge is
>> inserted and it's bus allocation causes bus ids to shift).
>> Technically, coreboot could even change the order extra buses are
>> assigned bus ids, but doesn't today.
>>
>> This was seen on several AMD systems - I'm told at least some Intel
>> systems have multiple root buses, but the bus numbers are just hard
>> wired.
>
> This is how the qemu pxb works: root bus numbers are a config option for
> the root bridge device, i.e. from the guest point of view they are
> hard-wired.
Exactly. In our case, the HW assigns the PXB bus bumber, and again, I saw this
also on real HW with multiple buses, the bus nr comes from ACPI, meaning the vendor.

Let's focus on the problem in hand:
We need a way for QEMU to write some fw path on bootorder fw_config file and
both Seabios/OVMF need to know how to correctly map this to the actual device.

If the boot device is behind a PXI extra root bus, there is a need not only to
differentiate the root bus but also to know *which one*. So we need the bus number,
what other way is there? As Gerd mentioned, the PXB bus number is provided in QEMU command line,
meaning hard-wired.

We can of course, as Laszlo suggested, add an extra condition the use of this path:
/pci-root@bus-br/ on running in QEMU in order not to interfere with other HW. Less pretty but
more robust.

Thanks,
Marcel

>
> cheers,
>    Gerd
>
>

Kevin O'Connor June 12, 2015, 1:03 p.m. UTC | #18

On Fri, Jun 12, 2015 at 11:25:50AM +0200, Laszlo Ersek wrote:
> On 06/11/15 21:24, Kevin O'Connor wrote:
> > On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
> >> On 06/11/15 19:46, Marcel Apfelbaum wrote:
> >>> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
> >>>> On real machines, the firmware assigns the 4 - it's not a physical
> >>>> address; it's a logical address (like all bus numbers in PCI).  The
> >>>> firmware might assign a totally different number on the next boot.
> >>> Now I am confused. Don't get me wrong, I am not an expert on fw, I hardly
> >>> try to understand it.
> >>>
> >>> I looked up a real hardware machine and it seemed to me that the extra
> >>> pci root numbers
> >>> are provided in the ACPI tables, meaning by the vendor, not the fw.
> >>> In this case QEMU is the vendor, i440fx is the machine, right?
> >>>
> >>> I am not aware that Seabios/OVMF are deciding the bus numbers for the
> >>> *PCI roots*.
> >>> They are doing it for the pci-2-pci bridges of course.
> >>> I saw that Seabios is trying to "guess" the root-buses by going over all
> >>> the 0-0xff range
> >>> and probing all the slots, looking for devices. So it expects the hw to
> >>> be hardwired regarding
> >>> PCI root buses.
> >>
> >> This is exactly how I understood it.
> >>
> >> We're not interested in placing such bus numbers in device paths that
> >> are assigned during PCI enumeration. (Like subordinate bus numbers.)
> >> We're talking about the root bus numbers.
> >>
> >> OVMF implements the same kind of probing that SeaBIOS does (based on
> >> natural language description from Michael and Marcel, not on the actual
> >> code). Devices on the root buses respond without any prior bus number
> >> assignments.
> > 
> > Alas, that is not correct.  Coreboot supports several AMD boards that
> > have multiple southbridge chips which provide independent PCI root
> > buses.  These chips have to be configured and assigned a bus number
> > prior to use (which coreboot does).
> 
> Thanks.
> 
> Assuming such a physical hardware configuration, and that Coreboot
> configures the root buses before the SeaBIOS payload is launched: how
> does Coreboot identify a device, on a nonzero root bus, for SeaBIOS to
> boot from? Is that possible at all, or is the user expected to configure
> / select that in SeaBIOS exclusively?

Coreboot does not provide information on what to boot.  It's task is
low level hardware initialization.  It's the job of SeaBIOS to boot
the OS (and determine which media, etc to boot from).  SeaBIOS gets
boot preference information from a static configuration file
(bootorder) stored in flash (cbfs).

> Assuming there is no such feature between Coreboot and SeaBIOS (ie. one
> that would parallel our QEMU use case on physical hardware), what
> solution would you find acceptable for the case when QEMU basically
> promises "I know where you'll find those root buses, and the bootorder
> fw_cfg file will match them"?

We currently go to great lengths to avoid logical identifiers in
bootorder and I'm confused why we would wish to add them now.  Bus
number is not currently used anywhere in bootorder because (in the
general case) it's an arbitrary identifier that's not stable between
boots and (in the general case) may not be stable even within a boot.

I understand that in this specific case (extra root buses on QEMU) it
is stable within a boot, but it seems strange that we'd want to define
the interface knowing it's a poor choice in the general case.

As for what I would suggest - well, SeaBIOS has already supported
multiple root buses for years and already has a mechanism for
deterministically specifying a device on an extra root bus.  (By
specifying the N'th extra root bus instead of specifying the logical
id given to that bus).  This is by no means a perfect solution and
it's certainly open to change - but the current proposed patches
appear to be regressions to me.

> Could we simply make this patch conditional on runningOnQEMU()?

It's possible.  I'd certainly prefer to avoid adding special cases if
possible.

-Kevin

Kevin O'Connor June 12, 2015, 1:23 p.m. UTC | #19

On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
> On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
> >>On each boot, coreboot might decide to assign a different bus id to
> >>the extra roots (for example, if a device with a PCI bridge is
> >>inserted and it's bus allocation causes bus ids to shift).
> >>Technically, coreboot could even change the order extra buses are
> >>assigned bus ids, but doesn't today.
> >>
> >>This was seen on several AMD systems - I'm told at least some Intel
> >>systems have multiple root buses, but the bus numbers are just hard
> >>wired.
> >
> >This is how the qemu pxb works: root bus numbers are a config option for
> >the root bridge device, i.e. from the guest point of view they are
> >hard-wired.
> Exactly. In our case, the HW assigns the PXB bus bumber, and again,
> I saw this also on real HW with multiple buses, the bus nr comes
> from ACPI, meaning the vendor.

I'm confused where ACPI comes into this.  In all cases I know of, the
firmware generates the ACPI tables to match the hardware.  I've never
heard of hardware configuring itself from the ACPI tables.

> Let's focus on the problem in hand: We need a way for QEMU to write
> some fw path on bootorder fw_config file and both Seabios/OVMF need
> to know how to correctly map this to the actual device.
> 
> If the boot device is behind a PXI extra root bus, there is a need
> not only to differentiate the root bus but also to know *which
> one*. So we need the bus number, what other way is there?

The submitted patch changed the mechanism already in SeaBIOS.  I'm not
claiming the existing mechanism was perfect, but lets not claim that
it's not possible either.

>As Gerd
> mentioned, the PXB bus number is provided in QEMU command line,
> meaning hard-wired.
> 
> We can of course, as Laszlo suggested, add an extra condition the
> use of this path: /pci-root@bus-br/ on running in QEMU in order not
> to interfere with other HW. Less pretty but more robust.

-Kevin

Laszlo Ersek June 12, 2015, 3:45 p.m. UTC | #20

On 06/12/15 15:03, Kevin O'Connor wrote
> On Fri, Jun 12, 2015 at 11:25:50AM +0200, Laszlo Ersek wrote:
>> On 06/11/15 21:24, Kevin O'Connor wrote:
>>> On Thu, Jun 11, 2015 at 08:34:56PM +0200, Laszlo Ersek wrote:
>>>> On 06/11/15 19:46, Marcel Apfelbaum wrote:
>>>>> On 06/11/2015 07:54 PM, Kevin O'Connor wrote:
>>>>>> On real machines, the firmware assigns the 4 - it's not a
>>>>>> physical address; it's a logical address (like all bus numbers in
>>>>>> PCI).  The firmware might assign a totally different number on
>>>>>> the next boot.
>>>>> Now I am confused. Don't get me wrong, I am not an expert on fw, I
>>>>> hardly try to understand it.
>>>>>
>>>>> I looked up a real hardware machine and it seemed to me that the
>>>>> extra pci root numbers are provided in the ACPI tables, meaning by
>>>>> the vendor, not the fw. In this case QEMU is the vendor, i440fx is
>>>>> the machine, right?
>>>>>
>>>>> I am not aware that Seabios/OVMF are deciding the bus numbers for
>>>>> the *PCI roots*.
>>>>> They are doing it for the pci-2-pci bridges of course.
>>>>> I saw that Seabios is trying to "guess" the root-buses by going
>>>>> over all the 0-0xff range and probing all the slots, looking for
>>>>> devices. So it expects the hw to be hardwired regarding PCI root
>>>>> buses.
>>>>
>>>> This is exactly how I understood it.
>>>>
>>>> We're not interested in placing such bus numbers in device paths
>>>> that are assigned during PCI enumeration. (Like subordinate bus
>>>> numbers.) We're talking about the root bus numbers.
>>>>
>>>> OVMF implements the same kind of probing that SeaBIOS does (based
>>>> on natural language description from Michael and Marcel, not on the
>>>> actual code). Devices on the root buses respond without any prior
>>>> bus number assignments.
>>>
>>> Alas, that is not correct.  Coreboot supports several AMD boards
>>> that have multiple southbridge chips which provide independent PCI
>>> root buses.  These chips have to be configured and assigned a bus
>>> number prior to use (which coreboot does).
>>
>> Thanks.
>>
>> Assuming such a physical hardware configuration, and that Coreboot
>> configures the root buses before the SeaBIOS payload is launched: how
>> does Coreboot identify a device, on a nonzero root bus, for SeaBIOS
>> to boot from? Is that possible at all, or is the user expected to
>> configure / select that in SeaBIOS exclusively?
>
> Coreboot does not provide information on what to boot.  It's task is
> low level hardware initialization.  It's the job of SeaBIOS to boot
> the OS (and determine which media, etc to boot from).  SeaBIOS gets
> boot preference information from a static configuration file
> (bootorder) stored in flash (cbfs).
>
>> Assuming there is no such feature between Coreboot and SeaBIOS (ie.
>> one that would parallel our QEMU use case on physical hardware), what
>> solution would you find acceptable for the case when QEMU basically
>> promises "I know where you'll find those root buses, and the
>> bootorder fw_cfg file will match them"?
>
> We currently go to great lengths to avoid logical identifiers in
> bootorder and I'm confused why we would wish to add them now.  Bus
> number is not currently used anywhere in bootorder because (in the
> general case) it's an arbitrary identifier that's not stable between
> boots and (in the general case) may not be stable even within a boot.
>
> I understand that in this specific case (extra root buses on QEMU) it
> is stable within a boot, but it seems strange that we'd want to define
> the interface knowing it's a poor choice in the general case.
>
> As for what I would suggest - well, SeaBIOS has already supported
> multiple root buses for years and already has a mechanism for
> deterministically specifying a device on an extra root bus.  (By
> specifying the N'th extra root bus instead of specifying the logical
> id given to that bus).  This is by no means a perfect solution and
> it's certainly open to change - but the current proposed patches
> appear to be regressions to me.
>
>> Could we simply make this patch conditional on runningOnQEMU()?
>
> It's possible.  I'd certainly prefer to avoid adding special cases if
> possible.

Okay. Let's compare the two options we appear to have:

(1) A patch like this for SeaBIOS:

> diff --git a/src/boot.c b/src/boot.c
> index ec59c37..c7fd091 100644
> --- a/src/boot.c
> +++ b/src/boot.c
> @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci)
>      } else {
>          if (pci->rootbus)
>              p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
> -        p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
> +        if (!runningOnQEMU() || !pci->rootbus)
> +            p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
>      }
>
>      int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
> diff --git a/src/hw/pci.c b/src/hw/pci.c
> index 0379b55..169a040 100644
> --- a/src/hw/pci.c
> +++ b/src/hw/pci.c
> @@ -13,6 +13,7 @@
>  #include "string.h" // memset
>  #include "util.h" // udelay
>  #include "x86.h" // outl
> +#include "fw/paravirt.h" // runningOnQEMU
>
>  void pci_config_writel(u16 bdf, u32 addr, u32 val)
>  {
> @@ -133,7 +134,7 @@ pci_probe_devices(void)
>                  if (bus != lastbus)
>                      rootbuses++;
>                  lastbus = bus;
> -                rootbus = rootbuses;
> +                rootbus = runningOnQEMU() ? bus : rootbuses;
>                  if (bus > MaxPCIBus)
>                      MaxPCIBus = bus;
>              } else {

(2) The QEMU command line and the effects the command line has on the
virtual hardware should not change. However, all of the following have
to be updated:

- the "explicit_ofw_unit_address" property assignments in
  pxb_dev_initfn() have to be done separately, after all PXBs have been
  seen, and sorted. This probably requires another "machine init done"
  notifier.

- the _UID assignments in build_ssdt() need to reflect the exact same
  values

- OVMF's root bridge driver needs to generate the same _UID values in
  the PciRoot() device path nodes

- OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/...
  format, where the root bus is the N'th extra root bus (in hex
  notation).

Basically, we need to keep the bus_nr=N user interface, and the effects
it has on the virtual hardware, intact, but translate the numbers that
are exposed via fw_cfg *and* ACPI (because those must match!) from
"identifier" to "serial number after sorting by identifier"; in practice
replicating the detection traversal that SeaBIOS does.

Doesn't seem impossible (unless Marcel raises a design-level issue with
it), but I'll have to withdraw for a while and research it.

Thanks
Laszlo

Kevin O'Connor June 12, 2015, 6:40 p.m. UTC | #21

On Fri, Jun 12, 2015 at 05:45:04PM +0200, Laszlo Ersek wrote:
> On 06/12/15 15:03, Kevin O'Connor wrote
> > As for what I would suggest - well, SeaBIOS has already supported
> > multiple root buses for years and already has a mechanism for
> > deterministically specifying a device on an extra root bus.  (By
> > specifying the N'th extra root bus instead of specifying the logical
> > id given to that bus).  This is by no means a perfect solution and
> > it's certainly open to change - but the current proposed patches
> > appear to be regressions to me.
> >
> >> Could we simply make this patch conditional on runningOnQEMU()?
> >
> > It's possible.  I'd certainly prefer to avoid adding special cases if
> > possible.
> 
> Okay. Let's compare the two options we appear to have:
> 
> (1) A patch like this for SeaBIOS:
> 
> > diff --git a/src/boot.c b/src/boot.c
> > index ec59c37..c7fd091 100644
> > --- a/src/boot.c
> > +++ b/src/boot.c
> > @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci)
> >      } else {
> >          if (pci->rootbus)
> >              p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
> > -        p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
> > +        if (!runningOnQEMU() || !pci->rootbus)
> > +            p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
> >      }
> >
> >      int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
> > diff --git a/src/hw/pci.c b/src/hw/pci.c
> > index 0379b55..169a040 100644
> > --- a/src/hw/pci.c
> > +++ b/src/hw/pci.c
> > @@ -13,6 +13,7 @@
> >  #include "string.h" // memset
> >  #include "util.h" // udelay
> >  #include "x86.h" // outl
> > +#include "fw/paravirt.h" // runningOnQEMU
> >
> >  void pci_config_writel(u16 bdf, u32 addr, u32 val)
> >  {
> > @@ -133,7 +134,7 @@ pci_probe_devices(void)
> >                  if (bus != lastbus)
> >                      rootbuses++;
> >                  lastbus = bus;
> > -                rootbus = rootbuses;
> > +                rootbus = runningOnQEMU() ? bus : rootbuses;
> >                  if (bus > MaxPCIBus)
> >                      MaxPCIBus = bus;
> >              } else {

If we went down this path, I hope we could agree on the same prefix
and thus limit the runningOnQEMU() to just the second part.  Was there
a concern with "/pci@i0cf8,%x/"?

> (2) The QEMU command line and the effects the command line has on the
> virtual hardware should not change. However, all of the following have
> to be updated:
> 
> - the "explicit_ofw_unit_address" property assignments in
>   pxb_dev_initfn() have to be done separately, after all PXBs have been
>   seen, and sorted. This probably requires another "machine init done"
>   notifier.

I admit the sorting of pxb objects just to reverse engineer what
SeaBIOS expects would not be fun.  Doesn't QEMU have to sort the buses
anyway to know which secondary bus ranges are associated with each
root bus?

> - the _UID assignments in build_ssdt() need to reflect the exact same
>   values
> 
> - OVMF's root bridge driver needs to generate the same _UID values in
>   the PciRoot() device path nodes
> 
> - OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/...
>   format, where the root bus is the N'th extra root bus (in hex
>   notation).
> 
> Basically, we need to keep the bus_nr=N user interface, and the effects
> it has on the virtual hardware, intact, but translate the numbers that
> are exposed via fw_cfg *and* ACPI (because those must match!) from
> "identifier" to "serial number after sorting by identifier"; in practice
> replicating the detection traversal that SeaBIOS does.

Why does fw_cfg and ACPI have to match?

-Kevin

Laszlo Ersek June 12, 2015, 8:13 p.m. UTC | #22

On 06/12/15 20:40, Kevin O'Connor wrote:
> On Fri, Jun 12, 2015 at 05:45:04PM +0200, Laszlo Ersek wrote:
>> On 06/12/15 15:03, Kevin O'Connor wrote
>>> As for what I would suggest - well, SeaBIOS has already supported
>>> multiple root buses for years and already has a mechanism for
>>> deterministically specifying a device on an extra root bus.  (By
>>> specifying the N'th extra root bus instead of specifying the logical
>>> id given to that bus).  This is by no means a perfect solution and
>>> it's certainly open to change - but the current proposed patches
>>> appear to be regressions to me.
>>>
>>>> Could we simply make this patch conditional on runningOnQEMU()?
>>>
>>> It's possible.  I'd certainly prefer to avoid adding special cases if
>>> possible.
>>
>> Okay. Let's compare the two options we appear to have:
>>
>> (1) A patch like this for SeaBIOS:
>>
>>> diff --git a/src/boot.c b/src/boot.c
>>> index ec59c37..c7fd091 100644
>>> --- a/src/boot.c
>>> +++ b/src/boot.c
>>> @@ -114,7 +114,8 @@ build_pci_path(char *buf, int max, const char *devname, struct pci_device *pci)
>>>      } else {
>>>          if (pci->rootbus)
>>>              p += snprintf(p, max, "/pci-root@%x", pci->rootbus);
>>> -        p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
>>> +        if (!runningOnQEMU() || !pci->rootbus)
>>> +            p += snprintf(p, buf+max-p, "%s", FW_PCI_DOMAIN);
>>>      }
>>>
>>>      int dev = pci_bdf_to_dev(pci->bdf), fn = pci_bdf_to_fn(pci->bdf);
>>> diff --git a/src/hw/pci.c b/src/hw/pci.c
>>> index 0379b55..169a040 100644
>>> --- a/src/hw/pci.c
>>> +++ b/src/hw/pci.c
>>> @@ -13,6 +13,7 @@
>>>  #include "string.h" // memset
>>>  #include "util.h" // udelay
>>>  #include "x86.h" // outl
>>> +#include "fw/paravirt.h" // runningOnQEMU
>>>
>>>  void pci_config_writel(u16 bdf, u32 addr, u32 val)
>>>  {
>>> @@ -133,7 +134,7 @@ pci_probe_devices(void)
>>>                  if (bus != lastbus)
>>>                      rootbuses++;
>>>                  lastbus = bus;
>>> -                rootbus = rootbuses;
>>> +                rootbus = runningOnQEMU() ? bus : rootbuses;
>>>                  if (bus > MaxPCIBus)
>>>                      MaxPCIBus = bus;
>>>              } else {
> 
> If we went down this path, I hope we could agree on the same prefix
> and thus limit the runningOnQEMU() to just the second part.  Was there
> a concern with "/pci@i0cf8,%x/"?

I don't recall any specific concern, but if we want to present either
/pci@i0cf8,%x/, or the pattern SeaBIOS currently expects, then in QEMU
the same stuff has to be poked at anyway.

> 
>> (2) The QEMU command line and the effects the command line has on the
>> virtual hardware should not change. However, all of the following have
>> to be updated:
>>
>> - the "explicit_ofw_unit_address" property assignments in
>>   pxb_dev_initfn() have to be done separately, after all PXBs have been
>>   seen, and sorted. This probably requires another "machine init done"
>>   notifier.
> 
> I admit the sorting of pxb objects just to reverse engineer what
> SeaBIOS expects would not be fun.

Actually it's kinda fun. :)

> Doesn't QEMU have to sort the buses
> anyway to know which secondary bus ranges are associated with each
> root bus?

I don't think so.

OVMF does the same probing (in the same order) as SeaBIOS for the root
buses, and the intervals between each pair are handed to edk2's PCI bus
driver (which is independent of the OVMF platform code). This latter
driver performs the assignments / allocations from the allowed interval.

>> - the _UID assignments in build_ssdt() need to reflect the exact same
>>   values
>>
>> - OVMF's root bridge driver needs to generate the same _UID values in
>>   the PciRoot() device path nodes
>>
>> - OVMF's boot order library must consider the /pci-root@N/pci@i0cf8/...
>>   format, where the root bus is the N'th extra root bus (in hex
>>   notation).
>>
>> Basically, we need to keep the bus_nr=N user interface, and the effects
>> it has on the virtual hardware, intact, but translate the numbers that
>> are exposed via fw_cfg *and* ACPI (because those must match!) from
>> "identifier" to "serial number after sorting by identifier"; in practice
>> replicating the detection traversal that SeaBIOS does.
> 
> Why does fw_cfg and ACPI have to match?

You will like the explanation for that. I'm about to test a new QEMU
series, and I'll CC you on the relevant patches. The last patch in the
series answers this question.

(In a nutshell: OVMF translates OFW devpath fragments to UEFI devpath
fragments, for boot option matching. The UEFI devpath fragments in
question start with a PciRoot() ACPI node, and the _UID value in that
initial node must match what QEMU hands down and OVMF uses in
translation. Finally, the UEFI spec requires that such a _UID value, in
the UEFI devpath, actually identify the valid ACPI object -- and that
ACPI object comes from QEMU's generator.)

If everything goes well, we might not need to change SeaBIOS at all.
(... Hope dies last :))

Thanks
Laszlo

Michael S. Tsirkin June 14, 2015, 12:05 p.m. UTC | #23

On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
> > (2) The QEMU command line and the effects the command line has on the
> > virtual hardware should not change. However, all of the following have
> > to be updated:
> > 
> > - the "explicit_ofw_unit_address" property assignments in
> >   pxb_dev_initfn() have to be done separately, after all PXBs have been
> >   seen, and sorted. This probably requires another "machine init done"
> >   notifier.
> 
> I admit the sorting of pxb objects just to reverse engineer what
> SeaBIOS expects would not be fun.  Doesn't QEMU have to sort the buses
> anyway to know which secondary bus ranges are associated with each
> root bus?

I guess it can be done - though this means it will break if
we ever support hot-plugging of these roots.
But more importantly, if the sort is by the bus number,
then how is it better than just using the bus number directly?

Michael S. Tsirkin June 14, 2015, 12:10 p.m. UTC | #24

On Thu, Jun 11, 2015 at 12:48:22PM -0400, Kevin O'Connor wrote:
> On Thu, Jun 11, 2015 at 04:35:33PM +0200, Laszlo Ersek wrote:
> > On 06/11/15 15:58, Kevin O'Connor wrote:
> > > On Thu, Jun 11, 2015 at 04:37:08PM +0300, Marcel Apfelbaum wrote:
> > >> The fixes solves the following issue:
> > >> The PXB device exposes a new  pci root bridge with the
> > >> fw path:  /pci-root@4/..., in which 4 is the root bus number.
> > >> Before this patch the fw path was wrongly computed:
> > >>     /pci-root@1/pci@i0cf8/...
> > >> Fix the above issues: Correct the bus number and remove the
> > >> extra host bridge description.
> > > 
> > > Why is that wrong?  The previous path looks correct to me.
> > > 
> > >> The IEEE Std 1275-1994:
> > >>
> > >>   IEEE Standard for Boot (Initialization Configuration)
> > >>     Firmware: Core Requirements and Practices
> > >>       3.2.1.1 Node names
> > >>           Each node in the device tree is identified by a node name
> > >>           using the following notation:
> > >>               driver-name@unit-address:device-arguments
> > >>
> > >>           The driver name field is a sequence of between one and 31
> > >>           letters [...]. By convention, this name includes the name of
> > >>           the device’s manufacturer and the device’s model name separated by
> > >>           a “,”.
> > >>
> > >>           The unit address field is the text representation of the
> > >>           physical address of the device within the address space
> > >>           defined by its parent node. The form of the text
> > >>           representation is bus-dependent.
> > > 
> > > Note the "physical address" part in the above.  Your patch changes the
> > > "pci-root@" syntax to use a logical address instead of a physical
> > > address.  That is, unless I've missed something, SeaBIOS today uses a
> > > physical address (the n'th root bus) and the patch would change it to
> > > use a logical address.
> > > 
> > > One of the goals of using an "openfirmware" like address was so that
> > > they would be stable across boots (the same mechanism is also used
> > > with coreboot).  Using a physical address is key for this, because
> > > simply adding or removing a PCI device could cause the logical PCI
> > > bridge enumeration to change - and that would mess up the bootorder
> > > list if it was based on logical addresses.
> > 
> > There are two questions here. The first is the inclusion of the
> > "pci@i0cf8" node even if a "pci-root@x" node is present in front of it.
> > The hunk that changes that is not your main concern, right? (And Marcel
> > just described that hunk in more detail.)
> > 
> > The other question is how "x" is selected in "pci-root@x".
> > 
> > On the QEMU side, and in OVMF, "x" is keyed off of the bus_nr property.
> > If you change that property from (say) 3 to 4, then the device paths
> > exported by QEMU will change. However, the location (in the PCI
> > hierarchy) of all the affected devices will *also* change at once, and
> > their auto-enumerated, firmware-side device paths will reflect that.
> > Therefore the new "bootorder" fw_cfg entries will match the freshly
> > generated firmware-side device paths.
> > 
> > So why is this not stable? If you change the hardware without
> > automatically updating any stashed firmware-side device paths, then
> > things will fall apart without "bootorder" entries in the picture anyway.
> > 
> > Also, assuming you key off "x" of the running counter that counts root
> > buses as they are found during enumeration, that's a possibility too,
> > but I don't see how it gives more stability. If you insert a new root
> > bus (with a device on it) between to preexistent ones, that will offset
> > all the "x" values for the root buses that come after it by one.
> 
> The SeaBIOS code is used on both virtual machines and real machines.
> The bus number is something that is generated by software and it is
> not assured to be stable between boots.  (For example, if someone adds
> a PCI device to their machine between boots then every bus number in
> the system might be different on the next boot.)  The open firmware
> paths go to great length to avoid arbitrary bus numbers today - for
> example:
> 
> /pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
> 
> Given the complexity to avoid arbitrary bus numbers I'm confused why
> one would want to add them.

Could you give an example real-hardware path when there are multiple
roots though?
I'd like to make sure what qemu generates matches that.


> > In UEFI at least (I'm not speaking about OVMF in particular, but the
> > UEFI spec), there is a "short-form device path" concept for hard drive
> > and USB boot options. For hard disks, it is practically a relative
> > device path that lacks the path fragment from the root node until just
> > before the GPT partition identifier. The idea being, if you plug your
> > SCSI controller in another PCI slot, the change in the full device path
> > will be local to the path fragment that is not captured in the
> > (persistent) boot option. The GPT GUID can identify the partition
> > uniquely in the system wherever it exists, so it can be booted even
> > without fully enumerating all devices and reproducing all the default
> > boot options.
> > 
> > Short of such a "uniquely identifying relative devpath" trick, I don't
> > think stability in firmware-stashed (ie. not regenerated) device paths
> > exists in general, if the underlying hardware configuration is changed.
> 
> I'm not sure why you say that - it works just fine.  The open firmware
> device paths relate a physical path to the given hardware and as long
> as one doesn't alter that physical path it will be the same path on
> every boot.  (Specifically, one can add or remove unrelated PCI
> devices, USB devices, etc. without impacting the open firmware paths
> to devices not modified.)
> 
> > In summary: I think we could modify both QEMU and OVMF to use the
> > "serial numbers" of the extra PCI root buses, in increasing bus number
> > order, instead of their actual bus numbers, for identifying them. That's
> > just a convention. Then the second hunk of this patch would not be
> > necessary for SeaBIOS. But I think this convention would be only less
> > logical, and not more stable.
> > 
> > Can you please elaborate? I'm confused.
> 
> -Kevin

Kevin O'Connor June 14, 2015, 1:47 p.m. UTC | #25

On Sun, Jun 14, 2015 at 02:10:22PM +0200, Michael S. Tsirkin wrote:
> On Thu, Jun 11, 2015 at 12:48:22PM -0400, Kevin O'Connor wrote:
> > The SeaBIOS code is used on both virtual machines and real machines.
> > The bus number is something that is generated by software and it is
> > not assured to be stable between boots.  (For example, if someone adds
> > a PCI device to their machine between boots then every bus number in
> > the system might be different on the next boot.)  The open firmware
> > paths go to great length to avoid arbitrary bus numbers today - for
> > example:
> > 
> > /pci@i0cf8/pci-bridge@1/usb@1,2/hub@3/storage@1/channel@0/disk@0,0
> > 
> > Given the complexity to avoid arbitrary bus numbers I'm confused why
> > one would want to add them.
> 
> Could you give an example real-hardware path when there are multiple
> roots though?
> I'd like to make sure what qemu generates matches that.

I don't have the hardware, but I've asked a user that does to send in
a log.

Here's a real world example of a search path that is generated today
for bus 0:

01.249: Searching bootorder for: /pci@i0cf8/*@11/drive@0/disk@0

Here's what SeaBIOS is coded to produce for a similar device on the
first extra pci root bus instead:

01.249: Searching bootorder for: /pci-root@1/pci@i0cf8/*@11/drive@0/disk@0

Placing "pci@i0cf8" after "pci-root@1" is admittedly hokey, and I
don't have any issue with changing it.

-Kevin

Kevin O'Connor June 14, 2015, 2:50 p.m. UTC | #26

On Sun, Jun 14, 2015 at 02:05:52PM +0200, Michael S. Tsirkin wrote:
> On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
> > > (2) The QEMU command line and the effects the command line has on the
> > > virtual hardware should not change. However, all of the following have
> > > to be updated:
> > > 
> > > - the "explicit_ofw_unit_address" property assignments in
> > >   pxb_dev_initfn() have to be done separately, after all PXBs have been
> > >   seen, and sorted. This probably requires another "machine init done"
> > >   notifier.
> > 
> > I admit the sorting of pxb objects just to reverse engineer what
> > SeaBIOS expects would not be fun.  Doesn't QEMU have to sort the buses
> > anyway to know which secondary bus ranges are associated with each
> > root bus?
> 
> I guess it can be done - though this means it will break if
> we ever support hot-plugging of these roots.

As I understand it, the use case for multiple PCI roots is large
servers that process a lot of IO.  For example, if you have a server
with two 8-core cpu chips and two 40 gig ethernet cards, it might be a
significant performance boost if the ethernet traffic from one card is
on a different bus from the other card.  These systems (at least in
the AMD case) have multiple southbridge chips on the motherboard that
each implement a PCI bus and each southbridge chip is associated with
a particular CPU.  This allows (in theory) a particular CPU to process
the IO from it's associated PCI bus without IO contention.  I'm not an
expert on this, but that's my general understanding.

I'm not aware of real world hardware with hot-plugable root buses.
Should it come about then some kind of OS visible spec would be needed
for the OS to identify and enumerate newly added buses, and I suppose
we could figure out how to handle it once that type of thing happens.

> But more importantly, if the sort is by the bus number,
> then how is it better than just using the bus number directly?

Coreboot supports real machines with multiple southbridge chips.  The
number of southbridge chips on the motherboard is static, but the bus
id associated with them is not.  So, SeaBIOS attempts to give a unique
id to the southbridge chip that doesn't rely on the bus id (it's the
N'th chip instead of the chip's current bus id).

I asked the coreboot developers about this again and they reiterated
that bus id is dynamic and it could change between boots.

The SeaBIOS scheme is not perfect of course - for example nothing
technically stops coreboot from assigning the bus ids in a different
order from one boot to the next and if a bus doesn't show any devices
on it at all then it would skew the ordering.  Neither happens in
practice today.

All of the above aside, I'm confused why one would want to add the
logical bus-id to the open firmware style physical topology.  We know
the bus-id is a logical value in the general case - I find it
confusing to use it in a description of physical topology.  To wit,
what happens if OSes learn how to alter the bus id of extra root buses
(eg, to support hot plugging them) - then we'd have built an interface
that's not stable even within a single boot.

-Kevin

Michael S. Tsirkin June 14, 2015, 6:06 p.m. UTC | #27

On Sun, Jun 14, 2015 at 10:50:22AM -0400, Kevin O'Connor wrote:
> On Sun, Jun 14, 2015 at 02:05:52PM +0200, Michael S. Tsirkin wrote:
> > On Fri, Jun 12, 2015 at 02:40:10PM -0400, Kevin O'Connor wrote:
> > > > (2) The QEMU command line and the effects the command line has on the
> > > > virtual hardware should not change. However, all of the following have
> > > > to be updated:
> > > > 
> > > > - the "explicit_ofw_unit_address" property assignments in
> > > >   pxb_dev_initfn() have to be done separately, after all PXBs have been
> > > >   seen, and sorted. This probably requires another "machine init done"
> > > >   notifier.
> > > 
> > > I admit the sorting of pxb objects just to reverse engineer what
> > > SeaBIOS expects would not be fun.  Doesn't QEMU have to sort the buses
> > > anyway to know which secondary bus ranges are associated with each
> > > root bus?
> > 
> > I guess it can be done - though this means it will break if
> > we ever support hot-plugging of these roots.
> 
> As I understand it, the use case for multiple PCI roots is large
> servers that process a lot of IO.

For better or worse, guest OS-es assume that numa locality
can only be specified for PCI roots.

So the use case is to specify numa locality for virtual
devices.


> For example, if you have a server
> with two 8-core cpu chips and two 40 gig ethernet cards, it might be a
> significant performance boost if the ethernet traffic from one card is
> on a different bus from the other card.  These systems (at least in
> the AMD case) have multiple southbridge chips on the motherboard that
> each implement a PCI bus and each southbridge chip is associated with
> a particular CPU.  This allows (in theory) a particular CPU to process
> the IO from it's associated PCI bus without IO contention.  I'm not an
> expert on this, but that's my general understanding.
> 
> I'm not aware of real world hardware with hot-plugable root buses.
> Should it come about then some kind of OS visible spec would be needed
> for the OS to identify and enumerate newly added buses, and I suppose
> we could figure out how to handle it once that type of thing happens.
> 
> > But more importantly, if the sort is by the bus number,
> > then how is it better than just using the bus number directly?
> 
> Coreboot supports real machines with multiple southbridge chips.  The
> number of southbridge chips on the motherboard is static, but the bus
> id associated with them is not.  So, SeaBIOS attempts to give a unique
> id to the southbridge chip that doesn't rely on the bus id (it's the
> N'th chip instead of the chip's current bus id).
> 
> I asked the coreboot developers about this again and they reiterated
> that bus id is dynamic and it could change between boots.
> 
> The SeaBIOS scheme is not perfect of course - for example nothing
> technically stops coreboot from assigning the bus ids in a different
> order from one boot to the next and if a bus doesn't show any devices
> on it at all then it would skew the ordering.  Neither happens in
> practice today.
> 
> All of the above aside, I'm confused why one would want to add the
> logical bus-id to the open firmware style physical topology.  We know
> the bus-id is a logical value in the general case - I find it
> confusing to use it in a description of physical topology.  To wit,
> what happens if OSes learn how to alter the bus id of extra root buses
> (eg, to support hot plugging them) - then we'd have built an interface
> that's not stable even within a single boot.
> 
> -Kevin

To summarise, you feel that modifying bus id without reordering
bus ids between roots is likely, modifications that would
cause reordering are unlikely, thus counting bus ids
in order gives a stable index. Is that right?

To be on the safe side, it would be nice to have bios skip some
fields/properties when parsing paths, so that if we want to use another
id in the future, we can supply both id types.  I haven't looked at the
parsing code - maybe it does this already?

Kevin O'Connor June 14, 2015, 6:21 p.m. UTC | #28

On Sun, Jun 14, 2015 at 08:06:22PM +0200, Michael S. Tsirkin wrote:
> To summarise, you feel that modifying bus id without reordering
> bus ids between roots is likely, modifications that would
> cause reordering are unlikely, thus counting bus ids
> in order gives a stable index. Is that right?

Yes.

> To be on the safe side, it would be nice to have bios skip some
> fields/properties when parsing paths, so that if we want to use another
> id in the future, we can supply both id types.  I haven't looked at the
> parsing code - maybe it does this already?

SeaBIOS already does that.  (SeaBIOS doesn't parse the bootorder file
- it generates a "glob" like pattern for each device and then sees if
that pattern matches a line in the bootorder file.)

Also, I don't have a strong objection to Laszlo's SeaBIOS patch (the
one that does the runningOnQEMU() check).  (I still think it's quirky
to use bus-id in the file, but it's not a show stopper if it's just
for QEMU.)

-Kevin

Benjamin Herrenschmidt June 14, 2015, 9:39 p.m. UTC | #29

On Sun, 2015-06-14 at 20:06 +0200, Michael S. Tsirkin wrote:

> > As I understand it, the use case for multiple PCI roots is large
> > servers that process a lot of IO.
> 
> For better or worse, guest OS-es assume that numa locality
> can only be specified for PCI roots.
> 
> So the use case is to specify numa locality for virtual
> devices.

In addition, I'd add that on ppc "pseries", the standard way of
hotplugging devices is to hotplug PCI roots (ie, a virtual host bridge
with the new device(s) below it).  

> > I'm not aware of real world hardware with hot-plugable root buses.
> > Should it come about then some kind of OS visible spec would be needed
> > for the OS to identify and enumerate newly added buses, and I suppose
> > we could figure out how to handle it once that type of thing happens.

On IBM ppc systems, both exist. IE, real HW hot pluggable roots (on
older systems mostly, GX based drawers with PCI host bridges in them
nowadays, we tend to use PCIe cable card based drawers), and in
virtualized systems, hot plugging root bridges is the standard way
PowerVM (aka pHyp) uses for hotplug which we need to support in
qemu/pseries at some point.

> > > But more importantly, if the sort is by the bus number,
> > > then how is it better than just using the bus number directly?

PCI Bus number makes no sense. Any root can have the whole range of bus
numbers 0...255 and the bus number assignment is under control of the
guest anyway. Or are you talking about a different bus number (somewhat
picked up the conversation half way...)

> To summarise, you feel that modifying bus id without reordering
> bus ids between roots is likely, modifications that would
> cause reordering are unlikely, thus counting bus ids
> in order gives a stable index. Is that right?
> 
> To be on the safe side, it would be nice to have bios skip some
> fields/properties when parsing paths, so that if we want to use another
> id in the future, we can supply both id types.  I haven't looked at the
> parsing code - maybe it does this already?

Ben.

Kevin O'Connor June 14, 2015, 9:59 p.m. UTC | #30

On Mon, Jun 15, 2015 at 07:39:18AM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2015-06-14 at 20:06 +0200, Michael S. Tsirkin wrote:
> 
> > > As I understand it, the use case for multiple PCI roots is large
> > > servers that process a lot of IO.
> > 
> > For better or worse, guest OS-es assume that numa locality
> > can only be specified for PCI roots.
> > 
> > So the use case is to specify numa locality for virtual
> > devices.
> 
> In addition, I'd add that on ppc "pseries", the standard way of
> hotplugging devices is to hotplug PCI roots (ie, a virtual host bridge
> with the new device(s) below it).  
> 
> > > I'm not aware of real world hardware with hot-plugable root buses.
> > > Should it come about then some kind of OS visible spec would be needed
> > > for the OS to identify and enumerate newly added buses, and I suppose
> > > we could figure out how to handle it once that type of thing happens.
> 
> On IBM ppc systems, both exist. IE, real HW hot pluggable roots (on
> older systems mostly, GX based drawers with PCI host bridges in them
> nowadays, we tend to use PCIe cable card based drawers), and in
> virtualized systems, hot plugging root bridges is the standard way
> PowerVM (aka pHyp) uses for hotplug which we need to support in
> qemu/pseries at some point.
> 
> > > > But more importantly, if the sort is by the bus number,
> > > > then how is it better than just using the bus number directly?
> 
> PCI Bus number makes no sense. Any root can have the whole range of bus
> numbers 0...255 and the bus number assignment is under control of the
> guest anyway. Or are you talking about a different bus number (somewhat
> picked up the conversation half way...)

There are x86 systems with multiple separate PCI root buses where one
can access the pci config space of all the buses using the same 0x0cf8
IO space.  During system setup, the multiple PCI root buses are each
configured to only respond to PCI config accesses within its range of
bus numbers.  So if "root1" is configured for bus ids between 64-128,
then it will only forward the request if the bus id in the request is
between 64-128.

I suspect in your PPC example that the separate root buses all had
separate io/memory space as well and thus were completely separate.
(That is, they don't share the equivalent of IO 0x0cf8.)  If so,
that's different from how the x86 qemu code and the x86 systems I was
discussing above work.

-Kevin

Benjamin Herrenschmidt June 15, 2015, 2:50 a.m. UTC | #31

On Sun, 2015-06-14 at 17:59 -0400, Kevin O'Connor wrote:
> There are x86 systems with multiple separate PCI root buses where one
> can access the pci config space of all the buses using the same 0x0cf8
> IO space.  During system setup, the multiple PCI root buses are each
> configured to only respond to PCI config accesses within its range of
> bus numbers.  So if "root1" is configured for bus ids between 64-128,
> then it will only forward the request if the bus id in the request is
> between 64-128.
> 
> I suspect in your PPC example that the separate root buses all had
> separate io/memory space as well and thus were completely separate.
> (That is, they don't share the equivalent of IO 0x0cf8.)  If so,
> that's different from how the x86 qemu code and the x86 systems I was
> discussing above work.

Correct, my point is that qemu shouldn't be made to rely on the stable
bus numbers. Why not use the mmconfig address instead ? That way you can
factor the bus number in via an offset if it's relevant or provide a
completely different address if the busses are separate.

Ben.

Gerd Hoffmann June 15, 2015, 6:01 a.m. UTC | #32

On Fr, 2015-06-12 at 09:23 -0400, Kevin O'Connor wrote:
> On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
> > On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
> > >>On each boot, coreboot might decide to assign a different bus id to
> > >>the extra roots (for example, if a device with a PCI bridge is
> > >>inserted and it's bus allocation causes bus ids to shift).
> > >>Technically, coreboot could even change the order extra buses are
> > >>assigned bus ids, but doesn't today.
> > >>
> > >>This was seen on several AMD systems - I'm told at least some Intel
> > >>systems have multiple root buses, but the bus numbers are just hard
> > >>wired.
> > >
> > >This is how the qemu pxb works: root bus numbers are a config option for
> > >the root bridge device, i.e. from the guest point of view they are
> > >hard-wired.
> > Exactly. In our case, the HW assigns the PXB bus bumber, and again,
> > I saw this also on real HW with multiple buses, the bus nr comes
> > from ACPI, meaning the vendor.
> 
> I'm confused where ACPI comes into this.  In all cases I know of, the
> firmware generates the ACPI tables to match the hardware.  I've never
> heard of hardware configuring itself from the ACPI tables.

We have basically the same model in qemu, except that it isn't the
firmware but qemu generating the tables (and qemu looks at the registers
programmed by the firmware to make sure things match).

The pxb has no registers to program, the hardware just shows up on a bus
number (qemu cfg, hard-wired for the guest).  ACPI must specify it so
the guest OS finds it.  When passing bus numbers via fw_cfg the must
match acpi of course.

I'm wondering whenever things become easier if we add config registers
to the pxb, where the firmware can program the bus number range and we
can use the config register base as a way to specify which pxb we are
referring to ?

cheers,
  Gerd

Gerd Hoffmann June 15, 2015, 6:50 a.m. UTC | #33

Hi,

> I'm wondering whenever things become easier if we add config registers
> to the pxb, where the firmware can program the bus number range and we
> can use the config register base as a way to specify which pxb we are
> referring to ?

... and, while thinking about ben's reply elsewhere in this thread,
maybe even decouple the whole thing from the primary root bus?  You
can't program the devices via 0x0cf8 then, but we could add a mmconfig
bar to the host bridge device ...

cheers,
  Gerd

Michael S. Tsirkin June 15, 2015, 8:22 a.m. UTC | #34

On Mon, Jun 15, 2015 at 12:50:08PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2015-06-14 at 17:59 -0400, Kevin O'Connor wrote:
> > There are x86 systems with multiple separate PCI root buses where one
> > can access the pci config space of all the buses using the same 0x0cf8
> > IO space.  During system setup, the multiple PCI root buses are each
> > configured to only respond to PCI config accesses within its range of
> > bus numbers.  So if "root1" is configured for bus ids between 64-128,
> > then it will only forward the request if the bus id in the request is
> > between 64-128.
> > 
> > I suspect in your PPC example that the separate root buses all had
> > separate io/memory space as well and thus were completely separate.
> > (That is, they don't share the equivalent of IO 0x0cf8.)  If so,
> > that's different from how the x86 qemu code and the x86 systems I was
> > discussing above work.
> 
> Correct, my point is that qemu shouldn't be made to rely on the stable
> bus numbers. Why not use the mmconfig address instead ?

These are traditional pci buses, so they don't have an mmconfig.

> That way you can
> factor the bus number in via an offset if it's relevant or provide a
> completely different address if the busses are separate.
> 
> Ben.
>

Marcel Apfelbaum June 15, 2015, 9:02 a.m. UTC | #35

On 06/15/2015 09:50 AM, Gerd Hoffmann wrote:
>    Hi,
>
>> I'm wondering whenever things become easier if we add config registers
>> to the pxb, where the firmware can program the bus number range and we
>> can use the config register base as a way to specify which pxb we are
>> referring to ?
>
> ... and, while thinking about ben's reply elsewhere in this thread,
> maybe even decouple the whole thing from the primary root bus?  You
> can't program the devices via 0x0cf8 then, but we could add a mmconfig
> bar to the host bridge device ...

Hi Gerd,

I thought about it of course, but it seemed to me to be over-kill and
no real reason to do it. I would hove done it if:
- there would be a "generic" spec for such a host-bridge specifying
   at least the registers for the bus number. I didn't find anything =>
   => Seabios should be aware of a special QEMU device and look for it... ugly
- I would have seen how Seabios/coreboot program the above bus number.
   Again, didn't find the host-bridge programming code.
At last, I followed a 'real' PXB device, some old Intel snooping host bridge.

As always, I am open to ideas, but, the latest patches from Laszlo follows
Seabios way, and if OVMF can also handle it maybe we can stop:
"The use of logical bus number instead of bus index" discussion and
start arguing on something else. :)

I just want to say thank you to everybody involved,
a lot of information and good guidelines surfaced during this talk.

Until the next patch...
Thanks,
Marcel

>
> cheers,
>    Gerd
>
>

Michael S. Tsirkin June 15, 2015, 9:43 a.m. UTC | #36

On Mon, Jun 15, 2015 at 08:01:08AM +0200, Gerd Hoffmann wrote:
> On Fr, 2015-06-12 at 09:23 -0400, Kevin O'Connor wrote:
> > On Fri, Jun 12, 2015 at 03:17:27PM +0300, Marcel Apfelbaum wrote:
> > > On 06/12/2015 09:00 AM, Gerd Hoffmann wrote:
> > > >>On each boot, coreboot might decide to assign a different bus id to
> > > >>the extra roots (for example, if a device with a PCI bridge is
> > > >>inserted and it's bus allocation causes bus ids to shift).
> > > >>Technically, coreboot could even change the order extra buses are
> > > >>assigned bus ids, but doesn't today.
> > > >>
> > > >>This was seen on several AMD systems - I'm told at least some Intel
> > > >>systems have multiple root buses, but the bus numbers are just hard
> > > >>wired.
> > > >
> > > >This is how the qemu pxb works: root bus numbers are a config option for
> > > >the root bridge device, i.e. from the guest point of view they are
> > > >hard-wired.
> > > Exactly. In our case, the HW assigns the PXB bus bumber, and again,
> > > I saw this also on real HW with multiple buses, the bus nr comes
> > > from ACPI, meaning the vendor.
> > 
> > I'm confused where ACPI comes into this.  In all cases I know of, the
> > firmware generates the ACPI tables to match the hardware.  I've never
> > heard of hardware configuring itself from the ACPI tables.
> 
> We have basically the same model in qemu, except that it isn't the
> firmware but qemu generating the tables (and qemu looks at the registers
> programmed by the firmware to make sure things match).
> 
> The pxb has no registers to program, the hardware just shows up on a bus
> number (qemu cfg, hard-wired for the guest).  ACPI must specify it so
> the guest OS finds it.  When passing bus numbers via fw_cfg the must
> match acpi of course.
> 
> I'm wondering whenever things become easier if we add config registers
> to the pxb, where the firmware can program the bus number range and we
> can use the config register base as a way to specify which pxb we are
> referring to ?
> 
> cheers,
>   Gerd
> 

But then we'll need a bunch of fw cfg entries to let guest
discover the extra roots and their bus ranges.

Gerd Hoffmann June 15, 2015, 10:18 a.m. UTC | #37

Hi,

> > I'm wondering whenever things become easier if we add config registers
> > to the pxb, where the firmware can program the bus number range and we
> > can use the config register base as a way to specify which pxb we are
> > referring to ?

> But then we'll need a bunch of fw cfg entries to let guest
> discover the extra roots and their bus ranges.

We could add them to the pxb host bridge device (1b36:0009).  Then we
don't need any fw_cfg stuff, seabios could simply lookup/setup things in
the 1b36:0009 pci config space ...

cheers,
  Gerd

Michael S. Tsirkin June 15, 2015, 10:26 a.m. UTC | #38

On Mon, Jun 15, 2015 at 12:18:16PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > > I'm wondering whenever things become easier if we add config registers
> > > to the pxb, where the firmware can program the bus number range and we
> > > can use the config register base as a way to specify which pxb we are
> > > referring to ?
> 
> > But then we'll need a bunch of fw cfg entries to let guest
> > discover the extra roots and their bus ranges.
> 
> We could add them to the pxb host bridge device (1b36:0009).  Then we
> don't need any fw_cfg stuff, seabios could simply lookup/setup things in
> the 1b36:0009 pci config space ...
> 
> cheers,
>   Gerd

Sure but then it's not a fixed address so you can't use these to name
boot devices.

[V2] pci: fixes to allow booting from extra root pci buses.

Commit Message

Comments

Patch