diff mbox

Re: [PATCH] PCI: Bus number from the bridge, not the device

Message ID 20101108162633.GA7962@redhat.com
State New
Headers show

Commit Message

Michael S. Tsirkin Nov. 8, 2010, 4:26 p.m. UTC
On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > pcibus_dev_print() was erroneously retrieving the device bus
> > > number from the secondary bus number offset of the device
> > > instead of the bridge above the device.  This ends of landing
> > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > ramblock naming, so changing it can effect migration.  However,
> > > I've only seen this byte be non-zero for an assigned device,
> > > which can't migrate anyway, so hopefully we won't run into
> > > any issues.
> > > 
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > 
> > Good catch. Applied.
> > I don't really see why do we put the dev path
> > in the bus object: why not let device supply its name?
> 
> Because the device name is not unique.  This came about from the
> discussion about how to create a canonical device path that Gleb and
> Markus are again trying to hash out.  If we go up to the bus and get the
> bus address, we have a VM unique name.  Unfortunately, it's difficult to
> define what the bus should print in all cases (ISA), but since they
> don't do hotplug and typically don't allocate ramblocks, we can mostly
> ignore it for this use case.
> 
> > And I think this will affect nested bridges. However they are currently
> > broken anyway: we really must convert to topological names as bus number
> > is guest-assigned - they don't have to be unique, even.
> 
> Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> unique?

Bus numbers for nested bridges are guest assigned. We start with 0 after reset.

> > What does fixing this involve? Just changing pcibus_get_dev_path?
> 
> How do you plan to fix it?  Don't forget that migration depends on these
> names, so some kind of compatibility layer would be required.  Thanks,
> 
> Alex

Replace bus number with slot numbers of parent bridges up to the root.
This works for root bridge in a compatible way because bus number there
is hard-coded to 0.
IMO nested bridges are broken anyway, no way to be compatible there.


Gleb, Markus, I think the following should be sufficient for PCI.  What
do you think?  Also - do we need to update QMP/monitor to teach them to
work with these paths?

This is on top of Alex's patch, completely untested.


pci: fix device path for devices behind nested bridges

We were using bus number in the device path, which is clearly
broken as this number is guest-assigned for all devices
except the root.

Fix by using hierarchical list of slots, walking the path
from root down to device, instead. Add :00 as bus number
so that if there are no nested bridges, this is compatible
with what we have now.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Comments

Alex Williamson Nov. 8, 2010, 4:36 p.m. UTC | #1
On Mon, 2010-11-08 at 18:26 +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > number from the secondary bus number offset of the device
> > > > instead of the bridge above the device.  This ends of landing
> > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > ramblock naming, so changing it can effect migration.  However,
> > > > I've only seen this byte be non-zero for an assigned device,
> > > > which can't migrate anyway, so hopefully we won't run into
> > > > any issues.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > > Good catch. Applied.
> > > I don't really see why do we put the dev path
> > > in the bus object: why not let device supply its name?
> > 
> > Because the device name is not unique.  This came about from the
> > discussion about how to create a canonical device path that Gleb and
> > Markus are again trying to hash out.  If we go up to the bus and get the
> > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > define what the bus should print in all cases (ISA), but since they
> > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > ignore it for this use case.
> > 
> > > And I think this will affect nested bridges. However they are currently
> > > broken anyway: we really must convert to topological names as bus number
> > > is guest-assigned - they don't have to be unique, even.
> > 
> > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > unique?
> 
> Bus numbers for nested bridges are guest assigned. We start with 0 after reset.

Right, invalid bus numbers are not unique.

> > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > 
> > How do you plan to fix it?  Don't forget that migration depends on these
> > names, so some kind of compatibility layer would be required.  Thanks,
> > 
> > Alex
> 
> Replace bus number with slot numbers of parent bridges up to the root.
> This works for root bridge in a compatible way because bus number there
> is hard-coded to 0.
> IMO nested bridges are broken anyway, no way to be compatible there.
> 
> 
> Gleb, Markus, I think the following should be sufficient for PCI.  What
> do you think?  Also - do we need to update QMP/monitor to teach them to
> work with these paths?
> 
> This is on top of Alex's patch, completely untested.

This function was originally intended to be recursive, so that each bus
could call the parent until we hit a canonical name.  It just so happens
that the root bus provides that for non-nested devices.  So I think
instead we should have a different function for bridges that appends to
what the root get_dev_path returns.

Alex

> pci: fix device path for devices behind nested bridges
> 
> We were using bus number in the device path, which is clearly
> broken as this number is guest-assigned for all devices
> except the root.
> 
> Fix by using hierarchical list of slots, walking the path
> from root down to device, instead. Add :00 as bus number
> so that if there are no nested bridges, this is compatible
> with what we have now.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 7d12473..fa98d94 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
>  
>  static char *pcibus_get_dev_path(DeviceState *dev)
>  {
> -    PCIDevice *d = (PCIDevice *)dev;
> -    char path[16];
> -
> -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> -
> -    return strdup(path);
> +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> +    PCIDevice *t;
> +    int slot_depth;
> +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> +     * 00 is added here to make this format compatible with
> +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> +     * Slot list specifies the slot numbers for all devices on the
> +     * path from root to the specific device. */
> +    int domain_len = strlen("DDDD:00");
> +    int func_len = strlen(".F");
> +    int slot_len = strlen(":SS");
> +    int path_len;
> +    char *path, *p;
> +
> +    /* Calculate # of slots on path between device and root. */;
> +    slot_depth = 0;
> +    for (t = d; t; t = t->bus->parent_dev)
> +        ++slot_depth;
> +
> +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> +
> +    /* Allocate memory, fill in the terminating null byte. */
> +    path = malloc(path_len + 1 /* For '\0' */);
> +    path[path_len] = '\0';
> +
> +    /* First field is the domain. */
> +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> +
> +    /* Leave space for slot numbers and fill in function number. */
> +    p = path + domain_len + slot_len * slot_depth;
> +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> +
> +    /* Fill in slot numbers. We walk up from device to root, so need to print
> +     * them in the reverse order, last to first. */
> +    for (t = d; t; t = t->bus->parent_dev) {
> +        p -= slot_len;
> +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> +    }
> +
> +    return path;
>  }
>
Michael S. Tsirkin Nov. 8, 2010, 4:48 p.m. UTC | #2
On Mon, Nov 08, 2010 at 09:36:59AM -0700, Alex Williamson wrote:
> On Mon, 2010-11-08 at 18:26 +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > number from the secondary bus number offset of the device
> > > > > instead of the bridge above the device.  This ends of landing
> > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > any issues.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > Good catch. Applied.
> > > > I don't really see why do we put the dev path
> > > > in the bus object: why not let device supply its name?
> > > 
> > > Because the device name is not unique.  This came about from the
> > > discussion about how to create a canonical device path that Gleb and
> > > Markus are again trying to hash out.  If we go up to the bus and get the
> > > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > > define what the bus should print in all cases (ISA), but since they
> > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > ignore it for this use case.
> > > 
> > > > And I think this will affect nested bridges. However they are currently
> > > > broken anyway: we really must convert to topological names as bus number
> > > > is guest-assigned - they don't have to be unique, even.
> > > 
> > > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > > unique?
> > 
> > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> 
> Right, invalid bus numbers are not unique.

Well, you can call them invalid, but this is the only
number there is until guest assigns something :)

> > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > 
> > > How do you plan to fix it?  Don't forget that migration depends on these
> > > names, so some kind of compatibility layer would be required.  Thanks,
> > > 
> > > Alex
> > 
> > Replace bus number with slot numbers of parent bridges up to the root.
> > This works for root bridge in a compatible way because bus number there
> > is hard-coded to 0.
> > IMO nested bridges are broken anyway, no way to be compatible there.
> > 
> > 
> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > work with these paths?
> > 
> > This is on top of Alex's patch, completely untested.
> 
> This function was originally intended to be recursive, so that each bus
> could call the parent until we hit a canonical name.  It just so happens
> that the root bus provides that for non-nested devices.  So I think
> instead we should have a different function for bridges that appends to
> what the root get_dev_path returns.
> 
> Alex

This won't be compatible: the format that we have is
DDDD:BB:SS.F
which means that string concatenation can not get you from bus path
to device path (remember that bridge is a PCI device too).

And frankly if we can make it non-recursive I'd rather have it
that way. We are not solving the tower of hanoi here.
Gleb Natapov Nov. 8, 2010, 5 p.m. UTC | #3
On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > number from the secondary bus number offset of the device
> > > > instead of the bridge above the device.  This ends of landing
> > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > ramblock naming, so changing it can effect migration.  However,
> > > > I've only seen this byte be non-zero for an assigned device,
> > > > which can't migrate anyway, so hopefully we won't run into
> > > > any issues.
> > > > 
> > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > 
> > > Good catch. Applied.
> > > I don't really see why do we put the dev path
> > > in the bus object: why not let device supply its name?
> > 
> > Because the device name is not unique.  This came about from the
> > discussion about how to create a canonical device path that Gleb and
> > Markus are again trying to hash out.  If we go up to the bus and get the
> > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > define what the bus should print in all cases (ISA), but since they
> > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > ignore it for this use case.
> > 
> > > And I think this will affect nested bridges. However they are currently
> > > broken anyway: we really must convert to topological names as bus number
> > > is guest-assigned - they don't have to be unique, even.
> > 
> > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > unique?
> 
> Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> 
> > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > 
> > How do you plan to fix it?  Don't forget that migration depends on these
> > names, so some kind of compatibility layer would be required.  Thanks,
> > 
> > Alex
> 
> Replace bus number with slot numbers of parent bridges up to the root.
> This works for root bridge in a compatible way because bus number there
> is hard-coded to 0.
> IMO nested bridges are broken anyway, no way to be compatible there.
> 
> 
> Gleb, Markus, I think the following should be sufficient for PCI.  What
> do you think?  Also - do we need to update QMP/monitor to teach them to
> work with these paths?
> 

I am no longer use bus's get_dev_path callback for my purpose (I added
another one get_fw_dev_path) since it is used for migration it should
be done for all buses to work properly. And by properly I mean produce
full path from system root to the device itself recursively. But what I
learned is that by changing get_dev_path output you will break migration
from older guest to newer once (something we have to support). And of
course, as Alex said already, you need to traverse bridges recursively and
domain does not provide any meaningful information.

> This is on top of Alex's patch, completely untested.
> 
> 
> pci: fix device path for devices behind nested bridges
> 
> We were using bus number in the device path, which is clearly
> broken as this number is guest-assigned for all devices
> except the root.
> 
> Fix by using hierarchical list of slots, walking the path
> from root down to device, instead. Add :00 as bus number
> so that if there are no nested bridges, this is compatible
> with what we have now.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 7d12473..fa98d94 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
>  
>  static char *pcibus_get_dev_path(DeviceState *dev)
>  {
> -    PCIDevice *d = (PCIDevice *)dev;
> -    char path[16];
> -
> -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> -
> -    return strdup(path);
> +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> +    PCIDevice *t;
> +    int slot_depth;
> +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> +     * 00 is added here to make this format compatible with
> +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> +     * Slot list specifies the slot numbers for all devices on the
> +     * path from root to the specific device. */
> +    int domain_len = strlen("DDDD:00");
> +    int func_len = strlen(".F");
> +    int slot_len = strlen(":SS");
> +    int path_len;
> +    char *path, *p;
> +
> +    /* Calculate # of slots on path between device and root. */;
> +    slot_depth = 0;
> +    for (t = d; t; t = t->bus->parent_dev)
> +        ++slot_depth;
> +
> +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> +
> +    /* Allocate memory, fill in the terminating null byte. */
> +    path = malloc(path_len + 1 /* For '\0' */);
> +    path[path_len] = '\0';
> +
> +    /* First field is the domain. */
> +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> +
> +    /* Leave space for slot numbers and fill in function number. */
> +    p = path + domain_len + slot_len * slot_depth;
> +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> +
> +    /* Fill in slot numbers. We walk up from device to root, so need to print
> +     * them in the reverse order, last to first. */
> +    for (t = d; t; t = t->bus->parent_dev) {
> +        p -= slot_len;
> +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> +    }
> +
> +    return path;
>  }
>  

--
			Gleb.
Michael S. Tsirkin Nov. 8, 2010, 5:08 p.m. UTC | #4
On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote:
> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > number from the secondary bus number offset of the device
> > > > > instead of the bridge above the device.  This ends of landing
> > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > any issues.
> > > > > 
> > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > 
> > > > Good catch. Applied.
> > > > I don't really see why do we put the dev path
> > > > in the bus object: why not let device supply its name?
> > > 
> > > Because the device name is not unique.  This came about from the
> > > discussion about how to create a canonical device path that Gleb and
> > > Markus are again trying to hash out.  If we go up to the bus and get the
> > > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > > define what the bus should print in all cases (ISA), but since they
> > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > ignore it for this use case.
> > > 
> > > > And I think this will affect nested bridges. However they are currently
> > > > broken anyway: we really must convert to topological names as bus number
> > > > is guest-assigned - they don't have to be unique, even.
> > > 
> > > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > > unique?
> > 
> > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> > 
> > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > 
> > > How do you plan to fix it?  Don't forget that migration depends on these
> > > names, so some kind of compatibility layer would be required.  Thanks,
> > > 
> > > Alex
> > 
> > Replace bus number with slot numbers of parent bridges up to the root.
> > This works for root bridge in a compatible way because bus number there
> > is hard-coded to 0.
> > IMO nested bridges are broken anyway, no way to be compatible there.
> > 
> > 
> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > work with these paths?
> > 
> 
> I am no longer use bus's get_dev_path callback for my purpose (I added
> another one get_fw_dev_path)

Why? Because get_dev_path is unstable? to avoid changing for migration?
So with this patch, you get to use it again.
IMO two ways to address devices is a bad idea.

> since it is used for migration

what is used for migration?

> it should
> be done for all buses to work properly. And by properly I mean produce
> full path from system root to the device itself recursively.

This is what the code does. Recursion is not needed here though.

> But what I
> learned is that by changing get_dev_path output you will break migration
> from older guest to newer once (something we have to support).

Well I think migration for sytems with nested buses are broken anyway:
it's just a new feature where we simply did not figure out the migration
andgle before the last release.
Without nesting my code returns exactly the existing output
so no, it's not broken.

> And of
> course, as Alex said already, you need to traverse bridges recursively

Well this is an implementation detail.  loop is functionally equivalent,
only cleaner and easier to understand.

> and
> domain does not provide any meaningful information.

I dont understand yet, sorry. Code I posted returns exactly
what's there today. If domain does not provide any meaningful
information, How do things work then? And how do you address
roots on a system with multiple roots?

> > This is on top of Alex's patch, completely untested.
> > 
> > 
> > pci: fix device path for devices behind nested bridges
> > 
> > We were using bus number in the device path, which is clearly
> > broken as this number is guest-assigned for all devices
> > except the root.
> > 
> > Fix by using hierarchical list of slots, walking the path
> > from root down to device, instead. Add :00 as bus number
> > so that if there are no nested bridges, this is compatible
> > with what we have now.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 7d12473..fa98d94 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> >  
> >  static char *pcibus_get_dev_path(DeviceState *dev)
> >  {
> > -    PCIDevice *d = (PCIDevice *)dev;
> > -    char path[16];
> > -
> > -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > -
> > -    return strdup(path);
> > +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > +    PCIDevice *t;
> > +    int slot_depth;
> > +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > +     * 00 is added here to make this format compatible with
> > +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > +     * Slot list specifies the slot numbers for all devices on the
> > +     * path from root to the specific device. */
> > +    int domain_len = strlen("DDDD:00");
> > +    int func_len = strlen(".F");
> > +    int slot_len = strlen(":SS");
> > +    int path_len;
> > +    char *path, *p;
> > +
> > +    /* Calculate # of slots on path between device and root. */;
> > +    slot_depth = 0;
> > +    for (t = d; t; t = t->bus->parent_dev)
> > +        ++slot_depth;
> > +
> > +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > +
> > +    /* Allocate memory, fill in the terminating null byte. */
> > +    path = malloc(path_len + 1 /* For '\0' */);
> > +    path[path_len] = '\0';
> > +
> > +    /* First field is the domain. */
> > +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > +
> > +    /* Leave space for slot numbers and fill in function number. */
> > +    p = path + domain_len + slot_len * slot_depth;
> > +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > +
> > +    /* Fill in slot numbers. We walk up from device to root, so need to print
> > +     * them in the reverse order, last to first. */
> > +    for (t = d; t; t = t->bus->parent_dev) {
> > +        p -= slot_len;
> > +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > +    }
> > +
> > +    return path;
> >  }
> >  
> 
> --
> 			Gleb.
Gleb Natapov Nov. 8, 2010, 5:27 p.m. UTC | #5
On Mon, Nov 08, 2010 at 07:08:37PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 08, 2010 at 07:00:15PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 08, 2010 at 07:52:12AM -0700, Alex Williamson wrote:
> > > > On Mon, 2010-11-08 at 13:22 +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Oct 04, 2010 at 03:53:11PM -0600, Alex Williamson wrote:
> > > > > > pcibus_dev_print() was erroneously retrieving the device bus
> > > > > > number from the secondary bus number offset of the device
> > > > > > instead of the bridge above the device.  This ends of landing
> > > > > > in the 2nd byte of the 3rd BAR for devices, which thankfully
> > > > > > is usually zero.  pcibus_get_dev_path() copied this code,
> > > > > > inheriting the same bug.  pcibus_get_dev_path() is used for
> > > > > > ramblock naming, so changing it can effect migration.  However,
> > > > > > I've only seen this byte be non-zero for an assigned device,
> > > > > > which can't migrate anyway, so hopefully we won't run into
> > > > > > any issues.
> > > > > > 
> > > > > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > > > 
> > > > > Good catch. Applied.
> > > > > I don't really see why do we put the dev path
> > > > > in the bus object: why not let device supply its name?
> > > > 
> > > > Because the device name is not unique.  This came about from the
> > > > discussion about how to create a canonical device path that Gleb and
> > > > Markus are again trying to hash out.  If we go up to the bus and get the
> > > > bus address, we have a VM unique name.  Unfortunately, it's difficult to
> > > > define what the bus should print in all cases (ISA), but since they
> > > > don't do hotplug and typically don't allocate ramblocks, we can mostly
> > > > ignore it for this use case.
> > > > 
> > > > > And I think this will affect nested bridges. However they are currently
> > > > > broken anyway: we really must convert to topological names as bus number
> > > > > is guest-assigned - they don't have to be unique, even.
> > > > 
> > > > Yes, nested bridges are a problem.  How can the seg/bus/devfn not be
> > > > unique?
> > > 
> > > Bus numbers for nested bridges are guest assigned. We start with 0 after reset.
> > > 
> > > > > What does fixing this involve? Just changing pcibus_get_dev_path?
> > > > 
> > > > How do you plan to fix it?  Don't forget that migration depends on these
> > > > names, so some kind of compatibility layer would be required.  Thanks,
> > > > 
> > > > Alex
> > > 
> > > Replace bus number with slot numbers of parent bridges up to the root.
> > > This works for root bridge in a compatible way because bus number there
> > > is hard-coded to 0.
> > > IMO nested bridges are broken anyway, no way to be compatible there.
> > > 
> > > 
> > > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > work with these paths?
> > > 
> > 
> > I am no longer use bus's get_dev_path callback for my purpose (I added
> > another one get_fw_dev_path)
> 
> Why? Because get_dev_path is unstable? to avoid changing for migration?
Because to get functional get_dev_path() we need to implemented it
recursively (hint domain should not be part of PCI bus get_dev_path but
its parent bus instead). And since  get_dev_path() is used in migration
code you have all or nothing situation. Either you implement it for all
devices properly or for none.

> So with this patch, you get to use it again.
Unfortunately not.

> IMO two ways to address devices is a bad idea.
> 
> > since it is used for migration
> 
> what is used for migration?
> 
get_dev_path() callback is used to create unique instance id. Actually
it is the only use of it so simple grep will show you this :)

> > it should
> > be done for all buses to work properly. And by properly I mean produce
> > full path from system root to the device itself recursively.
> 
> This is what the code does. Recursion is not needed here though.
> 
Recursion is needed to create full device path. I am not interested in
only PCI here. In fact PCI one is the easiest.

> > But what I
> > learned is that by changing get_dev_path output you will break migration
> > from older guest to newer once (something we have to support).
> 
> Well I think migration for sytems with nested buses are broken anyway:
> it's just a new feature where we simply did not figure out the migration
> andgle before the last release.
> Without nesting my code returns exactly the existing output
> so no, it's not broken.
> 
It does not drops bus id part? If without bridges the output is exactly
same then migration is OK with the patch.


> > And of
> > course, as Alex said already, you need to traverse bridges recursively
> 
> Well this is an implementation detail.  loop is functionally equivalent,
> only cleaner and easier to understand.
> 
With that I can agree. I am referring more to recursion down device
tree.

> > and
> > domain does not provide any meaningful information.
> 
> I dont understand yet, sorry. Code I posted returns exactly
> what's there today. If domain does not provide any meaningful
> information, How do things work then? And how do you address
> roots on a system with multiple roots?
> 
The current code is broken. Look at my patch series.

> > > This is on top of Alex's patch, completely untested.
> > > 
> > > 
> > > pci: fix device path for devices behind nested bridges
> > > 
> > > We were using bus number in the device path, which is clearly
> > > broken as this number is guest-assigned for all devices
> > > except the root.
> > > 
> > > Fix by using hierarchical list of slots, walking the path
> > > from root down to device, instead. Add :00 as bus number
> > > so that if there are no nested bridges, this is compatible
> > > with what we have now.
> > > 
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > 
> > > diff --git a/hw/pci.c b/hw/pci.c
> > > index 7d12473..fa98d94 100644
> > > --- a/hw/pci.c
> > > +++ b/hw/pci.c
> > > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> > >  
> > >  static char *pcibus_get_dev_path(DeviceState *dev)
> > >  {
> > > -    PCIDevice *d = (PCIDevice *)dev;
> > > -    char path[16];
> > > -
> > > -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > > -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > > -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > > -
> > > -    return strdup(path);
> > > +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > > +    PCIDevice *t;
> > > +    int slot_depth;
> > > +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > > +     * 00 is added here to make this format compatible with
> > > +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > > +     * Slot list specifies the slot numbers for all devices on the
> > > +     * path from root to the specific device. */
> > > +    int domain_len = strlen("DDDD:00");
> > > +    int func_len = strlen(".F");
> > > +    int slot_len = strlen(":SS");
> > > +    int path_len;
> > > +    char *path, *p;
> > > +
> > > +    /* Calculate # of slots on path between device and root. */;
> > > +    slot_depth = 0;
> > > +    for (t = d; t; t = t->bus->parent_dev)
> > > +        ++slot_depth;
> > > +
> > > +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > > +
> > > +    /* Allocate memory, fill in the terminating null byte. */
> > > +    path = malloc(path_len + 1 /* For '\0' */);
> > > +    path[path_len] = '\0';
> > > +
> > > +    /* First field is the domain. */
> > > +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > > +
> > > +    /* Leave space for slot numbers and fill in function number. */
> > > +    p = path + domain_len + slot_len * slot_depth;
> > > +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > > +
> > > +    /* Fill in slot numbers. We walk up from device to root, so need to print
> > > +     * them in the reverse order, last to first. */
> > > +    for (t = d; t; t = t->bus->parent_dev) {
> > > +        p -= slot_len;
> > > +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > > +    }
> > > +
> > > +    return path;
> > >  }
> > >  
> > 
> > --
> > 			Gleb.

--
			Gleb.
Isaku Yamahata Nov. 9, 2010, 2:41 a.m. UTC | #6
On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> Replace bus number with slot numbers of parent bridges up to the root.
> This works for root bridge in a compatible way because bus number there
> is hard-coded to 0.
> IMO nested bridges are broken anyway, no way to be compatible there.
> 
> 
> Gleb, Markus, I think the following should be sufficient for PCI.  What
> do you think?  Also - do we need to update QMP/monitor to teach them to
> work with these paths?
> 
> This is on top of Alex's patch, completely untested.
> 
> 
> pci: fix device path for devices behind nested bridges
> 
> We were using bus number in the device path, which is clearly
> broken as this number is guest-assigned for all devices
> except the root.
> 
> Fix by using hierarchical list of slots, walking the path
> from root down to device, instead. Add :00 as bus number
> so that if there are no nested bridges, this is compatible
> with what we have now.

This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
because pci-to-pci bridge is pci function.
So the format should be
Domain:00:Slot.Function:Slot.Function....:Slot.Function

thanks,

> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> diff --git a/hw/pci.c b/hw/pci.c
> index 7d12473..fa98d94 100644
> --- a/hw/pci.c
> +++ b/hw/pci.c
> @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
>  
>  static char *pcibus_get_dev_path(DeviceState *dev)
>  {
> -    PCIDevice *d = (PCIDevice *)dev;
> -    char path[16];
> -
> -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> -
> -    return strdup(path);
> +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> +    PCIDevice *t;
> +    int slot_depth;
> +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> +     * 00 is added here to make this format compatible with
> +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> +     * Slot list specifies the slot numbers for all devices on the
> +     * path from root to the specific device. */
> +    int domain_len = strlen("DDDD:00");
> +    int func_len = strlen(".F");
> +    int slot_len = strlen(":SS");
> +    int path_len;
> +    char *path, *p;
> +
> +    /* Calculate # of slots on path between device and root. */;
> +    slot_depth = 0;
> +    for (t = d; t; t = t->bus->parent_dev)
> +        ++slot_depth;
> +
> +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> +
> +    /* Allocate memory, fill in the terminating null byte. */
> +    path = malloc(path_len + 1 /* For '\0' */);
> +    path[path_len] = '\0';
> +
> +    /* First field is the domain. */
> +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> +
> +    /* Leave space for slot numbers and fill in function number. */
> +    p = path + domain_len + slot_len * slot_depth;
> +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> +
> +    /* Fill in slot numbers. We walk up from device to root, so need to print
> +     * them in the reverse order, last to first. */
> +    for (t = d; t; t = t->bus->parent_dev) {
> +        p -= slot_len;
> +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> +    }
> +
> +    return path;
>  }
>  
>
Michael S. Tsirkin Nov. 9, 2010, 11:53 a.m. UTC | #7
On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > Replace bus number with slot numbers of parent bridges up to the root.
> > This works for root bridge in a compatible way because bus number there
> > is hard-coded to 0.
> > IMO nested bridges are broken anyway, no way to be compatible there.
> > 
> > 
> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > work with these paths?
> > 
> > This is on top of Alex's patch, completely untested.
> > 
> > 
> > pci: fix device path for devices behind nested bridges
> > 
> > We were using bus number in the device path, which is clearly
> > broken as this number is guest-assigned for all devices
> > except the root.
> > 
> > Fix by using hierarchical list of slots, walking the path
> > from root down to device, instead. Add :00 as bus number
> > so that if there are no nested bridges, this is compatible
> > with what we have now.
> 
> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> because pci-to-pci bridge is pci function.
> So the format should be
> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> 
> thanks,

Hmm, interesting. If we do this we aren't backwards compatible
though, so maybe we could try using openfirmware paths, just as well.

> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > diff --git a/hw/pci.c b/hw/pci.c
> > index 7d12473..fa98d94 100644
> > --- a/hw/pci.c
> > +++ b/hw/pci.c
> > @@ -1826,13 +1826,45 @@ static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
> >  
> >  static char *pcibus_get_dev_path(DeviceState *dev)
> >  {
> > -    PCIDevice *d = (PCIDevice *)dev;
> > -    char path[16];
> > -
> > -    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
> > -             pci_find_domain(d->bus), pci_bus_num(d->bus),
> > -             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
> > -
> > -    return strdup(path);
> > +    PCIDevice *d = container_of(dev, PCIDevice, qdev);
> > +    PCIDevice *t;
> > +    int slot_depth;
> > +    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
> > +     * 00 is added here to make this format compatible with
> > +     * domain:Bus:Slot.Func for systems without nested PCI bridges.
> > +     * Slot list specifies the slot numbers for all devices on the
> > +     * path from root to the specific device. */
> > +    int domain_len = strlen("DDDD:00");
> > +    int func_len = strlen(".F");
> > +    int slot_len = strlen(":SS");
> > +    int path_len;
> > +    char *path, *p;
> > +
> > +    /* Calculate # of slots on path between device and root. */;
> > +    slot_depth = 0;
> > +    for (t = d; t; t = t->bus->parent_dev)
> > +        ++slot_depth;
> > +
> > +    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
> > +
> > +    /* Allocate memory, fill in the terminating null byte. */
> > +    path = malloc(path_len + 1 /* For '\0' */);
> > +    path[path_len] = '\0';
> > +
> > +    /* First field is the domain. */
> > +    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
> > +
> > +    /* Leave space for slot numbers and fill in function number. */
> > +    p = path + domain_len + slot_len * slot_depth;
> > +    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
> > +
> > +    /* Fill in slot numbers. We walk up from device to root, so need to print
> > +     * them in the reverse order, last to first. */
> > +    for (t = d; t; t = t->bus->parent_dev) {
> > +        p -= slot_len;
> > +        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
> > +    }
> > +
> > +    return path;
> >  }
> >  
> > 
> 
> -- 
> yamahata
Markus Armbruster Nov. 19, 2010, 5:02 p.m. UTC | #8
"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
>> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
>> > Replace bus number with slot numbers of parent bridges up to the root.
>> > This works for root bridge in a compatible way because bus number there
>> > is hard-coded to 0.
>> > IMO nested bridges are broken anyway, no way to be compatible there.
>> > 
>> > 
>> > Gleb, Markus, I think the following should be sufficient for PCI.  What
>> > do you think?  Also - do we need to update QMP/monitor to teach them to
>> > work with these paths?
>> > 
>> > This is on top of Alex's patch, completely untested.
>> > 
>> > 
>> > pci: fix device path for devices behind nested bridges
>> > 
>> > We were using bus number in the device path, which is clearly
>> > broken as this number is guest-assigned for all devices
>> > except the root.
>> > 
>> > Fix by using hierarchical list of slots, walking the path
>> > from root down to device, instead. Add :00 as bus number
>> > so that if there are no nested bridges, this is compatible
>> > with what we have now.
>> 
>> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
>> because pci-to-pci bridge is pci function.
>> So the format should be
>> Domain:00:Slot.Function:Slot.Function....:Slot.Function
>> 
>> thanks,
>
> Hmm, interesting. If we do this we aren't backwards compatible
> though, so maybe we could try using openfirmware paths, just as well.

Whatever we do, we need to make it work for all (qdevified) devices and
buses.

It should also be possible to use canonical addressing with device_add &
friends.  I.e. permit naming a device by (a unique abbreviation of) its
canonical address in addition to naming it by its user-defined ID.  For
instance, something like

   device_del /pci/@1,1

in addition to

   device_del ID

Open Firmware is a useful source of inspiration there, but should it
come into conflict with usability, we should let usability win.
Gleb Natapov Nov. 19, 2010, 8:38 p.m. UTC | #9
On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> >> > Replace bus number with slot numbers of parent bridges up to the root.
> >> > This works for root bridge in a compatible way because bus number there
> >> > is hard-coded to 0.
> >> > IMO nested bridges are broken anyway, no way to be compatible there.
> >> > 
> >> > 
> >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> >> > work with these paths?
> >> > 
> >> > This is on top of Alex's patch, completely untested.
> >> > 
> >> > 
> >> > pci: fix device path for devices behind nested bridges
> >> > 
> >> > We were using bus number in the device path, which is clearly
> >> > broken as this number is guest-assigned for all devices
> >> > except the root.
> >> > 
> >> > Fix by using hierarchical list of slots, walking the path
> >> > from root down to device, instead. Add :00 as bus number
> >> > so that if there are no nested bridges, this is compatible
> >> > with what we have now.
> >> 
> >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> >> because pci-to-pci bridge is pci function.
> >> So the format should be
> >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> >> 
> >> thanks,
> >
> > Hmm, interesting. If we do this we aren't backwards compatible
> > though, so maybe we could try using openfirmware paths, just as well.
> 
> Whatever we do, we need to make it work for all (qdevified) devices and
> buses.
> 
> It should also be possible to use canonical addressing with device_add &
> friends.  I.e. permit naming a device by (a unique abbreviation of) its
> canonical address in addition to naming it by its user-defined ID.  For
> instance, something like
> 
>    device_del /pci/@1,1
> 
FWIW openbios allows this kind of abbreviation.

> in addition to
> 
>    device_del ID
> 
> Open Firmware is a useful source of inspiration there, but should it
> come into conflict with usability, we should let usability win.

--
			Gleb.
Michael S. Tsirkin Nov. 20, 2010, 8:17 p.m. UTC | #10
On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > 
> > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > >> > This works for root bridge in a compatible way because bus number there
> > >> > is hard-coded to 0.
> > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > >> > 
> > >> > 
> > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > >> > work with these paths?
> > >> > 
> > >> > This is on top of Alex's patch, completely untested.
> > >> > 
> > >> > 
> > >> > pci: fix device path for devices behind nested bridges
> > >> > 
> > >> > We were using bus number in the device path, which is clearly
> > >> > broken as this number is guest-assigned for all devices
> > >> > except the root.
> > >> > 
> > >> > Fix by using hierarchical list of slots, walking the path
> > >> > from root down to device, instead. Add :00 as bus number
> > >> > so that if there are no nested bridges, this is compatible
> > >> > with what we have now.
> > >> 
> > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > >> because pci-to-pci bridge is pci function.
> > >> So the format should be
> > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > >> 
> > >> thanks,
> > >
> > > Hmm, interesting. If we do this we aren't backwards compatible
> > > though, so maybe we could try using openfirmware paths, just as well.
> > 
> > Whatever we do, we need to make it work for all (qdevified) devices and
> > buses.
> > 
> > It should also be possible to use canonical addressing with device_add &
> > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > canonical address in addition to naming it by its user-defined ID.  For
> > instance, something like
> > 
> >    device_del /pci/@1,1
> > 
> FWIW openbios allows this kind of abbreviation.
> 
> > in addition to
> > 
> >    device_del ID
> > 
> > Open Firmware is a useful source of inspiration there, but should it
> > come into conflict with usability, we should let usability win.
> 
> --
> 			Gleb.


I think that the domain (PCI segment group), bus, slot, function way to
address pci devices is still the most familiar and the easiest to map to
functionality in the guests.  Qemu is buggy in the moment in that is
uses the bus addresses assigned by guest and not the ones in ACPI,
but that can be fixed.

That should be enough for e.g. device_del. We do have the need to
describe the topology when we interface with firmware, e.g. to describe
the ACPI tables themselves to qemu (this is what Gleb's patches deal
with), but that's probably the only case.
Gleb Natapov Nov. 21, 2010, 8:32 a.m. UTC | #11
On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > 
> > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > > >> > This works for root bridge in a compatible way because bus number there
> > > >> > is hard-coded to 0.
> > > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > > >> > 
> > > >> > 
> > > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > >> > work with these paths?
> > > >> > 
> > > >> > This is on top of Alex's patch, completely untested.
> > > >> > 
> > > >> > 
> > > >> > pci: fix device path for devices behind nested bridges
> > > >> > 
> > > >> > We were using bus number in the device path, which is clearly
> > > >> > broken as this number is guest-assigned for all devices
> > > >> > except the root.
> > > >> > 
> > > >> > Fix by using hierarchical list of slots, walking the path
> > > >> > from root down to device, instead. Add :00 as bus number
> > > >> > so that if there are no nested bridges, this is compatible
> > > >> > with what we have now.
> > > >> 
> > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > >> because pci-to-pci bridge is pci function.
> > > >> So the format should be
> > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > >> 
> > > >> thanks,
> > > >
> > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > though, so maybe we could try using openfirmware paths, just as well.
> > > 
> > > Whatever we do, we need to make it work for all (qdevified) devices and
> > > buses.
> > > 
> > > It should also be possible to use canonical addressing with device_add &
> > > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > > canonical address in addition to naming it by its user-defined ID.  For
> > > instance, something like
> > > 
> > >    device_del /pci/@1,1
> > > 
> > FWIW openbios allows this kind of abbreviation.
> > 
> > > in addition to
> > > 
> > >    device_del ID
> > > 
> > > Open Firmware is a useful source of inspiration there, but should it
> > > come into conflict with usability, we should let usability win.
> > 
> > --
> > 			Gleb.
> 
> 
> I think that the domain (PCI segment group), bus, slot, function way to
> address pci devices is still the most familiar and the easiest to map to
Most familiar to whom? It looks like you identify yourself with most of
qemu users, but if most qemu users are like you then qemu has not enough
users :) Most users that consider themselves to be "advanced" may know
what eth1 or /dev/sdb means. This doesn't mean we should provide
"device_del eth1" or "device_add /dev/sdb" command though. 

More important is that "domain" (encoded as number like you used to)
and "bus number" has no meaning from inside qemu. So while I said many
times I don't care about exact CLI syntax to much it should make sense
at least. It can use id to specify PCI bus in CLI like this:
device_del pci.0:1.1. Or it can even use device id too like this:
device_del pci.0:ide.0. Or it can use HW topology like in FO device
path. But doing ah-hoc device enumeration inside qemu and then using it
for CLI is not it.

> functionality in the guests.  Qemu is buggy in the moment in that is
> uses the bus addresses assigned by guest and not the ones in ACPI,
> but that can be fixed.
It looks like you confused ACPI _SEG for something it isn't. ACPI spec
says that PCI segment group is purely software concept managed by system
firmware. In fact one segment may include multiple PCI host bridges. _SEG
is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
(Current Resource Settings) for that just like OF. No surprise there.

> 
> That should be enough for e.g. device_del. We do have the need to
> describe the topology when we interface with firmware, e.g. to describe
> the ACPI tables themselves to qemu (this is what Gleb's patches deal
> with), but that's probably the only case.
> 
Describing HW topology is the only way to unambiguously describe device to
something or someone outside qemu and have persistent device naming
between different HW configuration.

--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 9:50 a.m. UTC | #12
On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote:
> On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > 
> > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > > > >> > This works for root bridge in a compatible way because bus number there
> > > > >> > is hard-coded to 0.
> > > > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > > > >> > 
> > > > >> > 
> > > > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > > >> > work with these paths?
> > > > >> > 
> > > > >> > This is on top of Alex's patch, completely untested.
> > > > >> > 
> > > > >> > 
> > > > >> > pci: fix device path for devices behind nested bridges
> > > > >> > 
> > > > >> > We were using bus number in the device path, which is clearly
> > > > >> > broken as this number is guest-assigned for all devices
> > > > >> > except the root.
> > > > >> > 
> > > > >> > Fix by using hierarchical list of slots, walking the path
> > > > >> > from root down to device, instead. Add :00 as bus number
> > > > >> > so that if there are no nested bridges, this is compatible
> > > > >> > with what we have now.
> > > > >> 
> > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > > >> because pci-to-pci bridge is pci function.
> > > > >> So the format should be
> > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > > >> 
> > > > >> thanks,
> > > > >
> > > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > > though, so maybe we could try using openfirmware paths, just as well.
> > > > 
> > > > Whatever we do, we need to make it work for all (qdevified) devices and
> > > > buses.
> > > > 
> > > > It should also be possible to use canonical addressing with device_add &
> > > > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > > > canonical address in addition to naming it by its user-defined ID.  For
> > > > instance, something like
> > > > 
> > > >    device_del /pci/@1,1
> > > > 
> > > FWIW openbios allows this kind of abbreviation.
> > > 
> > > > in addition to
> > > > 
> > > >    device_del ID
> > > > 
> > > > Open Firmware is a useful source of inspiration there, but should it
> > > > come into conflict with usability, we should let usability win.
> > > 
> > > --
> > > 			Gleb.
> > 
> > 
> > I think that the domain (PCI segment group), bus, slot, function way to
> > address pci devices is still the most familiar and the easiest to map to
> Most familiar to whom?

The guests.
For CLI, we need an easy way to map a device in guest to the
device in qemu and back.

> It looks like you identify yourself with most of
> qemu users, but if most qemu users are like you then qemu has not enough
> users :) Most users that consider themselves to be "advanced" may know
> what eth1 or /dev/sdb means. This doesn't mean we should provide
> "device_del eth1" or "device_add /dev/sdb" command though. 
> 
> More important is that "domain" (encoded as number like you used to)
> and "bus number" has no meaning from inside qemu.
> So while I said many
> times I don't care about exact CLI syntax to much it should make sense
> at least. It can use id to specify PCI bus in CLI like this:
> device_del pci.0:1.1. Or it can even use device id too like this:
> device_del pci.0:ide.0. Or it can use HW topology like in FO device
> path. But doing ah-hoc device enumeration inside qemu and then using it
> for CLI is not it.
> 
> > functionality in the guests.  Qemu is buggy in the moment in that is
> > uses the bus addresses assigned by guest and not the ones in ACPI,
> > but that can be fixed.
> It looks like you confused ACPI _SEG for something it isn't.

Maybe I did. This is what linux does:

struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
*root)
{
        struct acpi_device *device = root->device;
        int domain = root->segment;
        int busnum = root->secondary.start;

And I think this is consistent with the spec.

> ACPI spec
> says that PCI segment group is purely software concept managed by system
> firmware. In fact one segment may include multiple PCI host bridges.

It can't I think:
	Multiple Host Bridges

	A platform may have multiple PCI Express or PCI-X host bridges. The base
	address for the
	MMCONFIG space for these host bridges may need to be allocated at
	different locations. In such
	cases, using MCFG table and _CBA method as defined in this section means
	that each of these host
	bridges must be in its own PCI Segment Group.


> _SEG
> is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> (Current Resource Settings) for that just like OF. No surprise there.

OSPM uses both I think.

All I see linux do with CRS is get the bus number range.
And the spec says, e.g.:

	  the memory mapped configuration base
	address (always corresponds to bus number 0) for the PCI Segment Group
	of the host bridge is provided by _CBA and the bus range covered by the
	base address is indicated by the corresponding bus range specified in
	_CRS.


> > 
> > That should be enough for e.g. device_del. We do have the need to
> > describe the topology when we interface with firmware, e.g. to describe
> > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > with), but that's probably the only case.
> > 
> Describing HW topology is the only way to unambiguously describe device to
> something or someone outside qemu and have persistent device naming
> between different HW configuration.

Not really, since ACPI is a binary blob programmed by qemu.

> --
> 			Gleb.
Gleb Natapov Nov. 21, 2010, 10:19 a.m. UTC | #13
On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote:
> > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > > 
> > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > > > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > > > > >> > This works for root bridge in a compatible way because bus number there
> > > > > >> > is hard-coded to 0.
> > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > > > > >> > 
> > > > > >> > 
> > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > > > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > > > >> > work with these paths?
> > > > > >> > 
> > > > > >> > This is on top of Alex's patch, completely untested.
> > > > > >> > 
> > > > > >> > 
> > > > > >> > pci: fix device path for devices behind nested bridges
> > > > > >> > 
> > > > > >> > We were using bus number in the device path, which is clearly
> > > > > >> > broken as this number is guest-assigned for all devices
> > > > > >> > except the root.
> > > > > >> > 
> > > > > >> > Fix by using hierarchical list of slots, walking the path
> > > > > >> > from root down to device, instead. Add :00 as bus number
> > > > > >> > so that if there are no nested bridges, this is compatible
> > > > > >> > with what we have now.
> > > > > >> 
> > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > > > >> because pci-to-pci bridge is pci function.
> > > > > >> So the format should be
> > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > > > >> 
> > > > > >> thanks,
> > > > > >
> > > > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > > > though, so maybe we could try using openfirmware paths, just as well.
> > > > > 
> > > > > Whatever we do, we need to make it work for all (qdevified) devices and
> > > > > buses.
> > > > > 
> > > > > It should also be possible to use canonical addressing with device_add &
> > > > > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > > > > canonical address in addition to naming it by its user-defined ID.  For
> > > > > instance, something like
> > > > > 
> > > > >    device_del /pci/@1,1
> > > > > 
> > > > FWIW openbios allows this kind of abbreviation.
> > > > 
> > > > > in addition to
> > > > > 
> > > > >    device_del ID
> > > > > 
> > > > > Open Firmware is a useful source of inspiration there, but should it
> > > > > come into conflict with usability, we should let usability win.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > 
> > > I think that the domain (PCI segment group), bus, slot, function way to
> > > address pci devices is still the most familiar and the easiest to map to
> > Most familiar to whom?
> 
> The guests.
Which one? There are many guests. Your favorite?

> For CLI, we need an easy way to map a device in guest to the
> device in qemu and back.
Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
you are saying is "lets use name that guest assigned to a device". 

> 
> > It looks like you identify yourself with most of
> > qemu users, but if most qemu users are like you then qemu has not enough
> > users :) Most users that consider themselves to be "advanced" may know
> > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > "device_del eth1" or "device_add /dev/sdb" command though. 
> > 
> > More important is that "domain" (encoded as number like you used to)
> > and "bus number" has no meaning from inside qemu.
> > So while I said many
> > times I don't care about exact CLI syntax to much it should make sense
> > at least. It can use id to specify PCI bus in CLI like this:
> > device_del pci.0:1.1. Or it can even use device id too like this:
> > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > path. But doing ah-hoc device enumeration inside qemu and then using it
> > for CLI is not it.
> > 
> > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > but that can be fixed.
> > It looks like you confused ACPI _SEG for something it isn't.
> 
> Maybe I did. This is what linux does:
> 
> struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> *root)
> {
>         struct acpi_device *device = root->device;
>         int domain = root->segment;
>         int busnum = root->secondary.start;
> 
> And I think this is consistent with the spec.
> 
It means that one domain may include several host bridges. At that level
domain is defined as something that have unique name for each device
inside it thus no two buses in one segment/domain can have same bus
number. This is what PCI spec tells you. 

And this further shows that using "domain" as defined by guest is very
bad idea. 

> > ACPI spec
> > says that PCI segment group is purely software concept managed by system
> > firmware. In fact one segment may include multiple PCI host bridges.
> 
> It can't I think:
Read _BBN definition:
 The _BBN object is located under a PCI host bridge and must be unique for
 every host bridge within a segment since it is the PCI bus number.

Clearly above speaks about multiple host bridge within a segment.

> 	Multiple Host Bridges
> 
> 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> 	address for the
> 	MMCONFIG space for these host bridges may need to be allocated at
> 	different locations. In such
> 	cases, using MCFG table and _CBA method as defined in this section means
> 	that each of these host
> 	bridges must be in its own PCI Segment Group.
> 
This is not from ACPI spec, but without going to deep into it above
paragraph talks about some particular case when each host bridge must
be in its own PCI Segment Group with is a definite prove that in other
cases multiple host bridges can be in on segment group.

> 
> > _SEG
> > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > (Current Resource Settings) for that just like OF. No surprise there.
> 
> OSPM uses both I think.
> 
> All I see linux do with CRS is get the bus number range.
So lets assume that HW has two PCI host bridges and ACPI has:
        Device(PCI0) {
            Name (_HID, EisaId ("PNP0A03"))
            Name (_SEG, 0x00)
        }
        Device(PCI1) {
            Name (_HID, EisaId ("PNP0A03"))
            Name (_SEG, 0x01)
        }
I.e no _CRS to describe resources. How do you think OSPM knows which of
two pci host bridges is PCI0 and which one is PCI1?


> And the spec says, e.g.:
> 
> 	  the memory mapped configuration base
> 	address (always corresponds to bus number 0) for the PCI Segment Group
> 	of the host bridge is provided by _CBA and the bus range covered by the
> 	base address is indicated by the corresponding bus range specified in
> 	_CRS.
> 
Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
solve the problem for PCI first and then move to PCI Express. Jumping from one
to another destruct us from main discussion.

> 
> > > 
> > > That should be enough for e.g. device_del. We do have the need to
> > > describe the topology when we interface with firmware, e.g. to describe
> > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > with), but that's probably the only case.
> > > 
> > Describing HW topology is the only way to unambiguously describe device to
> > something or someone outside qemu and have persistent device naming
> > between different HW configuration.
> 
> Not really, since ACPI is a binary blob programmed by qemu.
> 
APCI is part of the guest, not qemu. Just saying "not really" doesn't
prove much. I still haven't seen any proposition from you that actually
solve the problem. No, "lets use guest naming" is not it. There is no
such thing as "The Guest". 

--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 11:53 a.m. UTC | #14
On Sun, Nov 21, 2010 at 12:19:03PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 11:50:18AM +0200, Michael S. Tsirkin wrote:
> > On Sun, Nov 21, 2010 at 10:32:11AM +0200, Gleb Natapov wrote:
> > > On Sat, Nov 20, 2010 at 10:17:09PM +0200, Michael S. Tsirkin wrote:
> > > > On Fri, Nov 19, 2010 at 10:38:42PM +0200, Gleb Natapov wrote:
> > > > > On Fri, Nov 19, 2010 at 06:02:58PM +0100, Markus Armbruster wrote:
> > > > > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > > > > > 
> > > > > > > On Tue, Nov 09, 2010 at 11:41:43AM +0900, Isaku Yamahata wrote:
> > > > > > >> On Mon, Nov 08, 2010 at 06:26:33PM +0200, Michael S. Tsirkin wrote:
> > > > > > >> > Replace bus number with slot numbers of parent bridges up to the root.
> > > > > > >> > This works for root bridge in a compatible way because bus number there
> > > > > > >> > is hard-coded to 0.
> > > > > > >> > IMO nested bridges are broken anyway, no way to be compatible there.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > Gleb, Markus, I think the following should be sufficient for PCI.  What
> > > > > > >> > do you think?  Also - do we need to update QMP/monitor to teach them to
> > > > > > >> > work with these paths?
> > > > > > >> > 
> > > > > > >> > This is on top of Alex's patch, completely untested.
> > > > > > >> > 
> > > > > > >> > 
> > > > > > >> > pci: fix device path for devices behind nested bridges
> > > > > > >> > 
> > > > > > >> > We were using bus number in the device path, which is clearly
> > > > > > >> > broken as this number is guest-assigned for all devices
> > > > > > >> > except the root.
> > > > > > >> > 
> > > > > > >> > Fix by using hierarchical list of slots, walking the path
> > > > > > >> > from root down to device, instead. Add :00 as bus number
> > > > > > >> > so that if there are no nested bridges, this is compatible
> > > > > > >> > with what we have now.
> > > > > > >> 
> > > > > > >> This format, Domain:00:Slot:Slot....:Slot.Function, doesn't work
> > > > > > >> because pci-to-pci bridge is pci function.
> > > > > > >> So the format should be
> > > > > > >> Domain:00:Slot.Function:Slot.Function....:Slot.Function
> > > > > > >> 
> > > > > > >> thanks,
> > > > > > >
> > > > > > > Hmm, interesting. If we do this we aren't backwards compatible
> > > > > > > though, so maybe we could try using openfirmware paths, just as well.
> > > > > > 
> > > > > > Whatever we do, we need to make it work for all (qdevified) devices and
> > > > > > buses.
> > > > > > 
> > > > > > It should also be possible to use canonical addressing with device_add &
> > > > > > friends.  I.e. permit naming a device by (a unique abbreviation of) its
> > > > > > canonical address in addition to naming it by its user-defined ID.  For
> > > > > > instance, something like
> > > > > > 
> > > > > >    device_del /pci/@1,1
> > > > > > 
> > > > > FWIW openbios allows this kind of abbreviation.
> > > > > 
> > > > > > in addition to
> > > > > > 
> > > > > >    device_del ID
> > > > > > 
> > > > > > Open Firmware is a useful source of inspiration there, but should it
> > > > > > come into conflict with usability, we should let usability win.
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > 
> > > > 
> > > > I think that the domain (PCI segment group), bus, slot, function way to
> > > > address pci devices is still the most familiar and the easiest to map to
> > > Most familiar to whom?
> > 
> > The guests.
> Which one? There are many guests. Your favorite?
> 
> > For CLI, we need an easy way to map a device in guest to the
> > device in qemu and back.
> Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> you are saying is "lets use name that guest assigned to a device". 

No I am saying let's use the name that our ACPI tables assigned.

> > 
> > > It looks like you identify yourself with most of
> > > qemu users, but if most qemu users are like you then qemu has not enough
> > > users :) Most users that consider themselves to be "advanced" may know
> > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > 
> > > More important is that "domain" (encoded as number like you used to)
> > > and "bus number" has no meaning from inside qemu.
> > > So while I said many
> > > times I don't care about exact CLI syntax to much it should make sense
> > > at least. It can use id to specify PCI bus in CLI like this:
> > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > for CLI is not it.
> > > 
> > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > but that can be fixed.
> > > It looks like you confused ACPI _SEG for something it isn't.
> > 
> > Maybe I did. This is what linux does:
> > 
> > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > *root)
> > {
> >         struct acpi_device *device = root->device;
> >         int domain = root->segment;
> >         int busnum = root->secondary.start;
> > 
> > And I think this is consistent with the spec.
> > 
> It means that one domain may include several host bridges.
> At that level
> domain is defined as something that have unique name for each device
> inside it thus no two buses in one segment/domain can have same bus
> number. This is what PCI spec tells you. 

And that really is enough for CLI because all we need is locate the
specific slot in a unique way.

> And this further shows that using "domain" as defined by guest is very
> bad idea. 

As defined by ACPI, really.

> > > ACPI spec
> > > says that PCI segment group is purely software concept managed by system
> > > firmware. In fact one segment may include multiple PCI host bridges.
> > 
> > It can't I think:
> Read _BBN definition:
>  The _BBN object is located under a PCI host bridge and must be unique for
>  every host bridge within a segment since it is the PCI bus number.
> 
> Clearly above speaks about multiple host bridge within a segment.

Yes, it looks like the firmware spec allows that.

> > 	Multiple Host Bridges
> > 
> > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > 	address for the
> > 	MMCONFIG space for these host bridges may need to be allocated at
> > 	different locations. In such
> > 	cases, using MCFG table and _CBA method as defined in this section means
> > 	that each of these host
> > 	bridges must be in its own PCI Segment Group.
> > 
> This is not from ACPI spec,

PCI Firmware Specification 3.0

> but without going to deep into it above
> paragraph talks about some particular case when each host bridge must
> be in its own PCI Segment Group with is a definite prove that in other
> cases multiple host bridges can be in on segment group.

I stand corrected. I think you are right. But note that if they are,
they must have distinct bus numbers assigned by ACPI.

> > 
> > > _SEG
> > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > (Current Resource Settings) for that just like OF. No surprise there.
> > 
> > OSPM uses both I think.
> > 
> > All I see linux do with CRS is get the bus number range.
> So lets assume that HW has two PCI host bridges and ACPI has:
>         Device(PCI0) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x00)
>         }
>         Device(PCI1) {
>             Name (_HID, EisaId ("PNP0A03"))
>             Name (_SEG, 0x01)
>         }
> I.e no _CRS to describe resources. How do you think OSPM knows which of
> two pci host bridges is PCI0 and which one is PCI1?

You must be able to uniquely address any bridge using the combination of _SEG
and _BBN.

> > And the spec says, e.g.:
> > 
> > 	  the memory mapped configuration base
> > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > 	of the host bridge is provided by _CBA and the bus range covered by the
> > 	base address is indicated by the corresponding bus range specified in
> > 	_CRS.
> > 
> Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> solve the problem for PCI first and then move to PCI Express. Jumping from one
> to another destruct us from main discussion.

I think this is what confuses us.  As long as you are using cf8/cfc there's no
concept of a domain really.
Thus:
	/pci@i0cf8

is probably enough for BIOS boot because we'll need to make root bus numbers
unique for legacy guests/option ROMs.  But this is not a hardware requirement
and might become easier to ignore eith EFI.

> > 
> > > > 
> > > > That should be enough for e.g. device_del. We do have the need to
> > > > describe the topology when we interface with firmware, e.g. to describe
> > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > with), but that's probably the only case.
> > > > 
> > > Describing HW topology is the only way to unambiguously describe device to
> > > something or someone outside qemu and have persistent device naming
> > > between different HW configuration.
> > 
> > Not really, since ACPI is a binary blob programmed by qemu.
> > 
> APCI is part of the guest, not qemu.

Yes it runs in the guest but it's generated by qemu. On real hardware,
it's supplied by the motherboard.

> Just saying "not really" doesn't
> prove much. I still haven't seen any proposition from you that actually
> solve the problem. No, "lets use guest naming" is not it. There is no
> such thing as "The Guest". 
> 
> --
> 			Gleb.

I am sorry if I didn't make this clear.  I think we should use the domain:bus
pair to name the root device. As these are unique and
Gleb Natapov Nov. 21, 2010, 12:50 p.m. UTC | #15
On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > The guests.
> > Which one? There are many guests. Your favorite?
> > 
> > > For CLI, we need an easy way to map a device in guest to the
> > > device in qemu and back.
> > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> > you are saying is "lets use name that guest assigned to a device". 
> 
> No I am saying let's use the name that our ACPI tables assigned.
> 
ACPI does not assign any name. In a best case ACPI tables describe resources
used by a device. And not all guests qemu supports has support for ACPI. Qemu
even support machines types that do not support ACPI.
 
> > > 
> > > 
> > > > It looks like you identify yourself with most of
> > > > qemu users, but if most qemu users are like you then qemu has not enough
> > > > users :) Most users that consider themselves to be "advanced" may know
> > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > 
> > > > More important is that "domain" (encoded as number like you used to)
> > > > and "bus number" has no meaning from inside qemu.
> > > > So while I said many
> > > > times I don't care about exact CLI syntax to much it should make sense
> > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > > for CLI is not it.
> > > > 
> > > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > but that can be fixed.
> > > > It looks like you confused ACPI _SEG for something it isn't.
> > > 
> > > Maybe I did. This is what linux does:
> > > 
> > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > *root)
> > > {
> > >         struct acpi_device *device = root->device;
> > >         int domain = root->segment;
> > >         int busnum = root->secondary.start;
> > > 
> > > And I think this is consistent with the spec.
> > > 
> > It means that one domain may include several host bridges.
> > At that level
> > domain is defined as something that have unique name for each device
> > inside it thus no two buses in one segment/domain can have same bus
> > number. This is what PCI spec tells you. 
> 
> And that really is enough for CLI because all we need is locate the
> specific slot in a unique way.
> 
At qemu level we do not have bus numbers. They are assigned by a guest.
So inside a guest domain:bus:slot.func points you to a device, but in
qemu does not enumerate buses.

> > And this further shows that using "domain" as defined by guest is very
> > bad idea. 
> 
> As defined by ACPI, really.
> 
ACPI is a part of a guest software that may not event present in the
guest. How is it relevant?

> > > > ACPI spec
> > > > says that PCI segment group is purely software concept managed by system
> > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > 
> > > It can't I think:
> > Read _BBN definition:
> >  The _BBN object is located under a PCI host bridge and must be unique for
> >  every host bridge within a segment since it is the PCI bus number.
> > 
> > Clearly above speaks about multiple host bridge within a segment.
> 
> Yes, it looks like the firmware spec allows that.
It even have explicit example that shows it.

> 
> > > 	Multiple Host Bridges
> > > 
> > > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > > 	address for the
> > > 	MMCONFIG space for these host bridges may need to be allocated at
> > > 	different locations. In such
> > > 	cases, using MCFG table and _CBA method as defined in this section means
> > > 	that each of these host
> > > 	bridges must be in its own PCI Segment Group.
> > > 
> > This is not from ACPI spec,
> 
> PCI Firmware Specification 3.0
> 
> > but without going to deep into it above
> > paragraph talks about some particular case when each host bridge must
> > be in its own PCI Segment Group with is a definite prove that in other
> > cases multiple host bridges can be in on segment group.
> 
> I stand corrected. I think you are right. But note that if they are,
> they must have distinct bus numbers assigned by ACPI.
ACPI does not assign any numbers. Bios enumerates buses and assign
numbers. ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
one layer below all this and does not enumerate PC buses. Even if we make
it to do so there is not way to guaranty that guest will enumerate them
in the same order since there is more then one way to do enumeration. I
repeated this numerous times to you already.

> 
> > > 
> > > > _SEG
> > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > > (Current Resource Settings) for that just like OF. No surprise there.
> > > 
> > > OSPM uses both I think.
> > > 
> > > All I see linux do with CRS is get the bus number range.
> > So lets assume that HW has two PCI host bridges and ACPI has:
> >         Device(PCI0) {
> >             Name (_HID, EisaId ("PNP0A03"))
> >             Name (_SEG, 0x00)
> >         }
> >         Device(PCI1) {
> >             Name (_HID, EisaId ("PNP0A03"))
> >             Name (_SEG, 0x01)
> >         }
> > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > two pci host bridges is PCI0 and which one is PCI1?
> 
> You must be able to uniquely address any bridge using the combination of _SEG
> and _BBN.

No at all. And saying "you must be able" without actually show how
doesn't prove anything. _SEG is relevant only for those host bridges
that support MMCONFIG (not all of them do, and none that qemu support
does yet). _SEG points to specific entry in MCFG table and MCFG entry
holds base address for MMCONFIG space for the bridge (this address
is configured by a guest). This is all _SEG does really, no magic at
all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
qemu visible again. So _SEG and _BBN gives you two numbers assigned by
a guest FW. Nothing qemu can use to identify a device.

> 
> > > And the spec says, e.g.:
> > > 
> > > 	  the memory mapped configuration base
> > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > 	base address is indicated by the corresponding bus range specified in
> > > 	_CRS.
> > > 
> > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > to another destruct us from main discussion.
> 
> I think this is what confuses us.  As long as you are using cf8/cfc there's no
> concept of a domain really.
> Thus:
> 	/pci@i0cf8
> 
> is probably enough for BIOS boot because we'll need to make root bus numbers
> unique for legacy guests/option ROMs.  But this is not a hardware requirement
> and might become easier to ignore eith EFI.
> 
You do not need MMCONFIG to have multiple PCI domains. You can have one
configured via standard cf8/cfc and another one on ef8/efc and one more
at mmio fce00000 and you can address all of them:
/pci@i0cf8
/pci@i0ef8
/pci@fce00000

And each one of those PCI domains can have 256 subbridges.

> > > 
> > > > > 
> > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > with), but that's probably the only case.
> > > > > 
> > > > Describing HW topology is the only way to unambiguously describe device to
> > > > something or someone outside qemu and have persistent device naming
> > > > between different HW configuration.
> > > 
> > > Not really, since ACPI is a binary blob programmed by qemu.
> > > 
> > APCI is part of the guest, not qemu.
> 
> Yes it runs in the guest but it's generated by qemu. On real hardware,
> it's supplied by the motherboard.
> 
It is not generated by qemu. Parts of it depend on HW and other part depend
on how BIOS configure HW. _BBN for instance is clearly defined to return
address assigned bu the BIOS.


> > Just saying "not really" doesn't
> > prove much. I still haven't seen any proposition from you that actually
> > solve the problem. No, "lets use guest naming" is not it. There is no
> > such thing as "The Guest". 
> > 
> > --
> > 			Gleb.
> 
> I am sorry if I didn't make this clear.  I think we should use the domain:bus
> pair to name the root device. As these are unique and 
> 
You forgot to complete the sentence :) But you made it clear enough and
it is incorrect. domain:bus pair not only not unique they do not exist
in qemu at all and as such can't be used to address device. They are
product of HW enumeration done by a guest OS just like eth0 or C:.

--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 2:48 p.m. UTC | #16
On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > The guests.
> > > Which one? There are many guests. Your favorite?
> > > 
> > > > For CLI, we need an easy way to map a device in guest to the
> > > > device in qemu and back.
> > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> > > you are saying is "lets use name that guest assigned to a device". 
> > 
> > No I am saying let's use the name that our ACPI tables assigned.
> > 
> ACPI does not assign any name. In a best case ACPI tables describe resources
> used by a device.

Not only that. bus number and segment aren't resources as such.
They describe addressing.

> And not all guests qemu supports has support for ACPI. Qemu
> even support machines types that do not support ACPI.

So? Different machines -> different names.

> > > > 
> > > > 
> > > > > It looks like you identify yourself with most of
> > > > > qemu users, but if most qemu users are like you then qemu has not enough
> > > > > users :) Most users that consider themselves to be "advanced" may know
> > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > 
> > > > > More important is that "domain" (encoded as number like you used to)
> > > > > and "bus number" has no meaning from inside qemu.
> > > > > So while I said many
> > > > > times I don't care about exact CLI syntax to much it should make sense
> > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > > > for CLI is not it.
> > > > > 
> > > > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > > but that can be fixed.
> > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > 
> > > > Maybe I did. This is what linux does:
> > > > 
> > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > *root)
> > > > {
> > > >         struct acpi_device *device = root->device;
> > > >         int domain = root->segment;
> > > >         int busnum = root->secondary.start;
> > > > 
> > > > And I think this is consistent with the spec.
> > > > 
> > > It means that one domain may include several host bridges.
> > > At that level
> > > domain is defined as something that have unique name for each device
> > > inside it thus no two buses in one segment/domain can have same bus
> > > number. This is what PCI spec tells you. 
> > 
> > And that really is enough for CLI because all we need is locate the
> > specific slot in a unique way.
> > 
> At qemu level we do not have bus numbers. They are assigned by a guest.
> So inside a guest domain:bus:slot.func points you to a device, but in
> qemu does not enumerate buses.
> 
> > > And this further shows that using "domain" as defined by guest is very
> > > bad idea. 
> > 
> > As defined by ACPI, really.
> > 
> ACPI is a part of a guest software that may not event present in the
> guest. How is it relevant?

It's relevant because this is what guests use. To access the root
device with cf8/cfc you need to know the bus number assigned to it
by firmware. How that was assigned is of interest to BIOS/ACPI but not
really interesting to the user or, I suspect, guest OS.

> > > > > ACPI spec
> > > > > says that PCI segment group is purely software concept managed by system
> > > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > > 
> > > > It can't I think:
> > > Read _BBN definition:
> > >  The _BBN object is located under a PCI host bridge and must be unique for
> > >  every host bridge within a segment since it is the PCI bus number.
> > > 
> > > Clearly above speaks about multiple host bridge within a segment.
> > 
> > Yes, it looks like the firmware spec allows that.
> It even have explicit example that shows it.
> 
> > 
> > > > 	Multiple Host Bridges
> > > > 
> > > > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > > > 	address for the
> > > > 	MMCONFIG space for these host bridges may need to be allocated at
> > > > 	different locations. In such
> > > > 	cases, using MCFG table and _CBA method as defined in this section means
> > > > 	that each of these host
> > > > 	bridges must be in its own PCI Segment Group.
> > > > 
> > > This is not from ACPI spec,
> > 
> > PCI Firmware Specification 3.0
> > 
> > > but without going to deep into it above
> > > paragraph talks about some particular case when each host bridge must
> > > be in its own PCI Segment Group with is a definite prove that in other
> > > cases multiple host bridges can be in on segment group.
> > 
> > I stand corrected. I think you are right. But note that if they are,
> > they must have distinct bus numbers assigned by ACPI.
> ACPI does not assign any numbers.

For all root pci devices firmware must supply BBN number. This is the
bus number, isn't it? For nested buses, this is optional.

> Bios enumerates buses and assign
> numbers.

There's no standard way to enumerate pci root devices in guest AFAIK.
The spec says:
	Firmware must configure all Host Bridges in the systems, even if
	they are not connected to a console or boot device. Firmware must
	configure Host Bridges in order to allow operating systems to use the
	devices below the Host Bridges. This is because the Host Bridges
	programming model is not defined by the PCI Specifications. 


> ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> one layer below all this and does not enumerate PC buses. Even if we make
> it to do so there is not way to guaranty that guest will enumerate them
> in the same order since there is more then one way to do enumeration. I
> repeated this numerous times to you already.

ACPI is really part of the motherboard. Calling it the guest just
confuses things. Guest OS can override bus numbering for nested buses
but not for root buses.

> > 
> > > > 
> > > > > _SEG
> > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > > > (Current Resource Settings) for that just like OF. No surprise there.
> > > > 
> > > > OSPM uses both I think.
> > > > 
> > > > All I see linux do with CRS is get the bus number range.
> > > So lets assume that HW has two PCI host bridges and ACPI has:
> > >         Device(PCI0) {
> > >             Name (_HID, EisaId ("PNP0A03"))
> > >             Name (_SEG, 0x00)
> > >         }
> > >         Device(PCI1) {
> > >             Name (_HID, EisaId ("PNP0A03"))
> > >             Name (_SEG, 0x01)
> > >         }
> > > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > > two pci host bridges is PCI0 and which one is PCI1?
> > 
> > You must be able to uniquely address any bridge using the combination of _SEG
> > and _BBN.
> 
> No at all. And saying "you must be able" without actually show how
> doesn't prove anything. _SEG is relevant only for those host bridges
> that support MMCONFIG (not all of them do, and none that qemu support
> does yet). _SEG points to specific entry in MCFG table and MCFG entry
> holds base address for MMCONFIG space for the bridge (this address
> is configured by a guest). This is all _SEG does really, no magic at
> all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> qemu visible again.
> So _SEG and _BBN gives you two numbers assigned by
> a guest FW. Nothing qemu can use to identify a device.

This FW is given to guest by qemu. It only assigns bus numbers
because qemu told it to do so.

> > 
> > > > And the spec says, e.g.:
> > > > 
> > > > 	  the memory mapped configuration base
> > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > 	base address is indicated by the corresponding bus range specified in
> > > > 	_CRS.
> > > > 
> > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > to another destruct us from main discussion.
> > 
> > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > concept of a domain really.
> > Thus:
> > 	/pci@i0cf8
> > 
> > is probably enough for BIOS boot because we'll need to make root bus numbers
> > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > and might become easier to ignore eith EFI.
> > 
> You do not need MMCONFIG to have multiple PCI domains. You can have one
> configured via standard cf8/cfc and another one on ef8/efc and one more
> at mmio fce00000 and you can address all of them:
> /pci@i0cf8
> /pci@i0ef8
> /pci@fce00000
> 
> And each one of those PCI domains can have 256 subbridges.

Will common guests such as windows or linux be able to use them? This
seems to be outside the scope of the PCI Firmware specification, which
says that bus numbers must be unique.

> > > > 
> > > > > > 
> > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > with), but that's probably the only case.
> > > > > > 
> > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > something or someone outside qemu and have persistent device naming
> > > > > between different HW configuration.
> > > > 
> > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > 
> > > APCI is part of the guest, not qemu.
> > 
> > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > it's supplied by the motherboard.
> > 
> It is not generated by qemu. Parts of it depend on HW and other part depend
> on how BIOS configure HW. _BBN for instance is clearly defined to return
> address assigned bu the BIOS.

BIOS is supplied on the motherboard and in our case by qemu as well.
There's no standard way for BIOS to assign bus number to the pci root,
so it does it in device-specific way. Why should a management tool
or a CLI user care about these? As far as they are concerned
we could use some PV scheme to find root devices and assign bus
numbers, and it would be exactly the same.

> > > Just saying "not really" doesn't
> > > prove much. I still haven't seen any proposition from you that actually
> > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > such thing as "The Guest". 
> > > 
> > > --
> > > 			Gleb.
> > 
> > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > pair to name the root device. As these are unique and 
> > 
> You forgot to complete the sentence :) But you made it clear enough and
> it is incorrect. domain:bus pair not only not unique they do not exist
> in qemu at all

Sure they do. domain maps to mcfg address for express. bus is used for
cf8/cfc addressing. They are assigned by BIOS but since BIOS
is supplied with hardware the point is moot.

> and as such can't be used to address device. They are
> product of HW enumeration done by a guest OS just like eth0 or C:.
> 
> --
> 			Gleb.

There's a huge difference between BIOS and guest OS, and between bus
numbers of pci root and of nested bridges.

Describing hardware io ports makes sense if you are trying to
communicate data from qemu to the BIOS.  But the rest of the world might
not care.
Gleb Natapov Nov. 21, 2010, 4:01 p.m. UTC | #17
On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > > The guests.
> > > > Which one? There are many guests. Your favorite?
> > > > 
> > > > > For CLI, we need an easy way to map a device in guest to the
> > > > > device in qemu and back.
> > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> > > > you are saying is "lets use name that guest assigned to a device". 
> > > 
> > > No I am saying let's use the name that our ACPI tables assigned.
> > > 
> > ACPI does not assign any name. In a best case ACPI tables describe resources
> > used by a device.
> 
> Not only that. bus number and segment aren't resources as such.
> They describe addressing.
> 
> > And not all guests qemu supports has support for ACPI. Qemu
> > even support machines types that do not support ACPI.
> 
> So? Different machines -> different names.
> 
You want to have different cli for different type of machines qemu
supports?

> > > > > 
> > > > > 
> > > > > > It looks like you identify yourself with most of
> > > > > > qemu users, but if most qemu users are like you then qemu has not enough
> > > > > > users :) Most users that consider themselves to be "advanced" may know
> > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > > 
> > > > > > More important is that "domain" (encoded as number like you used to)
> > > > > > and "bus number" has no meaning from inside qemu.
> > > > > > So while I said many
> > > > > > times I don't care about exact CLI syntax to much it should make sense
> > > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > > > > for CLI is not it.
> > > > > > 
> > > > > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > > > but that can be fixed.
> > > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > > 
> > > > > Maybe I did. This is what linux does:
> > > > > 
> > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > > *root)
> > > > > {
> > > > >         struct acpi_device *device = root->device;
> > > > >         int domain = root->segment;
> > > > >         int busnum = root->secondary.start;
> > > > > 
> > > > > And I think this is consistent with the spec.
> > > > > 
> > > > It means that one domain may include several host bridges.
> > > > At that level
> > > > domain is defined as something that have unique name for each device
> > > > inside it thus no two buses in one segment/domain can have same bus
> > > > number. This is what PCI spec tells you. 
> > > 
> > > And that really is enough for CLI because all we need is locate the
> > > specific slot in a unique way.
> > > 
> > At qemu level we do not have bus numbers. They are assigned by a guest.
> > So inside a guest domain:bus:slot.func points you to a device, but in
> > qemu does not enumerate buses.
> > 
> > > > And this further shows that using "domain" as defined by guest is very
> > > > bad idea. 
> > > 
> > > As defined by ACPI, really.
> > > 
> > ACPI is a part of a guest software that may not event present in the
> > guest. How is it relevant?
> 
> It's relevant because this is what guests use. To access the root
> device with cf8/cfc you need to know the bus number assigned to it
> by firmware. How that was assigned is of interest to BIOS/ACPI but not
> really interesting to the user or, I suspect, guest OS.
> 
Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux
have cmd just for that. And saying that ACPI is relevant because this is
what guest software use in a reply to sentence that states that not all
guest even use ACPI is, well, strange.

And ACPI describes only HW that present at boot time. What if you
hot-plugged root pci bridge? How non existent PCI naming helps you?

> > > > > > ACPI spec
> > > > > > says that PCI segment group is purely software concept managed by system
> > > > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > > > 
> > > > > It can't I think:
> > > > Read _BBN definition:
> > > >  The _BBN object is located under a PCI host bridge and must be unique for
> > > >  every host bridge within a segment since it is the PCI bus number.
> > > > 
> > > > Clearly above speaks about multiple host bridge within a segment.
> > > 
> > > Yes, it looks like the firmware spec allows that.
> > It even have explicit example that shows it.
> > 
> > > 
> > > > > 	Multiple Host Bridges
> > > > > 
> > > > > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > > > > 	address for the
> > > > > 	MMCONFIG space for these host bridges may need to be allocated at
> > > > > 	different locations. In such
> > > > > 	cases, using MCFG table and _CBA method as defined in this section means
> > > > > 	that each of these host
> > > > > 	bridges must be in its own PCI Segment Group.
> > > > > 
> > > > This is not from ACPI spec,
> > > 
> > > PCI Firmware Specification 3.0
> > > 
> > > > but without going to deep into it above
> > > > paragraph talks about some particular case when each host bridge must
> > > > be in its own PCI Segment Group with is a definite prove that in other
> > > > cases multiple host bridges can be in on segment group.
> > > 
> > > I stand corrected. I think you are right. But note that if they are,
> > > they must have distinct bus numbers assigned by ACPI.
> > ACPI does not assign any numbers.
> 
> For all root pci devices firmware must supply BBN number. This is the
> bus number, isn't it? For nested buses, this is optional.
Nonsense. _BBN is optional and does not present in Seabios DSDT. As far
as I can tell it is only needed if PCI segment group has more then one
pci host bridges.
 
> 
> > Bios enumerates buses and assign
> > numbers.
> 
> There's no standard way to enumerate pci root devices in guest AFAIK.
> The spec says:
> 	Firmware must configure all Host Bridges in the systems, even if
> 	they are not connected to a console or boot device. Firmware must
> 	configure Host Bridges in order to allow operating systems to use the
> 	devices below the Host Bridges. This is because the Host Bridges
> 	programming model is not defined by the PCI Specifications. 
> 
> 
Guest should be aware of HW to use it. Be it through bios or driver.

> > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> > one layer below all this and does not enumerate PC buses. Even if we make
> > it to do so there is not way to guaranty that guest will enumerate them
> > in the same order since there is more then one way to do enumeration. I
> > repeated this numerous times to you already.
> 
> ACPI is really part of the motherboard. Calling it the guest just
> confuses things. Guest OS can override bus numbering for nested buses
> but not for root buses.
> 
If calling ACPI part of a guest confuses you then you are already
confused. Guest OS can do whatever it wishes with any enumeration FW did
if it knows better.

> > > 
> > > > > 
> > > > > > _SEG
> > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > > > > (Current Resource Settings) for that just like OF. No surprise there.
> > > > > 
> > > > > OSPM uses both I think.
> > > > > 
> > > > > All I see linux do with CRS is get the bus number range.
> > > > So lets assume that HW has two PCI host bridges and ACPI has:
> > > >         Device(PCI0) {
> > > >             Name (_HID, EisaId ("PNP0A03"))
> > > >             Name (_SEG, 0x00)
> > > >         }
> > > >         Device(PCI1) {
> > > >             Name (_HID, EisaId ("PNP0A03"))
> > > >             Name (_SEG, 0x01)
> > > >         }
> > > > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > > > two pci host bridges is PCI0 and which one is PCI1?
> > > 
> > > You must be able to uniquely address any bridge using the combination of _SEG
> > > and _BBN.
> > 
> > No at all. And saying "you must be able" without actually show how
> > doesn't prove anything. _SEG is relevant only for those host bridges
> > that support MMCONFIG (not all of them do, and none that qemu support
> > does yet). _SEG points to specific entry in MCFG table and MCFG entry
> > holds base address for MMCONFIG space for the bridge (this address
> > is configured by a guest). This is all _SEG does really, no magic at
> > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> > qemu visible again.
> > So _SEG and _BBN gives you two numbers assigned by
> > a guest FW. Nothing qemu can use to identify a device.
> 
> This FW is given to guest by qemu. It only assigns bus numbers
> because qemu told it to do so.
Seabios is just a guest qemu ships. There are other FW for qemu. Bochs
bios, openfirmware, efi. All of them where developed outside of qemu
project and all of them are usable without qemu. You can't consider them
be part of qemu any more then Linux/Windows with virtio drivers.

> 
> > > 
> > > > > And the spec says, e.g.:
> > > > > 
> > > > > 	  the memory mapped configuration base
> > > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > > 	base address is indicated by the corresponding bus range specified in
> > > > > 	_CRS.
> > > > > 
> > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > > to another destruct us from main discussion.
> > > 
> > > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > > concept of a domain really.
> > > Thus:
> > > 	/pci@i0cf8
> > > 
> > > is probably enough for BIOS boot because we'll need to make root bus numbers
> > > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > > and might become easier to ignore eith EFI.
> > > 
> > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > configured via standard cf8/cfc and another one on ef8/efc and one more
> > at mmio fce00000 and you can address all of them:
> > /pci@i0cf8
> > /pci@i0ef8
> > /pci@fce00000
> > 
> > And each one of those PCI domains can have 256 subbridges.
> 
> Will common guests such as windows or linux be able to use them? This
With proper drivers yes. There is HW with more then one PCI bus and I
think qemu emulates some of it (PPC MAC for instance). 

> seems to be outside the scope of the PCI Firmware specification, which
> says that bus numbers must be unique.
They must be unique per PCI segment group.

> 
> > > > > 
> > > > > > > 
> > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > > with), but that's probably the only case.
> > > > > > > 
> > > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > > something or someone outside qemu and have persistent device naming
> > > > > > between different HW configuration.
> > > > > 
> > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > 
> > > > APCI is part of the guest, not qemu.
> > > 
> > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > it's supplied by the motherboard.
> > > 
> > It is not generated by qemu. Parts of it depend on HW and other part depend
> > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > address assigned bu the BIOS.
> 
> BIOS is supplied on the motherboard and in our case by qemu as well.
You can replace MB bios by coreboot+seabios on some of them.
Manufacturer don't want you to do it and make it hard to do, but
otherwise this is just software, not some magic dust.

> There's no standard way for BIOS to assign bus number to the pci root,
> so it does it in device-specific way. Why should a management tool
> or a CLI user care about these? As far as they are concerned
> we could use some PV scheme to find root devices and assign bus
> numbers, and it would be exactly the same.
> 
Go write KVM userspace that does that. AFAIK there is project out there
that tries to do that. No luck so far. Your world view is very x86/Linux
centric. You need to broaden it a little bit. Next time you propose
something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.


> > > > Just saying "not really" doesn't
> > > > prove much. I still haven't seen any proposition from you that actually
> > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > such thing as "The Guest". 
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > > pair to name the root device. As these are unique and 
> > > 
> > You forgot to complete the sentence :) But you made it clear enough and
> > it is incorrect. domain:bus pair not only not unique they do not exist
> > in qemu at all
> 
> Sure they do. domain maps to mcfg address for express. bus is used for
mcfg is optional as far as I can see. You can compile out MMCONFIG
support on Linux.

> cf8/cfc addressing. They are assigned by BIOS but since BIOS
> is supplied with hardware the point is moot.
Most PC hardware is supplied with Windows, so what? BIOS is a code that
runs in a guest. It is part of a guest. Every line of code executed by
vcpu belongs to a guest. No need to redefine things to prove you point.

> 
> > and as such can't be used to address device. They are
> > product of HW enumeration done by a guest OS just like eth0 or C:.
> > 
> > --
> > 			Gleb.
> 
> There's a huge difference between BIOS and guest OS,
Not true.

>                                            and between bus
> numbers of pci root and of nested bridges.
Really? What is it?

> 
> Describing hardware io ports makes sense if you are trying to
> communicate data from qemu to the BIOS.  But the rest of the world might
> not care.
> 
The part of the world that manage HW cares. You may need to add device
from monitor before first line of BIOS is event executed. How can you
rely on BIOS enumerate of devices in this case?


--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 4:38 p.m. UTC | #18
On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote:
> On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote:
> > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > > > The guests.
> > > > > Which one? There are many guests. Your favorite?
> > > > > 
> > > > > > For CLI, we need an easy way to map a device in guest to the
> > > > > > device in qemu and back.
> > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> > > > > you are saying is "lets use name that guest assigned to a device". 
> > > > 
> > > > No I am saying let's use the name that our ACPI tables assigned.
> > > > 
> > > ACPI does not assign any name. In a best case ACPI tables describe resources
> > > used by a device.
> > 
> > Not only that. bus number and segment aren't resources as such.
> > They describe addressing.
> > 
> > > And not all guests qemu supports has support for ACPI. Qemu
> > > even support machines types that do not support ACPI.
> > 
> > So? Different machines -> different names.
> > 
> You want to have different cli for different type of machines qemu
> supports?

Different device names.

> > > > > > 
> > > > > > 
> > > > > > > It looks like you identify yourself with most of
> > > > > > > qemu users, but if most qemu users are like you then qemu has not enough
> > > > > > > users :) Most users that consider themselves to be "advanced" may know
> > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > > > 
> > > > > > > More important is that "domain" (encoded as number like you used to)
> > > > > > > and "bus number" has no meaning from inside qemu.
> > > > > > > So while I said many
> > > > > > > times I don't care about exact CLI syntax to much it should make sense
> > > > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > > > > > for CLI is not it.
> > > > > > > 
> > > > > > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > > > > but that can be fixed.
> > > > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > > > 
> > > > > > Maybe I did. This is what linux does:
> > > > > > 
> > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > > > *root)
> > > > > > {
> > > > > >         struct acpi_device *device = root->device;
> > > > > >         int domain = root->segment;
> > > > > >         int busnum = root->secondary.start;
> > > > > > 
> > > > > > And I think this is consistent with the spec.
> > > > > > 
> > > > > It means that one domain may include several host bridges.
> > > > > At that level
> > > > > domain is defined as something that have unique name for each device
> > > > > inside it thus no two buses in one segment/domain can have same bus
> > > > > number. This is what PCI spec tells you. 
> > > > 
> > > > And that really is enough for CLI because all we need is locate the
> > > > specific slot in a unique way.
> > > > 
> > > At qemu level we do not have bus numbers. They are assigned by a guest.
> > > So inside a guest domain:bus:slot.func points you to a device, but in
> > > qemu does not enumerate buses.
> > > 
> > > > > And this further shows that using "domain" as defined by guest is very
> > > > > bad idea. 
> > > > 
> > > > As defined by ACPI, really.
> > > > 
> > > ACPI is a part of a guest software that may not event present in the
> > > guest. How is it relevant?
> > 
> > It's relevant because this is what guests use. To access the root
> > device with cf8/cfc you need to know the bus number assigned to it
> > by firmware. How that was assigned is of interest to BIOS/ACPI but not
> > really interesting to the user or, I suspect, guest OS.
> > 
> Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux
> have cmd just for that.

I haven't looked but I suspect linux will simply assume cf8/cfc and
and start doing it from there. If that doesn't get you the root
device you wanted, tough.

> And saying that ACPI is relevant because this is
> what guest software use in a reply to sentence that states that not all
> guest even use ACPI is, well, strange.
> 
> And ACPI describes only HW that present at boot time. What if you
> hot-plugged root pci bridge? How non existent PCI naming helps you?

that's described by ACPI as well.

> > > > > > > ACPI spec
> > > > > > > says that PCI segment group is purely software concept managed by system
> > > > > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > > > > 
> > > > > > It can't I think:
> > > > > Read _BBN definition:
> > > > >  The _BBN object is located under a PCI host bridge and must be unique for
> > > > >  every host bridge within a segment since it is the PCI bus number.
> > > > > 
> > > > > Clearly above speaks about multiple host bridge within a segment.
> > > > 
> > > > Yes, it looks like the firmware spec allows that.
> > > It even have explicit example that shows it.
> > > 
> > > > 
> > > > > > 	Multiple Host Bridges
> > > > > > 
> > > > > > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > > > > > 	address for the
> > > > > > 	MMCONFIG space for these host bridges may need to be allocated at
> > > > > > 	different locations. In such
> > > > > > 	cases, using MCFG table and _CBA method as defined in this section means
> > > > > > 	that each of these host
> > > > > > 	bridges must be in its own PCI Segment Group.
> > > > > > 
> > > > > This is not from ACPI spec,
> > > > 
> > > > PCI Firmware Specification 3.0
> > > > 
> > > > > but without going to deep into it above
> > > > > paragraph talks about some particular case when each host bridge must
> > > > > be in its own PCI Segment Group with is a definite prove that in other
> > > > > cases multiple host bridges can be in on segment group.
> > > > 
> > > > I stand corrected. I think you are right. But note that if they are,
> > > > they must have distinct bus numbers assigned by ACPI.
> > > ACPI does not assign any numbers.
> > 
> > For all root pci devices firmware must supply BBN number. This is the
> > bus number, isn't it? For nested buses, this is optional.
> Nonsense. _BBN is optional and does not present in Seabios DSDT.

The spec says it's not optional for host bridges:

	Firmware must report Host Bridges in the ACPI name space. Each Host
	Bridge object must contain the following objects:
	   _HID and _CID
	   _CRS to determine all resources consumed and produced (passed through
	to the secondary bus)
	   by the host bridge. Firmware allocates resources (Memory Addresses,
	I/O Port, etc.) to Host
	   Bridges. The _CRS descriptor informs the operating system of the
	resources it may use for
	   configuring devices below the Host Bridge.
	   ● _TRA, _TTP, and _TRS translation offsets to inform the operating
	system of the mapping
		between the primary bus and the secondary bus.
	   _PRT and the interrupt descriptor to determine interrupt routing.

  	 _BBN to obtain a bus number.

so seabios seems to be out of spec.

> As far
> as I can tell it is only needed if PCI segment group has more then one
> pci host bridges.

No. Because cfc/cf8 are not aware of _SEG.

> > 
> > > Bios enumerates buses and assign
> > > numbers.
> > 
> > There's no standard way to enumerate pci root devices in guest AFAIK.
> > The spec says:
> > 	Firmware must configure all Host Bridges in the systems, even if
> > 	they are not connected to a console or boot device. Firmware must
> > 	configure Host Bridges in order to allow operating systems to use the
> > 	devices below the Host Bridges. This is because the Host Bridges
> > 	programming model is not defined by the PCI Specifications. 
> > 
> > 
> Guest should be aware of HW to use it. Be it through bios or driver.

Why should it? You get bus number and stiuck it in cf8/cfc,
you get a config cycle. No magic HW awareness needed.

> > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> > > one layer below all this and does not enumerate PC buses. Even if we make
> > > it to do so there is not way to guaranty that guest will enumerate them
> > > in the same order since there is more then one way to do enumeration. I
> > > repeated this numerous times to you already.
> > 
> > ACPI is really part of the motherboard. Calling it the guest just
> > confuses things. Guest OS can override bus numbering for nested buses
> > but not for root buses.
> > 
> If calling ACPI part of a guest confuses you then you are already
> confused. Guest OS can do whatever it wishes with any enumeration FW did
> if it knows better.
> 
> > > > 
> > > > > > 
> > > > > > > _SEG
> > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > > > > > (Current Resource Settings) for that just like OF. No surprise there.
> > > > > > 
> > > > > > OSPM uses both I think.
> > > > > > 
> > > > > > All I see linux do with CRS is get the bus number range.
> > > > > So lets assume that HW has two PCI host bridges and ACPI has:
> > > > >         Device(PCI0) {
> > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > >             Name (_SEG, 0x00)
> > > > >         }
> > > > >         Device(PCI1) {
> > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > >             Name (_SEG, 0x01)
> > > > >         }
> > > > > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > > > > two pci host bridges is PCI0 and which one is PCI1?
> > > > 
> > > > You must be able to uniquely address any bridge using the combination of _SEG
> > > > and _BBN.
> > > 
> > > No at all. And saying "you must be able" without actually show how
> > > doesn't prove anything. _SEG is relevant only for those host bridges
> > > that support MMCONFIG (not all of them do, and none that qemu support
> > > does yet). _SEG points to specific entry in MCFG table and MCFG entry
> > > holds base address for MMCONFIG space for the bridge (this address
> > > is configured by a guest). This is all _SEG does really, no magic at
> > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> > > qemu visible again.
> > > So _SEG and _BBN gives you two numbers assigned by
> > > a guest FW. Nothing qemu can use to identify a device.
> > 
> > This FW is given to guest by qemu. It only assigns bus numbers
> > because qemu told it to do so.
> Seabios is just a guest qemu ships. There are other FW for qemu. Bochs
> bios, openfirmware, efi. All of them where developed outside of qemu
> project and all of them are usable without qemu. You can't consider them
> be part of qemu any more then Linux/Windows with virtio drivers.
> 
> > 
> > > > 
> > > > > > And the spec says, e.g.:
> > > > > > 
> > > > > > 	  the memory mapped configuration base
> > > > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > > > 	base address is indicated by the corresponding bus range specified in
> > > > > > 	_CRS.
> > > > > > 
> > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > > > to another destruct us from main discussion.
> > > > 
> > > > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > > > concept of a domain really.
> > > > Thus:
> > > > 	/pci@i0cf8
> > > > 
> > > > is probably enough for BIOS boot because we'll need to make root bus numbers
> > > > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > > > and might become easier to ignore eith EFI.
> > > > 
> > > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > > configured via standard cf8/cfc and another one on ef8/efc and one more
> > > at mmio fce00000 and you can address all of them:
> > > /pci@i0cf8
> > > /pci@i0ef8
> > > /pci@fce00000
> > > 
> > > And each one of those PCI domains can have 256 subbridges.
> > 
> > Will common guests such as windows or linux be able to use them? This
> With proper drivers yes. There is HW with more then one PCI bus and I
> think qemu emulates some of it (PPC MAC for instance). 
> 
> > seems to be outside the scope of the PCI Firmware specification, which
> > says that bus numbers must be unique.
> They must be unique per PCI segment group.
> 
> > 
> > > > > > 
> > > > > > > > 
> > > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > > > with), but that's probably the only case.
> > > > > > > > 
> > > > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > > > something or someone outside qemu and have persistent device naming
> > > > > > > between different HW configuration.
> > > > > > 
> > > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > > 
> > > > > APCI is part of the guest, not qemu.
> > > > 
> > > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > > it's supplied by the motherboard.
> > > > 
> > > It is not generated by qemu. Parts of it depend on HW and other part depend
> > > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > > address assigned bu the BIOS.
> > 
> > BIOS is supplied on the motherboard and in our case by qemu as well.
> You can replace MB bios by coreboot+seabios on some of them.
> Manufacturer don't want you to do it and make it hard to do, but
> otherwise this is just software, not some magic dust.
> 
> > There's no standard way for BIOS to assign bus number to the pci root,
> > so it does it in device-specific way. Why should a management tool
> > or a CLI user care about these? As far as they are concerned
> > we could use some PV scheme to find root devices and assign bus
> > numbers, and it would be exactly the same.
> > 
> Go write KVM userspace that does that. AFAIK there is project out there
> that tries to do that. No luck so far. Your world view is very x86/Linux
> centric. You need to broaden it a little bit. Next time you propose
> something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.
> 
> 
> > > > > Just saying "not really" doesn't
> > > > > prove much. I still haven't seen any proposition from you that actually
> > > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > > such thing as "The Guest". 
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > 
> > > > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > > > pair to name the root device. As these are unique and 
> > > > 
> > > You forgot to complete the sentence :) But you made it clear enough and
> > > it is incorrect. domain:bus pair not only not unique they do not exist
> > > in qemu at all
> > 
> > Sure they do. domain maps to mcfg address for express. bus is used for
> mcfg is optional as far as I can see. You can compile out MMCONFIG
> support on Linux.
> 
> > cf8/cfc addressing. They are assigned by BIOS but since BIOS
> > is supplied with hardware the point is moot.
> Most PC hardware is supplied with Windows, so what? BIOS is a code that
> runs in a guest. It is part of a guest. Every line of code executed by
> vcpu belongs to a guest. No need to redefine things to prove you point.
> 
> > 
> > > and as such can't be used to address device. They are
> > > product of HW enumeration done by a guest OS just like eth0 or C:.
> > > 
> > > --
> > > 			Gleb.
> > 
> > There's a huge difference between BIOS and guest OS,
> Not true.
> 
> >                                            and between bus
> > numbers of pci root and of nested bridges.
> Really? What is it?
> 
> > 
> > Describing hardware io ports makes sense if you are trying to
> > communicate data from qemu to the BIOS.  But the rest of the world might
> > not care.
> > 
> The part of the world that manage HW cares. You may need to add device
> from monitor before first line of BIOS is event executed. How can you
> rely on BIOS enumerate of devices in this case?
> 
> 
> --
> 			Gleb.
Gleb Natapov Nov. 21, 2010, 5:28 p.m. UTC | #19
On Sun, Nov 21, 2010 at 06:38:31PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote:
> > On Sun, Nov 21, 2010 at 04:48:44PM +0200, Michael S. Tsirkin wrote:
> > > On Sun, Nov 21, 2010 at 02:50:14PM +0200, Gleb Natapov wrote:
> > > > On Sun, Nov 21, 2010 at 01:53:26PM +0200, Michael S. Tsirkin wrote:
> > > > > > > The guests.
> > > > > > Which one? There are many guests. Your favorite?
> > > > > > 
> > > > > > > For CLI, we need an easy way to map a device in guest to the
> > > > > > > device in qemu and back.
> > > > > > Then use eth0, /dev/sdb, or even C:. Your way is not less broken since what
> > > > > > you are saying is "lets use name that guest assigned to a device". 
> > > > > 
> > > > > No I am saying let's use the name that our ACPI tables assigned.
> > > > > 
> > > > ACPI does not assign any name. In a best case ACPI tables describe resources
> > > > used by a device.
> > > 
> > > Not only that. bus number and segment aren't resources as such.
> > > They describe addressing.
> > > 
> > > > And not all guests qemu supports has support for ACPI. Qemu
> > > > even support machines types that do not support ACPI.
> > > 
> > > So? Different machines -> different names.
> > > 
> > You want to have different cli for different type of machines qemu
> > supports?
> 
> Different device names.
> 
You mean on qemu-sparc for deleting PCI device you will use different
device naming scheme?!

> > > > > > > 
> > > > > > > 
> > > > > > > > It looks like you identify yourself with most of
> > > > > > > > qemu users, but if most qemu users are like you then qemu has not enough
> > > > > > > > users :) Most users that consider themselves to be "advanced" may know
> > > > > > > > what eth1 or /dev/sdb means. This doesn't mean we should provide
> > > > > > > > "device_del eth1" or "device_add /dev/sdb" command though. 
> > > > > > > > 
> > > > > > > > More important is that "domain" (encoded as number like you used to)
> > > > > > > > and "bus number" has no meaning from inside qemu.
> > > > > > > > So while I said many
> > > > > > > > times I don't care about exact CLI syntax to much it should make sense
> > > > > > > > at least. It can use id to specify PCI bus in CLI like this:
> > > > > > > > device_del pci.0:1.1. Or it can even use device id too like this:
> > > > > > > > device_del pci.0:ide.0. Or it can use HW topology like in FO device
> > > > > > > > path. But doing ah-hoc device enumeration inside qemu and then using it
> > > > > > > > for CLI is not it.
> > > > > > > > 
> > > > > > > > > functionality in the guests.  Qemu is buggy in the moment in that is
> > > > > > > > > uses the bus addresses assigned by guest and not the ones in ACPI,
> > > > > > > > > but that can be fixed.
> > > > > > > > It looks like you confused ACPI _SEG for something it isn't.
> > > > > > > 
> > > > > > > Maybe I did. This is what linux does:
> > > > > > > 
> > > > > > > struct pci_bus * __devinit pci_acpi_scan_root(struct acpi_pci_root
> > > > > > > *root)
> > > > > > > {
> > > > > > >         struct acpi_device *device = root->device;
> > > > > > >         int domain = root->segment;
> > > > > > >         int busnum = root->secondary.start;
> > > > > > > 
> > > > > > > And I think this is consistent with the spec.
> > > > > > > 
> > > > > > It means that one domain may include several host bridges.
> > > > > > At that level
> > > > > > domain is defined as something that have unique name for each device
> > > > > > inside it thus no two buses in one segment/domain can have same bus
> > > > > > number. This is what PCI spec tells you. 
> > > > > 
> > > > > And that really is enough for CLI because all we need is locate the
> > > > > specific slot in a unique way.
> > > > > 
> > > > At qemu level we do not have bus numbers. They are assigned by a guest.
> > > > So inside a guest domain:bus:slot.func points you to a device, but in
> > > > qemu does not enumerate buses.
> > > > 
> > > > > > And this further shows that using "domain" as defined by guest is very
> > > > > > bad idea. 
> > > > > 
> > > > > As defined by ACPI, really.
> > > > > 
> > > > ACPI is a part of a guest software that may not event present in the
> > > > guest. How is it relevant?
> > > 
> > > It's relevant because this is what guests use. To access the root
> > > device with cf8/cfc you need to know the bus number assigned to it
> > > by firmware. How that was assigned is of interest to BIOS/ACPI but not
> > > really interesting to the user or, I suspect, guest OS.
> > > 
> > Of course this is incorrect. OS can re-enumerate PCI if it wishes. Linux
> > have cmd just for that.
> 
> I haven't looked but I suspect linux will simply assume cf8/cfc and
> and start doing it from there. If that doesn't get you the root
> device you wanted, tough.
> 
Linux runs on many platforms and on most of them there is no such thing
as IO space. Also I believe Linux on x86 supported multi-root without
mmconfig in the past. It may be enough to describe IO resource second pci
host bridge uses in ACPI for Linux to use it.

> > And saying that ACPI is relevant because this is
> > what guest software use in a reply to sentence that states that not all
> > guest even use ACPI is, well, strange.
> > 
> > And ACPI describes only HW that present at boot time. What if you
> > hot-plugged root pci bridge? How non existent PCI naming helps you?
> 
> that's described by ACPI as well.
> 
> > > > > > > > ACPI spec
> > > > > > > > says that PCI segment group is purely software concept managed by system
> > > > > > > > firmware. In fact one segment may include multiple PCI host bridges.
> > > > > > > 
> > > > > > > It can't I think:
> > > > > > Read _BBN definition:
> > > > > >  The _BBN object is located under a PCI host bridge and must be unique for
> > > > > >  every host bridge within a segment since it is the PCI bus number.
> > > > > > 
> > > > > > Clearly above speaks about multiple host bridge within a segment.
> > > > > 
> > > > > Yes, it looks like the firmware spec allows that.
> > > > It even have explicit example that shows it.
> > > > 
> > > > > 
> > > > > > > 	Multiple Host Bridges
> > > > > > > 
> > > > > > > 	A platform may have multiple PCI Express or PCI-X host bridges. The base
> > > > > > > 	address for the
> > > > > > > 	MMCONFIG space for these host bridges may need to be allocated at
> > > > > > > 	different locations. In such
> > > > > > > 	cases, using MCFG table and _CBA method as defined in this section means
> > > > > > > 	that each of these host
> > > > > > > 	bridges must be in its own PCI Segment Group.
> > > > > > > 
> > > > > > This is not from ACPI spec,
> > > > > 
> > > > > PCI Firmware Specification 3.0
> > > > > 
> > > > > > but without going to deep into it above
> > > > > > paragraph talks about some particular case when each host bridge must
> > > > > > be in its own PCI Segment Group with is a definite prove that in other
> > > > > > cases multiple host bridges can be in on segment group.
> > > > > 
> > > > > I stand corrected. I think you are right. But note that if they are,
> > > > > they must have distinct bus numbers assigned by ACPI.
> > > > ACPI does not assign any numbers.
> > > 
> > > For all root pci devices firmware must supply BBN number. This is the
> > > bus number, isn't it? For nested buses, this is optional.
> > Nonsense. _BBN is optional and does not present in Seabios DSDT.
> 
> The spec says it's not optional for host bridges:
> 
> 	Firmware must report Host Bridges in the ACPI name space. Each Host
> 	Bridge object must contain the following objects:
> 	   _HID and _CID
> 	   _CRS to determine all resources consumed and produced (passed through
> 	to the secondary bus)
> 	   by the host bridge. Firmware allocates resources (Memory Addresses,
> 	I/O Port, etc.) to Host
> 	   Bridges. The _CRS descriptor informs the operating system of the
> 	resources it may use for
> 	   configuring devices below the Host Bridge.
> 	   ● _TRA, _TTP, and _TRS translation offsets to inform the operating
> 	system of the mapping
> 		between the primary bus and the secondary bus.
> 	   _PRT and the interrupt descriptor to determine interrupt routing.
> 
>   	 _BBN to obtain a bus number.
> 
> so seabios seems to be out of spec.
> 

Looks like it is. Well implementation will be easy, just return 0. Also
same spec says that Firmware don't have to fully enumerate all PCI buses:
"Firmware is not required to configure devices other than boot and
console devices."

 
> > As far
> > as I can tell it is only needed if PCI segment group has more then one
> > pci host bridges.
> 
> No. Because cfc/cf8 are not aware of _SEG.
More importantly Qemu is not ware of _SEG. As spec says: "it is purely a
software concept managed by system firmware".

> 
> > > 
> > > > Bios enumerates buses and assign
> > > > numbers.
> > > 
> > > There's no standard way to enumerate pci root devices in guest AFAIK.
> > > The spec says:
> > > 	Firmware must configure all Host Bridges in the systems, even if
> > > 	they are not connected to a console or boot device. Firmware must
> > > 	configure Host Bridges in order to allow operating systems to use the
> > > 	devices below the Host Bridges. This is because the Host Bridges
> > > 	programming model is not defined by the PCI Specifications. 
> > > 
> > > 
> > Guest should be aware of HW to use it. Be it through bios or driver.
> 
> Why should it? You get bus number and stiuck it in cf8/cfc,
> you get a config cycle. No magic HW awareness needed.
> 
SW should be aware of cf8/cfc. Where will it stick bus number otherwise?
Another PCI host bridge may sit on another IO or MMIO address and SW
should be aware of it so it will be able to stick bus number there too.

You ignored all other point below. Why?

> > > > ACPI, in a base case, describes what BIOS did to OSPM. Qemu sits
> > > > one layer below all this and does not enumerate PC buses. Even if we make
> > > > it to do so there is not way to guaranty that guest will enumerate them
> > > > in the same order since there is more then one way to do enumeration. I
> > > > repeated this numerous times to you already.
> > > 
> > > ACPI is really part of the motherboard. Calling it the guest just
> > > confuses things. Guest OS can override bus numbering for nested buses
> > > but not for root buses.
> > > 
> > If calling ACPI part of a guest confuses you then you are already
> > confused. Guest OS can do whatever it wishes with any enumeration FW did
> > if it knows better.
> > 
> > > > > 
> > > > > > > 
> > > > > > > > _SEG
> > > > > > > > is not what OSPM uses to tie HW resource to ACPI resource. It used _CRS
> > > > > > > > (Current Resource Settings) for that just like OF. No surprise there.
> > > > > > > 
> > > > > > > OSPM uses both I think.
> > > > > > > 
> > > > > > > All I see linux do with CRS is get the bus number range.
> > > > > > So lets assume that HW has two PCI host bridges and ACPI has:
> > > > > >         Device(PCI0) {
> > > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > > >             Name (_SEG, 0x00)
> > > > > >         }
> > > > > >         Device(PCI1) {
> > > > > >             Name (_HID, EisaId ("PNP0A03"))
> > > > > >             Name (_SEG, 0x01)
> > > > > >         }
> > > > > > I.e no _CRS to describe resources. How do you think OSPM knows which of
> > > > > > two pci host bridges is PCI0 and which one is PCI1?
> > > > > 
> > > > > You must be able to uniquely address any bridge using the combination of _SEG
> > > > > and _BBN.
> > > > 
> > > > No at all. And saying "you must be able" without actually show how
> > > > doesn't prove anything. _SEG is relevant only for those host bridges
> > > > that support MMCONFIG (not all of them do, and none that qemu support
> > > > does yet). _SEG points to specific entry in MCFG table and MCFG entry
> > > > holds base address for MMCONFIG space for the bridge (this address
> > > > is configured by a guest). This is all _SEG does really, no magic at
> > > > all. _BBN returns bus number assigned by the BIOS to host bridge. Nothing
> > > > qemu visible again.
> > > > So _SEG and _BBN gives you two numbers assigned by
> > > > a guest FW. Nothing qemu can use to identify a device.
> > > 
> > > This FW is given to guest by qemu. It only assigns bus numbers
> > > because qemu told it to do so.
> > Seabios is just a guest qemu ships. There are other FW for qemu. Bochs
> > bios, openfirmware, efi. All of them where developed outside of qemu
> > project and all of them are usable without qemu. You can't consider them
> > be part of qemu any more then Linux/Windows with virtio drivers.
> > 
> > > 
> > > > > 
> > > > > > > And the spec says, e.g.:
> > > > > > > 
> > > > > > > 	  the memory mapped configuration base
> > > > > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > > > > 	base address is indicated by the corresponding bus range specified in
> > > > > > > 	_CRS.
> > > > > > > 
> > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > > > > to another destruct us from main discussion.
> > > > > 
> > > > > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > > > > concept of a domain really.
> > > > > Thus:
> > > > > 	/pci@i0cf8
> > > > > 
> > > > > is probably enough for BIOS boot because we'll need to make root bus numbers
> > > > > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > > > > and might become easier to ignore eith EFI.
> > > > > 
> > > > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > > > configured via standard cf8/cfc and another one on ef8/efc and one more
> > > > at mmio fce00000 and you can address all of them:
> > > > /pci@i0cf8
> > > > /pci@i0ef8
> > > > /pci@fce00000
> > > > 
> > > > And each one of those PCI domains can have 256 subbridges.
> > > 
> > > Will common guests such as windows or linux be able to use them? This
> > With proper drivers yes. There is HW with more then one PCI bus and I
> > think qemu emulates some of it (PPC MAC for instance). 
> > 
> > > seems to be outside the scope of the PCI Firmware specification, which
> > > says that bus numbers must be unique.
> > They must be unique per PCI segment group.
> > 
> > > 
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > > > > with), but that's probably the only case.
> > > > > > > > > 
> > > > > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > > > > something or someone outside qemu and have persistent device naming
> > > > > > > > between different HW configuration.
> > > > > > > 
> > > > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > > > 
> > > > > > APCI is part of the guest, not qemu.
> > > > > 
> > > > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > > > it's supplied by the motherboard.
> > > > > 
> > > > It is not generated by qemu. Parts of it depend on HW and other part depend
> > > > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > > > address assigned bu the BIOS.
> > > 
> > > BIOS is supplied on the motherboard and in our case by qemu as well.
> > You can replace MB bios by coreboot+seabios on some of them.
> > Manufacturer don't want you to do it and make it hard to do, but
> > otherwise this is just software, not some magic dust.
> > 
> > > There's no standard way for BIOS to assign bus number to the pci root,
> > > so it does it in device-specific way. Why should a management tool
> > > or a CLI user care about these? As far as they are concerned
> > > we could use some PV scheme to find root devices and assign bus
> > > numbers, and it would be exactly the same.
> > > 
> > Go write KVM userspace that does that. AFAIK there is project out there
> > that tries to do that. No luck so far. Your world view is very x86/Linux
> > centric. You need to broaden it a little bit. Next time you propose
> > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.
> > 
> > 
> > > > > > Just saying "not really" doesn't
> > > > > > prove much. I still haven't seen any proposition from you that actually
> > > > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > > > such thing as "The Guest". 
> > > > > > 
> > > > > > --
> > > > > > 			Gleb.
> > > > > 
> > > > > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > > > > pair to name the root device. As these are unique and 
> > > > > 
> > > > You forgot to complete the sentence :) But you made it clear enough and
> > > > it is incorrect. domain:bus pair not only not unique they do not exist
> > > > in qemu at all
> > > 
> > > Sure they do. domain maps to mcfg address for express. bus is used for
> > mcfg is optional as far as I can see. You can compile out MMCONFIG
> > support on Linux.
> > 
> > > cf8/cfc addressing. They are assigned by BIOS but since BIOS
> > > is supplied with hardware the point is moot.
> > Most PC hardware is supplied with Windows, so what? BIOS is a code that
> > runs in a guest. It is part of a guest. Every line of code executed by
> > vcpu belongs to a guest. No need to redefine things to prove you point.
> > 
> > > 
> > > > and as such can't be used to address device. They are
> > > > product of HW enumeration done by a guest OS just like eth0 or C:.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > There's a huge difference between BIOS and guest OS,
> > Not true.
> > 
> > >                                            and between bus
> > > numbers of pci root and of nested bridges.
> > Really? What is it?
> > 
> > > 
> > > Describing hardware io ports makes sense if you are trying to
> > > communicate data from qemu to the BIOS.  But the rest of the world might
> > > not care.
> > > 
> > The part of the world that manage HW cares. You may need to add device
> > from monitor before first line of BIOS is event executed. How can you
> > rely on BIOS enumerate of devices in this case?
> > 
> > 
> > --
> > 			Gleb.

--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 6:22 p.m. UTC | #20
On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote:
> > This FW is given to guest by qemu. It only assigns bus numbers
> > because qemu told it to do so.
> Seabios is just a guest qemu ships. There are other FW for qemu.

We don't support them though, do we?

> Bochs
> bios, openfirmware, efi. All of them where developed outside of qemu
> project and all of them are usable without qemu. You can't consider them
> be part of qemu any more then Linux/Windows with virtio drivers.

You can also burn linuxbios onto your motherboard. If you do you
void your warranty.


> > 
> > > > 
> > > > > > And the spec says, e.g.:
> > > > > > 
> > > > > > 	  the memory mapped configuration base
> > > > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > > > 	base address is indicated by the corresponding bus range specified in
> > > > > > 	_CRS.
> > > > > > 
> > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > > > to another destruct us from main discussion.
> > > > 
> > > > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > > > concept of a domain really.
> > > > Thus:
> > > > 	/pci@i0cf8
> > > > 
> > > > is probably enough for BIOS boot because we'll need to make root bus numbers
> > > > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > > > and might become easier to ignore eith EFI.
> > > > 
> > > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > > configured via standard cf8/cfc and another one on ef8/efc and one more
> > > at mmio fce00000 and you can address all of them:
> > > /pci@i0cf8
> > > /pci@i0ef8
> > > /pci@fce00000

Isn't the mmio one relocatable?

> > > 
> > > And each one of those PCI domains can have 256 subbridges.
> > 
> > Will common guests such as windows or linux be able to use them? This
> With proper drivers yes. There is HW with more then one PCI bus and I
> think qemu emulates some of it (PPC MAC for instance). 

MAC is probably just poking in a couple of predefined places.  That's
not enumeration.

> > seems to be outside the scope of the PCI Firmware specification, which
> > says that bus numbers must be unique.
> They must be unique per PCI segment group.

We've come full circle, didn't we? i am saying we should let users
specify PCI Segment group+bus as opposed to the io port, which they
don't use.

> > 
> > > > > > 
> > > > > > > > 
> > > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > > > with), but that's probably the only case.
> > > > > > > > 
> > > > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > > > something or someone outside qemu and have persistent device naming
> > > > > > > between different HW configuration.
> > > > > > 
> > > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > > 
> > > > > APCI is part of the guest, not qemu.
> > > > 
> > > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > > it's supplied by the motherboard.
> > > > 
> > > It is not generated by qemu. Parts of it depend on HW and other part depend
> > > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > > address assigned bu the BIOS.
> > 
> > BIOS is supplied on the motherboard and in our case by qemu as well.
> You can replace MB bios by coreboot+seabios on some of them.
> Manufacturer don't want you to do it and make it hard to do, but
> otherwise this is just software, not some magic dust.

They support common hardware but not all features will automatically work.

> > There's no standard way for BIOS to assign bus number to the pci root,
> > so it does it in device-specific way. Why should a management tool
> > or a CLI user care about these? As far as they are concerned
> > we could use some PV scheme to find root devices and assign bus
> > numbers, and it would be exactly the same.
> > 
> Go write KVM userspace that does that. AFAIK there is project out there
> that tries to do that. No luck so far. Your world view is very x86/Linux
> centric. You need to broaden it a little bit. Next time you propose
> something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.

Your view is very qemu centric. Ask yourself whether what you propose
will work with libvirt. Better yet, ask libvirt developers.

> > > > > Just saying "not really" doesn't
> > > > > prove much. I still haven't seen any proposition from you that actually
> > > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > > such thing as "The Guest". 
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > 
> > > > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > > > pair to name the root device. As these are unique and 
> > > > 
> > > You forgot to complete the sentence :) But you made it clear enough and
> > > it is incorrect. domain:bus pair not only not unique they do not exist
> > > in qemu at all
> > 
> > Sure they do. domain maps to mcfg address for express. bus is used for
> mcfg is optional as far as I can see. You can compile out MMCONFIG
> support on Linux.

This doesn't mean that you will be able to address all devices
on all systems if you do.

> > cf8/cfc addressing. They are assigned by BIOS but since BIOS
> > is supplied with hardware the point is moot.
> Most PC hardware is supplied with Windows, so what? BIOS is a code that
> runs in a guest. It is part of a guest. Every line of code executed by
> vcpu belongs to a guest. No need to redefine things to prove you point.

Call it what you want. I care about the payload that the user supplies.
BIOS and ACPI are implementation details of qemu needed to run said
payload smoothly. Said payload might not know or care about
IO address used to program host pci bridge.

> > 
> > > and as such can't be used to address device. They are
> > > product of HW enumeration done by a guest OS just like eth0 or C:.
> > > 
> > > --
> > > 			Gleb.
> > 
> > There's a huge difference between BIOS and guest OS,
> Not true.
> 
> >                                            and between bus
> > numbers of pci root and of nested bridges.
> Really? What is it?

Look at the spec citations I provided. bus number for root must be set by
firmware (or default to 0). The rest can be done by the OS.

> > 
> > Describing hardware io ports makes sense if you are trying to
> > communicate data from qemu to the BIOS.  But the rest of the world might
> > not care.
> > 
> The part of the world that manage HW cares.

You really want libvirt to deal with ioports?  Why only that?
Let them emulate PCI while we are at it?

> You may need to add device
> from monitor before first line of BIOS is event executed. How can you
> rely on BIOS enumerate of devices in this case?

We can control the bus numbers that the BIOS will assign to pci root.
It's probably required to make them stable anyway.

> 
> --
> 			Gleb.
Gleb Natapov Nov. 21, 2010, 7:29 p.m. UTC | #21
On Sun, Nov 21, 2010 at 08:22:03PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 06:01:11PM +0200, Gleb Natapov wrote:
> > > This FW is given to guest by qemu. It only assigns bus numbers
> > > because qemu told it to do so.
> > Seabios is just a guest qemu ships. There are other FW for qemu.
> 
> We don't support them though, do we?
> 
Who is "we"? As qemu community we support any guest code.

> > Bochs
> > bios, openfirmware, efi. All of them where developed outside of qemu
> > project and all of them are usable without qemu. You can't consider them
> > be part of qemu any more then Linux/Windows with virtio drivers.
> 
> You can also burn linuxbios onto your motherboard. If you do you
> void your warranty.
And? What is your point?

> 
> 
> > > 
> > > > > 
> > > > > > > And the spec says, e.g.:
> > > > > > > 
> > > > > > > 	  the memory mapped configuration base
> > > > > > > 	address (always corresponds to bus number 0) for the PCI Segment Group
> > > > > > > 	of the host bridge is provided by _CBA and the bus range covered by the
> > > > > > > 	base address is indicated by the corresponding bus range specified in
> > > > > > > 	_CRS.
> > > > > > > 
> > > > > > Don't see how it is relevant. And _CBA is defined only for PCI Express. Lets
> > > > > > solve the problem for PCI first and then move to PCI Express. Jumping from one
> > > > > > to another destruct us from main discussion.
> > > > > 
> > > > > I think this is what confuses us.  As long as you are using cf8/cfc there's no
> > > > > concept of a domain really.
> > > > > Thus:
> > > > > 	/pci@i0cf8
> > > > > 
> > > > > is probably enough for BIOS boot because we'll need to make root bus numbers
> > > > > unique for legacy guests/option ROMs.  But this is not a hardware requirement
> > > > > and might become easier to ignore eith EFI.
> > > > > 
> > > > You do not need MMCONFIG to have multiple PCI domains. You can have one
> > > > configured via standard cf8/cfc and another one on ef8/efc and one more
> > > > at mmio fce00000 and you can address all of them:
> > > > /pci@i0cf8
> > > > /pci@i0ef8
> > > > /pci@fce00000
> 
> Isn't the mmio one relocatable?
> 
No. If it is, there is a way to specify so.


> > > > 
> > > > And each one of those PCI domains can have 256 subbridges.
> > > 
> > > Will common guests such as windows or linux be able to use them? This
> > With proper drivers yes. There is HW with more then one PCI bus and I
> > think qemu emulates some of it (PPC MAC for instance). 
> 
> MAC is probably just poking in a couple of predefined places.  That's
> not enumeration.
> 
System but is not enumerable on PC too. That is why you need ACPI to
describe resources or you just poke into predefined places (0cf8-0cff in
case of PCI).

> > > seems to be outside the scope of the PCI Firmware specification, which
> > > says that bus numbers must be unique.
> > They must be unique per PCI segment group.
> 
> We've come full circle, didn't we? i am saying we should let users
> specify PCI Segment group+bus as opposed to the io port, which they
> don't use.
> 
When qemu creates device model there is no any "bus" number assigned to
pci bridges and thus "bus" number is meaningless for qemu.  PCI Segment
group has no HW meaning at all (you seems to ignore this completely). So
lets say you run qemu with -S and want to delete or add device. Where do
you get bus number an PCI Segment group number to specify it?

> > > 
> > > > > > > 
> > > > > > > > > 
> > > > > > > > > That should be enough for e.g. device_del. We do have the need to
> > > > > > > > > describe the topology when we interface with firmware, e.g. to describe
> > > > > > > > > the ACPI tables themselves to qemu (this is what Gleb's patches deal
> > > > > > > > > with), but that's probably the only case.
> > > > > > > > > 
> > > > > > > > Describing HW topology is the only way to unambiguously describe device to
> > > > > > > > something or someone outside qemu and have persistent device naming
> > > > > > > > between different HW configuration.
> > > > > > > 
> > > > > > > Not really, since ACPI is a binary blob programmed by qemu.
> > > > > > > 
> > > > > > APCI is part of the guest, not qemu.
> > > > > 
> > > > > Yes it runs in the guest but it's generated by qemu. On real hardware,
> > > > > it's supplied by the motherboard.
> > > > > 
> > > > It is not generated by qemu. Parts of it depend on HW and other part depend
> > > > on how BIOS configure HW. _BBN for instance is clearly defined to return
> > > > address assigned bu the BIOS.
> > > 
> > > BIOS is supplied on the motherboard and in our case by qemu as well.
> > You can replace MB bios by coreboot+seabios on some of them.
> > Manufacturer don't want you to do it and make it hard to do, but
> > otherwise this is just software, not some magic dust.
> 
> They support common hardware but not all features will automatically work.
So what? The same can be said about Linux on a lot of HW.

> 
> > > There's no standard way for BIOS to assign bus number to the pci root,
> > > so it does it in device-specific way. Why should a management tool
> > > or a CLI user care about these? As far as they are concerned
> > > we could use some PV scheme to find root devices and assign bus
> > > numbers, and it would be exactly the same.
> > > 
> > Go write KVM userspace that does that. AFAIK there is project out there
> > that tries to do that. No luck so far. Your world view is very x86/Linux
> > centric. You need to broaden it a little bit. Next time you propose
> > something ask yourself will it work with qemu-sparc, qemu-ppc, qemu-amd.
> 
> Your view is very qemu centric. Ask yourself whether what you propose
> will work with libvirt. Better yet, ask libvirt developers.
My view is qemu centric because we are developing qemu here. Anything that will
work in qemu monitor will work with libvirt or any other management software.

> 
> > > > > > Just saying "not really" doesn't
> > > > > > prove much. I still haven't seen any proposition from you that actually
> > > > > > solve the problem. No, "lets use guest naming" is not it. There is no
> > > > > > such thing as "The Guest". 
> > > > > > 
> > > > > > --
> > > > > > 			Gleb.
> > > > > 
> > > > > I am sorry if I didn't make this clear.  I think we should use the domain:bus
> > > > > pair to name the root device. As these are unique and 
> > > > > 
> > > > You forgot to complete the sentence :) But you made it clear enough and
> > > > it is incorrect. domain:bus pair not only not unique they do not exist
> > > > in qemu at all
> > > 
> > > Sure they do. domain maps to mcfg address for express. bus is used for
> > mcfg is optional as far as I can see. You can compile out MMCONFIG
> > support on Linux.
> 
> This doesn't mean that you will be able to address all devices
> on all systems if you do.
> 
Why qemu should care? Guest problem.

> > > cf8/cfc addressing. They are assigned by BIOS but since BIOS
> > > is supplied with hardware the point is moot.
> > Most PC hardware is supplied with Windows, so what? BIOS is a code that
> > runs in a guest. It is part of a guest. Every line of code executed by
> > vcpu belongs to a guest. No need to redefine things to prove you point.
> 
> Call it what you want. I care about the payload that the user supplies.
> BIOS and ACPI are implementation details of qemu needed to run said
> payload smoothly. Said payload might not know or care about
> IO address used to program host pci bridge.
And your point is? What "said payload" has to do with qemu CLI? Qemu
CLI behaviour should not depend on a guest code. This is path to
madness.

> 
> > > 
> > > > and as such can't be used to address device. They are
> > > > product of HW enumeration done by a guest OS just like eth0 or C:.
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > There's a huge difference between BIOS and guest OS,
> > Not true.
> > 
> > >                                            and between bus
> > > numbers of pci root and of nested bridges.
> > Really? What is it?
> 
> Look at the spec citations I provided. bus number for root must be set by
> firmware (or default to 0). The rest can be done by the OS.
> 
So there is not "huge difference" at all. This is just a good old bus
number. Two host bridges may have the same bus number. Some buses may be
not enumerated by a guest (neither FW nor OS) at all and should still be
addressable from qemu CLI.

> > > 
> > > Describing hardware io ports makes sense if you are trying to
> > > communicate data from qemu to the BIOS.  But the rest of the world might
> > > not care.
> > > 
> > The part of the world that manage HW cares.
> 
> You really want libvirt to deal with ioports?  Why only that?
No, I want them to deal with data that make sense. And I stated in first
mail after which you started this flame war that OF paths is not the only
way that make sense for qemu CLI. Using qdev device id makes perfect
sense too. So CLI may look like "devce_add pci.0:pci-bridge.1:ide.0"
or similar. They may even prefer it since they already deal with them on
the command line. Using bus addresses, on the other hand, that may not be even
assigned makes no sense.

> Let them emulate PCI while we are at it?
Don't be ridiculous. Actually that is you who wants to teach management
how bus numbers are assigned to PCI buses. I want to teach them about
device topology which they should know already since they control how
machine is created.

> 
> > You may need to add device
> > from monitor before first line of BIOS is event executed. How can you
> > rely on BIOS enumerate of devices in this case?
> 
> We can control the bus numbers that the BIOS will assign to pci root.
And if you need to insert device that sits behind pci-to-pci bridge? Do
you want to control that bus address too? And bus number do not unambiguously
describe pci root since two roots can have same bus number just fine.

> It's probably required to make them stable anyway.
> 
Why?

--
			Gleb.
Michael S. Tsirkin Nov. 21, 2010, 8:39 p.m. UTC | #22
On Sun, Nov 21, 2010 at 09:29:59PM +0200, Gleb Natapov wrote:
> > We can control the bus numbers that the BIOS will assign to pci root.
> And if you need to insert device that sits behind pci-to-pci bridge? Do
> you want to control that bus address too? And bus number do not unambiguously
> describe pci root since two roots can have same bus number just fine.

They have to have different segment numbers.

> > It's probably required to make them stable anyway.
> > 
> Why?

To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
Gleb Natapov Nov. 22, 2010, 7:37 a.m. UTC | #23
On Sun, Nov 21, 2010 at 10:39:31PM +0200, Michael S. Tsirkin wrote:
> On Sun, Nov 21, 2010 at 09:29:59PM +0200, Gleb Natapov wrote:
> > > We can control the bus numbers that the BIOS will assign to pci root.
> > And if you need to insert device that sits behind pci-to-pci bridge? Do
> > you want to control that bus address too? And bus number do not unambiguously
> > describe pci root since two roots can have same bus number just fine.
> 
> They have to have different segment numbers.
> 
AKA PCI domains AKA one more number assigned by a guest.

> > > It's probably required to make them stable anyway.
> > > 
> > Why?
> 
> To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> 
Why should qemu care?

--
			Gleb.
Michael S. Tsirkin Nov. 22, 2010, 8:16 a.m. UTC | #24
On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > It's probably required to make them stable anyway.
> > > > 
> > > Why?
> > 
> > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > 
> Why should qemu care?

Stable bus numbering is a feature *users* care about, because
some Guest OSes get confused when a card gets moved to another
bus.

> --
> 			Gleb.
Gleb Natapov Nov. 22, 2010, 1:04 p.m. UTC | #25
On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > It's probably required to make them stable anyway.
> > > > > 
> > > > Why?
> > > 
> > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > 
> > Why should qemu care?
> 
> Stable bus numbering is a feature *users* care about, because
> some Guest OSes get confused when a card gets moved to another
> bus.
> 
So if user cares about it it should not change HW configuration of QEMU.
I guess those OSes knows how to handle hot-pluggable equipment otherwise
they will get confused on real HW too. Why QEMU should care to preserve
something in a face of configuration change?

--
			Gleb.
Michael S. Tsirkin Nov. 22, 2010, 2:50 p.m. UTC | #26
On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > It's probably required to make them stable anyway.
> > > > > > 
> > > > > Why?
> > > > 
> > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > 
> > > Why should qemu care?
> > 
> > Stable bus numbering is a feature *users* care about, because
> > some Guest OSes get confused when a card gets moved to another
> > bus.
> > 
> So if user cares about it it should not change HW configuration of QEMU.
> I guess those OSes knows how to handle hot-pluggable equipment otherwise
> they will get confused on real HW too. Why QEMU should care to preserve
> something in a face of configuration change?
> 
> --
> 			Gleb.

We've been there, weren't we? See
http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
Gleb Natapov Nov. 22, 2010, 2:52 p.m. UTC | #27
On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > > It's probably required to make them stable anyway.
> > > > > > > 
> > > > > > Why?
> > > > > 
> > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > > 
> > > > Why should qemu care?
> > > 
> > > Stable bus numbering is a feature *users* care about, because
> > > some Guest OSes get confused when a card gets moved to another
> > > bus.
> > > 
> > So if user cares about it it should not change HW configuration of QEMU.
> > I guess those OSes knows how to handle hot-pluggable equipment otherwise
> > they will get confused on real HW too. Why QEMU should care to preserve
> > something in a face of configuration change?
> > 
> > --
> > 			Gleb.
> 
> We've been there, weren't we? See
> http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
> 
This is about stable HW configuration.

--
			Gleb.
Michael S. Tsirkin Nov. 22, 2010, 2:56 p.m. UTC | #28
On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote:
> On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > > > It's probably required to make them stable anyway.
> > > > > > > > 
> > > > > > > Why?
> > > > > > 
> > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > > > 
> > > > > Why should qemu care?
> > > > 
> > > > Stable bus numbering is a feature *users* care about, because
> > > > some Guest OSes get confused when a card gets moved to another
> > > > bus.
> > > > 
> > > So if user cares about it it should not change HW configuration of QEMU.
> > > I guess those OSes knows how to handle hot-pluggable equipment otherwise
> > > they will get confused on real HW too. Why QEMU should care to preserve
> > > something in a face of configuration change?
> > > 
> > > --
> > > 			Gleb.
> > 
> > We've been there, weren't we? See
> > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
> > 
> This is about stable HW configuration.

Exactly. We have the same need for nested bridges and devices behind
them.

> --
> 			Gleb.
Gleb Natapov Nov. 22, 2010, 2:58 p.m. UTC | #29
On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > > > > It's probably required to make them stable anyway.
> > > > > > > > > 
> > > > > > > > Why?
> > > > > > > 
> > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > > > > 
> > > > > > Why should qemu care?
> > > > > 
> > > > > Stable bus numbering is a feature *users* care about, because
> > > > > some Guest OSes get confused when a card gets moved to another
> > > > > bus.
> > > > > 
> > > > So if user cares about it it should not change HW configuration of QEMU.
> > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise
> > > > they will get confused on real HW too. Why QEMU should care to preserve
> > > > something in a face of configuration change?
> > > > 
> > > > --
> > > > 			Gleb.
> > > 
> > > We've been there, weren't we? See
> > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
> > > 
> > This is about stable HW configuration.
> 
> Exactly. We have the same need for nested bridges and devices behind
> them.
> 
And now you are talking about topology, not guest assigned bus numbers.
So suddenly you are on my side? I don't even get what are you arguing
about at this point.

--
			Gleb.
Michael S. Tsirkin Nov. 22, 2010, 4:41 p.m. UTC | #30
On Mon, Nov 22, 2010 at 04:58:11PM +0200, Gleb Natapov wrote:
> On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote:
> > > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote:
> > > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> > > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > > > > > It's probably required to make them stable anyway.
> > > > > > > > > > 
> > > > > > > > > Why?
> > > > > > > > 
> > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > > > > > 
> > > > > > > Why should qemu care?
> > > > > > 
> > > > > > Stable bus numbering is a feature *users* care about, because
> > > > > > some Guest OSes get confused when a card gets moved to another
> > > > > > bus.
> > > > > > 
> > > > > So if user cares about it it should not change HW configuration of QEMU.
> > > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise
> > > > > they will get confused on real HW too. Why QEMU should care to preserve
> > > > > something in a face of configuration change?
> > > > > 
> > > > > --
> > > > > 			Gleb.
> > > > 
> > > > We've been there, weren't we? See
> > > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
> > > > 
> > > This is about stable HW configuration.
> > 
> > Exactly. We have the same need for nested bridges and devices behind
> > them.
> > 
> And now you are talking about topology, not guest assigned bus numbers.

I suspect that if bus numbers change, it has the same effect as topology
change on the guest. Need to test though, I'm not sure.

> So suddenly you are on my side? I don't even get what are you arguing
> about at this point.

By this time, I forgot, too :).

> --
> 			Gleb.
Gleb Natapov Nov. 22, 2010, 5:01 p.m. UTC | #31
On Mon, Nov 22, 2010 at 06:41:28PM +0200, Michael S. Tsirkin wrote:
> On Mon, Nov 22, 2010 at 04:58:11PM +0200, Gleb Natapov wrote:
> > On Mon, Nov 22, 2010 at 04:56:16PM +0200, Michael S. Tsirkin wrote:
> > > On Mon, Nov 22, 2010 at 04:52:32PM +0200, Gleb Natapov wrote:
> > > > On Mon, Nov 22, 2010 at 04:50:29PM +0200, Michael S. Tsirkin wrote:
> > > > > On Mon, Nov 22, 2010 at 03:04:51PM +0200, Gleb Natapov wrote:
> > > > > > On Mon, Nov 22, 2010 at 10:16:57AM +0200, Michael S. Tsirkin wrote:
> > > > > > > On Mon, Nov 22, 2010 at 09:37:07AM +0200, Gleb Natapov wrote:
> > > > > > > > > > > It's probably required to make them stable anyway.
> > > > > > > > > > > 
> > > > > > > > > > Why?
> > > > > > > > > 
> > > > > > > > > To avoid bus renumbering on reboot after you add a pci-to-pci bridge.
> > > > > > > > > 
> > > > > > > > Why should qemu care?
> > > > > > > 
> > > > > > > Stable bus numbering is a feature *users* care about, because
> > > > > > > some Guest OSes get confused when a card gets moved to another
> > > > > > > bus.
> > > > > > > 
> > > > > > So if user cares about it it should not change HW configuration of QEMU.
> > > > > > I guess those OSes knows how to handle hot-pluggable equipment otherwise
> > > > > > they will get confused on real HW too. Why QEMU should care to preserve
> > > > > > something in a face of configuration change?
> > > > > > 
> > > > > > --
> > > > > > 			Gleb.
> > > > > 
> > > > > We've been there, weren't we? See
> > > > > http://fedoraproject.org/wiki/Features/KVM_Stable_PCI_Addresses#KVM_Stable_PCI_Addresses
> > > > > 
> > > > This is about stable HW configuration.
> > > 
> > > Exactly. We have the same need for nested bridges and devices behind
> > > them.
> > > 
> > And now you are talking about topology, not guest assigned bus numbers.
> 
> I suspect that if bus numbers change, it has the same effect as topology
> change on the guest. Need to test though, I'm not sure.
> 
Hard to believe. Unplugging card with internal pci-pci bridge may change
bus numbering.

> > So suddenly you are on my side? I don't even get what are you arguing
> > about at this point.
> 
> By this time, I forgot, too :).
> 
:)

--
			Gleb.
diff mbox

Patch

diff --git a/hw/pci.c b/hw/pci.c
index 7d12473..fa98d94 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -1826,13 +1826,45 @@  static void pcibus_dev_print(Monitor *mon, DeviceState *dev, int indent)
 
 static char *pcibus_get_dev_path(DeviceState *dev)
 {
-    PCIDevice *d = (PCIDevice *)dev;
-    char path[16];
-
-    snprintf(path, sizeof(path), "%04x:%02x:%02x.%x",
-             pci_find_domain(d->bus), pci_bus_num(d->bus),
-             PCI_SLOT(d->devfn), PCI_FUNC(d->devfn));
-
-    return strdup(path);
+    PCIDevice *d = container_of(dev, PCIDevice, qdev);
+    PCIDevice *t;
+    int slot_depth;
+    /* Path format: Domain:00:Slot:Slot....:Slot.Function.
+     * 00 is added here to make this format compatible with
+     * domain:Bus:Slot.Func for systems without nested PCI bridges.
+     * Slot list specifies the slot numbers for all devices on the
+     * path from root to the specific device. */
+    int domain_len = strlen("DDDD:00");
+    int func_len = strlen(".F");
+    int slot_len = strlen(":SS");
+    int path_len;
+    char *path, *p;
+
+    /* Calculate # of slots on path between device and root. */;
+    slot_depth = 0;
+    for (t = d; t; t = t->bus->parent_dev)
+        ++slot_depth;
+
+    path_len = domain_len + bus_len + slot_len * slot_depth + func_len;
+
+    /* Allocate memory, fill in the terminating null byte. */
+    path = malloc(path_len + 1 /* For '\0' */);
+    path[path_len] = '\0';
+
+    /* First field is the domain. */
+    snprintf(path, domain_len, "%04x", pci_find_domain(d->bus));
+
+    /* Leave space for slot numbers and fill in function number. */
+    p = path + domain_len + slot_len * slot_depth;
+    snprintf(p, func_len, ".%02x", PCI_FUNC(d->devfn));
+
+    /* Fill in slot numbers. We walk up from device to root, so need to print
+     * them in the reverse order, last to first. */
+    for (t = d; t; t = t->bus->parent_dev) {
+        p -= slot_len;
+        snprintf(p, slot_len, ":%x", PCI_SLOT(t->devfn));
+    }
+
+    return path;
 }