diff mbox

Add support for reading SMBIOS Slot number for PCI devices

Message ID 1436565766-21820-1-git-send-email-jordan_hargrave@dell.com
State Changes Requested
Headers show

Commit Message

Jordan Hargrave July 10, 2015, 10:02 p.m. UTC
From: Jordan Hargrave <Jordan_Hargrave@dell.com>

There currently isn't an easy way to determine which PCI devices belong to
system slots.  This patch adds support to read SMBIOS Type 9 (System Slots).

Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
---
 drivers/firmware/dmi_scan.c | 39 ++++++++++++++++++++
 drivers/pci/pci-label.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/dmi.h         |  1 +
 3 files changed, 129 insertions(+)

Comments

Jean Delvare July 13, 2015, 7:35 a.m. UTC | #1
Hi Jordan,

On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
> From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> 
> There currently isn't an easy way to determine which PCI devices belong to
> system slots.  This patch adds support to read SMBIOS Type 9 (System Slots).

I'm wondering, can't you use dmidecode or libsmbios to retrieve the
same information?
Jordan Hargrave July 13, 2015, 3:11 p.m. UTC | #2
On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>
> Hi Jordan,
>
 On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
 > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> >
> > There currently isn't an easy way to determine which PCI devices belong
> > to
> > system slots.  This patch adds support to read SMBIOS Type 9 (System
> > Slots).
>
> I'm wondering, can't you use dmidecode or libsmbios to retrieve the
> same information?
>
> --
> Jean Delvare
> SUSE L3 Support

You can but it's as not easy to determine the slot number for leaf
devices on bridges.  Eventually planning on using this for pulling
slot number for identifying network cards and disk numbering for
systemd
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 21, 2015, 4:57 p.m. UTC | #3
On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
> 
> > Hi Jordan,
> >
> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> > >
> > > There currently isn't an easy way to determine which PCI devices belong
> > to
> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
> > Slots).
> >
> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
> > same information?
> >
> > --
> > Jean Delvare
> > SUSE L3 Support
> >
> 
> You can but it's as not easy to determine the slot number for leaf devices
> on bridges.  Eventually planning on using this for pulling slot number for
> identifying network cards and disk numbering for systemd

Can you outline the problems with using dmidecode or libsmbios?
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave July 21, 2015, 5:31 p.m. UTC | #4
On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
>> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>>
>> > Hi Jordan,
>> >
>> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
>> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>> > >
>> > > There currently isn't an easy way to determine which PCI devices belong
>> > to
>> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
>> > Slots).
>> >
>> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
>> > same information?
>> >
>> > --
>> > Jean Delvare
>> > SUSE L3 Support
>> >
>>
>> You can but it's as not easy to determine the slot number for leaf devices
>> on bridges.  Eventually planning on using this for pulling slot number for
>> identifying network cards and disk numbering for systemd
>
> Can you outline the problems with using dmidecode or libsmbios?

Neither dmidecode nor libsmbios report the slot number for devices
behind bridges in a slot.  I'm wanting to use this sysfs variable to
get slot numbers for systemd, so using libsmbios and dmidecode aren't
very useful.  We already report the index for embedded devices in
pci-label.c, this code should have gone in at the same time.

For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
quad-port NIC in the slot with a bridge at 40:00.0

42:00.0 Bridge (sec=43, sub=45)
43:02.0 Bridge (sec=44, sub=44)
43:04.0 Bridge (sec=45, sub=45)
44:00.0 Ethernet
44:00.1 Ethernet
45:00.0 Ethernet
45:00.1 Ethernet

So dmidecode only returns the slot number for 42:00.0 but not any
child devices.  This code will provide a 'slot' sysfs variable that
reports '3' for all devices under and including the bridge.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 21, 2015, 6:09 p.m. UTC | #5
On Fri, Jul 10, 2015 at 05:02:46PM -0500, Jordan Hargrave wrote:
> From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> 
> There currently isn't an easy way to determine which PCI devices belong to
> system slots.  This patch adds support to read SMBIOS Type 9 (System Slots).
> 
> Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
> ---
>  drivers/firmware/dmi_scan.c | 39 ++++++++++++++++++++
>  drivers/pci/pci-label.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/dmi.h         |  1 +
>  3 files changed, 129 insertions(+)
> 
> diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
> index ac1ce4a..8f95897 100644
> --- a/drivers/firmware/dmi_scan.c
> +++ b/drivers/firmware/dmi_scan.c
> @@ -356,6 +356,41 @@ static void __init dmi_save_extended_devices(const struct dmi_header *dm)
>  	dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
>  }
>  
> +static void __init dmi_save_dev_slot(int instance, int segment, int bus, int devfn, const char *name)
> +{
> +	struct dmi_dev_onboard *slot;
> +	
> +	slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
> +	if (!slot) {
> +		printk(KERN_ERR "dmi_save_system_slot: out of memory.\n");

You've got lots of data that might be useful in the printk: segment, bus,
devfn, name.

> +		return;
> +	}
> +	slot->instance = instance;
> +	slot->segment = segment;
> +	slot->bus = bus;
> +	slot->devfn = devfn;
> +
> +	strcpy((char *)&slot[1], name);
> +	slot->dev.type = DMI_DEV_TYPE_SYSTEM_SLOT;
> +	slot->dev.name = (char *)&slot[1];
> +	slot->dev.device_data = slot;
> +
> +	list_add(&slot->dev.list, &dmi_devices);
> +}
> +
> +
> +static void __init dmi_save_system_slot(const struct dmi_header *dm)
> +{
> +	const char *name;
> +	const u8 *d = (u8*)dm;
> +	
> +	if (dm->type == DMI_ENTRY_SYSTEM_SLOT && dm->length >= 0x11) {
> +		name = dmi_string_nosave(dm, *(d + 0x04));
> +		dmi_save_dev_slot(*(u16 *)(d + 0x09), *(u16 *)(d + 0xD), 
> +				  *(d + 0xF), *(d + 0x10), name);
> +	}
> +}
> +
>  static void __init count_mem_devices(const struct dmi_header *dm, void *v)
>  {
>  	if (dm->type != DMI_ENTRY_MEM_DEVICE)
> @@ -437,6 +472,10 @@ static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
>  		break;
>  	case 41:	/* Onboard Devices Extended Information */
>  		dmi_save_extended_devices(dm);
> +		break;
> +	case 9:         /* System Slots */
> +		dmi_save_system_slot(dm);
> +		break;
>  	}
>  }
>  
> diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
> index 024b5c1..38dfb45 100644
> --- a/drivers/pci/pci-label.c
> +++ b/drivers/pci/pci-label.c
> @@ -125,14 +125,103 @@ static struct attribute_group smbios_attr_group = {
>  	.is_visible = smbios_instance_string_exist,
>  };
>  
> +static int smbios_getslot(struct pci_dev *pdev)
> +{
> +	struct pci_dev *sdev;
> +	struct dmi_dev_onboard *dslot;
> +	const struct dmi_device *dmi;
> +	u8 sub, sec, bus;
> +
> +	dmi = NULL;
> +	if (pdev->is_virtfn) {
> +		/* Get Physical device for SR-IOV */
> +		pdev = pdev->physfn;
> +	}

Use pci_physfn().

> +	bus = pdev->bus->number;
> +	while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL, dmi)) != NULL) {

This loop doesn't match my naive expectation of how we should find a slot.
I expected something like:

  - Look for DMI info that's an exact match for my D:B:D.F
  - Walk bridges upstream towards the root, looking for a subtree that
    includes pdev

> +		sec = sub = -1;
> +
> +		dslot = dmi->device_data;
> +		sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus, dslot->devfn);

This acquires a reference on the dev returned, so you need to dispose of
that somewhere.

> +		if (sdev == NULL)
> +			continue;
> +
> +		if (sdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +			pci_read_config_byte(sdev, PCI_SECONDARY_BUS, &sec);
> +			pci_read_config_byte(sdev, PCI_SUBORDINATE_BUS, &sub);

We should not have to read this out of config space; busn_res in struct
pci_bus has this information already.

> +			if (bus >= sec && bus <= sub) {
> +				/* device is child of bridge */
> +				return dslot->instance;

If you have a slot name for 0000:00:00.0 and a device at 0001:00:00.0, this
will erroneously associate the domain 0 name with the domain 1 device.

> +			}
> +		}
> +		if (pci_domain_nr(sdev->bus) == pci_domain_nr(pdev->bus) &&
> +		    sdev->bus->number == pdev->bus->number &&
> +		    PCI_SLOT(sdev->devfn) == PCI_SLOT(pdev->devfn)) {
> +			/* Match domain:bus:dev.  If PCIE root, only match function */

The PCIe part of this comment doesn't seem to match the code.

> +			if (PCI_FUNC(sdev->devfn) == PCI_FUNC(pdev->devfn) ||
> +			    pci_pcie_type(sdev) != PCI_EXP_TYPE_ROOT_PORT) {
> +				return dslot->instance;
> +			}
> +		}
> +	}
> +	return -ENODEV;
> +}
> + 
> +static ssize_t smbiosslot_show(struct device *dev,
> +			       struct device_attribute *attr, char *buf)
> +{
> + 	struct pci_dev *pdev;
> +	int slot;
> + 
> + 	pdev = to_pci_dev(dev);
> +	slot = smbios_getslot(pdev);
> +	if (slot > 0) {
> +		return scnprintf(buf, PAGE_SIZE, "%d\n", slot);
> +	}
> +	return 0;
> +}
> +
> +static umode_t smbios_slot_exist(struct kobject *kobj, struct attribute *attr,
> +				 int n)
> +{
> +	struct device *dev;
> +	struct pci_dev *pdev;
> +
> +	dev = container_of(kobj, struct device, kobj);
> +	pdev = to_pci_dev(dev);
> +
> +	return (smbios_getslot(pdev) > 0) ? S_IRUGO : 0;
> +}
> +
> +static struct device_attribute smbios_attr_slot = {
> +	.attr = {.name = "slot", .mode = 0444},
> +	.show = smbiosslot_show,
> +};
> +
> +static struct attribute *smbios_slot_attributes[] = {
> +	&smbios_attr_slot.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group smbios_slot_attr_group = {
> +	.attrs = smbios_slot_attributes,
> +	.is_visible = smbios_slot_exist,
> +};
> +
>  static int pci_create_smbiosname_file(struct pci_dev *pdev)
>  {
> +	int rc;
> +
> +	rc = sysfs_create_group(&pdev->dev.kobj, &smbios_slot_attr_group);
> +	if (rc != 0)
> +		return rc;
>  	return sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group);
>  }
>  
>  static void pci_remove_smbiosname_file(struct pci_dev *pdev)
>  {
>  	sysfs_remove_group(&pdev->dev.kobj, &smbios_attr_group);
> +	sysfs_remove_group(&pdev->dev.kobj, &smbios_slot_attr_group);
>  }
>  #else
>  static inline int pci_create_smbiosname_file(struct pci_dev *pdev)
> diff --git a/include/linux/dmi.h b/include/linux/dmi.h
> index 5055ac3..09f42e7 100644
> --- a/include/linux/dmi.h
> +++ b/include/linux/dmi.h
> @@ -22,6 +22,7 @@ enum dmi_device_type {
>  	DMI_DEV_TYPE_IPMI = -1,
>  	DMI_DEV_TYPE_OEM_STRING = -2,
>  	DMI_DEV_TYPE_DEV_ONBOARD = -3,
> +	DMI_DEV_TYPE_SYSTEM_SLOT = -4,
>  };
>  
>  enum dmi_entry_type {
> -- 
> 1.8.3.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave July 21, 2015, 6:56 p.m. UTC | #6
On Tue, Jul 21, 2015 at 1:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Fri, Jul 10, 2015 at 05:02:46PM -0500, Jordan Hargrave wrote:
>> From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>>
>> There currently isn't an easy way to determine which PCI devices belong to
>> system slots.  This patch adds support to read SMBIOS Type 9 (System Slots).
>>
>> Signed-off-by: Jordan Hargrave <Jordan_Hargrave@dell.com>
>> ---
>>  drivers/firmware/dmi_scan.c | 39 ++++++++++++++++++++
>>  drivers/pci/pci-label.c     | 89 +++++++++++++++++++++++++++++++++++++++++++++
>>  include/linux/dmi.h         |  1 +
>>  3 files changed, 129 insertions(+)
>>
>> diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
>> index ac1ce4a..8f95897 100644
>> --- a/drivers/firmware/dmi_scan.c
>> +++ b/drivers/firmware/dmi_scan.c
>> @@ -356,6 +356,41 @@ static void __init dmi_save_extended_devices(const struct dmi_header *dm)
>>       dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
>>  }
>>
>> +static void __init dmi_save_dev_slot(int instance, int segment, int bus, int devfn, const char *name)
>> +{
>> +     struct dmi_dev_onboard *slot;
>> +
>> +     slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
>> +     if (!slot) {
>> +             printk(KERN_ERR "dmi_save_system_slot: out of memory.\n");
>
> You've got lots of data that might be useful in the printk: segment, bus,
> devfn, name.
>
>> +             return;
>> +     }
>> +     slot->instance = instance;
>> +     slot->segment = segment;
>> +     slot->bus = bus;
>> +     slot->devfn = devfn;
>> +
>> +     strcpy((char *)&slot[1], name);
>> +     slot->dev.type = DMI_DEV_TYPE_SYSTEM_SLOT;
>> +     slot->dev.name = (char *)&slot[1];
>> +     slot->dev.device_data = slot;
>> +
>> +     list_add(&slot->dev.list, &dmi_devices);
>> +}
>> +
>> +
>> +static void __init dmi_save_system_slot(const struct dmi_header *dm)
>> +{
>> +     const char *name;
>> +     const u8 *d = (u8*)dm;
>> +
>> +     if (dm->type == DMI_ENTRY_SYSTEM_SLOT && dm->length >= 0x11) {
>> +             name = dmi_string_nosave(dm, *(d + 0x04));
>> +             dmi_save_dev_slot(*(u16 *)(d + 0x09), *(u16 *)(d + 0xD),
>> +                               *(d + 0xF), *(d + 0x10), name);
>> +     }
>> +}
>> +
>>  static void __init count_mem_devices(const struct dmi_header *dm, void *v)
>>  {
>>       if (dm->type != DMI_ENTRY_MEM_DEVICE)
>> @@ -437,6 +472,10 @@ static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
>>               break;
>>       case 41:        /* Onboard Devices Extended Information */
>>               dmi_save_extended_devices(dm);
>> +             break;
>> +     case 9:         /* System Slots */
>> +             dmi_save_system_slot(dm);
>> +             break;
>>       }
>>  }
>>
>> diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
>> index 024b5c1..38dfb45 100644
>> --- a/drivers/pci/pci-label.c
>> +++ b/drivers/pci/pci-label.c
>> @@ -125,14 +125,103 @@ static struct attribute_group smbios_attr_group = {
>>       .is_visible = smbios_instance_string_exist,
>>  };
>>
>> +static int smbios_getslot(struct pci_dev *pdev)
>> +{
>> +     struct pci_dev *sdev;
>> +     struct dmi_dev_onboard *dslot;
>> +     const struct dmi_device *dmi;
>> +     u8 sub, sec, bus;
>> +
>> +     dmi = NULL;
>> +     if (pdev->is_virtfn) {
>> +             /* Get Physical device for SR-IOV */
>> +             pdev = pdev->physfn;
>> +     }
>
> Use pci_physfn().
>
>> +     bus = pdev->bus->number;
>> +     while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL, dmi)) != NULL) {
>
> This loop doesn't match my naive expectation of how we should find a slot.
> I expected something like:
>
>   - Look for DMI info that's an exact match for my D:B:D.F
>   - Walk bridges upstream towards the root, looking for a subtree that
>     includes pdev
>
It's a top-down approach vs bottom up.  This way we only need to
search a single pci device for the slot... saves having to search up
the tree for all pci devices that *aren't* on a slot (eg embedded
devices will always search up to root without finding a slot entry).

>> +             sec = sub = -1;
>> +
>> +             dslot = dmi->device_data;
>> +             sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus, dslot->devfn);
>
> This acquires a reference on the dev returned, so you need to dispose of
> that somewhere.

Ok

>
>> +             if (sdev == NULL)
>> +                     continue;
>> +
>> +             if (sdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
>> +                     pci_read_config_byte(sdev, PCI_SECONDARY_BUS, &sec);
>> +                     pci_read_config_byte(sdev, PCI_SUBORDINATE_BUS, &sub);
>
> We should not have to read this out of config space; busn_res in struct
> pci_bus has this information already.
>

Ok

>> +                     if (bus >= sec && bus <= sub) {
>> +                             /* device is child of bridge */
>> +                             return dslot->instance;
>
> If you have a slot name for 0000:00:00.0 and a device at 0001:00:00.0, this
> will erroneously associate the domain 0 name with the domain 1 device.
>
OK will put in domain check earlier.

>> +                     }
>> +             }
>> +             if (pci_domain_nr(sdev->bus) == pci_domain_nr(pdev->bus) &&
>> +                 sdev->bus->number == pdev->bus->number &&
>> +                 PCI_SLOT(sdev->devfn) == PCI_SLOT(pdev->devfn)) {
>> +                     /* Match domain:bus:dev.  If PCIE root, only match function */
>
> The PCIe part of this comment doesn't seem to match the code.
>

There can be multifunction devices with pcie root ports
00:01.0 PCIE Root (SMBIOS Slot 2)
00:01.1 PCIE Root (SMBIOS Slot 3)
00:02.2 PCIE Root
etc

and multifunction devices on a single device
01:00.0 Ethernet (SMBIOS slot points to function 0 only)
01:00.1 Ethernet

If SMBIOS slot points to PCIE root port, only match exact function
number. Otherwise match any function number.

>> +                     if (PCI_FUNC(sdev->devfn) == PCI_FUNC(pdev->devfn) ||
>> +                         pci_pcie_type(sdev) != PCI_EXP_TYPE_ROOT_PORT) {
>> +                             return dslot->instance;
>> +                     }
>> +             }
>> +     }
>> +     return -ENODEV;
>> +}
>> +
>> +static ssize_t smbiosslot_show(struct device *dev,
>> +                            struct device_attribute *attr, char *buf)
>> +{
>> +     struct pci_dev *pdev;
>> +     int slot;
>> +
>> +     pdev = to_pci_dev(dev);
>> +     slot = smbios_getslot(pdev);
>> +     if (slot > 0) {
>> +             return scnprintf(buf, PAGE_SIZE, "%d\n", slot);
>> +     }
>> +     return 0;
>> +}
>> +
>> +static umode_t smbios_slot_exist(struct kobject *kobj, struct attribute *attr,
>> +                              int n)
>> +{
>> +     struct device *dev;
>> +     struct pci_dev *pdev;
>> +
>> +     dev = container_of(kobj, struct device, kobj);
>> +     pdev = to_pci_dev(dev);
>> +
>> +     return (smbios_getslot(pdev) > 0) ? S_IRUGO : 0;
>> +}
>> +
>> +static struct device_attribute smbios_attr_slot = {
>> +     .attr = {.name = "slot", .mode = 0444},
>> +     .show = smbiosslot_show,
>> +};
>> +
>> +static struct attribute *smbios_slot_attributes[] = {
>> +     &smbios_attr_slot.attr,
>> +     NULL,
>> +};
>> +
>> +static struct attribute_group smbios_slot_attr_group = {
>> +     .attrs = smbios_slot_attributes,
>> +     .is_visible = smbios_slot_exist,
>> +};
>> +
>>  static int pci_create_smbiosname_file(struct pci_dev *pdev)
>>  {
>> +     int rc;
>> +
>> +     rc = sysfs_create_group(&pdev->dev.kobj, &smbios_slot_attr_group);
>> +     if (rc != 0)
>> +             return rc;
>>       return sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group);
>>  }
>>
>>  static void pci_remove_smbiosname_file(struct pci_dev *pdev)
>>  {
>>       sysfs_remove_group(&pdev->dev.kobj, &smbios_attr_group);
>> +     sysfs_remove_group(&pdev->dev.kobj, &smbios_slot_attr_group);
>>  }
>>  #else
>>  static inline int pci_create_smbiosname_file(struct pci_dev *pdev)
>> diff --git a/include/linux/dmi.h b/include/linux/dmi.h
>> index 5055ac3..09f42e7 100644
>> --- a/include/linux/dmi.h
>> +++ b/include/linux/dmi.h
>> @@ -22,6 +22,7 @@ enum dmi_device_type {
>>       DMI_DEV_TYPE_IPMI = -1,
>>       DMI_DEV_TYPE_OEM_STRING = -2,
>>       DMI_DEV_TYPE_DEV_ONBOARD = -3,
>> +     DMI_DEV_TYPE_SYSTEM_SLOT = -4,
>>  };
>>
>>  enum dmi_entry_type {
>> --
>> 1.8.3.1
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 22, 2015, 1:09 a.m. UTC | #7
On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
> >>
> >> > Hi Jordan,
> >> >
> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> >> > >
> >> > > There currently isn't an easy way to determine which PCI devices belong
> >> > to
> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
> >> > Slots).
> >> >
> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
> >> > same information?
> >> >
> >> > --
> >> > Jean Delvare
> >> > SUSE L3 Support
> >> >
> >>
> >> You can but it's as not easy to determine the slot number for leaf devices
> >> on bridges.  Eventually planning on using this for pulling slot number for
> >> identifying network cards and disk numbering for systemd
> >
> > Can you outline the problems with using dmidecode or libsmbios?
> 
> Neither dmidecode nor libsmbios report the slot number for devices
> behind bridges in a slot.  

True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
easy to see what the upstream bridges are.

> I'm wanting to use this sysfs variable to
> get slot numbers for systemd, so using libsmbios and dmidecode aren't
> very useful.

If you want this in systemd, I see why you wouldn't want a command like
dmidecode.  Help me understand the problem with libsmbios.  Is it not
useful because (a) systemd doesn't want to link with it, or (b) libsmbios
doesn't have the right information, or (c) something else?

It doesn't *look* like this is using any information that is only available
in the kernel, so in principle it seems like this could be done in
user-space.

> We already report the index for embedded devices in
> pci-label.c, this code should have gone in at the same time.
> 
> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
> quad-port NIC in the slot with a bridge at 40:00.0
> 
> 42:00.0 Bridge (sec=43, sub=45)
> 43:02.0 Bridge (sec=44, sub=44)
> 43:04.0 Bridge (sec=45, sub=45)
> 44:00.0 Ethernet
> 44:00.1 Ethernet
> 45:00.0 Ethernet
> 45:00.1 Ethernet
> 
> So dmidecode only returns the slot number for 42:00.0 but not any
> child devices.  This code will provide a 'slot' sysfs variable that
> reports '3' for all devices under and including the bridge.

What if the card in slot 3 is an adapter leading to an external PCI
chassis?  Wouldn't we then have a 'slot' file for every card in that
chassis, all containing '3'?  This sounds confusing, although it is true
that they all would be connected via the system board slot 3.

Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
something like this in the kernel, how would it relate to that hierarchy?
Could this SMBIOS stuff be integrated into that somehow?

We have a bit of a hodge-podge of slot names already, and I'd like to
simplify things if we can.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave July 22, 2015, 8:07 p.m. UTC | #8
On Tue, Jul 21, 2015 at 8:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
>> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
>> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>> >>
>> >> > Hi Jordan,
>> >> >
>> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
>> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>> >> > >
>> >> > > There currently isn't an easy way to determine which PCI devices belong
>> >> > to
>> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
>> >> > Slots).
>> >> >
>> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
>> >> > same information?
>> >> >
>> >> > --
>> >> > Jean Delvare
>> >> > SUSE L3 Support
>> >> >
>> >>
>> >> You can but it's as not easy to determine the slot number for leaf devices
>> >> on bridges.  Eventually planning on using this for pulling slot number for
>> >> identifying network cards and disk numbering for systemd
>> >
>> > Can you outline the problems with using dmidecode or libsmbios?
>>
>> Neither dmidecode nor libsmbios report the slot number for devices
>> behind bridges in a slot.
>
> True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
> given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
> easy to see what the upstream bridges are.
>
It makes it more complicated I think. You have to check all functions
on all devices as well on the walk to the root.

Would it look something like this?
while (pdev) {
  if (pdev->bus->number == dslot->bus && PCI_SLOT(pdev->devfn) ==
PCI_SLOT(dslot->devfn) &&
     (pci_pcie_type(pdev) != PCI_ROOT_PORT || PCI_FUNC(pdev->devfn) ==
PCI_FUNC(dslot->devfn))
    return dslot->instance;
  if (!pdev->bus->parent)
    break;
  pdev = pdev->bus->parent->self;
}

>> I'm wanting to use this sysfs variable to
>> get slot numbers for systemd, so using libsmbios and dmidecode aren't
>> very useful.
>
> If you want this in systemd, I see why you wouldn't want a command like
> dmidecode.  Help me understand the problem with libsmbios.  Is it not
> useful because (a) systemd doesn't want to link with it, or (b) libsmbios
> doesn't have the right information, or (c) something else?
>
Linking with libsmbios would be a problem, and libsmbios doesn't have
this info anyway.

> It doesn't *look* like this is using any information that is only available
> in the kernel, so in principle it seems like this could be done in
> user-space.
>

I actually just got an email from someone who needs to determine the
card slot number in their driver... so I've added an external callable
'pci_get_smbios_slot' function to enable this.

>> We already report the index for embedded devices in
>> pci-label.c, this code should have gone in at the same time.
>>
>> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
>> quad-port NIC in the slot with a bridge at 40:00.0
>>
>> 42:00.0 Bridge (sec=43, sub=45)
>> 43:02.0 Bridge (sec=44, sub=44)
>> 43:04.0 Bridge (sec=45, sub=45)
>> 44:00.0 Ethernet
>> 44:00.1 Ethernet
>> 45:00.0 Ethernet
>> 45:00.1 Ethernet
>>
>> So dmidecode only returns the slot number for 42:00.0 but not any
>> child devices.  This code will provide a 'slot' sysfs variable that
>> reports '3' for all devices under and including the bridge.
>
> What if the card in slot 3 is an adapter leading to an external PCI
> chassis?  Wouldn't we then have a 'slot' file for every card in that
> chassis, all containing '3'?  This sounds confusing, although it is true
> that they all would be connected via the system board slot 3.
>
Yes that is correct.  Unless SMBIOS had a table of the second chassis.

> Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
> something like this in the kernel, how would it relate to that hierarchy?
> Could this SMBIOS stuff be integrated into that somehow?
>

/sys/bus/pci/slots only map a slot to a single PCI device.  systemd
does use /sys/bus/pci/slots but it can't see the slot number on cards
with bridges as the actual slot number is a parent.  And that's not
easily to determine a parent device from the slots interface.  I'd
really like something generic here. I'm also looking for some method
for reporting bay/enclosure ID for PCIe-SSD devices.

> We have a bit of a hodge-podge of slot names already, and I'd like to
> simplify things if we can.
>
> Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bjorn Helgaas July 23, 2015, 5:24 p.m. UTC | #9
[+cc Matthew for PCIe-SSD perspective]

On Wed, Jul 22, 2015 at 03:07:46PM -0500, Jordan Hargrave wrote:
> On Tue, Jul 21, 2015 at 8:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> > On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
> >> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> >> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
> >> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
> >> >>
> >> >> > Hi Jordan,
> >> >> >
> >> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
> >> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
> >> >> > >
> >> >> > > There currently isn't an easy way to determine which PCI devices belong
> >> >> > to
> >> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
> >> >> > Slots).
> >> >> >
> >> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
> >> >> > same information?
> >> >> >
> >> >> > --
> >> >> > Jean Delvare
> >> >> > SUSE L3 Support
> >> >> >
> >> >>
> >> >> You can but it's as not easy to determine the slot number for leaf devices
> >> >> on bridges.  Eventually planning on using this for pulling slot number for
> >> >> identifying network cards and disk numbering for systemd
> >> >
> >> > Can you outline the problems with using dmidecode or libsmbios?
> >>
> >> Neither dmidecode nor libsmbios report the slot number for devices
> >> behind bridges in a slot.
> >
> > True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
> > given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
> > easy to see what the upstream bridges are.
> >
> It makes it more complicated I think. You have to check all functions
> on all devices as well on the walk to the root.
> 
> Would it look something like this?
> while (pdev) {
>   if (pdev->bus->number == dslot->bus && PCI_SLOT(pdev->devfn) ==
> PCI_SLOT(dslot->devfn) &&
>      (pci_pcie_type(pdev) != PCI_ROOT_PORT || PCI_FUNC(pdev->devfn) ==
> PCI_FUNC(dslot->devfn))
>     return dslot->instance;
>   if (!pdev->bus->parent)
>     break;
>   pdev = pdev->bus->parent->self;
> }
> 
> >> I'm wanting to use this sysfs variable to
> >> get slot numbers for systemd, so using libsmbios and dmidecode aren't
> >> very useful.
> >
> > If you want this in systemd, I see why you wouldn't want a command like
> > dmidecode.  Help me understand the problem with libsmbios.  Is it not
> > useful because (a) systemd doesn't want to link with it, or (b) libsmbios
> > doesn't have the right information, or (c) something else?
> >
> Linking with libsmbios would be a problem, and libsmbios doesn't have
> this info anyway.
> 
> > It doesn't *look* like this is using any information that is only available
> > in the kernel, so in principle it seems like this could be done in
> > user-space.
> >
> 
> I actually just got an email from someone who needs to determine the
> card slot number in their driver... so I've added an external callable
> 'pci_get_smbios_slot' function to enable this.
> 
> >> We already report the index for embedded devices in
> >> pci-label.c, this code should have gone in at the same time.
> >>
> >> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
> >> quad-port NIC in the slot with a bridge at 40:00.0
> >>
> >> 42:00.0 Bridge (sec=43, sub=45)
> >> 43:02.0 Bridge (sec=44, sub=44)
> >> 43:04.0 Bridge (sec=45, sub=45)
> >> 44:00.0 Ethernet
> >> 44:00.1 Ethernet
> >> 45:00.0 Ethernet
> >> 45:00.1 Ethernet
> >>
> >> So dmidecode only returns the slot number for 42:00.0 but not any
> >> child devices.  This code will provide a 'slot' sysfs variable that
> >> reports '3' for all devices under and including the bridge.
> >
> > What if the card in slot 3 is an adapter leading to an external PCI
> > chassis?  Wouldn't we then have a 'slot' file for every card in that
> > chassis, all containing '3'?  This sounds confusing, although it is true
> > that they all would be connected via the system board slot 3.
> >
> Yes that is correct.  Unless SMBIOS had a table of the second chassis.
> 
> > Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
> > something like this in the kernel, how would it relate to that hierarchy?
> > Could this SMBIOS stuff be integrated into that somehow?
> >
> 
> /sys/bus/pci/slots only map a slot to a single PCI device.  systemd
> does use /sys/bus/pci/slots but it can't see the slot number on cards
> with bridges as the actual slot number is a parent.  And that's not
> easily to determine a parent device from the slots interface.  I'd
> really like something generic here. I'm also looking for some method
> for reporting bay/enclosure ID for PCIe-SSD devices.
> 
> > We have a bit of a hodge-podge of slot names already, and I'd like to
> > simplify things if we can.

We aren't really converging here yet.

We need to figure out exactly what has to be done in the kernel because it
can't be done in user-space.  So far, your patch *looks* like it could (in
principle) be done in user-space, given a sysfs PCI hierarchy and the
SMBIOS information.

The goal ("determine which PCI devices belong to system slots") is
definitely generic and useful.  We have quite a bit of slot stuff already
in the kernel, and some is already exposed via sysfs, so if we're still
missing what you need, it seems like the current code is off the rails
somewhere and should be fixed.

Can we explore what you need in a little more detail, with some concrete
examples?  I heard:

  - Systemd uses /sys/bus/pci/slots but gets the wrong slot information for
    cards with bridges on them.  Can you give an example including all the
    PCI devices involved and the related /sys/bus/pci/slots info so we can
    see exactly what's wrong?

  - A driver needs the card slot number.  A slot number seems like a user
    interface thing, so I'm surprised that a driver would be concerned with
    it.  And of course, SMBIOS is an arch-specific thing, so the driver
    would have to be able to get along without the slot number anyway.

  - PCIe-SSD bay/enclosure IDs.  Where would these IDs come from, and what
    sort of reporting mechanism are you looking for?  How are these
    structured?  Is there a separate PCIe device for every bay/enclosure
    ID, so there would be a 1:1 mapping from PCIe device to ID?

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave July 24, 2015, 2:31 a.m. UTC | #10
On Thu, Jul 23, 2015 at 12:24 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
> [+cc Matthew for PCIe-SSD perspective]
>
> On Wed, Jul 22, 2015 at 03:07:46PM -0500, Jordan Hargrave wrote:
>> On Tue, Jul 21, 2015 at 8:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> > On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
>> >> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> >> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
>> >> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>> >> >>
>> >> >> > Hi Jordan,
>> >> >> >
>> >> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
>> >> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>> >> >> > >
>> >> >> > > There currently isn't an easy way to determine which PCI devices belong
>> >> >> > to
>> >> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
>> >> >> > Slots).
>> >> >> >
>> >> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
>> >> >> > same information?
>> >> >> >
>> >> >> > --
>> >> >> > Jean Delvare
>> >> >> > SUSE L3 Support
>> >> >> >
>> >> >>
>> >> >> You can but it's as not easy to determine the slot number for leaf devices
>> >> >> on bridges.  Eventually planning on using this for pulling slot number for
>> >> >> identifying network cards and disk numbering for systemd
>> >> >
>> >> > Can you outline the problems with using dmidecode or libsmbios?
>> >>
>> >> Neither dmidecode nor libsmbios report the slot number for devices
>> >> behind bridges in a slot.
>> >
>> > True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
>> > given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
>> > easy to see what the upstream bridges are.
>> >
>> It makes it more complicated I think. You have to check all functions
>> on all devices as well on the walk to the root.
>>
>> Would it look something like this?
>> while (pdev) {
>>   if (pdev->bus->number == dslot->bus && PCI_SLOT(pdev->devfn) ==
>> PCI_SLOT(dslot->devfn) &&
>>      (pci_pcie_type(pdev) != PCI_ROOT_PORT || PCI_FUNC(pdev->devfn) ==
>> PCI_FUNC(dslot->devfn))
>>     return dslot->instance;
>>   if (!pdev->bus->parent)
>>     break;
>>   pdev = pdev->bus->parent->self;
>> }
>>
>> >> I'm wanting to use this sysfs variable to
>> >> get slot numbers for systemd, so using libsmbios and dmidecode aren't
>> >> very useful.
>> >
>> > If you want this in systemd, I see why you wouldn't want a command like
>> > dmidecode.  Help me understand the problem with libsmbios.  Is it not
>> > useful because (a) systemd doesn't want to link with it, or (b) libsmbios
>> > doesn't have the right information, or (c) something else?
>> >
>> Linking with libsmbios would be a problem, and libsmbios doesn't have
>> this info anyway.
>>
>> > It doesn't *look* like this is using any information that is only available
>> > in the kernel, so in principle it seems like this could be done in
>> > user-space.
>> >
>>
>> I actually just got an email from someone who needs to determine the
>> card slot number in their driver... so I've added an external callable
>> 'pci_get_smbios_slot' function to enable this.
>>
>> >> We already report the index for embedded devices in
>> >> pci-label.c, this code should have gone in at the same time.
>> >>
>> >> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
>> >> quad-port NIC in the slot with a bridge at 40:00.0
>> >>
>> >> 42:00.0 Bridge (sec=43, sub=45)
>> >> 43:02.0 Bridge (sec=44, sub=44)
>> >> 43:04.0 Bridge (sec=45, sub=45)
>> >> 44:00.0 Ethernet
>> >> 44:00.1 Ethernet
>> >> 45:00.0 Ethernet
>> >> 45:00.1 Ethernet
>> >>
>> >> So dmidecode only returns the slot number for 42:00.0 but not any
>> >> child devices.  This code will provide a 'slot' sysfs variable that
>> >> reports '3' for all devices under and including the bridge.
>> >
>> > What if the card in slot 3 is an adapter leading to an external PCI
>> > chassis?  Wouldn't we then have a 'slot' file for every card in that
>> > chassis, all containing '3'?  This sounds confusing, although it is true
>> > that they all would be connected via the system board slot 3.
>> >
>> Yes that is correct.  Unless SMBIOS had a table of the second chassis.
>>
>> > Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
>> > something like this in the kernel, how would it relate to that hierarchy?
>> > Could this SMBIOS stuff be integrated into that somehow?
>> >
>>
>> /sys/bus/pci/slots only map a slot to a single PCI device.  systemd
>> does use /sys/bus/pci/slots but it can't see the slot number on cards
>> with bridges as the actual slot number is a parent.  And that's not
>> easily to determine a parent device from the slots interface.  I'd
>> really like something generic here. I'm also looking for some method
>> for reporting bay/enclosure ID for PCIe-SSD devices.
>>
>> > We have a bit of a hodge-podge of slot names already, and I'd like to
>> > simplify things if we can.
>
> We aren't really converging here yet.
>
> We need to figure out exactly what has to be done in the kernel because it
> can't be done in user-space.  So far, your patch *looks* like it could (in
> principle) be done in user-space, given a sysfs PCI hierarchy and the
> SMBIOS information.

It's a pain to do in user space as you have to read DMI information,
then traverse PCI hierarchy until you find a match.  Every utility or
code that wants to match a PCI device to a SMBIOS slot must reinvent
the wheel.  biosdevname has its own code to read the entire SMBIOS
table. It then reads in all pci devices and attempts to do a mapping.
Why do this when the kernel already has this info easily available.

>
> The goal ("determine which PCI devices belong to system slots") is
> definitely generic and useful.  We have quite a bit of slot stuff already
> in the kernel, and some is already exposed via sysfs, so if we're still
> missing what you need, it seems like the current code is off the rails
> somewhere and should be fixed.
>
> Can we explore what you need in a little more detail, with some concrete
> examples?  I heard:
>
>   - Systemd uses /sys/bus/pci/slots but gets the wrong slot information for
>     cards with bridges on them.  Can you give an example including all the
>     PCI devices involved and the related /sys/bus/pci/slots info so we can
>     see exactly what's wrong?

I created a quick and dirty module that populates /sys/bus/pci/slots:

#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/dmi.h>

static int __init kslot_init(void)
{
        const struct dmi_device *dmi;
        struct dmi_dev_onboard *dslot;
        struct pci_dev *sdev;

        dmi = NULL;
        while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL,
dmi)) != NULL) {
                dslot = dmi->device_data;
                sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus,
                                             dslot->devfn);
                if (!sdev)
                        continue;
                pci_create_slot(sdev->bus, dslot->instance, dslot->dev.name,
                                NULL);
                pci_dev_put(sdev);
        }
        return 0;
}

static void __exit kslot_exit(void)
{
}

module_init(kslot_init);
module_exit(kslot_exit);

MODULE_AUTHOR("jordan_hargrave@dell.com");
MODULE_LICENSE("GPL");
===

# ls -l /sys/bus/pci/slots
total 0
drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI1
drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI2
drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI3

# for X in /sys/bus/pci/slots/* ; do echo -n "$X = "; cat $X/address; done
/sys/bus/pci/slots/PCI1 = 0000:41:01
/sys/bus/pci/slots/PCI2 = 0000:42:02
/sys/bus/pci/slots/PCI3 = 0000:04:03

So one problem with this already is the string that gets output. It's
<domain>:<bus>:<slot>   so you still have the problem of what about
bus 43-45 that are children of bus 42.

So wrote a script:
#!/usr/bin/bash
for X in /sys/bus/pci/devices/* ; do
    RX=$(readlink -f $X)
    for Z in /sys/bus/pci/slots/* ; do
        NAME=$(basename $Z)
        ADDR=$(cat $Z/address | cut -f1-2 -d:)
        if [[ $RX =~ $ADDR ]] ; then
            echo $RX,$NAME
        fi
    done
done

for each PCI device you have to iterate all devices in
/sys/bus/pci/slots, and munge the 'address' to actually find a match

output is:
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0,PCI3 [SMBIOS entry]
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.1,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.2,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.3,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.4,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.5,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.6,PCI3
/sys/devices/pci0000:00/0000:00:03.0/0000:04:00.7,PCI3
/sys/devices/pci0000:40/0000:40:01.0/0000:41:00.0,PCI1 [SMBIOS entry]
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0,PCI2 [SMBIOS entry]
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0,PCI2
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0,PCI2
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.0,PCI2
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.1,PCI2
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.0,PCI2
/sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.1,PCI2

And I'm not totally convinced this will work on some weird
configurations I've seen of SMBIOS entries.  as in the B:D:F must
match, not just the bus number.

>
>   - A driver needs the card slot number.  A slot number seems like a user
>     interface thing, so I'm surprised that a driver would be concerned with
>     it.  And of course, SMBIOS is an arch-specific thing, so the driver
>     would have to be able to get along without the slot number anyway.
>
pci_get_smbios_slot would just return -ENODEV or something on systems
without SMBIOS.

>   - PCIe-SSD bay/enclosure IDs.  Where would these IDs come from, and what
>     sort of reporting mechanism are you looking for?  How are these
>     structured?  Is there a separate PCIe device for every bay/enclosure
>     ID, so there would be a 1:1 mapping from PCIe device to ID?
>
On our system the enclosure IDs actually come from IPMI commands.
However you must query every PCI Bus number in the system to get the
correct enclosure mapping.  Currently yes there is a 1:1 mapping.

> Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave July 24, 2015, 3:11 a.m. UTC | #11
On Thu, Jul 23, 2015 at 9:31 PM, Jordan Hargrave <jharg93@gmail.com> wrote:
> On Thu, Jul 23, 2015 at 12:24 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>> [+cc Matthew for PCIe-SSD perspective]
>>
>> On Wed, Jul 22, 2015 at 03:07:46PM -0500, Jordan Hargrave wrote:
>>> On Tue, Jul 21, 2015 at 8:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> > On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
>>> >> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> >> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
>>> >> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>>> >> >>
>>> >> >> > Hi Jordan,
>>> >> >> >
>>> >> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
>>> >> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>>> >> >> > >
>>> >> >> > > There currently isn't an easy way to determine which PCI devices belong
>>> >> >> > to
>>> >> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
>>> >> >> > Slots).
>>> >> >> >
>>> >> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
>>> >> >> > same information?
>>> >> >> >
>>> >> >> > --
>>> >> >> > Jean Delvare
>>> >> >> > SUSE L3 Support
>>> >> >> >
>>> >> >>
>>> >> >> You can but it's as not easy to determine the slot number for leaf devices
>>> >> >> on bridges.  Eventually planning on using this for pulling slot number for
>>> >> >> identifying network cards and disk numbering for systemd
>>> >> >
>>> >> > Can you outline the problems with using dmidecode or libsmbios?
>>> >>
>>> >> Neither dmidecode nor libsmbios report the slot number for devices
>>> >> behind bridges in a slot.
>>> >
>>> > True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
>>> > given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
>>> > easy to see what the upstream bridges are.
>>> >
>>> It makes it more complicated I think. You have to check all functions
>>> on all devices as well on the walk to the root.
>>>
>>> Would it look something like this?
>>> while (pdev) {
>>>   if (pdev->bus->number == dslot->bus && PCI_SLOT(pdev->devfn) ==
>>> PCI_SLOT(dslot->devfn) &&
>>>      (pci_pcie_type(pdev) != PCI_ROOT_PORT || PCI_FUNC(pdev->devfn) ==
>>> PCI_FUNC(dslot->devfn))
>>>     return dslot->instance;
>>>   if (!pdev->bus->parent)
>>>     break;
>>>   pdev = pdev->bus->parent->self;
>>> }
>>>
>>> >> I'm wanting to use this sysfs variable to
>>> >> get slot numbers for systemd, so using libsmbios and dmidecode aren't
>>> >> very useful.
>>> >
>>> > If you want this in systemd, I see why you wouldn't want a command like
>>> > dmidecode.  Help me understand the problem with libsmbios.  Is it not
>>> > useful because (a) systemd doesn't want to link with it, or (b) libsmbios
>>> > doesn't have the right information, or (c) something else?
>>> >
>>> Linking with libsmbios would be a problem, and libsmbios doesn't have
>>> this info anyway.
>>>
>>> > It doesn't *look* like this is using any information that is only available
>>> > in the kernel, so in principle it seems like this could be done in
>>> > user-space.
>>> >
>>>
>>> I actually just got an email from someone who needs to determine the
>>> card slot number in their driver... so I've added an external callable
>>> 'pci_get_smbios_slot' function to enable this.
>>>
>>> >> We already report the index for embedded devices in
>>> >> pci-label.c, this code should have gone in at the same time.
>>> >>
>>> >> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
>>> >> quad-port NIC in the slot with a bridge at 40:00.0
>>> >>
>>> >> 42:00.0 Bridge (sec=43, sub=45)
>>> >> 43:02.0 Bridge (sec=44, sub=44)
>>> >> 43:04.0 Bridge (sec=45, sub=45)
>>> >> 44:00.0 Ethernet
>>> >> 44:00.1 Ethernet
>>> >> 45:00.0 Ethernet
>>> >> 45:00.1 Ethernet
>>> >>
>>> >> So dmidecode only returns the slot number for 42:00.0 but not any
>>> >> child devices.  This code will provide a 'slot' sysfs variable that
>>> >> reports '3' for all devices under and including the bridge.
>>> >
>>> > What if the card in slot 3 is an adapter leading to an external PCI
>>> > chassis?  Wouldn't we then have a 'slot' file for every card in that
>>> > chassis, all containing '3'?  This sounds confusing, although it is true
>>> > that they all would be connected via the system board slot 3.
>>> >
>>> Yes that is correct.  Unless SMBIOS had a table of the second chassis.
>>>
>>> > Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
>>> > something like this in the kernel, how would it relate to that hierarchy?
>>> > Could this SMBIOS stuff be integrated into that somehow?
>>> >
>>>
>>> /sys/bus/pci/slots only map a slot to a single PCI device.  systemd
>>> does use /sys/bus/pci/slots but it can't see the slot number on cards
>>> with bridges as the actual slot number is a parent.  And that's not
>>> easily to determine a parent device from the slots interface.  I'd
>>> really like something generic here. I'm also looking for some method
>>> for reporting bay/enclosure ID for PCIe-SSD devices.
>>>
>>> > We have a bit of a hodge-podge of slot names already, and I'd like to
>>> > simplify things if we can.
>>
>> We aren't really converging here yet.
>>
>> We need to figure out exactly what has to be done in the kernel because it
>> can't be done in user-space.  So far, your patch *looks* like it could (in
>> principle) be done in user-space, given a sysfs PCI hierarchy and the
>> SMBIOS information.
>
> It's a pain to do in user space as you have to read DMI information,
> then traverse PCI hierarchy until you find a match.  Every utility or
> code that wants to match a PCI device to a SMBIOS slot must reinvent
> the wheel.  biosdevname has its own code to read the entire SMBIOS
> table. It then reads in all pci devices and attempts to do a mapping.
> Why do this when the kernel already has this info easily available.
>
>>
>> The goal ("determine which PCI devices belong to system slots") is
>> definitely generic and useful.  We have quite a bit of slot stuff already
>> in the kernel, and some is already exposed via sysfs, so if we're still
>> missing what you need, it seems like the current code is off the rails
>> somewhere and should be fixed.
>>
>> Can we explore what you need in a little more detail, with some concrete
>> examples?  I heard:
>>
>>   - Systemd uses /sys/bus/pci/slots but gets the wrong slot information for
>>     cards with bridges on them.  Can you give an example including all the
>>     PCI devices involved and the related /sys/bus/pci/slots info so we can
>>     see exactly what's wrong?
>
> I created a quick and dirty module that populates /sys/bus/pci/slots:
>
> #include <linux/kernel.h>
> #include <linux/module.h>
> #include <linux/pci.h>
> #include <linux/dmi.h>
>
> static int __init kslot_init(void)
> {
>         const struct dmi_device *dmi;
>         struct dmi_dev_onboard *dslot;
>         struct pci_dev *sdev;
>
>         dmi = NULL;
>         while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL,
> dmi)) != NULL) {
>                 dslot = dmi->device_data;
>                 sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus,
>                                              dslot->devfn);
>                 if (!sdev)
>                         continue;
>                 pci_create_slot(sdev->bus, dslot->instance, dslot->dev.name,
>                                 NULL);
>                 pci_dev_put(sdev);
>         }
>         return 0;
> }
>
> static void __exit kslot_exit(void)
> {
> }
>
> module_init(kslot_init);
> module_exit(kslot_exit);
>
> MODULE_AUTHOR("jordan_hargrave@dell.com");
> MODULE_LICENSE("GPL");
> ===
>
> # ls -l /sys/bus/pci/slots
> total 0
> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI1
> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI2
> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI3
>
> # for X in /sys/bus/pci/slots/* ; do echo -n "$X = "; cat $X/address; done
> /sys/bus/pci/slots/PCI1 = 0000:41:01
> /sys/bus/pci/slots/PCI2 = 0000:42:02
> /sys/bus/pci/slots/PCI3 = 0000:04:03
>
> So one problem with this already is the string that gets output. It's
> <domain>:<bus>:<slot>   so you still have the problem of what about
> bus 43-45 that are children of bus 42.
>
> So wrote a script:
> #!/usr/bin/bash
> for X in /sys/bus/pci/devices/* ; do
>     RX=$(readlink -f $X)
>     for Z in /sys/bus/pci/slots/* ; do
>         NAME=$(basename $Z)
>         ADDR=$(cat $Z/address | cut -f1-2 -d:)
>         if [[ $RX =~ $ADDR ]] ; then
>             echo $RX,$NAME
>         fi
>     done
> done
>
> for each PCI device you have to iterate all devices in
> /sys/bus/pci/slots, and munge the 'address' to actually find a match
>
> output is:
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0,PCI3 [SMBIOS entry]
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.1,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.2,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.3,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.4,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.5,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.6,PCI3
> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.7,PCI3
> /sys/devices/pci0000:40/0000:40:01.0/0000:41:00.0,PCI1 [SMBIOS entry]
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0,PCI2 [SMBIOS entry]
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0,PCI2
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0,PCI2
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.0,PCI2
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.1,PCI2
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.0,PCI2
> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.1,PCI2
>
> And I'm not totally convinced this will work on some weird
> configurations I've seen of SMBIOS entries.  as in the B:D:F must
> match, not just the bus number.

Yeah just verified this won't work on this config.

I have a lspci output that defines the following bridges
(segment:bus:dev:func -> secondary:subordinate)
0000:00:01.0 -> 01:01
0000:00:02.0 -> 02:02
0000:00:03.0 -> 03:03 [SMBIOS Slot entry 1]
0000:00:03.2 -> 04:04 [SMBIOS Slot entry 2]
0000:00:1c.0 -> 05:05
0000:00:1c.4 -> 06:06

The SMBIOS table defines the following entries:
Descriptor: MEZZ1  ID:1 0000:00:03.0
Descriptor: MEZZ2  ID:2 0000:00:03.2

creating /sys/bus/pci/slots:
# ls -l /sys/bus/pci/slots/
total 0
drwxr-xr-x 2 root root 0 Jul 23 23:04 MEZZ1
drwxr-xr-x 2 root root 0 Jul 23 23:04 MEZZ2

# for X in /sys/bus/pci/slots/* ; do echo -n "$X = "; cat $X/address; done
/sys/bus/pci/slots/MEZZ1 = 0000:00:01
/sys/bus/pci/slots/MEZZ2 = 0000:00:02

Both entries are showing Bus 0 only... not 0:3.0 and 0:3.2
>
>>
>>   - A driver needs the card slot number.  A slot number seems like a user
>>     interface thing, so I'm surprised that a driver would be concerned with
>>     it.  And of course, SMBIOS is an arch-specific thing, so the driver
>>     would have to be able to get along without the slot number anyway.
>>
> pci_get_smbios_slot would just return -ENODEV or something on systems
> without SMBIOS.
>
>>   - PCIe-SSD bay/enclosure IDs.  Where would these IDs come from, and what
>>     sort of reporting mechanism are you looking for?  How are these
>>     structured?  Is there a separate PCIe device for every bay/enclosure
>>     ID, so there would be a 1:1 mapping from PCIe device to ID?
>>
> On our system the enclosure IDs actually come from IPMI commands.
> However you must query every PCI Bus number in the system to get the
> correct enclosure mapping.  Currently yes there is a 1:1 mapping.
>
>> Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Hargrave Nov. 10, 2015, 2:40 p.m. UTC | #12
On Thu, Jul 23, 2015 at 10:11 PM, Jordan Hargrave <jharg93@gmail.com> wrote:
> On Thu, Jul 23, 2015 at 9:31 PM, Jordan Hargrave <jharg93@gmail.com> wrote:
>> On Thu, Jul 23, 2015 at 12:24 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>> [+cc Matthew for PCIe-SSD perspective]
>>>
>>> On Wed, Jul 22, 2015 at 03:07:46PM -0500, Jordan Hargrave wrote:
>>>> On Tue, Jul 21, 2015 at 8:09 PM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> > On Tue, Jul 21, 2015 at 12:31:35PM -0500, Jordan Hargrave wrote:
>>>> >> On Tue, Jul 21, 2015 at 11:57 AM, Bjorn Helgaas <bhelgaas@google.com> wrote:
>>>> >> > On Mon, Jul 13, 2015 at 09:57:32AM -0500, Jordan Hargrave wrote:
>>>> >> >> On Mon, Jul 13, 2015 at 2:35 AM, Jean Delvare <jdelvare@suse.de> wrote:
>>>> >> >>
>>>> >> >> > Hi Jordan,
>>>> >> >> >
>>>> >> >> > On Fri, 10 Jul 2015 17:02:46 -0500, Jordan Hargrave wrote:
>>>> >> >> > > From: Jordan Hargrave <Jordan_Hargrave@dell.com>
>>>> >> >> > >
>>>> >> >> > > There currently isn't an easy way to determine which PCI devices belong
>>>> >> >> > to
>>>> >> >> > > system slots.  This patch adds support to read SMBIOS Type 9 (System
>>>> >> >> > Slots).
>>>> >> >> >
>>>> >> >> > I'm wondering, can't you use dmidecode or libsmbios to retrieve the
>>>> >> >> > same information?
>>>> >> >> >
>>>> >> >> > --
>>>> >> >> > Jean Delvare
>>>> >> >> > SUSE L3 Support
>>>> >> >> >
>>>> >> >>
>>>> >> >> You can but it's as not easy to determine the slot number for leaf devices
>>>> >> >> on bridges.  Eventually planning on using this for pulling slot number for
>>>> >> >> identifying network cards and disk numbering for systemd
>>>> >> >
>>>> >> > Can you outline the problems with using dmidecode or libsmbios?
>>>> >>
>>>> >> Neither dmidecode nor libsmbios report the slot number for devices
>>>> >> behind bridges in a slot.
>>>> >
>>>> > True, but it's straightforward to walk up the PCI tree in sysfs, e.g.,
>>>> > given a path like /sys/devices/pci0000:00/0000:00:1c.5/0000:03:00.0/, it's
>>>> > easy to see what the upstream bridges are.
>>>> >
>>>> It makes it more complicated I think. You have to check all functions
>>>> on all devices as well on the walk to the root.
>>>>
>>>> Would it look something like this?
>>>> while (pdev) {
>>>>   if (pdev->bus->number == dslot->bus && PCI_SLOT(pdev->devfn) ==
>>>> PCI_SLOT(dslot->devfn) &&
>>>>      (pci_pcie_type(pdev) != PCI_ROOT_PORT || PCI_FUNC(pdev->devfn) ==
>>>> PCI_FUNC(dslot->devfn))
>>>>     return dslot->instance;
>>>>   if (!pdev->bus->parent)
>>>>     break;
>>>>   pdev = pdev->bus->parent->self;
>>>> }
>>>>
>>>> >> I'm wanting to use this sysfs variable to
>>>> >> get slot numbers for systemd, so using libsmbios and dmidecode aren't
>>>> >> very useful.
>>>> >
>>>> > If you want this in systemd, I see why you wouldn't want a command like
>>>> > dmidecode.  Help me understand the problem with libsmbios.  Is it not
>>>> > useful because (a) systemd doesn't want to link with it, or (b) libsmbios
>>>> > doesn't have the right information, or (c) something else?
>>>> >
>>>> Linking with libsmbios would be a problem, and libsmbios doesn't have
>>>> this info anyway.
>>>>
>>>> > It doesn't *look* like this is using any information that is only available
>>>> > in the kernel, so in principle it seems like this could be done in
>>>> > user-space.
>>>> >
>>>>
>>>> I actually just got an email from someone who needs to determine the
>>>> card slot number in their driver... so I've added an external callable
>>>> 'pci_get_smbios_slot' function to enable this.
>>>>
>>>> >> We already report the index for embedded devices in
>>>> >> pci-label.c, this code should have gone in at the same time.
>>>> >>
>>>> >> For example.  The SMBIOS entry for slot 3 is 40:00.0 There is a
>>>> >> quad-port NIC in the slot with a bridge at 40:00.0
>>>> >>
>>>> >> 42:00.0 Bridge (sec=43, sub=45)
>>>> >> 43:02.0 Bridge (sec=44, sub=44)
>>>> >> 43:04.0 Bridge (sec=45, sub=45)
>>>> >> 44:00.0 Ethernet
>>>> >> 44:00.1 Ethernet
>>>> >> 45:00.0 Ethernet
>>>> >> 45:00.1 Ethernet
>>>> >>
>>>> >> So dmidecode only returns the slot number for 42:00.0 but not any
>>>> >> child devices.  This code will provide a 'slot' sysfs variable that
>>>> >> reports '3' for all devices under and including the bridge.
>>>> >
>>>> > What if the card in slot 3 is an adapter leading to an external PCI
>>>> > chassis?  Wouldn't we then have a 'slot' file for every card in that
>>>> > chassis, all containing '3'?  This sounds confusing, although it is true
>>>> > that they all would be connected via the system board slot 3.
>>>> >
>>>> Yes that is correct.  Unless SMBIOS had a table of the second chassis.
>>>>
>>>> > Also, we do have the /sys/bus/pci/slots/ hierarchy already.  If we do put
>>>> > something like this in the kernel, how would it relate to that hierarchy?
>>>> > Could this SMBIOS stuff be integrated into that somehow?
>>>> >
>>>>
>>>> /sys/bus/pci/slots only map a slot to a single PCI device.  systemd
>>>> does use /sys/bus/pci/slots but it can't see the slot number on cards
>>>> with bridges as the actual slot number is a parent.  And that's not
>>>> easily to determine a parent device from the slots interface.  I'd
>>>> really like something generic here. I'm also looking for some method
>>>> for reporting bay/enclosure ID for PCIe-SSD devices.
>>>>
>>>> > We have a bit of a hodge-podge of slot names already, and I'd like to
>>>> > simplify things if we can.
>>>
>>> We aren't really converging here yet.
>>>
>>> We need to figure out exactly what has to be done in the kernel because it
>>> can't be done in user-space.  So far, your patch *looks* like it could (in
>>> principle) be done in user-space, given a sysfs PCI hierarchy and the
>>> SMBIOS information.
>>
>> It's a pain to do in user space as you have to read DMI information,
>> then traverse PCI hierarchy until you find a match.  Every utility or
>> code that wants to match a PCI device to a SMBIOS slot must reinvent
>> the wheel.  biosdevname has its own code to read the entire SMBIOS
>> table. It then reads in all pci devices and attempts to do a mapping.
>> Why do this when the kernel already has this info easily available.
>>
>>>
>>> The goal ("determine which PCI devices belong to system slots") is
>>> definitely generic and useful.  We have quite a bit of slot stuff already
>>> in the kernel, and some is already exposed via sysfs, so if we're still
>>> missing what you need, it seems like the current code is off the rails
>>> somewhere and should be fixed.
>>>
>>> Can we explore what you need in a little more detail, with some concrete
>>> examples?  I heard:
>>>
>>>   - Systemd uses /sys/bus/pci/slots but gets the wrong slot information for
>>>     cards with bridges on them.  Can you give an example including all the
>>>     PCI devices involved and the related /sys/bus/pci/slots info so we can
>>>     see exactly what's wrong?
>>
>> I created a quick and dirty module that populates /sys/bus/pci/slots:
>>
>> #include <linux/kernel.h>
>> #include <linux/module.h>
>> #include <linux/pci.h>
>> #include <linux/dmi.h>
>>
>> static int __init kslot_init(void)
>> {
>>         const struct dmi_device *dmi;
>>         struct dmi_dev_onboard *dslot;
>>         struct pci_dev *sdev;
>>
>>         dmi = NULL;
>>         while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL,
>> dmi)) != NULL) {
>>                 dslot = dmi->device_data;
>>                 sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus,
>>                                              dslot->devfn);
>>                 if (!sdev)
>>                         continue;
>>                 pci_create_slot(sdev->bus, dslot->instance, dslot->dev.name,
>>                                 NULL);
>>                 pci_dev_put(sdev);
>>         }
>>         return 0;
>> }
>>
>> static void __exit kslot_exit(void)
>> {
>> }
>>
>> module_init(kslot_init);
>> module_exit(kslot_exit);
>>
>> MODULE_AUTHOR("jordan_hargrave@dell.com");
>> MODULE_LICENSE("GPL");
>> ===
>>
>> # ls -l /sys/bus/pci/slots
>> total 0
>> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI1
>> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI2
>> drwxr-xr-x 2 root root 0 Jul 23 21:43 PCI3
>>
>> # for X in /sys/bus/pci/slots/* ; do echo -n "$X = "; cat $X/address; done
>> /sys/bus/pci/slots/PCI1 = 0000:41:01
>> /sys/bus/pci/slots/PCI2 = 0000:42:02
>> /sys/bus/pci/slots/PCI3 = 0000:04:03
>>
>> So one problem with this already is the string that gets output. It's
>> <domain>:<bus>:<slot>   so you still have the problem of what about
>> bus 43-45 that are children of bus 42.
>>
>> So wrote a script:
>> #!/usr/bin/bash
>> for X in /sys/bus/pci/devices/* ; do
>>     RX=$(readlink -f $X)
>>     for Z in /sys/bus/pci/slots/* ; do
>>         NAME=$(basename $Z)
>>         ADDR=$(cat $Z/address | cut -f1-2 -d:)
>>         if [[ $RX =~ $ADDR ]] ; then
>>             echo $RX,$NAME
>>         fi
>>     done
>> done
>>
>> for each PCI device you have to iterate all devices in
>> /sys/bus/pci/slots, and munge the 'address' to actually find a match
>>
>> output is:
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.0,PCI3 [SMBIOS entry]
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.1,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.2,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.3,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.4,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.5,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.6,PCI3
>> /sys/devices/pci0000:00/0000:00:03.0/0000:04:00.7,PCI3
>> /sys/devices/pci0000:40/0000:40:01.0/0000:41:00.0,PCI1 [SMBIOS entry]
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0,PCI2 [SMBIOS entry]
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0,PCI2
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0,PCI2
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.0,PCI2
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:02.0/0000:44:00.1,PCI2
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.0,PCI2
>> /sys/devices/pci0000:40/0000:40:03.0/0000:42:00.0/0000:43:04.0/0000:45:00.1,PCI2
>>
>> And I'm not totally convinced this will work on some weird
>> configurations I've seen of SMBIOS entries.  as in the B:D:F must
>> match, not just the bus number.
>
> Yeah just verified this won't work on this config.
>
> I have a lspci output that defines the following bridges
> (segment:bus:dev:func -> secondary:subordinate)
> 0000:00:01.0 -> 01:01
> 0000:00:02.0 -> 02:02
> 0000:00:03.0 -> 03:03
> 0000:00:03.2 -> 04:04
> 0000:00:1c.0 -> 05:05
> 0000:00:1c.4 -> 06:06
>
> The SMBIOS table defines the following entries:
> Descriptor: MEZZ1  ID:1 0000:00:03.0
> Descriptor: MEZZ2  ID:2 0000:00:03.2
>
> creating /sys/bus/pci/slots:
> # ls -l /sys/bus/pci/slots/
> total 0
> drwxr-xr-x 2 root root 0 Jul 23 23:04 MEZZ1
> drwxr-xr-x 2 root root 0 Jul 23 23:04 MEZZ2
>
> # for X in /sys/bus/pci/slots/* ; do echo -n "$X = "; cat $X/address; done
> /sys/bus/pci/slots/MEZZ1 = 0000:00:01
> /sys/bus/pci/slots/MEZZ2 = 0000:00:02
>
> Both entries are showing Bus 0 only... not 0:3.0 and 0:3.2
>>
>>>
>>>   - A driver needs the card slot number.  A slot number seems like a user
>>>     interface thing, so I'm surprised that a driver would be concerned with
>>>     it.  And of course, SMBIOS is an arch-specific thing, so the driver
>>>     would have to be able to get along without the slot number anyway.
>>>
>> pci_get_smbios_slot would just return -ENODEV or something on systems
>> without SMBIOS.
>>
>>>   - PCIe-SSD bay/enclosure IDs.  Where would these IDs come from, and what
>>>     sort of reporting mechanism are you looking for?  How are these
>>>     structured?  Is there a separate PCIe device for every bay/enclosure
>>>     ID, so there would be a 1:1 mapping from PCIe device to ID?
>>>
>> On our system the enclosure IDs actually come from IPMI commands.
>> However you must query every PCI Bus number in the system to get the
>> correct enclosure mapping.  Currently yes there is a 1:1 mapping.
>>
>>> Bjorn

The systemd maintainers are still refusing to add in code to parse
SMBIOS directly, so we need for this to be in the kernel as a sysfs
variable. Perferrably per-PCI device instead of /sys/bus/pci/XXX/slots
as that only gets populated for the upstream device if a card has one
or more levels of bridges.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index ac1ce4a..8f95897 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -356,6 +356,41 @@  static void __init dmi_save_extended_devices(const struct dmi_header *dm)
 	dmi_save_one_device(*d & 0x7f, dmi_string_nosave(dm, *(d - 1)));
 }
 
+static void __init dmi_save_dev_slot(int instance, int segment, int bus, int devfn, const char *name)
+{
+	struct dmi_dev_onboard *slot;
+	
+	slot = dmi_alloc(sizeof(*slot) + strlen(name) + 1);
+	if (!slot) {
+		printk(KERN_ERR "dmi_save_system_slot: out of memory.\n");
+		return;
+	}
+	slot->instance = instance;
+	slot->segment = segment;
+	slot->bus = bus;
+	slot->devfn = devfn;
+
+	strcpy((char *)&slot[1], name);
+	slot->dev.type = DMI_DEV_TYPE_SYSTEM_SLOT;
+	slot->dev.name = (char *)&slot[1];
+	slot->dev.device_data = slot;
+
+	list_add(&slot->dev.list, &dmi_devices);
+}
+
+
+static void __init dmi_save_system_slot(const struct dmi_header *dm)
+{
+	const char *name;
+	const u8 *d = (u8*)dm;
+	
+	if (dm->type == DMI_ENTRY_SYSTEM_SLOT && dm->length >= 0x11) {
+		name = dmi_string_nosave(dm, *(d + 0x04));
+		dmi_save_dev_slot(*(u16 *)(d + 0x09), *(u16 *)(d + 0xD), 
+				  *(d + 0xF), *(d + 0x10), name);
+	}
+}
+
 static void __init count_mem_devices(const struct dmi_header *dm, void *v)
 {
 	if (dm->type != DMI_ENTRY_MEM_DEVICE)
@@ -437,6 +472,10 @@  static void __init dmi_decode(const struct dmi_header *dm, void *dummy)
 		break;
 	case 41:	/* Onboard Devices Extended Information */
 		dmi_save_extended_devices(dm);
+		break;
+	case 9:         /* System Slots */
+		dmi_save_system_slot(dm);
+		break;
 	}
 }
 
diff --git a/drivers/pci/pci-label.c b/drivers/pci/pci-label.c
index 024b5c1..38dfb45 100644
--- a/drivers/pci/pci-label.c
+++ b/drivers/pci/pci-label.c
@@ -125,14 +125,103 @@  static struct attribute_group smbios_attr_group = {
 	.is_visible = smbios_instance_string_exist,
 };
 
+static int smbios_getslot(struct pci_dev *pdev)
+{
+	struct pci_dev *sdev;
+	struct dmi_dev_onboard *dslot;
+	const struct dmi_device *dmi;
+	u8 sub, sec, bus;
+
+	dmi = NULL;
+	if (pdev->is_virtfn) {
+		/* Get Physical device for SR-IOV */
+		pdev = pdev->physfn;
+	}
+	bus = pdev->bus->number;
+	while ((dmi = dmi_find_device(DMI_DEV_TYPE_SYSTEM_SLOT, NULL, dmi)) != NULL) {
+		sec = sub = -1;
+
+		dslot = dmi->device_data;
+		sdev = pci_get_domain_bus_and_slot(dslot->segment, dslot->bus, dslot->devfn);
+		if (sdev == NULL)
+			continue;
+
+		if (sdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+			pci_read_config_byte(sdev, PCI_SECONDARY_BUS, &sec);
+			pci_read_config_byte(sdev, PCI_SUBORDINATE_BUS, &sub);
+			if (bus >= sec && bus <= sub) {
+				/* device is child of bridge */
+				return dslot->instance;
+			}
+		}
+		if (pci_domain_nr(sdev->bus) == pci_domain_nr(pdev->bus) &&
+		    sdev->bus->number == pdev->bus->number &&
+		    PCI_SLOT(sdev->devfn) == PCI_SLOT(pdev->devfn)) {
+			/* Match domain:bus:dev.  If PCIE root, only match function */
+			if (PCI_FUNC(sdev->devfn) == PCI_FUNC(pdev->devfn) ||
+			    pci_pcie_type(sdev) != PCI_EXP_TYPE_ROOT_PORT) {
+				return dslot->instance;
+			}
+		}
+	}
+	return -ENODEV;
+}
+ 
+static ssize_t smbiosslot_show(struct device *dev,
+			       struct device_attribute *attr, char *buf)
+{
+ 	struct pci_dev *pdev;
+	int slot;
+ 
+ 	pdev = to_pci_dev(dev);
+	slot = smbios_getslot(pdev);
+	if (slot > 0) {
+		return scnprintf(buf, PAGE_SIZE, "%d\n", slot);
+	}
+	return 0;
+}
+
+static umode_t smbios_slot_exist(struct kobject *kobj, struct attribute *attr,
+				 int n)
+{
+	struct device *dev;
+	struct pci_dev *pdev;
+
+	dev = container_of(kobj, struct device, kobj);
+	pdev = to_pci_dev(dev);
+
+	return (smbios_getslot(pdev) > 0) ? S_IRUGO : 0;
+}
+
+static struct device_attribute smbios_attr_slot = {
+	.attr = {.name = "slot", .mode = 0444},
+	.show = smbiosslot_show,
+};
+
+static struct attribute *smbios_slot_attributes[] = {
+	&smbios_attr_slot.attr,
+	NULL,
+};
+
+static struct attribute_group smbios_slot_attr_group = {
+	.attrs = smbios_slot_attributes,
+	.is_visible = smbios_slot_exist,
+};
+
 static int pci_create_smbiosname_file(struct pci_dev *pdev)
 {
+	int rc;
+
+	rc = sysfs_create_group(&pdev->dev.kobj, &smbios_slot_attr_group);
+	if (rc != 0)
+		return rc;
 	return sysfs_create_group(&pdev->dev.kobj, &smbios_attr_group);
 }
 
 static void pci_remove_smbiosname_file(struct pci_dev *pdev)
 {
 	sysfs_remove_group(&pdev->dev.kobj, &smbios_attr_group);
+	sysfs_remove_group(&pdev->dev.kobj, &smbios_slot_attr_group);
 }
 #else
 static inline int pci_create_smbiosname_file(struct pci_dev *pdev)
diff --git a/include/linux/dmi.h b/include/linux/dmi.h
index 5055ac3..09f42e7 100644
--- a/include/linux/dmi.h
+++ b/include/linux/dmi.h
@@ -22,6 +22,7 @@  enum dmi_device_type {
 	DMI_DEV_TYPE_IPMI = -1,
 	DMI_DEV_TYPE_OEM_STRING = -2,
 	DMI_DEV_TYPE_DEV_ONBOARD = -3,
+	DMI_DEV_TYPE_SYSTEM_SLOT = -4,
 };
 
 enum dmi_entry_type {