diff mbox

[RFC] PCI: enable and disable sriov support via sysfs at per device level

Message ID 1349134020-62152-1-git-send-email-ddutile@redhat.com
State Superseded
Headers show

Commit Message

Don Dutile Oct. 1, 2012, 11:27 p.m. UTC
Provide files under sysfs to determine the max number of vfs
an SRIOV-capable PCIe device supports, and methods to enable and
disable the vfs on a per device basis.

Currently, VF enablement by SRIOV-capable PCIe devices is done
in driver-specific module parameters.  If not setup in modprobe files,
it requires admin to unload & reload PF drivers with number of desired
VFs to enable.  Additionally, the enablement is system wide: all
devices controlled by the same driver have the same number of VFs
enabled.  Although the latter is probably desired, there are PCI
configurations setup by system BIOS that may not enable that to occur.

Three files are created if a PCIe device has SRIOV support:
sriov_max_vfs -- cat-ing this file returns the maximum number
                 of VFs a PCIe device supports.
sriov_enable_vfs -- echo'ing a number to this file enables this
                    number of VFs for this given PCIe device.
                 -- cat-ing this file will return the number of VFs
                    currently enabled on this PCIe device.
sriov_disable_vfs -- echo-ing any number other than 0 disables all
                     VFs associated with this PCIe device.

VF enable and disablement is invoked much like other PCIe
configuration functions -- via registered callbacks in the driver,
i.e., probe, release, etc.

Note: I haven't had a chance to refactor an SRIOV PF driver
      to test against; hoping to try ixgbe or igb in next couple days.
      To date, just tested that cat-ing sriov_max_vfs works, and
      cat-ing sriov_enable_vfs returns the correct number when
      a PF driver has been loaded with VFs enabled via per-driver param,
      e.g., modprobe igb max_vfs=4

Send comments and I'll integrate as needed, while modifying
a PF driver to use this interface.

Signed-off-by: Donald Dutile <ddutile@redhat.com>

---
 drivers/pci/pci-sysfs.c | 186 ++++++++++++++++++++++++++++++++++++++++++++----
 include/linux/pci.h     |   2 +
 2 files changed, 175 insertions(+), 13 deletions(-)

Comments

Yuval Mintz Oct. 2, 2012, 7:32 a.m. UTC | #1
> +static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev)
> +{
> +	int pos;
> +
> +	if ((dev->is_physfn) && pci_is_pcie(dev)) {
> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> +            if (pos)
> +		device_remove_file(&dev->dev, pci_dev_sriov_attrs);

Shouldn't it iterate over the attributes when removing them?



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile Oct. 2, 2012, 5:38 p.m. UTC | #2
On 10/02/2012 03:32 AM, Yuval Mintz wrote:
>> +static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev)
>> +{
>> +	int pos;
>> +
>> +	if ((dev->is_physfn)&&  pci_is_pcie(dev)) {
>> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> +            if (pos)
>> +		device_remove_file(&dev->dev, pci_dev_sriov_attrs);
>
> Shouldn't it iterate over the attributes when removing them?
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

good catch; that should be:
  +	int pos, i;
     ....
  +                for (i = 0; attr_name(pci_dev_sriov_attrs[i]); i++)
  +			device_remove_file(&dev->dev, &pci_dev_sriov_attrs);

it doesn't remove the files in reverse order, but neither does
the device-generic code for dev-attrs.


Another question for RFC:
-- above shows I didn't test with hot-plug
-- which makes me wonder/think if [get,put]_device([dev,parent]) needed
    during [create,remove]_file() calls.

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 2, 2012, 8:01 p.m. UTC | #3
On Mon, Oct 1, 2012 at 4:27 PM, Donald Dutile <ddutile@redhat.com> wrote:
> Provide files under sysfs to determine the max number of vfs
> an SRIOV-capable PCIe device supports, and methods to enable and
> disable the vfs on a per device basis.
>
> Currently, VF enablement by SRIOV-capable PCIe devices is done
> in driver-specific module parameters.  If not setup in modprobe files,
> it requires admin to unload & reload PF drivers with number of desired
> VFs to enable.  Additionally, the enablement is system wide: all
> devices controlled by the same driver have the same number of VFs
> enabled.  Although the latter is probably desired, there are PCI
> configurations setup by system BIOS that may not enable that to occur.
>
> Three files are created if a PCIe device has SRIOV support:
> sriov_max_vfs -- cat-ing this file returns the maximum number
>                  of VFs a PCIe device supports.
> sriov_enable_vfs -- echo'ing a number to this file enables this
>                     number of VFs for this given PCIe device.
>                  -- cat-ing this file will return the number of VFs
>                     currently enabled on this PCIe device.
> sriov_disable_vfs -- echo-ing any number other than 0 disables all
>                      VFs associated with this PCIe device.
>
> VF enable and disablement is invoked much like other PCIe
> configuration functions -- via registered callbacks in the driver,
> i.e., probe, release, etc.
>
> Note: I haven't had a chance to refactor an SRIOV PF driver
>       to test against; hoping to try ixgbe or igb in next couple days.
>       To date, just tested that cat-ing sriov_max_vfs works, and
>       cat-ing sriov_enable_vfs returns the correct number when
>       a PF driver has been loaded with VFs enabled via per-driver param,
>       e.g., modprobe igb max_vfs=4
>
> Send comments and I'll integrate as needed, while modifying
> a PF driver to use this interface.
>
> Signed-off-by: Donald Dutile <ddutile@redhat.com>
>
> ---
>  drivers/pci/pci-sysfs.c | 186 ++++++++++++++++++++++++++++++++++++++++++++----
>  include/linux/pci.h     |   2 +
>  2 files changed, 175 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
> index 6869009..1b5eab7 100644
> --- a/drivers/pci/pci-sysfs.c
> +++ b/drivers/pci/pci-sysfs.c
> @@ -404,6 +404,152 @@ static ssize_t d3cold_allowed_show(struct device *dev,
>  }
>  #endif
>
> +
> +#ifdef CONFIG_PCI_IOV
> +static ssize_t sriov_max_vfs_show(struct device *dev,
> +                                 struct device_attribute *attr,
> +                                 char *buf)
> +{
> +       struct pci_dev *pdev;
> +
> +       pdev = to_pci_dev(dev);
> +       return sprintf (buf, "%u\n", pdev->sriov->total);
> +}
> +
> +bool pci_dev_has_sriov(struct pci_dev *pdev)
> +{
> +       int ret = false;
> +       int pos;
> +
> +        if (!pci_is_pcie(pdev))
> +               goto out;
> +
> +       pos =  pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
> +       if (pos)
> +               ret = true;
> +out:
> +       return ret;
> +}
> +
> +static ssize_t sriov_enable_vfs_show(struct device *dev,
> +                                    struct device_attribute *attr,
> +                                    char *buf)
> +{
> +       struct pci_dev *pdev;
> +       int nr_virtfn = 0;
> +
> +       pdev = to_pci_dev(dev);
> +
> +       if (pci_dev_has_sriov(pdev))
> +               nr_virtfn = pdev->sriov->nr_virtfn;
> +
> +       return sprintf (buf, "%u\n", nr_virtfn);
> +}
> +
> +static ssize_t sriov_enable_vfs_store(struct device *dev,
> +                                     struct device_attribute *attr,
> +                                     const char *buf, size_t count)
> +{
> +       struct pci_dev *pdev;
> +       int num_vf_enabled = 0;
> +       unsigned long num_vfs;
> +       pdev = to_pci_dev(dev);
> +
> +       if (!pci_dev_has_sriov(pdev))
> +               goto out;
> +
> +       /* Requested VFs to enable < max_vfs
> +        * and none enabled already
> +        */
> +       if (strict_strtoul(buf, 0, &num_vfs) < 0)
> +                       return -EINVAL;
> +
> +       if ((num_vfs > pdev->sriov->total) ||
> +           (pdev->sriov->nr_virtfn != 0))
> +               return -EINVAL;
> +
> +       /* Ready for lift-off! */
> +       if (pdev->driver && pdev->driver->sriov_enable) {
> +               num_vf_enabled = pdev->driver->sriov_enable(pdev, num_vfs);
> +       }
> +
> +out:
> +       if (num_vf_enabled != num_vfs)
> +               printk(KERN_WARNING
> +                       "%s: %04x:%02x:%02x.%d: Only %d VFs enabled \n",
> +                       pci_name(pdev), pci_domain_nr(pdev->bus),
> +                       pdev->bus->number, PCI_SLOT(pdev->devfn),
> +                       PCI_FUNC(pdev->devfn), num_vf_enabled);
> +
> +       return count;
> +}
> +
> +static ssize_t sriov_disable_vfs_store(struct device *dev,
> +                                      struct device_attribute *attr,
> +                                      const char *buf, size_t count)
> +{
> +       struct pci_dev *pdev;
> +
> +       pdev = to_pci_dev(dev);
> +
> +       /* make sure sriov device & at least 1 vf enabled */
> +       if (!pci_dev_has_sriov(pdev) ||
> +          (pdev->sriov->nr_virtfn == 0))
> +               goto out;
> +
> +       /* Ready for landing! */
> +       if (pdev->driver && pdev->driver->sriov_disable) {
> +               pdev->driver->sriov_disable(pdev);
> +       }
> +out:
> +       return count;
> +}
> +
> +struct device_attribute pci_dev_sriov_attrs[] = {
> +       __ATTR_RO(sriov_max_vfs),
> +       __ATTR(sriov_enable_vfs, (S_IRUGO|S_IWUSR|S_IWGRP),
> +               sriov_enable_vfs_show, sriov_enable_vfs_store),
> +       __ATTR(sriov_disable_vfs, (S_IWUSR|S_IWGRP),
> +               NULL, sriov_disable_vfs_store),
> +       __ATTR_NULL,
> +};
> +
> +static int pci_sriov_create_sysfs_dev_files(struct pci_dev *dev)
> +{
> +       int pos, i;
> +       int retval=0;
> +
> +       if ((dev->is_physfn) && pci_is_pcie(dev)) {
> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> +            if (pos) {
> +               for (i = 0; attr_name(pci_dev_sriov_attrs[i]); i++) {
> +                   retval = device_create_file(&dev->dev, &pci_dev_sriov_attrs[i]);
> +                   if (retval) {
> +                       while (--i >= 0)
> +                            device_remove_file(&dev->dev, &pci_dev_sriov_attrs[i]);
> +                       break;
> +                   }
> +               }
> +           }
> +       }
> +       return retval;

> +}
> +
> +static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev)
> +{
> +       int pos;
> +
> +       if ((dev->is_physfn) && pci_is_pcie(dev)) {
> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
> +            if (pos)
> +               device_remove_file(&dev->dev, pci_dev_sriov_attrs);
> +       }
> +}
...
> @@ -1203,14 +1365,18 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
>                         goto error;
>                 dev->reset_fn = 1;
>         }
> +
> +       /* SRIOV */
> +       retval = pci_sriov_create_sysfs_dev_files(dev);
> +       if (retval)
> +               goto error;
> +

No, You should not use function for that.

According to Greg, you should use group visible to do that.

Please check the patches  that i send out before.

here I attached them again.

Yinghai
Don Dutile Oct. 2, 2012, 8:23 p.m. UTC | #4
On 10/02/2012 04:01 PM, Yinghai Lu wrote:
> On Mon, Oct 1, 2012 at 4:27 PM, Donald Dutile<ddutile@redhat.com>  wrote:
>> Provide files under sysfs to determine the max number of vfs
>> an SRIOV-capable PCIe device supports, and methods to enable and
>> disable the vfs on a per device basis.
>>
>> Currently, VF enablement by SRIOV-capable PCIe devices is done
>> in driver-specific module parameters.  If not setup in modprobe files,
>> it requires admin to unload&  reload PF drivers with number of desired
>> VFs to enable.  Additionally, the enablement is system wide: all
>> devices controlled by the same driver have the same number of VFs
>> enabled.  Although the latter is probably desired, there are PCI
>> configurations setup by system BIOS that may not enable that to occur.
>>
>> Three files are created if a PCIe device has SRIOV support:
>> sriov_max_vfs -- cat-ing this file returns the maximum number
>>                   of VFs a PCIe device supports.
>> sriov_enable_vfs -- echo'ing a number to this file enables this
>>                      number of VFs for this given PCIe device.
>>                   -- cat-ing this file will return the number of VFs
>>                      currently enabled on this PCIe device.
>> sriov_disable_vfs -- echo-ing any number other than 0 disables all
>>                       VFs associated with this PCIe device.
>>
>> VF enable and disablement is invoked much like other PCIe
>> configuration functions -- via registered callbacks in the driver,
>> i.e., probe, release, etc.
>>
>> Note: I haven't had a chance to refactor an SRIOV PF driver
>>        to test against; hoping to try ixgbe or igb in next couple days.
>>        To date, just tested that cat-ing sriov_max_vfs works, and
>>        cat-ing sriov_enable_vfs returns the correct number when
>>        a PF driver has been loaded with VFs enabled via per-driver param,
>>        e.g., modprobe igb max_vfs=4
>>
>> Send comments and I'll integrate as needed, while modifying
>> a PF driver to use this interface.
>>
>> Signed-off-by: Donald Dutile<ddutile@redhat.com>
>>
>> ---
>>   drivers/pci/pci-sysfs.c | 186 ++++++++++++++++++++++++++++++++++++++++++++----
>>   include/linux/pci.h     |   2 +
>>   2 files changed, 175 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
>> index 6869009..1b5eab7 100644
>> --- a/drivers/pci/pci-sysfs.c
>> +++ b/drivers/pci/pci-sysfs.c
>> @@ -404,6 +404,152 @@ static ssize_t d3cold_allowed_show(struct device *dev,
>>   }
>>   #endif
>>
>> +
>> +#ifdef CONFIG_PCI_IOV
>> +static ssize_t sriov_max_vfs_show(struct device *dev,
>> +                                 struct device_attribute *attr,
>> +                                 char *buf)
>> +{
>> +       struct pci_dev *pdev;
>> +
>> +       pdev = to_pci_dev(dev);
>> +       return sprintf (buf, "%u\n", pdev->sriov->total);
>> +}
>> +
>> +bool pci_dev_has_sriov(struct pci_dev *pdev)
>> +{
>> +       int ret = false;
>> +       int pos;
>> +
>> +        if (!pci_is_pcie(pdev))
>> +               goto out;
>> +
>> +       pos =  pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
>> +       if (pos)
>> +               ret = true;
>> +out:
>> +       return ret;
>> +}
>> +
>> +static ssize_t sriov_enable_vfs_show(struct device *dev,
>> +                                    struct device_attribute *attr,
>> +                                    char *buf)
>> +{
>> +       struct pci_dev *pdev;
>> +       int nr_virtfn = 0;
>> +
>> +       pdev = to_pci_dev(dev);
>> +
>> +       if (pci_dev_has_sriov(pdev))
>> +               nr_virtfn = pdev->sriov->nr_virtfn;
>> +
>> +       return sprintf (buf, "%u\n", nr_virtfn);
>> +}
>> +
>> +static ssize_t sriov_enable_vfs_store(struct device *dev,
>> +                                     struct device_attribute *attr,
>> +                                     const char *buf, size_t count)
>> +{
>> +       struct pci_dev *pdev;
>> +       int num_vf_enabled = 0;
>> +       unsigned long num_vfs;
>> +       pdev = to_pci_dev(dev);
>> +
>> +       if (!pci_dev_has_sriov(pdev))
>> +               goto out;
>> +
>> +       /* Requested VFs to enable<  max_vfs
>> +        * and none enabled already
>> +        */
>> +       if (strict_strtoul(buf, 0,&num_vfs)<  0)
>> +                       return -EINVAL;
>> +
>> +       if ((num_vfs>  pdev->sriov->total) ||
>> +           (pdev->sriov->nr_virtfn != 0))
>> +               return -EINVAL;
>> +
>> +       /* Ready for lift-off! */
>> +       if (pdev->driver&&  pdev->driver->sriov_enable) {
>> +               num_vf_enabled = pdev->driver->sriov_enable(pdev, num_vfs);
>> +       }
>> +
>> +out:
>> +       if (num_vf_enabled != num_vfs)
>> +               printk(KERN_WARNING
>> +                       "%s: %04x:%02x:%02x.%d: Only %d VFs enabled \n",
>> +                       pci_name(pdev), pci_domain_nr(pdev->bus),
>> +                       pdev->bus->number, PCI_SLOT(pdev->devfn),
>> +                       PCI_FUNC(pdev->devfn), num_vf_enabled);
>> +
>> +       return count;
>> +}
>> +
>> +static ssize_t sriov_disable_vfs_store(struct device *dev,
>> +                                      struct device_attribute *attr,
>> +                                      const char *buf, size_t count)
>> +{
>> +       struct pci_dev *pdev;
>> +
>> +       pdev = to_pci_dev(dev);
>> +
>> +       /* make sure sriov device&  at least 1 vf enabled */
>> +       if (!pci_dev_has_sriov(pdev) ||
>> +          (pdev->sriov->nr_virtfn == 0))
>> +               goto out;
>> +
>> +       /* Ready for landing! */
>> +       if (pdev->driver&&  pdev->driver->sriov_disable) {
>> +               pdev->driver->sriov_disable(pdev);
>> +       }
>> +out:
>> +       return count;
>> +}
>> +
>> +struct device_attribute pci_dev_sriov_attrs[] = {
>> +       __ATTR_RO(sriov_max_vfs),
>> +       __ATTR(sriov_enable_vfs, (S_IRUGO|S_IWUSR|S_IWGRP),
>> +               sriov_enable_vfs_show, sriov_enable_vfs_store),
>> +       __ATTR(sriov_disable_vfs, (S_IWUSR|S_IWGRP),
>> +               NULL, sriov_disable_vfs_store),
>> +       __ATTR_NULL,
>> +};
>> +
>> +static int pci_sriov_create_sysfs_dev_files(struct pci_dev *dev)
>> +{
>> +       int pos, i;
>> +       int retval=0;
>> +
>> +       if ((dev->is_physfn)&&  pci_is_pcie(dev)) {
>> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> +            if (pos) {
>> +               for (i = 0; attr_name(pci_dev_sriov_attrs[i]); i++) {
>> +                   retval = device_create_file(&dev->dev,&pci_dev_sriov_attrs[i]);
>> +                   if (retval) {
>> +                       while (--i>= 0)
>> +                            device_remove_file(&dev->dev,&pci_dev_sriov_attrs[i]);
>> +                       break;
>> +                   }
>> +               }
>> +           }
>> +       }
>> +       return retval;
>
>> +}
>> +
>> +static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev)
>> +{
>> +       int pos;
>> +
>> +       if ((dev->is_physfn)&&  pci_is_pcie(dev)) {
>> +            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
>> +            if (pos)
>> +               device_remove_file(&dev->dev, pci_dev_sriov_attrs);
>> +       }
>> +}
> ...
>> @@ -1203,14 +1365,18 @@ static int pci_create_capabilities_sysfs(struct pci_dev *dev)
>>                          goto error;
>>                  dev->reset_fn = 1;
>>          }
>> +
>> +       /* SRIOV */
>> +       retval = pci_sriov_create_sysfs_dev_files(dev);
>> +       if (retval)
>> +               goto error;
>> +
>
> No, You should not use function for that.
>
> According to Greg, you should use group visible to do that.
>
> Please check the patches  that i send out before.
>
> here I attached them again.
>
> Yinghai
Greg: Why not use the above functions?
       and what are 'visible groups' ??
I tried to follow how other optional files were added,
which didn't involve groups or visibility. ... well, ok,
I followed optional symlinks in iov.c... which didn't use
group attributes, and these file seemed no different than other
pcie attributes which is handled in the pci-sysfs.c module
in this fashion.

Yinghai: As for looking at your patches, I did that.... and
they made no sense, i.e., I don't see the value/use/need of
'is_visible'.    maybe if there was some decent
documentation/comments on this stuff, I could
confidently use them.... if they exist, please point me
in the that direction.  if not, then point me to better examples.
Looking in Documentation, I didn't find anything to help.
Doing a grep on is_visible, all I see are flags being
added to subsystem-specific struct's, and the use of device_[create,remove]_files().



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 2, 2012, 8:33 p.m. UTC | #5
On Tue, Oct 2, 2012 at 1:23 PM, Don Dutile <ddutile@redhat.com> wrote:

>
> Yinghai: As for looking at your patches, I did that.... and
> they made no sense, i.e., I don't see the value/use/need of
> 'is_visible'.

sysfs frame work use it during create sysfs files for that group.

> maybe if there was some decent
> documentation/comments on this stuff, I could
> confidently use them.... if they exist, please point me
> in the that direction.  if not, then point me to better examples.
> Looking in Documentation,

sriov_xxx.patch should create one file, you could just expand them to
make three.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Oct. 2, 2012, 8:39 p.m. UTC | #6
On Tue, Oct 02, 2012 at 04:23:40PM -0400, Don Dutile wrote:
> Greg: Why not use the above functions?

What "above functions"?  You make a lot of calls in the code :)

You are creating an attribute group, and then manually walking through
it to create/destroy the file?  That's not good, use the in-kernel
functions to do that automatically for you.

Also, you need to do this _before_ the device shows up to userspace,
otherwise you have a race that you just lost with udev and the like.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile Oct. 2, 2012, 9:06 p.m. UTC | #7
On 10/02/2012 04:39 PM, Greg Kroah-Hartman wrote:
> On Tue, Oct 02, 2012 at 04:23:40PM -0400, Don Dutile wrote:
>> Greg: Why not use the above functions?
>
> What "above functions"?  You make a lot of calls in the code :)
>
> You are creating an attribute group, and then manually walking through
> it to create/destroy the file?  That's not good, use the in-kernel
> functions to do that automatically for you.
>
> Also, you need to do this _before_ the device shows up to userspace,
> otherwise you have a race that you just lost with udev and the like.
>
> thanks,
>
> greg k-h

So, why is it ok to create the 'reset' file after the device shows up?
Like reset, SRIOV is optional.  The sysfs files are designed to only
be used by userspace utils to enable/disable VFs after the system is up.

Or should the design allow a udev rule to be created that if one
of these new sriov files exists, then perform a set of read-max-vf &
enable n-vf ops ?



--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Oct. 3, 2012, 3:10 a.m. UTC | #8
On Tue, Oct 02, 2012 at 05:06:34PM -0400, Don Dutile wrote:
> On 10/02/2012 04:39 PM, Greg Kroah-Hartman wrote:
> >On Tue, Oct 02, 2012 at 04:23:40PM -0400, Don Dutile wrote:
> >>Greg: Why not use the above functions?
> >
> >What "above functions"?  You make a lot of calls in the code :)
> >
> >You are creating an attribute group, and then manually walking through
> >it to create/destroy the file?  That's not good, use the in-kernel
> >functions to do that automatically for you.
> >
> >Also, you need to do this _before_ the device shows up to userspace,
> >otherwise you have a race that you just lost with udev and the like.
> >
> >thanks,
> >
> >greg k-h
> 
> So, why is it ok to create the 'reset' file after the device shows up?

It's not, that should be fixed, patches are always welcome.

> Like reset, SRIOV is optional.  The sysfs files are designed to only
> be used by userspace utils to enable/disable VFs after the system is up.
> 
> Or should the design allow a udev rule to be created that if one
> of these new sriov files exists, then perform a set of read-max-vf &
> enable n-vf ops ?

Yes it should.  Never create sysfs files after the device is present in
the system, otherwise userspace never knows about it.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 3, 2012, 4:58 a.m. UTC | #9
On Tue, Oct 2, 2012 at 8:10 PM, Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
> On Tue, Oct 02, 2012 at 05:06:34PM -0400, Don Dutile wrote:
>>
>> So, why is it ok to create the 'reset' file after the device shows up?
>
> It's not, that should be fixed, patches are always welcome.
>

I tried to move "reset" to device type, but it did not work.

Anyway will send out two patches for dev_type and vga again.

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Konrad Wilk Oct. 3, 2012, 1 p.m. UTC | #10
>> Also, you need to do this _before_ the device shows up to userspace,
>> otherwise you have a race that you just lost with udev and the like.
>>
>> thanks,
>>
>> greg k-h
>
>
> So, why is it ok to create the 'reset' file after the device shows up?
> Like reset, SRIOV is optional.  The sysfs files are designed to only
> be used by userspace utils to enable/disable VFs after the system is up.

It should (the attribute) be there when the device is created. Otherwise you end
up with the case when udev could call 'reset' attribute the moment the
SR-IOV device shows up. Or the userspace utility could.

> Or should the design allow a udev rule to be created that if one
> of these new sriov files exists, then perform a set of read-max-vf &
> enable n-vf ops ?

Well, that depends on the ingenuity of the udev rules writers.  But we should
also make sure that the concept is race-free. You don't want the udev
rules to have 'sleep 1' to work-around the kernel code.
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Don Dutile Oct. 3, 2012, 1:18 p.m. UTC | #11
On 10/02/2012 11:10 PM, Greg Kroah-Hartman wrote:
> On Tue, Oct 02, 2012 at 05:06:34PM -0400, Don Dutile wrote:
>> On 10/02/2012 04:39 PM, Greg Kroah-Hartman wrote:
>>> On Tue, Oct 02, 2012 at 04:23:40PM -0400, Don Dutile wrote:
>>>> Greg: Why not use the above functions?
>>>
>>> What "above functions"?  You make a lot of calls in the code :)
>>>
>>> You are creating an attribute group, and then manually walking through
>>> it to create/destroy the file?  That's not good, use the in-kernel
>>> functions to do that automatically for you.
>>>
>>> Also, you need to do this _before_ the device shows up to userspace,
>>> otherwise you have a race that you just lost with udev and the like.
>>>
>>> thanks,
>>>
>>> greg k-h
>>
>> So, why is it ok to create the 'reset' file after the device shows up?
>
> It's not, that should be fixed, patches are always welcome.
>
>> Like reset, SRIOV is optional.  The sysfs files are designed to only
>> be used by userspace utils to enable/disable VFs after the system is up.
>>
>> Or should the design allow a udev rule to be created that if one
>> of these new sriov files exists, then perform a set of read-max-vf&
>> enable n-vf ops ?
>
> Yes it should.  Never create sysfs files after the device is present in
> the system, otherwise userspace never knows about it.
>
> thanks,
>
> greg k-h

ok, thanks for feedback and bit of education ....
still looking for more on this 'visible' tagging though.

so, I guess Bjorn ought to add 'cleaning up pci-sysfs' to
his PCI to-do list...

maybe once i figure out the proper sriov sysfs code, I
can duplicate that effort in the reset,vga & pm-capabilities
section.

- Don
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 3, 2012, 5:51 p.m. UTC | #12
First two patches are old patches in my queue.
They introduce pci_dev_type and use it with vga in sysfs.

other two will add set_max_vfs in sysfs for device that support siov.

last one is draft version for ixgbe, still need ixgbe guys to sort it
out.

Thanks

Yinghai


Yinghai Lu (5):
  PCI: Add pci_dev_type
  PCI, sys: Use is_visable() with boot_vga attribute for pci_dev
  PCI: add set_max_vfs in pci_driver ops
  PCI: Add max_vfs in sysfs per pci device where supports
  ixgbe: add driver set_max_vfs support

 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    2 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   44 +++++++++---
 drivers/pci/pci-sysfs.c                       |   96 ++++++++++++++++++++++---
 drivers/pci/pci.h                             |    1 +
 drivers/pci/probe.c                           |    1 +
 include/linux/pci.h                           |    2 +
 6 files changed, 126 insertions(+), 20 deletions(-)
Yinghai Lu Oct. 3, 2012, 5:55 p.m. UTC | #13
On Wed, Oct 3, 2012 at 6:18 AM, Don Dutile <ddutile@redhat.com> wrote:
> maybe once i figure out the proper sriov sysfs code, I
> can duplicate that effort in the reset,vga & pm-capabilities
> section.

I resent my patcheset in plain mail instead of attached form.

I am thinking we should ixgbe guys to sort out ixgbe change to use
set_max_vfs ops.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 3, 2012, 5:57 p.m. UTC | #14
On Wed, Oct 3, 2012 at 10:51 AM, Yinghai Lu <yinghai@kernel.org> wrote:
> Need ixgbe guys to close the loop to use set_max_vfs instead
> kernel parameters.

Sorry, I should put RFC in the subject line for this one.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rose, Gregory V Oct. 3, 2012, 6:16 p.m. UTC | #15
-----Original Message-----
> From: yhlu.kernel@gmail.com [mailto:yhlu.kernel@gmail.com] On Behalf Of
> Yinghai Lu
> Sent: Wednesday, October 03, 2012 10:56 AM
> To: Don Dutile
> Cc: Greg Kroah-Hartman; linux-pci@vger.kernel.org; bhelgaas@google.com;
> yuvalmin@broadcom.com; bhutchings@solarflare.com; Rose, Gregory V;
> davem@davemloft.net
> Subject: Re: [RFC] PCI: enable and disable sriov support via sysfs at per
> device level
> 
> On Wed, Oct 3, 2012 at 6:18 AM, Don Dutile <ddutile@redhat.com> wrote:
> > maybe once i figure out the proper sriov sysfs code, I can duplicate
> > that effort in the reset,vga & pm-capabilities section.
> 
> I resent my patcheset in plain mail instead of attached form.
> 
> I am thinking we should ixgbe guys to sort out ixgbe change to use
> set_max_vfs ops.

I'm willing to do that but I'm sort of waiting for things to settle out a bit and see which set of patches everyone settles on before diving into it.

- Greg

> 
> Thanks
> 
> Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Yinghai Lu Oct. 3, 2012, 6:28 p.m. UTC | #16
On Wed, Oct 3, 2012 at 11:16 AM, Rose, Gregory V
<gregory.v.rose@intel.com> wrote:
>>
>> I resent my patcheset in plain mail instead of attached form.
>>
>> I am thinking we should ixgbe guys to sort out ixgbe change to use
>> set_max_vfs ops.
>
> I'm willing to do that but I'm sort of waiting for things to settle out a bit and see which set of patches everyone settles on before diving into it.

That would be great !

Actually your changes to ixgbe should be critical changes.

others change to drivers/pci/pci-sysfs.c or adding 3 or 1 methods in driver ops
is not that really matter.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c
index 6869009..1b5eab7 100644
--- a/drivers/pci/pci-sysfs.c
+++ b/drivers/pci/pci-sysfs.c
@@ -404,6 +404,152 @@  static ssize_t d3cold_allowed_show(struct device *dev,
 }
 #endif
 
+
+#ifdef CONFIG_PCI_IOV
+static ssize_t sriov_max_vfs_show(struct device *dev,
+				  struct device_attribute *attr, 
+				  char *buf)
+{
+	struct pci_dev *pdev;
+
+	pdev = to_pci_dev(dev);
+	return sprintf (buf, "%u\n", pdev->sriov->total);
+}
+
+bool pci_dev_has_sriov(struct pci_dev *pdev)
+{
+	int ret = false;
+	int pos;
+
+        if (!pci_is_pcie(pdev))
+		goto out;
+
+	pos =  pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_SRIOV);
+	if (pos)
+		ret = true;
+out:
+	return ret;
+}
+
+static ssize_t sriov_enable_vfs_show(struct device *dev,
+				     struct device_attribute *attr,
+				     char *buf)
+{
+	struct pci_dev *pdev;
+	int nr_virtfn = 0;
+
+	pdev = to_pci_dev(dev);
+
+	if (pci_dev_has_sriov(pdev))
+		nr_virtfn = pdev->sriov->nr_virtfn;
+
+	return sprintf (buf, "%u\n", nr_virtfn);
+}
+
+static ssize_t sriov_enable_vfs_store(struct device *dev,
+				      struct device_attribute *attr,
+				      const char *buf, size_t count)
+{
+	struct pci_dev *pdev;
+	int num_vf_enabled = 0;
+	unsigned long num_vfs;
+	pdev = to_pci_dev(dev);
+	
+	if (!pci_dev_has_sriov(pdev))
+		goto out;
+
+	/* Requested VFs to enable < max_vfs 
+	 * and none enabled already
+	 */
+	if (strict_strtoul(buf, 0, &num_vfs) < 0)
+			return -EINVAL;
+
+	if ((num_vfs > pdev->sriov->total) ||
+	    (pdev->sriov->nr_virtfn != 0))
+		return -EINVAL;
+
+	/* Ready for lift-off! */
+	if (pdev->driver && pdev->driver->sriov_enable) {
+		num_vf_enabled = pdev->driver->sriov_enable(pdev, num_vfs);
+	}
+
+out:
+	if (num_vf_enabled != num_vfs)
+		printk(KERN_WARNING 
+			"%s: %04x:%02x:%02x.%d: Only %d VFs enabled \n",
+			pci_name(pdev), pci_domain_nr(pdev->bus),
+			pdev->bus->number, PCI_SLOT(pdev->devfn),
+			PCI_FUNC(pdev->devfn), num_vf_enabled);
+
+	return count;
+}
+
+static ssize_t sriov_disable_vfs_store(struct device *dev,
+				       struct device_attribute *attr,
+				       const char *buf, size_t count)
+{
+	struct pci_dev *pdev;
+
+	pdev = to_pci_dev(dev);
+
+	/* make sure sriov device & at least 1 vf enabled */
+	if (!pci_dev_has_sriov(pdev) ||
+	   (pdev->sriov->nr_virtfn == 0))
+		goto out;
+
+	/* Ready for landing! */
+	if (pdev->driver && pdev->driver->sriov_disable) {
+		pdev->driver->sriov_disable(pdev);
+	}
+out:
+	return count;
+}
+
+struct device_attribute pci_dev_sriov_attrs[] = {
+	__ATTR_RO(sriov_max_vfs),
+	__ATTR(sriov_enable_vfs, (S_IRUGO|S_IWUSR|S_IWGRP), 
+		sriov_enable_vfs_show, sriov_enable_vfs_store),
+	__ATTR(sriov_disable_vfs, (S_IWUSR|S_IWGRP), 
+		NULL, sriov_disable_vfs_store),
+	__ATTR_NULL,
+};
+
+static int pci_sriov_create_sysfs_dev_files(struct pci_dev *dev)
+{
+	int pos, i;
+	int retval=0;
+
+	if ((dev->is_physfn) && pci_is_pcie(dev)) {
+            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
+            if (pos) {
+		for (i = 0; attr_name(pci_dev_sriov_attrs[i]); i++) {
+		    retval = device_create_file(&dev->dev, &pci_dev_sriov_attrs[i]);
+		    if (retval) {
+			while (--i >= 0)
+			     device_remove_file(&dev->dev, &pci_dev_sriov_attrs[i]);
+			break;
+		    }
+		}
+	    }
+	}
+	return retval;
+}
+
+static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev)
+{
+	int pos;
+
+	if ((dev->is_physfn) && pci_is_pcie(dev)) {
+            pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_SRIOV);
+            if (pos)
+		device_remove_file(&dev->dev, pci_dev_sriov_attrs);
+	}
+}
+#else
+static int pci_sriov_create_sysfs_dev_files(struct pci_dev *dev) { return 0; }
+static void pci_sriov_remove_sysfs_dev_files(struct pci_dev *dev) { return; }
+#endif /* CONFIG_PCI_IOV */
+
 struct device_attribute pci_dev_attrs[] = {
 	__ATTR_RO(resource),
 	__ATTR_RO(vendor),
@@ -1169,6 +1315,22 @@  static ssize_t reset_store(struct device *dev,
 
 static struct device_attribute reset_attr = __ATTR(reset, 0200, NULL, reset_store);
 
+static void pci_reset_remove_sysfs_dev_file(struct pci_dev *dev)
+{
+	if (dev->reset_fn) {
+		device_remove_file(&dev->dev, &reset_attr);
+		dev->reset_fn = 0;
+	}
+}
+
+static void pci_vpd_remove_sysfs_dev_file(struct pci_dev *dev)
+{
+	if (dev->vpd && dev->vpd->attr) {
+		sysfs_remove_bin_file(&dev->dev.kobj, dev->vpd->attr);
+		kfree(dev->vpd->attr);
+	}
+}
+
 static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 {
 	int retval;
@@ -1203,14 +1365,18 @@  static int pci_create_capabilities_sysfs(struct pci_dev *dev)
 			goto error;
 		dev->reset_fn = 1;
 	}
+
+	/* SRIOV */
+	retval = pci_sriov_create_sysfs_dev_files(dev);
+	if (retval)
+		goto error;
+
 	return 0;
 
 error:
+	pci_reset_remove_sysfs_dev_file(dev);
 	pcie_aspm_remove_sysfs_dev_files(dev);
-	if (dev->vpd && dev->vpd->attr) {
-		sysfs_remove_bin_file(&dev->dev.kobj, dev->vpd->attr);
-		kfree(dev->vpd->attr);
-	}
+	pci_vpd_remove_sysfs_dev_file(dev);
 
 	return retval;
 }
@@ -1303,16 +1469,10 @@  err:
 
 static void pci_remove_capabilities_sysfs(struct pci_dev *dev)
 {
-	if (dev->vpd && dev->vpd->attr) {
-		sysfs_remove_bin_file(&dev->dev.kobj, dev->vpd->attr);
-		kfree(dev->vpd->attr);
-	}
-
+	pci_sriov_remove_sysfs_dev_files(dev);
+	pci_reset_remove_sysfs_dev_file(dev);
 	pcie_aspm_remove_sysfs_dev_files(dev);
-	if (dev->reset_fn) {
-		device_remove_file(&dev->dev, &reset_attr);
-		dev->reset_fn = 0;
-	}
+	pci_vpd_remove_sysfs_dev_file(dev);
 }
 
 /**
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5faa831..29e10aa 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -596,6 +596,8 @@  struct pci_driver {
 	int  (*resume_early) (struct pci_dev *dev);
 	int  (*resume) (struct pci_dev *dev);	                /* Device woken up */
 	void (*shutdown) (struct pci_dev *dev);
+        int (*sriov_enable) (struct pci_dev *dev, int num_vfs);    /* PF pci dev */
+        void (*sriov_disable) (struct pci_dev *dev); 
 	struct pci_error_handlers *err_handler;
 	struct device_driver	driver;
 	struct pci_dynids dynids;