diff mbox

pci: Limit VPD reads for all Intel Ethernet devices

Message ID 20150519000042.56109.69779.stgit@mdrustad-wks.jf.intel.com
State Changes Requested
Headers show

Commit Message

Rustad, Mark D May 19, 2015, midnight UTC
To save boot time and some memory, limit VPD size to the maximum
possible for all Intel Ethernet devices that have VPD, which is 1K.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/pci/quirks.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Duyck May 19, 2015, 3:54 p.m. UTC | #1
On 05/18/2015 05:00 PM, Mark D Rustad wrote:
> To save boot time and some memory, limit VPD size to the maximum
> possible for all Intel Ethernet devices that have VPD, which is 1K.
>
> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
>   drivers/pci/quirks.c |    7 +++++--
>   1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index c6dc1dfd25d5..4fabbeda964a 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -1903,12 +1903,15 @@ static void quirk_netmos(struct pci_dev *dev)
>   DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
>   			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
>   
> -static void quirk_e100_interrupt(struct pci_dev *dev)
> +static void quirk_intel_enet(struct pci_dev *dev)
>   {
>   	u16 command, pmcsr;
>   	u8 __iomem *csr;
>   	u8 cmd_hi;
>   
> +	if (dev->vpd)
> +		dev->vpd->len = 0x400;
> +
>   	switch (dev->device) {
>   	/* PCI IDs taken from drivers/net/e100.c */
>   	case 0x1029:
> @@ -1967,7 +1970,7 @@ static void quirk_e100_interrupt(struct pci_dev *dev)
>   	iounmap(csr);
>   }
>   DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
> -			PCI_CLASS_NETWORK_ETHERNET, 8, quirk_e100_interrupt);
> +			      PCI_CLASS_NETWORK_ETHERNET, 8, quirk_intel_enet);
>   
>   /*
>    * The 82575 and 82598 may experience data corruption issues when transitioning
>

I wasn't a fan of the first VPD patch and this clinches it.  What I 
would recommend doing is identifying all of the functions for a given 
device that share a VPD and then eliminate the VPD structure for all but 
the first function.  By doing that the OS should treat the other 
functions as though their VPD areas don't exist.

The way I would code it would be to scan for any other functions with 
the same bus and device number and if one with a lower function number 
has a VPD area we delete our own VPD area, if one with a higher function 
number has a VPD area then we delete that ones VPD area, if the search 
returns our function number we resume our search from there, and do 
nothing if no other devices with VPD are found. The general idea is to 
sort things until the device closest to function 0 is the only one with 
a VPD area.

Artificially limiting the size of the VPD does nothing but cut off 
possibly useful data, you would be better of providing all of the data 
on only the first function than providing only partial data on all 
functions and adding extra lock overhead.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D May 19, 2015, 4:19 p.m. UTC | #2
> On May 19, 2015, at 8:54 AM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
> 
> On 05/18/2015 05:00 PM, Mark D Rustad wrote:
>> To save boot time and some memory, limit VPD size to the maximum
>> possible for all Intel Ethernet devices that have VPD, which is 1K.
>> 
>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>> ---
>>  drivers/pci/quirks.c |    7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>> index c6dc1dfd25d5..4fabbeda964a 100644
>> --- a/drivers/pci/quirks.c
>> +++ b/drivers/pci/quirks.c
>> @@ -1903,12 +1903,15 @@ static void quirk_netmos(struct pci_dev *dev)
>>  DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
>>  			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
>>  -static void quirk_e100_interrupt(struct pci_dev *dev)
>> +static void quirk_intel_enet(struct pci_dev *dev)
>>  {
>>  	u16 command, pmcsr;
>>  	u8 __iomem *csr;
>>  	u8 cmd_hi;
>>  +	if (dev->vpd)
>> +		dev->vpd->len = 0x400;
>> +
>>  	switch (dev->device) {
>>  	/* PCI IDs taken from drivers/net/e100.c */
>>  	case 0x1029:

> I wasn't a fan of the first VPD patch and this clinches it.  What I would recommend doing is identifying all of the functions for a given device that share a VPD and then eliminate the VPD structure for all but the first function.  By doing that the OS should treat the other functions as though their VPD areas don't exist.

Please, lets discuss only *this* patch in this thread. The patches are not related except that they both have to do with VPD.

<snip>

> Artificially limiting the size of the VPD does nothing but cut off possibly useful data, you would be better of providing all of the data on only the first function than providing only partial data on all functions and adding extra lock overhead.

This limit only limits the maximum that the OS will read to what is architecturally possible in these devices. Yes, PCIe architecturally provides for the possibility of more, but these devices do not. More repeating data can be read, however slowly, but there is no possibility of useful content beyond the first 1K. If this limit were set to 0x100, which is more in line what the actual usage is, it would be an artificial limit, but at 1K it is not. Oh and it does include devices made by others that incorporate Intel Ethernet silicon, not just Intel-built devices.

Since this quirk function was already being run for every Intel Ethernet device, this seemed like a trivial thing to do to speed up booting a bit. It has the greatest effect with 82599 devices. Newer devices will respond to reads faster.

--
Mark Rustad, Networking Division, Intel Corporation
Alexander Duyck May 19, 2015, 5:50 p.m. UTC | #3
On 05/19/2015 09:19 AM, Rustad, Mark D wrote:
>> On May 19, 2015, at 8:54 AM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>>
>> On 05/18/2015 05:00 PM, Mark D Rustad wrote:
>>> To save boot time and some memory, limit VPD size to the maximum
>>> possible for all Intel Ethernet devices that have VPD, which is 1K.
>>>
>>> Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
>>> Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
>>> ---
>>>   drivers/pci/quirks.c |    7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
>>> index c6dc1dfd25d5..4fabbeda964a 100644
>>> --- a/drivers/pci/quirks.c
>>> +++ b/drivers/pci/quirks.c
>>> @@ -1903,12 +1903,15 @@ static void quirk_netmos(struct pci_dev *dev)
>>>   DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
>>>   			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
>>>   -static void quirk_e100_interrupt(struct pci_dev *dev)
>>> +static void quirk_intel_enet(struct pci_dev *dev)
>>>   {
>>>   	u16 command, pmcsr;
>>>   	u8 __iomem *csr;
>>>   	u8 cmd_hi;
>>>   +	if (dev->vpd)
>>> +		dev->vpd->len = 0x400;
>>> +
>>>   	switch (dev->device) {
>>>   	/* PCI IDs taken from drivers/net/e100.c */
>>>   	case 0x1029:
>> I wasn't a fan of the first VPD patch and this clinches it.  What I would recommend doing is identifying all of the functions for a given device that share a VPD and then eliminate the VPD structure for all but the first function.  By doing that the OS should treat the other functions as though their VPD areas don't exist.
> Please, lets discuss only *this* patch in this thread. The patches are not related except that they both have to do with VPD.
>
> <snip>

These two patches are very much related.  The fact is you are 
implementing this due to "save boot time and memory" which implies that 
VPD accesses are very slow.  One good reason why VPD accesses might be 
slower is that accesses are now serialized per bus due to your earlier 
patch.  This likely means that other vendors will want to do the same 
thing which is why I would say that the first patch needs to be replaced.

>> Artificially limiting the size of the VPD does nothing but cut off possibly useful data, you would be better of providing all of the data on only the first function than providing only partial data on all functions and adding extra lock overhead.
> This limit only limits the maximum that the OS will read to what is architecturally possible in these devices. Yes, PCIe architecturally provides for the possibility of more, but these devices do not. More repeating data can be read, however slowly, but there is no possibility of useful content beyond the first 1K. If this limit were set to 0x100, which is more in line what the actual usage is, it would be an artificial limit, but at 1K it is not. Oh and it does include devices made by others that incorporate Intel Ethernet silicon, not just Intel-built devices.

As per section 3.4.4 of the X540 datasheet the upper addressable range 
for the VPD section is 0xFFF which means the upper limit for the 
hardware is 0x1000, not 0x400.

> Since this quirk function was already being run for every Intel Ethernet device, this seemed like a trivial thing to do to speed up booting a bit. It has the greatest effect with 82599 devices. Newer devices will respond to reads faster.

This is run for EVERY Intel Ethernet device.  Not just some of the 1G or 
10G server parts, everything.  This presents a very wide net and is 
highly likely to introduce some sort of regression.  This includes every 
3rd party part out there based on Intel silicon and messing with 
something like this is kind of a big issue.  That is why I say you are 
better off limiting the VPD access to 1 function instead of allowing 
only partial access from any Intel Ethernet function.

The fact is VPD is not vital.  It is simply product data for inventory 
tracking and the like.  You are much better off simply limiting this to 
one function for your parts since all of the Intel Ethernet silicon 
implements on VPD which is shared between all functions in a device.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D May 19, 2015, 6:38 p.m. UTC | #4
> On May 19, 2015, at 10:50 AM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
> 
> These two patches are very much related.

They are only related because I saw an opportunity to do this while working on the other issue. That is the only relationship.

<snip material on the other patch>

>>> Artificially limiting the size of the VPD does nothing but cut off possibly useful data, you would be better of providing all of the data on only the first function than providing only partial data on all functions and adding extra lock overhead.
>> This limit only limits the maximum that the OS will read to what is architecturally possible in these devices. Yes, PCIe architecturally provides for the possibility of more, but these devices do not. More repeating data can be read, however slowly, but there is no possibility of useful content beyond the first 1K. If this limit were set to 0x100, which is more in line what the actual usage is, it would be an artificial limit, but at 1K it is not. Oh and it does include devices made by others that incorporate Intel Ethernet silicon, not just Intel-built devices.
> 
> As per section 3.4.4 of the X540 datasheet the upper addressable range for the VPD section is 0xFFF which means the upper limit for the hardware is 0x1000, not 0x400.

Ok. I have no problem changing it to that. I had been looking at other specs, but 0x1000 really is a hard limit.

<snip more material mostly relating to the other patch>

--
Mark Rustad, Networking Division, Intel Corporation
Alexander Duyck May 19, 2015, 8:39 p.m. UTC | #5
On 05/19/2015 11:38 AM, Rustad, Mark D wrote:
>> On May 19, 2015, at 10:50 AM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>>
>> These two patches are very much related.
> They are only related because I saw an opportunity to do this while working on the other issue. That is the only relationship.
>
> <snip material on the other patch>
>
>>>> Artificially limiting the size of the VPD does nothing but cut off possibly useful data, you would be better of providing all of the data on only the first function than providing only partial data on all functions and adding extra lock overhead.
>>> This limit only limits the maximum that the OS will read to what is architecturally possible in these devices. Yes, PCIe architecturally provides for the possibility of more, but these devices do not. More repeating data can be read, however slowly, but there is no possibility of useful content beyond the first 1K. If this limit were set to 0x100, which is more in line what the actual usage is, it would be an artificial limit, but at 1K it is not. Oh and it does include devices made by others that incorporate Intel Ethernet silicon, not just Intel-built devices.
>> As per section 3.4.4 of the X540 datasheet the upper addressable range for the VPD section is 0xFFF which means the upper limit for the hardware is 0x1000, not 0x400.
> Ok. I have no problem changing it to that. I had been looking at other specs, but 0x1000 really is a hard limit.
>
> <snip more material mostly relating to the other patch>

So how does this improve boot time anyway?  The original patch 
description said this improved boot time and reduced memory usage but I 
have yet to find where any of those gains would actually occur.  If you 
can point me in that direction I might have a better idea of the 
motivations behind this.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D May 19, 2015, 9:04 p.m. UTC | #6
> On May 19, 2015, at 1:39 PM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
> 
> So how does this improve boot time anyway?  The original patch description said this improved boot time and reduced memory usage but I have yet to find where any of those gains would actually occur.  If you can point me in that direction I might have a better idea of the motivations behind this.

I have to admit that you won't find them in the kernel, but systems that read VPD in the course of starting up will benefit by not reading lots of useless stuff so slowly. At least those reads will be slow on any device that reads VPD from slow hardware. Since the kernel would never perform any of those reads itself, in isolation the kernel has no benefit. However the system gets a benefit.

I can delete the reference to a memory benefit, since it would be a fleeting benefit at most and not a kernel memory footprint issue.

This is not a big deal for me. If you don't see the value, drop it. Some people are interested in speeding up boot. Apparently VPD is now being accessed in parallel by user space as a result of such work, exposing the problem addressed in the other patch. I saw this opportunity while trying to resolve that issue. If it isn't worth it, drop it.

--
Mark Rustad, Networking Division, Intel Corporation
Alexander H Duyck May 19, 2015, 9:17 p.m. UTC | #7
On 05/19/2015 02:04 PM, Rustad, Mark D wrote:
>> On May 19, 2015, at 1:39 PM, Alexander Duyck <alexander.h.duyck@redhat.com> wrote:
>>
>> So how does this improve boot time anyway?  The original patch description said this improved boot time and reduced memory usage but I have yet to find where any of those gains would actually occur.  If you can point me in that direction I might have a better idea of the motivations behind this.
> I have to admit that you won't find them in the kernel, but systems that read VPD in the course of starting up will benefit by not reading lots of useless stuff so slowly. At least those reads will be slow on any device that reads VPD from slow hardware. Since the kernel would never perform any of those reads itself, in isolation the kernel has no benefit. However the system gets a benefit.
>
> I can delete the reference to a memory benefit, since it would be a fleeting benefit at most and not a kernel memory footprint issue.
>
> This is not a big deal for me. If you don't see the value, drop it. Some people are interested in speeding up boot. Apparently VPD is now being accessed in parallel by user space as a result of such work, exposing the problem addressed in the other patch. I saw this opportunity while trying to resolve that issue. If it isn't worth it, drop it.

Any chance you could point me toward the software in question?  Just 
wondering because it seems like what you are fixing with this is an 
implementation issue in the application since you really shouldn't be 
accessing areas outside the scope of the VPD data structure, and doing 
so is undefined in terms of what happens if you do.

- Alex
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rustad, Mark D May 19, 2015, 10:43 p.m. UTC | #8
> On May 19, 2015, at 2:17 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> Any chance you could point me toward the software in question?  Just wondering because it seems like what you are fixing with this is an implementation issue in the application since you really shouldn't be accessing areas outside the scope of the VPD data structure, and doing so is undefined in terms of what happens if you do.

I don't have it, but if you dump VPD via sysfs you will see it comes out as 32k in size. The kernel just blindly provides access to the full 32K space provided by the spec. I'm sure that we agree that the kernel should not go parse it and find the actual size. If it is read via stdio, say fread, the read access would be whatever buffer size it chooses to use.

If you looked at the quirks, you might have noticed that Broadcom limited the VPD access for some devices for functional reasons. That is what gave me the idea for limiting access to what was possibly there. With the existing Intel Ethernet quirk, it seemed like a simple thing to do.

--
Mark Rustad, Networking Division, Intel Corporation
Alexander Duyck May 19, 2015, 11:42 p.m. UTC | #9
On 05/19/2015 03:43 PM, Rustad, Mark D wrote:
>> On May 19, 2015, at 2:17 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
>>
>> Any chance you could point me toward the software in question?  Just wondering because it seems like what you are fixing with this is an implementation issue in the application since you really shouldn't be accessing areas outside the scope of the VPD data structure, and doing so is undefined in terms of what happens if you do.
> I don't have it, but if you dump VPD via sysfs you will see it comes out as 32k in size. The kernel just blindly provides access to the full 32K space provided by the spec. I'm sure that we agree that the kernel should not go parse it and find the actual size. If it is read via stdio, say fread, the read access would be whatever buffer size it chooses to use.
>
> If you looked at the quirks, you might have noticed that Broadcom limited the VPD access for some devices for functional reasons. That is what gave me the idea for limiting access to what was possibly there. With the existing Intel Ethernet quirk, it seemed like a simple thing to do.

Actually we probably should be parsing through the VPD data.  The PCIe 
spec doesn't define what happens if you read past the end marker, and I 
suspect most applications are probably performing sequential reads of 
the data instead of just accessing offsets anyway since that is how this 
is really meant to be accessed.  So if we moved this to a sequenced 
interface instead of a memory mapped style interface it would probably 
work out better anyway since we could perform multiple reads in sequence 
instead of one at a time.

- Alex

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index c6dc1dfd25d5..4fabbeda964a 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -1903,12 +1903,15 @@  static void quirk_netmos(struct pci_dev *dev)
 DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_NETMOS, PCI_ANY_ID,
 			 PCI_CLASS_COMMUNICATION_SERIAL, 8, quirk_netmos);
 
-static void quirk_e100_interrupt(struct pci_dev *dev)
+static void quirk_intel_enet(struct pci_dev *dev)
 {
 	u16 command, pmcsr;
 	u8 __iomem *csr;
 	u8 cmd_hi;
 
+	if (dev->vpd)
+		dev->vpd->len = 0x400;
+
 	switch (dev->device) {
 	/* PCI IDs taken from drivers/net/e100.c */
 	case 0x1029:
@@ -1967,7 +1970,7 @@  static void quirk_e100_interrupt(struct pci_dev *dev)
 	iounmap(csr);
 }
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_INTEL, PCI_ANY_ID,
-			PCI_CLASS_NETWORK_ETHERNET, 8, quirk_e100_interrupt);
+			      PCI_CLASS_NETWORK_ETHERNET, 8, quirk_intel_enet);
 
 /*
  * The 82575 and 82598 may experience data corruption issues when transitioning