diff mbox

PCI: minor performance optimization

Message ID 1448016301-20944-1-git-send-email-caoj.fnst@cn.fujitsu.com
State New
Headers show

Commit Message

Cao jin Nov. 20, 2015, 10:45 a.m. UTC
1. Do param check in pci_add_capability2(), as it is a public API.
2. As spec says, each capability must be DWORD aligned, so an optimization can
   be done via Loop Unrolling.

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
---
 hw/pci/pci.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Michael S. Tsirkin Nov. 20, 2015, 10:45 a.m. UTC | #1
On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> 1. Do param check in pci_add_capability2(), as it is a public API.

Separate patch pls.

> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>    be done via Loop Unrolling.

Why do we want to optimize it?

> 
> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> ---
>  hw/pci/pci.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 168b9cc..1e99603 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
>  static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
>  {
>      int offset = PCI_CONFIG_HEADER_SIZE;
> -    int i;
> -    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> +    int i = PCI_CONFIG_HEADER_SIZE;;
> +
> +    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
>          if (pdev->used[i])
> -            offset = i + 1;
> -        else if (i - offset + 1 == size)
> +            offset = i + 4;
> +        else if (i - offset >= size)
>              return offset;
>      }
> +
>      return 0;
>  }
>  
> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>      uint8_t *config;
>      int i, overlapping_cap;
>  
> +    assert(size > 0);
> +
>      if (!offset) {
>          offset = pci_find_space(pdev, size);
>          if (!offset) {
> -- 
> 2.1.0
Cao jin Nov. 20, 2015, 11:04 a.m. UTC | #2
On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>> 1. Do param check in pci_add_capability2(), as it is a public API.
>
> Separate patch pls.

OK

>
>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>>     be done via Loop Unrolling.
>
> Why do we want to optimize it?
>

For tiny performance improvement via less loop. take pcie express 
capability(60 bytes at most) for example, it may loop 60 times, now we 
just need 15 times, a quarter of before.

>>
>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>> ---
>>   hw/pci/pci.c | 12 ++++++++----
>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 168b9cc..1e99603 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
>>   static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
>>   {
>>       int offset = PCI_CONFIG_HEADER_SIZE;
>> -    int i;
>> -    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
>> +    int i = PCI_CONFIG_HEADER_SIZE;;
>> +
>> +    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
>>           if (pdev->used[i])
>> -            offset = i + 1;
>> -        else if (i - offset + 1 == size)
>> +            offset = i + 4;
>> +        else if (i - offset >= size)
>>               return offset;
>>       }
>> +
>>       return 0;
>>   }
>>
>> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>>       uint8_t *config;
>>       int i, overlapping_cap;
>>
>> +    assert(size > 0);
>> +
>>       if (!offset) {
>>           offset = pci_find_space(pdev, size);
>>           if (!offset) {
>> --
>> 2.1.0
> .
>
Michael S. Tsirkin Nov. 20, 2015, 11:26 a.m. UTC | #3
On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
> 
> 
> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> >On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> >>1. Do param check in pci_add_capability2(), as it is a public API.
> >
> >Separate patch pls.
> 
> OK
> 
> >
> >>2. As spec says, each capability must be DWORD aligned, so an optimization can
> >>    be done via Loop Unrolling.
> >
> >Why do we want to optimize it?
> >
> 
> For tiny performance improvement via less loop. take pcie express
> capability(60 bytes at most) for example, it may loop 60 times, now we just
> need 15 times, a quarter of before.

But who cares? This is not a data path operation.

> >>
> >>Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >>---
> >>  hw/pci/pci.c | 12 ++++++++----
> >>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>index 168b9cc..1e99603 100644
> >>--- a/hw/pci/pci.c
> >>+++ b/hw/pci/pci.c
> >>@@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
> >>  static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
> >>  {
> >>      int offset = PCI_CONFIG_HEADER_SIZE;
> >>-    int i;
> >>-    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> >>+    int i = PCI_CONFIG_HEADER_SIZE;;
> >>+
> >>+    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
> >>          if (pdev->used[i])
> >>-            offset = i + 1;
> >>-        else if (i - offset + 1 == size)
> >>+            offset = i + 4;
> >>+        else if (i - offset >= size)
> >>              return offset;
> >>      }
> >>+
> >>      return 0;
> >>  }
> >>
> >>@@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> >>      uint8_t *config;
> >>      int i, overlapping_cap;
> >>
> >>+    assert(size > 0);
> >>+
> >>      if (!offset) {
> >>          offset = pci_find_space(pdev, size);
> >>          if (!offset) {
> >>--
> >>2.1.0
> >.
> >
> 
> -- 
> Yours Sincerely,
> 
> Cao Jin
Cao jin Nov. 20, 2015, 11:58 a.m. UTC | #4
On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
>>
>>
>> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
>>> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>>>
>>>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>>>>     be done via Loop Unrolling.
>>>
>>> Why do we want to optimize it?
>>>
>>
>> For tiny performance improvement via less loop. take pcie express
>> capability(60 bytes at most) for example, it may loop 60 times, now we just
>> need 15 times, a quarter of before.
>
> But who cares? This is not a data path operation.

It is tiny thing I found when browsing code. When found there are 
several places looks like this, I think maybe it does good to qemu to do 
this and CCed to you because it don`t look like a simple trivial patch.

So, hey Michael, if you don`t like this kind of optimization, that`t ok, 
forget it. But I think it make me little confused when determine which 
kind of patch should be CCed to you.

>
>>>>
>>>> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
>>>> ---
>>>>   hw/pci/pci.c | 12 ++++++++----
>>>>   1 file changed, 8 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>> index 168b9cc..1e99603 100644
>>>> --- a/hw/pci/pci.c
>>>> +++ b/hw/pci/pci.c
>>>> @@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
>>>>   static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
>>>>   {
>>>>       int offset = PCI_CONFIG_HEADER_SIZE;
>>>> -    int i;
>>>> -    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
>>>> +    int i = PCI_CONFIG_HEADER_SIZE;;
>>>> +
>>>> +    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
>>>>           if (pdev->used[i])
>>>> -            offset = i + 1;
>>>> -        else if (i - offset + 1 == size)
>>>> +            offset = i + 4;
>>>> +        else if (i - offset >= size)
>>>>               return offset;
>>>>       }
>>>> +
>>>>       return 0;
>>>>   }
>>>>
>>>> @@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
>>>>       uint8_t *config;
>>>>       int i, overlapping_cap;
>>>>
>>>> +    assert(size > 0);
>>>> +
>>>>       if (!offset) {
>>>>           offset = pci_find_space(pdev, size);
>>>>           if (!offset) {
>>>> --
>>>> 2.1.0
>>> .
>>>
>>
>> --
>> Yours Sincerely,
>>
>> Cao Jin
> .
>
Michael S. Tsirkin Nov. 20, 2015, 1:30 p.m. UTC | #5
On Fri, Nov 20, 2015 at 07:58:01PM +0800, Cao jin wrote:
> 
> 
> On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
> >On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
> >>
> >>
> >>On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
> >>>On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
> >>>
> >>>>2. As spec says, each capability must be DWORD aligned, so an optimization can
> >>>>    be done via Loop Unrolling.
> >>>
> >>>Why do we want to optimize it?
> >>>
> >>
> >>For tiny performance improvement via less loop. take pcie express
> >>capability(60 bytes at most) for example, it may loop 60 times, now we just
> >>need 15 times, a quarter of before.
> >
> >But who cares? This is not a data path operation.
> 
> It is tiny thing I found when browsing code. When found there are several
> places looks like this, I think maybe it does good to qemu to do this and
> CCed to you because it don`t look like a simple trivial patch.
> 
> So, hey Michael, if you don`t like this kind of optimization, that`t ok,
> forget it. But I think it make me little confused when determine which kind
> of patch should be CCed to you.

Optimization patches should normally include performance numbers
if they are to be merged.
Try to come up with a benchmark and you will realize that the speed of
this function has no effect under even half way realistic conditions.

> >
> >>>>
> >>>>Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> >>>>---
> >>>>  hw/pci/pci.c | 12 ++++++++----
> >>>>  1 file changed, 8 insertions(+), 4 deletions(-)
> >>>>
> >>>>diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> >>>>index 168b9cc..1e99603 100644
> >>>>--- a/hw/pci/pci.c
> >>>>+++ b/hw/pci/pci.c
> >>>>@@ -1924,13 +1924,15 @@ PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
> >>>>  static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
> >>>>  {
> >>>>      int offset = PCI_CONFIG_HEADER_SIZE;
> >>>>-    int i;
> >>>>-    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
> >>>>+    int i = PCI_CONFIG_HEADER_SIZE;;
> >>>>+
> >>>>+    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
> >>>>          if (pdev->used[i])
> >>>>-            offset = i + 1;
> >>>>-        else if (i - offset + 1 == size)
> >>>>+            offset = i + 4;
> >>>>+        else if (i - offset >= size)
> >>>>              return offset;
> >>>>      }
> >>>>+
> >>>>      return 0;
> >>>>  }
> >>>>
> >>>>@@ -2144,6 +2146,8 @@ int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
> >>>>      uint8_t *config;
> >>>>      int i, overlapping_cap;
> >>>>
> >>>>+    assert(size > 0);
> >>>>+
> >>>>      if (!offset) {
> >>>>          offset = pci_find_space(pdev, size);
> >>>>          if (!offset) {
> >>>>--
> >>>>2.1.0
> >>>.
> >>>
> >>
> >>--
> >>Yours Sincerely,
> >>
> >>Cao Jin
> >.
> >
> 
> -- 
> Yours Sincerely,
> 
> Cao Jin
Cao jin Nov. 21, 2015, 7:22 a.m. UTC | #6
On 11/20/2015 09:30 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 20, 2015 at 07:58:01PM +0800, Cao jin wrote:
>>
>>
>> On 11/20/2015 07:26 PM, Michael S. Tsirkin wrote:
>>> On Fri, Nov 20, 2015 at 07:04:07PM +0800, Cao jin wrote:
>>>>
>>>>
>>>> On 11/20/2015 06:45 PM, Michael S. Tsirkin wrote:
>>>>> On Fri, Nov 20, 2015 at 06:45:01PM +0800, Cao jin wrote:
>>>>>
>>>>>> 2. As spec says, each capability must be DWORD aligned, so an optimization can
>>>>>>     be done via Loop Unrolling.
>>>>>
>>>>> Why do we want to optimize it?
>>>>>
>>>>
>>>> For tiny performance improvement via less loop. take pcie express
>>>> capability(60 bytes at most) for example, it may loop 60 times, now we just
>>>> need 15 times, a quarter of before.
>>>
>>> But who cares? This is not a data path operation.
>>
>> It is tiny thing I found when browsing code. When found there are several
>> places looks like this, I think maybe it does good to qemu to do this and
>> CCed to you because it don`t look like a simple trivial patch.
>>
>> So, hey Michael, if you don`t like this kind of optimization, that`t ok,
>> forget it. But I think it make me little confused when determine which kind
>> of patch should be CCed to you.
>
> Optimization patches should normally include performance numbers
> if they are to be merged.
> Try to come up with a benchmark and you will realize that the speed of
> this function has no effect under even half way realistic conditions.
>

Maybe you are right. OK, will send the param check patch to the qemu-trivial

>>>>> .
>>>>>
>>>>
>>>> --
>>>> Yours Sincerely,
>>>>
>>>> Cao Jin
>>> .
>>>
>>
>> --
>> Yours Sincerely,
>>
>> Cao Jin
> .
>
diff mbox

Patch

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 168b9cc..1e99603 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1924,13 +1924,15 @@  PCIDevice *pci_create_simple(PCIBus *bus, int devfn, const char *name)
 static uint8_t pci_find_space(PCIDevice *pdev, uint8_t size)
 {
     int offset = PCI_CONFIG_HEADER_SIZE;
-    int i;
-    for (i = PCI_CONFIG_HEADER_SIZE; i < PCI_CONFIG_SPACE_SIZE; ++i) {
+    int i = PCI_CONFIG_HEADER_SIZE;;
+
+    for (; i < PCI_CONFIG_SPACE_SIZE; i = i + 4) {
         if (pdev->used[i])
-            offset = i + 1;
-        else if (i - offset + 1 == size)
+            offset = i + 4;
+        else if (i - offset >= size)
             return offset;
     }
+
     return 0;
 }
 
@@ -2144,6 +2146,8 @@  int pci_add_capability2(PCIDevice *pdev, uint8_t cap_id,
     uint8_t *config;
     int i, overlapping_cap;
 
+    assert(size > 0);
+
     if (!offset) {
         offset = pci_find_space(pdev, size);
         if (!offset) {