[ovs-dev,v7,0/5] dpcls func ptrs & optimizations
mbox series

Message ID 20190411125604.70050-1-harry.van.haaren@intel.com
Headers show
Series
  • dpcls func ptrs & optimizations
Related show

Message

Harry van Haaren April 11, 2019, 12:55 p.m. UTC
TL;DR:
1) As per v6, the function pointer rework for dpcls
2) Last 2 patches include specialized scalar optimizations

Running with Eth/IPv4/UDP traffic should show performance improvements,
with EMC/SMC disabled (so just DPCLS traffic), on a simple test case there
is a > 15% speedup. Please test this patchset, and report back numbers!


Patchset Details;
The code is split into 5 patches to make the code traceable during
review, as the resulting code is quite different to today's dpcls_lookup.

Checkpatch flags two warnings, which I believe to not be sanely fixable
due to the way MACROs accept arguments.

Running TESTSUITE shows all passing, with ~28 tests being skipped, which
is the same as before this patchset.


Per patch details:
1) Refactor dpcls_lookup and the subtable for flexibility.
In particular, add a function pointer to the subtable
structure, which enables "plugging-in" a lookup function
at runtime. This enables a number of optimizations in future.

2) and 3)
With the function pointer in place, we refactor the existing
dpcls_lookup matching code into its own function, and later its
own file. To split it to its own file requires making various
dpcls data-structures available in the dpif-netdev.h header.

4)
Refactor the existing code, to favour compute of flat arrays of
miniflows, instead of the MACRO based iteration. This simplifies
the code itself, and makes future optimizations possible due to
simplified loop structures, and loop trip counts pass in via
function arguments. See commit message for more details.

5)
This patch implements a select few specialized functions, for handling
miniflows with 5-1, 4-1, and 4-0 miniflow unit bit patterns. More of
these types of functions can (and should) be added to accelerate other
patterns of subtable lookups! See commit message for more details.


As always: feedback, suggestions, performance numbers all welcome!
Regards -Harry


Harry van Haaren (5):
  dpif-netdev: implement function pointers/subtable
  dpif-netdev: move dpcls lookup structures to .h
  dpif-netdev: split out generic lookup function
  dpif-netdev: refactor generic implementation
  dpif-netdev: add specialized generic scalar functions

 lib/automake.mk                  |   1 +
 lib/dpif-netdev-lookup-generic.c | 342 +++++++++++++++++++++++++++++++
 lib/dpif-netdev.c                | 128 +++---------
 lib/dpif-netdev.h                |  83 ++++++++
 4 files changed, 456 insertions(+), 98 deletions(-)
 create mode 100644 lib/dpif-netdev-lookup-generic.c

Comments

Ilya Maximets April 15, 2019, 8:32 a.m. UTC | #1
On 11.04.2019 15:55, Harry van Haaren wrote:
> TL;DR:
> 1) As per v6, the function pointer rework for dpcls
> 2) Last 2 patches include specialized scalar optimizations
> 
> Running with Eth/IPv4/UDP traffic should show performance improvements,
> with EMC/SMC disabled (so just DPCLS traffic), on a simple test case there
> is a > 15% speedup. Please test this patchset, and report back numbers!
> 
> 
> Patchset Details;
> The code is split into 5 patches to make the code traceable during
> review, as the resulting code is quite different to today's dpcls_lookup.
> 
> Checkpatch flags two warnings, which I believe to not be sanely fixable
> due to the way MACROs accept arguments.
> 
> Running TESTSUITE shows all passing, with ~28 tests being skipped, which
> is the same as before this patchset.
> 
> 
> Per patch details:
> 1) Refactor dpcls_lookup and the subtable for flexibility.
> In particular, add a function pointer to the subtable
> structure, which enables "plugging-in" a lookup function
> at runtime. This enables a number of optimizations in future.
> 
> 2) and 3)
> With the function pointer in place, we refactor the existing
> dpcls_lookup matching code into its own function, and later its
> own file. To split it to its own file requires making various
> dpcls data-structures available in the dpif-netdev.h header.
> 
> 4)
> Refactor the existing code, to favour compute of flat arrays of
> miniflows, instead of the MACRO based iteration. This simplifies
> the code itself, and makes future optimizations possible due to
> simplified loop structures, and loop trip counts pass in via
> function arguments. See commit message for more details.
> 
> 5)
> This patch implements a select few specialized functions, for handling
> miniflows with 5-1, 4-1, and 4-0 miniflow unit bit patterns. More of
> these types of functions can (and should) be added to accelerate other
> patterns of subtable lookups! See commit message for more details.
> 
> 
> As always: feedback, suggestions, performance numbers all welcome!
> Regards -Harry
> 
> 
> Harry van Haaren (5):
>   dpif-netdev: implement function pointers/subtable
>   dpif-netdev: move dpcls lookup structures to .h
>   dpif-netdev: split out generic lookup function
>   dpif-netdev: refactor generic implementation
>   dpif-netdev: add specialized generic scalar functions
> 
>  lib/automake.mk                  |   1 +
>  lib/dpif-netdev-lookup-generic.c | 342 +++++++++++++++++++++++++++++++
>  lib/dpif-netdev.c                | 128 +++---------
>  lib/dpif-netdev.h                |  83 ++++++++
>  4 files changed, 456 insertions(+), 98 deletions(-)
>  create mode 100644 lib/dpif-netdev-lookup-generic.c
> 

Hi. This patch-set fails the sparse build:

libtool: compile:  env "REAL_CC=gcc -std=gnu99" "CHECK=sparse  -I ./include/sparse -m64 -I /usr/local/include -I /usr/include/x86_64-linux-gnu  " cgcc -target=x86_64 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -Werror -Wsparse-error -g -O2 -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -MF lib/.deps/dpif-netdev-lookup-generic.Tpo -c lib/dpif-netdev-lookup-generic.c -o lib/dpif-netdev-lookup-generic.o
lib/dpif-netdev-lookup-generic.c:203:43: error: Variable length array is used.
lib/dpif-netdev-lookup-generic.c:210:23: error: Variable length array is used.
make[2]: *** [lib/dpif-netdev-lookup-generic.lo] Error 1

https://travis-ci.org/ovsrobot/ovs/jobs/518798999

Sparse and MSVC doesn't like variable length arrays.

Best regards, Ilya Maximets.
Harry van Haaren April 15, 2019, 6:50 p.m. UTC | #2
> -----Original Message-----
> From: Ilya Maximets [mailto:i.maximets@samsung.com]
> Sent: Monday, April 15, 2019 9:33 AM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; ovs-dev@openvswitch.org
> Cc: Stokes, Ian <ian.stokes@intel.com>; aconole@redhat.com;
> echaudro@redhat.com
> Subject: Re: [PATCH v7 0/5] dpcls func ptrs & optimizations
> 
> On 11.04.2019 15:55, Harry van Haaren wrote:
<snip>
> > Harry van Haaren (5):
> >   dpif-netdev: implement function pointers/subtable
> >   dpif-netdev: move dpcls lookup structures to .h
> >   dpif-netdev: split out generic lookup function
> >   dpif-netdev: refactor generic implementation
> >   dpif-netdev: add specialized generic scalar functions
> >
> >  lib/automake.mk                  |   1 +
> >  lib/dpif-netdev-lookup-generic.c | 342 +++++++++++++++++++++++++++++++
> >  lib/dpif-netdev.c                | 128 +++---------
> >  lib/dpif-netdev.h                |  83 ++++++++
> >  4 files changed, 456 insertions(+), 98 deletions(-)
> >  create mode 100644 lib/dpif-netdev-lookup-generic.c
> >
> 
> Hi. This patch-set fails the sparse build:
> 
> libtool: compile:  env "REAL_CC=gcc -std=gnu99" "CHECK=sparse  -I
> ./include/sparse -m64 -I /usr/local/include -I /usr/include/x86_64-linux-gnu
> " cgcc -target=x86_64 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib
> -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
> -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-
> cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-
> prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -
> Werror -Wsparse-error -g -O2 -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -
> MF lib/.deps/dpif-netdev-lookup-generic.Tpo -c lib/dpif-netdev-lookup-
> generic.c -o lib/dpif-netdev-lookup-generic.o
> lib/dpif-netdev-lookup-generic.c:203:43: error: Variable length array is
> used.
> lib/dpif-netdev-lookup-generic.c:210:23: error: Variable length array is
> used.
> make[2]: *** [lib/dpif-netdev-lookup-generic.lo] Error 1
> 
> https://travis-ci.org/ovsrobot/ovs/jobs/518798999
> 
> Sparse and MSVC doesn't like variable length arrays.

Thanks for the heads up - I'll add sparse to my test builds here, and add #ifdefs around the VLAs (like other places in the code). Will fix in V+1.

Any performance numbers/gains there?
Ilya Maximets April 16, 2019, 2:23 p.m. UTC | #3
On 15.04.2019 21:50, Van Haaren, Harry wrote:
>> -----Original Message-----
>> From: Ilya Maximets [mailto:i.maximets@samsung.com]
>> Sent: Monday, April 15, 2019 9:33 AM
>> To: Van Haaren, Harry <harry.van.haaren@intel.com>; ovs-dev@openvswitch.org
>> Cc: Stokes, Ian <ian.stokes@intel.com>; aconole@redhat.com;
>> echaudro@redhat.com
>> Subject: Re: [PATCH v7 0/5] dpcls func ptrs & optimizations
>>
>> On 11.04.2019 15:55, Harry van Haaren wrote:
> <snip>
>>> Harry van Haaren (5):
>>>   dpif-netdev: implement function pointers/subtable
>>>   dpif-netdev: move dpcls lookup structures to .h
>>>   dpif-netdev: split out generic lookup function
>>>   dpif-netdev: refactor generic implementation
>>>   dpif-netdev: add specialized generic scalar functions
>>>
>>>  lib/automake.mk                  |   1 +
>>>  lib/dpif-netdev-lookup-generic.c | 342 +++++++++++++++++++++++++++++++
>>>  lib/dpif-netdev.c                | 128 +++---------
>>>  lib/dpif-netdev.h                |  83 ++++++++
>>>  4 files changed, 456 insertions(+), 98 deletions(-)
>>>  create mode 100644 lib/dpif-netdev-lookup-generic.c
>>>
>>
>> Hi. This patch-set fails the sparse build:
>>
>> libtool: compile:  env "REAL_CC=gcc -std=gnu99" "CHECK=sparse  -I
>> ./include/sparse -m64 -I /usr/local/include -I /usr/include/x86_64-linux-gnu
>> " cgcc -target=x86_64 -DHAVE_CONFIG_H -I. -I ./include -I ./include -I ./lib
>> -I ./lib -Wstrict-prototypes -Wall -Wextra -Wno-sign-compare -Wpointer-arith
>> -Wformat -Wformat-security -Wswitch-enum -Wunused-parameter -Wbad-function-
>> cast -Wcast-align -Wstrict-prototypes -Wold-style-definition -Wmissing-
>> prototypes -Wmissing-field-initializers -fno-strict-aliasing -Wshadow -
>> Werror -Wsparse-error -g -O2 -MT lib/dpif-netdev-lookup-generic.lo -MD -MP -
>> MF lib/.deps/dpif-netdev-lookup-generic.Tpo -c lib/dpif-netdev-lookup-
>> generic.c -o lib/dpif-netdev-lookup-generic.o
>> lib/dpif-netdev-lookup-generic.c:203:43: error: Variable length array is
>> used.
>> lib/dpif-netdev-lookup-generic.c:210:23: error: Variable length array is
>> used.
>> make[2]: *** [lib/dpif-netdev-lookup-generic.lo] Error 1
>>
>> https://travis-ci.org/ovsrobot/ovs/jobs/518798999
>>
>> Sparse and MSVC doesn't like variable length arrays.
> 
> Thanks for the heads up - I'll add sparse to my test builds here, and add #ifdefs around the VLAs (like other places in the code). Will fix in V+1.

If you have a github, you may allow travis to make most of sanity checks for you. 

> 
> Any performance numbers/gains there?

Sorry, had no chance to run perf tests yet.