mbox series

[ovs-dev,v4,0/4] dpcls subtable miniflow optimizations

Message ID 20190121132241.76471-1-harry.van.haaren@intel.com
Headers show
Series dpcls subtable miniflow optimizations | expand

Message

Van Haaren, Harry Jan. 21, 2019, 1:22 p.m. UTC
Hi Folks,

This patchset is a v4, changes from v3 are a fix to the issue reported
by Ilya (see v3 patchset for details). Note that this patchset enables
the work as presented at OVS Conf last December, particularly this is
the function pointer part: https://www.youtube.com/watch?v=5-MDlpUIOBE

The work contained in this patchset achieves the following;

Patch 1:
Refactor dpcls_lookup and the subtable for flexibility.
In particular, add a function pointer to the subtable
structure, which enables "plugging-in" a lookup function
at runtime. This enables a number of optimizations in future.

Patch 2 & 3:
With the function pointer in place, we refactor the existing
dpcls_lookup matching code into its own function, and later its
own file. To split it to its own file requires making various
dpcls data-structures available in the dpif-netdev.h header.

Patch 4:
Re-implement and optimize dpcls_rule_matches_key() by removing
the "loopy-branch-ness" of the FOR_EACH() macros used. Instead
a popcount() approach is used, which is much more CPU performance
friendly, due to reduced branches/loads-stores and total work done.

Performance:
Patches 1, 2 and 3 are performance neutral in testing here. The
fourth patch provides a significant performance improvement when
dpcls or SMC are processing packets.


Feedback, reviews, performance numbers weclomed! -Harry


Harry van Haaren (4):
  dpif-netdev: implement function pointers/subtable
  dpif-netdev: move dpcls lookup structures to .h
  dpif-netdev: split out generic lookup function
  dpif-netdev: optimized dpcls_rule_matches_key()

 lib/automake.mk                  |   1 +
 lib/dpif-netdev-lookup-generic.c |  95 ++++++++++++++++++++++
 lib/dpif-netdev.c                | 124 +++--------------------------
 lib/dpif-netdev.h                | 130 +++++++++++++++++++++++++++++++
 4 files changed, 238 insertions(+), 112 deletions(-)
 create mode 100644 lib/dpif-netdev-lookup-generic.c

Comments

Aaron Conole Jan. 21, 2019, 4:01 p.m. UTC | #1
Harry van Haaren <harry.van.haaren@intel.com> writes:

> Hi Folks,
>
> This patchset is a v4, changes from v3 are a fix to the issue reported
> by Ilya (see v3 patchset for details). Note that this patchset enables
> the work as presented at OVS Conf last December, particularly this is
> the function pointer part: https://www.youtube.com/watch?v=5-MDlpUIOBE

Hi Harry,

I'm seeing quite a few failures associated with this patch set (see
below log for an example):

   https://api.travis-ci.org/v3/job/482434899/log.txt

Seems to be with gcc and the test suite.  The clang builds don't seem to
be impacted.  I haven't dived deep into the failures, but the tests
which are failing.

> The work contained in this patchset achieves the following;
>
> Patch 1:
> Refactor dpcls_lookup and the subtable for flexibility.
> In particular, add a function pointer to the subtable
> structure, which enables "plugging-in" a lookup function
> at runtime. This enables a number of optimizations in future.
>
> Patch 2 & 3:
> With the function pointer in place, we refactor the existing
> dpcls_lookup matching code into its own function, and later its
> own file. To split it to its own file requires making various
> dpcls data-structures available in the dpif-netdev.h header.
>
> Patch 4:
> Re-implement and optimize dpcls_rule_matches_key() by removing
> the "loopy-branch-ness" of the FOR_EACH() macros used. Instead
> a popcount() approach is used, which is much more CPU performance
> friendly, due to reduced branches/loads-stores and total work done.
>
> Performance:
> Patches 1, 2 and 3 are performance neutral in testing here. The
> fourth patch provides a significant performance improvement when
> dpcls or SMC are processing packets.
>
>
> Feedback, reviews, performance numbers weclomed! -Harry
>
>
> Harry van Haaren (4):
>   dpif-netdev: implement function pointers/subtable
>   dpif-netdev: move dpcls lookup structures to .h
>   dpif-netdev: split out generic lookup function
>   dpif-netdev: optimized dpcls_rule_matches_key()
>
>  lib/automake.mk                  |   1 +
>  lib/dpif-netdev-lookup-generic.c |  95 ++++++++++++++++++++++
>  lib/dpif-netdev.c                | 124 +++--------------------------
>  lib/dpif-netdev.h                | 130 +++++++++++++++++++++++++++++++
>  4 files changed, 238 insertions(+), 112 deletions(-)
>  create mode 100644 lib/dpif-netdev-lookup-generic.c
Ilya Maximets Jan. 21, 2019, 4:19 p.m. UTC | #2
On 21.01.2019 19:01, Aaron Conole wrote:
> Harry van Haaren <harry.van.haaren@intel.com> writes:
> 
>> Hi Folks,
>>
>> This patchset is a v4, changes from v3 are a fix to the issue reported
>> by Ilya (see v3 patchset for details). Note that this patchset enables
>> the work as presented at OVS Conf last December, particularly this is
>> the function pointer part: https://www.youtube.com/watch?v=5-MDlpUIOBE
> 
> Hi Harry,
> 
> I'm seeing quite a few failures associated with this patch set (see
> below log for an example):
> 
>    https://api.travis-ci.org/v3/job/482434899/log.txt

The most of failures are because of ukey installation failures:

2019-01-21T15:06:03.609Z|00267|ofproto_dpif_upcall|WARN|upcall_cb failure: ukey installation fails

And also because some packets was missed in stats (probably dropped because
of processing issues)

> 
> Seems to be with gcc and the test suite.  The clang builds don't seem to
> be impacted.

We're not running testsuite with clang in Travis, actually, regardless of
'TESTSUITE' variable. That is the reason of green clang builds.

> I haven't dived deep into the failures, but the tests
> which are failing.
> 
>> The work contained in this patchset achieves the following;
>>
>> Patch 1:
>> Refactor dpcls_lookup and the subtable for flexibility.
>> In particular, add a function pointer to the subtable
>> structure, which enables "plugging-in" a lookup function
>> at runtime. This enables a number of optimizations in future.
>>
>> Patch 2 & 3:
>> With the function pointer in place, we refactor the existing
>> dpcls_lookup matching code into its own function, and later its
>> own file. To split it to its own file requires making various
>> dpcls data-structures available in the dpif-netdev.h header.
>>
>> Patch 4:
>> Re-implement and optimize dpcls_rule_matches_key() by removing
>> the "loopy-branch-ness" of the FOR_EACH() macros used. Instead
>> a popcount() approach is used, which is much more CPU performance
>> friendly, due to reduced branches/loads-stores and total work done.
>>
>> Performance:
>> Patches 1, 2 and 3 are performance neutral in testing here. The
>> fourth patch provides a significant performance improvement when
>> dpcls or SMC are processing packets.
>>
>>
>> Feedback, reviews, performance numbers weclomed! -Harry
>>
>>
>> Harry van Haaren (4):
>>   dpif-netdev: implement function pointers/subtable
>>   dpif-netdev: move dpcls lookup structures to .h
>>   dpif-netdev: split out generic lookup function
>>   dpif-netdev: optimized dpcls_rule_matches_key()
>>
>>  lib/automake.mk                  |   1 +
>>  lib/dpif-netdev-lookup-generic.c |  95 ++++++++++++++++++++++
>>  lib/dpif-netdev.c                | 124 +++--------------------------
>>  lib/dpif-netdev.h                | 130 +++++++++++++++++++++++++++++++
>>  4 files changed, 238 insertions(+), 112 deletions(-)
>>  create mode 100644 lib/dpif-netdev-lookup-generic.c
> 
>
Aaron Conole Jan. 21, 2019, 4:55 p.m. UTC | #3
Ilya Maximets <i.maximets@samsung.com> writes:

> On 21.01.2019 19:01, Aaron Conole wrote:
>> Harry van Haaren <harry.van.haaren@intel.com> writes:
>> 
>>> Hi Folks,
>>>
>>> This patchset is a v4, changes from v3 are a fix to the issue reported
>>> by Ilya (see v3 patchset for details). Note that this patchset enables
>>> the work as presented at OVS Conf last December, particularly this is
>>> the function pointer part: https://www.youtube.com/watch?v=5-MDlpUIOBE
>> 
>> Hi Harry,
>> 
>> I'm seeing quite a few failures associated with this patch set (see
>> below log for an example):
>> 
>>    https://api.travis-ci.org/v3/job/482434899/log.txt
>
> The most of failures are because of ukey installation failures:
>
> 2019-01-21T15:06:03.609Z|00267|ofproto_dpif_upcall|WARN|upcall_cb
> failure: ukey installation fails
>
> And also because some packets was missed in stats (probably dropped because
> of processing issues)
>
>> 
>> Seems to be with gcc and the test suite.  The clang builds don't seem to
>> be impacted.
>
> We're not running testsuite with clang in Travis, actually, regardless of
> 'TESTSUITE' variable. That is the reason of green clang builds.

Okay - I'll look at a cooking up a patch (I've already started on one to
enable 'DPDK=1 TESTSUITE=1').  It isn't clear currently why it should be
disabled still (I realize it was for the total build time, but the
travis build is currently over 2 hours... seems better to just
re-enable).

>> I haven't dived deep into the failures, but the tests
>> which are failing.
>> 
>>> The work contained in this patchset achieves the following;
>>>
>>> Patch 1:
>>> Refactor dpcls_lookup and the subtable for flexibility.
>>> In particular, add a function pointer to the subtable
>>> structure, which enables "plugging-in" a lookup function
>>> at runtime. This enables a number of optimizations in future.
>>>
>>> Patch 2 & 3:
>>> With the function pointer in place, we refactor the existing
>>> dpcls_lookup matching code into its own function, and later its
>>> own file. To split it to its own file requires making various
>>> dpcls data-structures available in the dpif-netdev.h header.
>>>
>>> Patch 4:
>>> Re-implement and optimize dpcls_rule_matches_key() by removing
>>> the "loopy-branch-ness" of the FOR_EACH() macros used. Instead
>>> a popcount() approach is used, which is much more CPU performance
>>> friendly, due to reduced branches/loads-stores and total work done.
>>>
>>> Performance:
>>> Patches 1, 2 and 3 are performance neutral in testing here. The
>>> fourth patch provides a significant performance improvement when
>>> dpcls or SMC are processing packets.
>>>
>>>
>>> Feedback, reviews, performance numbers weclomed! -Harry
>>>
>>>
>>> Harry van Haaren (4):
>>>   dpif-netdev: implement function pointers/subtable
>>>   dpif-netdev: move dpcls lookup structures to .h
>>>   dpif-netdev: split out generic lookup function
>>>   dpif-netdev: optimized dpcls_rule_matches_key()
>>>
>>>  lib/automake.mk                  |   1 +
>>>  lib/dpif-netdev-lookup-generic.c |  95 ++++++++++++++++++++++
>>>  lib/dpif-netdev.c                | 124 +++--------------------------
>>>  lib/dpif-netdev.h                | 130 +++++++++++++++++++++++++++++++
>>>  4 files changed, 238 insertions(+), 112 deletions(-)
>>>  create mode 100644 lib/dpif-netdev-lookup-generic.c
>> 
>>
Van Haaren, Harry Jan. 22, 2019, 5:11 p.m. UTC | #4
> -----Original Message-----
> From: Aaron Conole [mailto:aconole@redhat.com]
> Sent: Monday, January 21, 2019 4:55 PM
> To: Ilya Maximets <i.maximets@samsung.com>
> Cc: Van Haaren, Harry <harry.van.haaren@intel.com>; ovs-dev@openvswitch.org;
> Stokes, Ian <ian.stokes@intel.com>
> Subject: Re: [ovs-dev] [PATCH v4 0/4] dpcls subtable miniflow optimizations
> 
> Ilya Maximets <i.maximets@samsung.com> writes:
> 
> > On 21.01.2019 19:01, Aaron Conole wrote:
> >> Harry van Haaren <harry.van.haaren@intel.com> writes:
> >>
> >>> Hi Folks,
> >>>
> >>> This patchset is a v4, changes from v3 are a fix to the issue reported
> >>> by Ilya (see v3 patchset for details). Note that this patchset enables
> >>> the work as presented at OVS Conf last December, particularly this is
> >>> the function pointer part: https://www.youtube.com/watch?v=5-MDlpUIOBE
> >>
> >> Hi Harry,
> >>
> >> I'm seeing quite a few failures associated with this patch set (see
> >> below log for an example):
> >>
> >>    https://api.travis-ci.org/v3/job/482434899/log.txt
> >
> > The most of failures are because of ukey installation failures:
> >
> > 2019-01-21T15:06:03.609Z|00267|ofproto_dpif_upcall|WARN|upcall_cb
> > failure: ukey installation fails
> >
> > And also because some packets was missed in stats (probably dropped
> because
> > of processing issues)
> >
> >>
> >> Seems to be with gcc and the test suite.  The clang builds don't seem to
> >> be impacted.
> >
> > We're not running testsuite with clang in Travis, actually, regardless of
> > 'TESTSUITE' variable. That is the reason of green clang builds.
> 
> Okay - I'll look at a cooking up a patch (I've already started on one to
> enable 'DPDK=1 TESTSUITE=1').  It isn't clear currently why it should be
> disabled still (I realize it was for the total build time, but the
> travis build is currently over 2 hours... seems better to just
> re-enable).

Thanks for reporting - I've been testing with "real" traffic and haven't been
reproducing issues. I'll focus on these test cases, and get back.

Cheers, -H