Message ID | 20190717130033.25114-1-harry.van.haaren@intel.com |
---|---|
Headers | show |
Series | dpcls func ptrs & optimizations | expand |
On 17.07.2019 16:00, Harry van Haaren wrote: > Hey Folks, > > Here a v12 of the DPCLS Function Pointer patchset, as has been > presented at OVS Conf in Nov '18, and discussed on the ML since then. > I'm aware of the soft-freeze for 2.12, I feel this patchset has had > enough reviews/versions/testing to be merged in 2.12. > > Thanks Ilya and Ian for review comments on v11, they should all be addressed > in this v12. Patchset details below, summary of v11 changes as follows: > - Reworked various minor comments (Capitals, stops, and whitespace) > - Moved variable declarations to patch they're used in > - Improved function documentation on probe() > - Reduced log level from info to debug > - See per patch --- v12 notes for details. > > Given the nearing soft-freeze and branch deadlines, I'd like to see > this get merged - and any future minor comments / improvements can be > handled before release. > > Regards, -Harry > > > Patchset details; > 1) Refactor dpcls_lookup and the subtable for flexibility. > In particular, add a function pointer to the subtable > structure, which enables "plugging-in" a lookup function > at runtime. This enables a number of optimizations in future. > > 2) and 3) > With the function pointer in place, we refactor the existing > dpcls_lookup matching code into its own function, and later its > own file. To split it to its own file requires making various > dpcls data-structures available in dpif-netdev-private.h. > > 4) > Refactor the existing code, to favour compute of flat arrays of > miniflows, instead of the MACRO based iteration. This simplifies > the code itself, and makes future optimizations possible due to > simplified loop structures, and loop trip counts pass in via > function arguments. See commit message for more details. > > 5) > This patch implements a select few specialized functions, for handling > miniflows with 5-1, 4-1, and 4-0 miniflow unit bit patterns. More of > these types of functions can (and should) be added to accelerate other > patterns of subtable lookups! See commit message for more details. > > > Harry van Haaren (5): > dpif-netdev: Implement function pointers/subtable > dpif-netdev: Move dpcls lookup structures to .h > dpif-netdev: Split out generic lookup function > dpif-netdev: Refactor generic implementation > dpif-netdev: Add specialized generic scalar functions > > NEWS | 4 + > lib/automake.mk | 2 + > lib/dpif-netdev-lookup-generic.c | 290 +++++++++++++++++++++++++++++++ > lib/dpif-netdev-private.h | 129 ++++++++++++++ > lib/dpif-netdev.c | 197 ++++++++++----------- > 5 files changed, 525 insertions(+), 97 deletions(-) > create mode 100644 lib/dpif-netdev-lookup-generic.c > create mode 100644 lib/dpif-netdev-private.h > I performed a few tests with v11 of this patch-set on my usual setup to check performance of the new generic (non-optimized) implementation. The result is that new generic implementation is ~5% faster for me than current master (it was 12% for optimized lookup functions) which is good. The code looks OK for me in general. I still don't really like the fact that dpcls depends on the internal structure of a flowmap, but we, probably, could deal with that while we have a build time assertion. Hope, we'll have some better implementation with the same level of performance in the future. Best regards, Ilya Maximets.
> -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Wednesday, July 17, 2019 4:03 PM > To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@openvswitch.org > Cc: echaudro@redhat.com; malvika.gupta@arm.com; Stokes, Ian > <ian.stokes@intel.com> > Subject: Re: [PATCH v12 0/5] dpcls func ptrs & optimizations > > On 17.07.2019 16:00, Harry van Haaren wrote: > > Hey Folks, > > > > Here a v12 of the DPCLS Function Pointer patchset, as has been > > presented at OVS Conf in Nov '18, and discussed on the ML since then. > > I'm aware of the soft-freeze for 2.12, I feel this patchset has had > > enough reviews/versions/testing to be merged in 2.12. <snip> > > Harry van Haaren (5): > > dpif-netdev: Implement function pointers/subtable > > dpif-netdev: Move dpcls lookup structures to .h > > dpif-netdev: Split out generic lookup function > > dpif-netdev: Refactor generic implementation > > dpif-netdev: Add specialized generic scalar functions > > > > NEWS | 4 + > > lib/automake.mk | 2 + > > lib/dpif-netdev-lookup-generic.c | 290 +++++++++++++++++++++++++++++++ > > lib/dpif-netdev-private.h | 129 ++++++++++++++ > > lib/dpif-netdev.c | 197 ++++++++++----------- > > 5 files changed, 525 insertions(+), 97 deletions(-) > > create mode 100644 lib/dpif-netdev-lookup-generic.c > > create mode 100644 lib/dpif-netdev-private.h > > > > > I performed a few tests with v11 of this patch-set on my usual setup to > check > performance of the new generic (non-optimized) implementation. The result is > that > new generic implementation is ~5% faster for me than current master (it was > 12% > for optimized lookup functions) which is good. > The code looks OK for me in general. I still don't really like the fact that > dpcls depends on the internal structure of a flowmap, but we, probably, > could > deal with that while we have a build time assertion. Hope, we'll have some > better > implementation with the same level of performance in the future. Thanks for the feedback on performance Ilya - good to see that we're going in the right direction performance wise anyway. I have one item I'd still like to improve in this patchset, and its regarding where the blocks scratch array is being stored. I'll rework and find a better solution than the current, and post that as the lucky patchset v13 :) Cheers, -Harry
On 17.07.2019 19:16, Van Haaren, Harry wrote: >> -----Original Message----- >> From: Ilya Maximets [mailto:i.maximets@samsung.com] >> Sent: Wednesday, July 17, 2019 4:03 PM >> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@openvswitch.org >> Cc: echaudro@redhat.com; malvika.gupta@arm.com; Stokes, Ian >> <ian.stokes@intel.com> >> Subject: Re: [PATCH v12 0/5] dpcls func ptrs & optimizations >> >> On 17.07.2019 16:00, Harry van Haaren wrote: >>> Hey Folks, >>> >>> Here a v12 of the DPCLS Function Pointer patchset, as has been >>> presented at OVS Conf in Nov '18, and discussed on the ML since then. >>> I'm aware of the soft-freeze for 2.12, I feel this patchset has had >>> enough reviews/versions/testing to be merged in 2.12. > > <snip> > >>> Harry van Haaren (5): >>> dpif-netdev: Implement function pointers/subtable >>> dpif-netdev: Move dpcls lookup structures to .h >>> dpif-netdev: Split out generic lookup function >>> dpif-netdev: Refactor generic implementation >>> dpif-netdev: Add specialized generic scalar functions >>> >>> NEWS | 4 + >>> lib/automake.mk | 2 + >>> lib/dpif-netdev-lookup-generic.c | 290 +++++++++++++++++++++++++++++++ >>> lib/dpif-netdev-private.h | 129 ++++++++++++++ >>> lib/dpif-netdev.c | 197 ++++++++++----------- >>> 5 files changed, 525 insertions(+), 97 deletions(-) >>> create mode 100644 lib/dpif-netdev-lookup-generic.c >>> create mode 100644 lib/dpif-netdev-private.h >>> >> >> >> I performed a few tests with v11 of this patch-set on my usual setup to >> check >> performance of the new generic (non-optimized) implementation. The result is >> that >> new generic implementation is ~5% faster for me than current master (it was >> 12% >> for optimized lookup functions) which is good. >> The code looks OK for me in general. I still don't really like the fact that >> dpcls depends on the internal structure of a flowmap, but we, probably, >> could >> deal with that while we have a build time assertion. Hope, we'll have some >> better >> implementation with the same level of performance in the future. > > > Thanks for the feedback on performance Ilya - good to see that we're going > in the right direction performance wise anyway. > > I have one item I'd still like to improve in this patchset, and its regarding > where the blocks scratch array is being stored. I'll rework and find a better > solution than the current, and post that as the lucky patchset v13 :) As an idea, I could suggest DEFINE_STATIC_PER_THREAD_DATA inside the dpif-netdev-lookup-generic.c.
> -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Wednesday, July 17, 2019 5:26 PM > To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@openvswitch.org > Cc: echaudro@redhat.com; malvika.gupta@arm.com; Stokes, Ian > <ian.stokes@intel.com> > Subject: Re: [PATCH v12 0/5] dpcls func ptrs & optimizations <snip> > >> I performed a few tests with v11 of this patch-set on my usual setup to > >> check > >> performance of the new generic (non-optimized) implementation. The result > is > >> that > >> new generic implementation is ~5% faster for me than current master (it > was > >> 12% > >> for optimized lookup functions) which is good. > >> The code looks OK for me in general. I still don't really like the fact > that > >> dpcls depends on the internal structure of a flowmap, but we, probably, > >> could > >> deal with that while we have a build time assertion. Hope, we'll have > some > >> better > >> implementation with the same level of performance in the future. > > > > > > Thanks for the feedback on performance Ilya - good to see that we're going > > in the right direction performance wise anyway. > > > > I have one item I'd still like to improve in this patchset, and its > regarding > > where the blocks scratch array is being stored. I'll rework and find a > better > > solution than the current, and post that as the lucky patchset v13 :) > > As an idea, I could suggest DEFINE_STATIC_PER_THREAD_DATA inside the dpif- > netdev-lookup-generic.c. Thanks for the suggestion - that would make the blocks scratch pointer available on a per-thread basis, and be initialized to NULL correct? As a result, we would must a (predictable) branch to check if the pointer is NULL, to allocate the required space on first usage of the subtable_lookup? Or is there a better way to initialize it for all threads that call dpcls_lookup()? My concern here is what if one thread creates a subtable (and allocs blocks_scratch for itself), but then the PMD is polled by another thread - which uses its own blocks_scratch which is NULL. Hence, I think the runtime check + alloc is probably the best/safest way, but I'd appreciate your input on the above threading logic Ilya :)
On 17.07.2019 19:33, Van Haaren, Harry wrote: >> -----Original Message----- >> From: Ilya Maximets [mailto:i.maximets@samsung.com] >> Sent: Wednesday, July 17, 2019 5:26 PM >> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@openvswitch.org >> Cc: echaudro@redhat.com; malvika.gupta@arm.com; Stokes, Ian >> <ian.stokes@intel.com> >> Subject: Re: [PATCH v12 0/5] dpcls func ptrs & optimizations > > <snip> > >>>> I performed a few tests with v11 of this patch-set on my usual setup to >>>> check >>>> performance of the new generic (non-optimized) implementation. The result >> is >>>> that >>>> new generic implementation is ~5% faster for me than current master (it >> was >>>> 12% >>>> for optimized lookup functions) which is good. >>>> The code looks OK for me in general. I still don't really like the fact >> that >>>> dpcls depends on the internal structure of a flowmap, but we, probably, >>>> could >>>> deal with that while we have a build time assertion. Hope, we'll have >> some >>>> better >>>> implementation with the same level of performance in the future. >>> >>> >>> Thanks for the feedback on performance Ilya - good to see that we're going >>> in the right direction performance wise anyway. >>> >>> I have one item I'd still like to improve in this patchset, and its >> regarding >>> where the blocks scratch array is being stored. I'll rework and find a >> better >>> solution than the current, and post that as the lucky patchset v13 :) >> >> As an idea, I could suggest DEFINE_STATIC_PER_THREAD_DATA inside the dpif- >> netdev-lookup-generic.c. > > > Thanks for the suggestion - that would make the blocks scratch pointer available > on a per-thread basis, and be initialized to NULL correct? > > As a result, we would must a (predictable) branch to check if the pointer is > NULL, to allocate the required space on first usage of the subtable_lookup? > > Or is there a better way to initialize it for all threads that call dpcls_lookup()? > My concern here is what if one thread creates a subtable (and allocs blocks_scratch for itself), > but then the PMD is polled by another thread - which uses its own blocks_scratch which is NULL. > > Hence, I think the runtime check + alloc is probably the best/safest way, > but I'd appreciate your input on the above threading logic Ilya :) > Sorry, I was already out of office when this message arrived. I'll send my comments on your v13. Best regards, Ilya Maximets.