mbox series

[ovs-dev,RFC,v3,0/6] conntrack: Introduce buckets and reduce contention.

Message ID 164781559991.306037.12052326881283461224.stgit@fed.void
Headers show
Series conntrack: Introduce buckets and reduce contention. | expand

Message

Paolo Valerio March 20, 2022, 10:33 p.m. UTC
This series aims to share the work done so far, and to start a
discussion of a slightly different approach in terms of the way we
handle the connections.
It's not considered ready as further work and extensive tests are
needed.

These are some performance numbers obtained in the following way:

./tests/ovstest test-conntrack benchmark $i 16777216 32 1

$i   baseline  ct-scale  ct-bucket
 2   16480 ms   3527 ms    3180 ms
 4   37375 ms   4997 ms    4477 ms
 8   63239 ms   9692 ms    8954 ms
16  131583 ms  19303 ms   18062 ms

Both ct-scale and ct-bucket introduce a noticeable improvement in
terms of performace. It's worth to mention that the ct-scale, keeping
the exp_lists, is more efficient during the sweeping.

There are two main logical changes that the series tries to introduce.
The first one is the reintroduction of buckets, the second one is the
removal of the expiration lists.

The first two patches are a prereq and were picked (untouched) from
this series (by Gaetan):

https://patchwork.ozlabs.org/project/openvswitch/list/?series=249027&state=*

The third patch is a follow up of the following:

https://patchwork.ozlabs.org/project/openvswitch/patch/20201129033255.64647-2-hepeng.0320@bytedance.com/

Some logic has changed, but the overall idea remained the same.
Additional details in the patch description.

The fourth patch introduces the buckets and removes the expiration
lists.
The buckets got reintroduced as cmaps instead of hmaps, trying to
narrow down the critical sections as well.

An alternative approach may involve the replacement of cmaps with
linked lists (without increasing the number of locks), increasing the
number of buckets (it could involve the ability to change the buckets
number when the user changes the connection limit). This could improve
the stale conns eviction during the processing of the packet, but this
is currently out of scope for this series (can be discussed, though).

The insertion, namely the creation of the connection, can probably be
improved (or at least an improvement can be evaluated), as, in case of
nat, it acquires the bucket locks calling nat_get_unique_tuple_lock().
This is needed because we may need to know the reverse tuple before
locking in order to acquire the locks in the right way to avoid ABBA
deadlocks.

Further details about locking may be found in patch #4 description.

There's a known limitation, and it is related mostly to zone
limit. Keeping the stale entries, leads to a mismatch between the
current number of per zone connections and the actual number of valid
connections. This means that if we have a small zone limit number, we
may wait a whole "next_wakeup" time before the entries get cleaned up
(assuming we have candidates for removal). To avoid this, a zone clean
up from the packet path has been added, but alternative approaches as
triggering a per zone cleanup when a threshold of connections has
been reached (storing the connections per bucket and per zone during
the creation) may be still viable.

Expiration lists have been removed for the sake of evaluation, but
they could be included and duplicated among buckets. In this way we
could distribute the contention (as a matter of fact reducing the
number of connections per list) that affects the write access to a
single data structure during every connection update.

Patch #6 reintegrates the command below with multiple buckets:

ovs-appctl dpctl/ct-bkts

A couple of minor patches have been kept out of the series, for the
time being, as they strongly depend on #4.

Gaetan Rivet (3):
      conntrack: Use a cmap to store zone limits
      conntrack-tp: Use a cmap to store timeout policies
      conntrack: Use an atomic conn expiration value

Paolo Valerio (2):
      conntrack: Split single cmap to multiple buckets.
      conntrack: Make ovs-appctl dpctl/ct-bkts work with multiple buckets

Peng He (1):
      conntrack: Replaces nat_conn introducing key directionality.


 lib/conntrack-private.h |   60 ++-
 lib/conntrack-tp.c      |  102 ++--
 lib/conntrack.c         | 1061 +++++++++++++++++++++++----------------
 lib/conntrack.h         |    6 +-
 lib/dpif-netdev.c       |    5 +-
 tests/system-traffic.at |    5 +-
 6 files changed, 706 insertions(+), 533 deletions(-)