diff mbox series

[ovs-dev] conntrack: Do clean instead of forece expire.

Message ID 20240305134655.641196-1-lic121@chinatelecom.cn
State Under Review
Delegated to: aaron conole
Headers show
Series [ovs-dev] conntrack: Do clean instead of forece expire. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Cheng Li March 5, 2024, 1:46 p.m. UTC
Force expire a connection and then create new connection of the same
tuple(cmap hash). This makes ct->conns cmap operation expensive(
within ct->ct_lock).

This patch cover the scenario by doing the clean immediately instead
of setting expire. Also this patch fix ct_clean debug log.

Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
---
 lib/conntrack.c | 25 ++++++++++++-------------
 1 file changed, 12 insertions(+), 13 deletions(-)

Comments

Aaron Conole March 27, 2024, 5:37 p.m. UTC | #1
Cheng Li <lic121@chinatelecom.cn> writes:

> Force expire a connection and then create new connection of the same
> tuple(cmap hash). This makes ct->conns cmap operation expensive(
> within ct->ct_lock).
>
> This patch cover the scenario by doing the clean immediately instead
> of setting expire. Also this patch fix ct_clean debug log.
>
> Signed-off-by: Cheng Li <lic121@chinatelecom.cn>
> ---

I'm having trouble with this commit message.  You don't include any
details about any performance characteristics that you measured
anywhere, while (rightly) pointing out that this has a larger
performance impact.  Please include these details (how you measured the
impact, what changed, etc).

>  lib/conntrack.c | 25 ++++++++++++-------------
>  1 file changed, 12 insertions(+), 13 deletions(-)
>
> diff --git a/lib/conntrack.c b/lib/conntrack.c
> index 8a7056bac..6e137daa6 100644
> --- a/lib/conntrack.c
> +++ b/lib/conntrack.c
> @@ -461,12 +461,6 @@ conn_clean(struct conntrack *ct, struct conn *conn)
>      atomic_count_dec(&ct->n_conn);
>  }
> 
> -static void
> -conn_force_expire(struct conn *conn)
> -{
> -    atomic_store_relaxed(&conn->expiration, 0);
> -}
> -
>  /* Destroys the connection tracker 'ct' and frees all the allocated memory.
>   * The caller of this function must already have shut down packet input
>   * and PMD threads (which would have been quiesced).  */
> @@ -1030,7 +1024,10 @@ conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
>          case CT_UPDATE_NEW:
>              if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key,
>                              now, NULL, NULL)) {
> -                conn_force_expire(conn);
> +                /* Instead of setting conn->expiration and let ct_clean thread
> +                 * clean the conn, doing the clean now to avoid duplicate hash
> +                 * in ct->conns for performance reason. */
> +                conn_clean(ct, conn);

I guess the idea is that the clean can happen from the unlocked context,
but that creates a lock in the datapath (which is why the expire list
update is here).  The conn_key_lookup() has to iterate over the list
more, but then it doesn't need to take an expensive lock.

In this case, we will have to acquire the big CT lock, and if there are
two lookups happening simultaneously we will pend processing one packet
for the other.  Ideally, the adaptive nature of the pthread will prevent
us from having to call into kernel, but that isn't a guarantee - just a
best effort.

Have you done any kind of measurement on this?  I'm quite worried about
the worst-case that this introduces into the packet processing path.
The existing design avoids this worst-case lock performance precisely
because it marks the connection as needing expiration so that future
lookups will skip.

>              }
>              create_new_conn = true;
>              break;
> @@ -1242,7 +1239,7 @@ process_one(struct conntrack *ct, struct dp_packet *pkt,
>      if (OVS_UNLIKELY(force && ctx->reply && conn)) {
>          if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key,
>                          now, NULL, NULL)) {
> -            conn_force_expire(conn);
> +            conn_clean(ct, conn);
>          }
>          conn = NULL;
>      }
> @@ -1429,7 +1426,8 @@ conntrack_get_sweep_interval(struct conntrack *ct)
>  }
> 
>  static size_t
> -ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
> +ct_sweep(struct conntrack *ct, struct rculist *list, long long now,
> +         size_t *cleaned_count)
>      OVS_NO_THREAD_SAFETY_ANALYSIS
>  {
>      struct conn *conn;
> @@ -1438,6 +1436,7 @@ ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
>      RCULIST_FOR_EACH (conn, node, list) {
>          if (conn_expired(conn, now)) {
>              conn_clean(ct, conn);
> +            (*cleaned_count) ++;

A fix somewhat like this may be needed anyway.  The log seems to have become
inaccurate with:

3d9c1b855a5f ("conntrack: Replace timeout based expiration lists with rculists.")

Please submit it as a separate patch so we can backport.

>          }
> 
>          count++;
> @@ -1454,7 +1453,7 @@ conntrack_clean(struct conntrack *ct, long long now)
>  {
>      long long next_wakeup = now + conntrack_get_sweep_interval(ct);
>      unsigned int n_conn_limit, i;
> -    size_t clean_end, count = 0;
> +    size_t clean_end, count = 0, cleaned_count = 0;
> 
>      atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit);
>      clean_end = n_conn_limit / 64;
> @@ -1465,13 +1464,13 @@ conntrack_clean(struct conntrack *ct, long long now)
>              break;
>          }
> 
> -        count += ct_sweep(ct, &ct->exp_lists[i], now);
> +        count += ct_sweep(ct, &ct->exp_lists[i], now, &cleaned_count);
>      }
> 
>      ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
>  
> -    VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
> -             time_msec() - now);
> +    VLOG_DBG("conntrack cleanup %"PRIuSIZE"/%"PRIuSIZE" entries in %lld msec",
> +             cleaned_count, count, time_msec() - now);
>  
>      return next_wakeup;
>  }
diff mbox series

Patch

diff --git a/lib/conntrack.c b/lib/conntrack.c
index 8a7056bac..6e137daa6 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -461,12 +461,6 @@  conn_clean(struct conntrack *ct, struct conn *conn)
     atomic_count_dec(&ct->n_conn);
 }
 
-static void
-conn_force_expire(struct conn *conn)
-{
-    atomic_store_relaxed(&conn->expiration, 0);
-}
-
 /* Destroys the connection tracker 'ct' and frees all the allocated memory.
  * The caller of this function must already have shut down packet input
  * and PMD threads (which would have been quiesced).  */
@@ -1030,7 +1024,10 @@  conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
         case CT_UPDATE_NEW:
             if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key,
                             now, NULL, NULL)) {
-                conn_force_expire(conn);
+                /* Instead of setting conn->expiration and let ct_clean thread
+                 * clean the conn, doing the clean now to avoid duplicate hash
+                 * in ct->conns for performance reason. */
+                conn_clean(ct, conn);
             }
             create_new_conn = true;
             break;
@@ -1242,7 +1239,7 @@  process_one(struct conntrack *ct, struct dp_packet *pkt,
     if (OVS_UNLIKELY(force && ctx->reply && conn)) {
         if (conn_lookup(ct, &conn->key_node[CT_DIR_FWD].key,
                         now, NULL, NULL)) {
-            conn_force_expire(conn);
+            conn_clean(ct, conn);
         }
         conn = NULL;
     }
@@ -1429,7 +1426,8 @@  conntrack_get_sweep_interval(struct conntrack *ct)
 }
 
 static size_t
-ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
+ct_sweep(struct conntrack *ct, struct rculist *list, long long now,
+         size_t *cleaned_count)
     OVS_NO_THREAD_SAFETY_ANALYSIS
 {
     struct conn *conn;
@@ -1438,6 +1436,7 @@  ct_sweep(struct conntrack *ct, struct rculist *list, long long now)
     RCULIST_FOR_EACH (conn, node, list) {
         if (conn_expired(conn, now)) {
             conn_clean(ct, conn);
+            (*cleaned_count) ++;
         }
 
         count++;
@@ -1454,7 +1453,7 @@  conntrack_clean(struct conntrack *ct, long long now)
 {
     long long next_wakeup = now + conntrack_get_sweep_interval(ct);
     unsigned int n_conn_limit, i;
-    size_t clean_end, count = 0;
+    size_t clean_end, count = 0, cleaned_count = 0;
 
     atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit);
     clean_end = n_conn_limit / 64;
@@ -1465,13 +1464,13 @@  conntrack_clean(struct conntrack *ct, long long now)
             break;
         }
 
-        count += ct_sweep(ct, &ct->exp_lists[i], now);
+        count += ct_sweep(ct, &ct->exp_lists[i], now, &cleaned_count);
     }
 
     ct->next_sweep = (i < N_EXP_LISTS) ? i : 0;
 
-    VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
-             time_msec() - now);
+    VLOG_DBG("conntrack cleanup %"PRIuSIZE"/%"PRIuSIZE" entries in %lld msec",
+             cleaned_count, count, time_msec() - now);
 
     return next_wakeup;
 }