[ovs-dev,v5,2/2] handlers: Fix handlers mapping

Message ID	20220606185942.858101-2-msantana@redhat.com
State	Superseded
Headers	show Return-Path: <ovs-dev-bounces@openvswitch.org> From: Michael Santana <msantana@redhat.com> To: ovs-dev@openvswitch.org, i.maximets@ovn.org, Flavio Leitner <fbl@sysclose.org>, Aaron Conole <aconole@redhat.com> Date: Mon, 6 Jun 2022 14:59:42 -0400 Message-Id: <20220606185942.858101-2-msantana@redhat.com> In-Reply-To: <20220606185942.858101-1-msantana@redhat.com> References: <20220527171817.662264-1-msantana@redhat.com> <20220606185942.858101-1-msantana@redhat.com> MIME-Version: 1.0 Subject: [ovs-dev] [PATCH v5 2/2] handlers: Fix handlers mapping Precedence: list Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" <ovs-dev-bounces@openvswitch.org>
Series	[ovs-dev,v5,1/2] handlers: Create additional handler threads when using CPU isolation \| expand [ovs-dev,v5,1/2] handlers: Create additional handler threads when using CPU isolation [ovs-dev,v5,2/2] handlers: Fix handlers mapping

Message ID

20220606185942.858101-2-msantana@redhat.com

State

Superseded

Headers

From: Michael Santana <msantana@redhat.com>
To: ovs-dev@openvswitch.org, i.maximets@ovn.org,
 Flavio Leitner <fbl@sysclose.org>, Aaron Conole <aconole@redhat.com>
Date: Mon,  6 Jun 2022 14:59:42 -0400
Message-Id: <20220606185942.858101-2-msantana@redhat.com>
In-Reply-To: <20220606185942.858101-1-msantana@redhat.com>
References: <20220527171817.662264-1-msantana@redhat.com>
 <20220606185942.858101-1-msantana@redhat.com>
MIME-Version: 1.0
Subject: [ovs-dev] [PATCH v5 2/2] handlers: Fix handlers mapping
Precedence: list
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ovs-dev-bounces@openvswitch.org
Sender: "dev" <ovs-dev-bounces@openvswitch.org>

Series

[ovs-dev,v5,1/2] handlers: Create additional handler threads when using CPU isolation | expand

Context	Check	Description
ovsrobot/apply-robot	success	apply and check: success
ovsrobot/github-robot-_Build_and_Test	success	github build: passed

Context

Check

Description

ovsrobot/apply-robot

success

apply and check: success

ovsrobot/github-robot-_Build_and_Test

success

github build: passed

Commit Message

Michael Santana June 6, 2022, 6:59 p.m. UTC

The handler and CPU mapping in upcalls are incorrect, and this is
specially noticeable systems with cpu isolation enabled.

Say we have a 12 core system where only every even number CPU is enabled
C0, C2, C4, C6, C8, C10

This means we will create an array of size 6 that will be sent to
kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]

The problem is when the kernel does an upcall it checks the socket array
via the index of the CPU, effectively adding additional load on some
CPUs while leaving no work on other CPUs.

e.g.

C0  indexes to S0
C2  indexes to S2 (should be S1)
C4  indexes to S4 (should be S2)

Modulo of 6 (size of socket array) is applied, so we wrap back to S0
C6  indexes to S0 (should be S3)
C8  indexes to S2 (should be S4)
C10 indexes to S4 (should be S5)

Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
get no work assigned to them

This leads to the kernel to throw the following message:
"openvswitch: cpu_id mismatch with handler threads"

Instead we will send the kernel a corrected array of sockets the size
of all CPUs in the system. In the above example we would create a
corrected array in a round-robin(assuming prime bias) fashion as follows:
[S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4]

Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")

Co-authored-by: Aaron Conole <aconole@redhat.com>
signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Michael Santana <msantana@redhat.com>
---
 lib/dpif-netlink.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Ilya Maximets July 4, 2022, 2:57 p.m. UTC | #1

On 6/6/22 20:59, Michael Santana wrote:
> The handler and CPU mapping in upcalls are incorrect, and this is
> specially noticeable systems with cpu isolation enabled.
> 
> Say we have a 12 core system where only every even number CPU is enabled
> C0, C2, C4, C6, C8, C10
> 
> This means we will create an array of size 6 that will be sent to
> kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]
> 
> The problem is when the kernel does an upcall it checks the socket array
> via the index of the CPU, effectively adding additional load on some
> CPUs while leaving no work on other CPUs.
> 
> e.g.
> 
> C0  indexes to S0
> C2  indexes to S2 (should be S1)
> C4  indexes to S4 (should be S2)
> 
> Modulo of 6 (size of socket array) is applied, so we wrap back to S0
> C6  indexes to S0 (should be S3)
> C8  indexes to S2 (should be S4)
> C10 indexes to S4 (should be S5)
> 
> Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
> get no work assigned to them

I suppose, most of this commit message should be a commit message
of the first patch.  This one patch doesn't ensure fairness of
the distribution, its only purpose is to silence the kernel warning,
AFAICT.  Actual workload distribution with and without this patch
should be the same.

> 
> This leads to the kernel to throw the following message:
> "openvswitch: cpu_id mismatch with handler threads"
> 
> Instead we will send the kernel a corrected array of sockets the size
> of all CPUs in the system. In the above example we would create a
> corrected array in a round-robin(assuming prime bias) fashion as follows:
> [S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4]
> 
> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
> 
> Co-authored-by: Aaron Conole <aconole@redhat.com>
> signed-off-by: Aaron Conole <aconole@redhat.com>
> Signed-off-by: Michael Santana <msantana@redhat.com>
> ---
>  lib/dpif-netlink.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index e948a829b..dad4f684c 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -804,11 +804,11 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>      struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
>      struct dpif_netlink_dp request, reply;
>      struct ofpbuf *bufp;
> -    int error;
> -    int n_cores;
>  
> -    n_cores = count_cpu_cores();
> -    ovs_assert(n_cores == n_upcall_pids);
> +    uint32_t *corrected;
> +    int error, i, n_cores;
> +
> +    n_cores = count_total_cores();
>      VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores);
>  
>      dpif_netlink_dp_init(&request);
> @@ -818,7 +818,12 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>      request.user_features = dpif->user_features |
>                              OVS_DP_F_DISPATCH_UPCALL_PER_CPU;
>  
> -    request.upcall_pids = upcall_pids;
> +    corrected = xcalloc(n_cores, sizeof(uint32_t));

sizeof *corrected

> +
> +    for (i = 0; i < n_cores; i++) {
> +        corrected[i] = upcall_pids[i % n_upcall_pids];
> +    }

IIUC, core numbers are not necessarily contiguous.  So,
by using the number of cores we will create enough threads,
but may not cover all the actual core ids in the array.
Maybe it makes sense to add a new function to the ovs-numa
module that will iterate over all_cpu_cores and return the
maximum core id?  It will miss offline cores, so we'll need
to get the max between this value and the count_total_cores()
if we want to try to accommodate offline cores as well.
Not sure if we care much about offline cores though.

> +    request.upcall_pids = corrected;
>      request.n_upcall_pids = n_cores;
>  
>      error = dpif_netlink_dp_transact(&request, &reply, &bufp);
> @@ -826,9 +831,10 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>          dpif->user_features = reply.user_features;
>          ofpbuf_delete(bufp);
>          if (!dpif_netlink_upcall_per_cpu(dpif)) {
> -            return -EOPNOTSUPP;
> +            error = -EOPNOTSUPP;
>          }
>      }
> +    free(corrected);
>      return error;
>  }
>

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e948a829b..dad4f684c 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -804,11 +804,11 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
     struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
     struct dpif_netlink_dp request, reply;
     struct ofpbuf *bufp;
-    int error;
-    int n_cores;
 
-    n_cores = count_cpu_cores();
-    ovs_assert(n_cores == n_upcall_pids);
+    uint32_t *corrected;
+    int error, i, n_cores;
+
+    n_cores = count_total_cores();
     VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores);
 
     dpif_netlink_dp_init(&request);
@@ -818,7 +818,12 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
     request.user_features = dpif->user_features |
                             OVS_DP_F_DISPATCH_UPCALL_PER_CPU;
 
-    request.upcall_pids = upcall_pids;
+    corrected = xcalloc(n_cores, sizeof(uint32_t));
+
+    for (i = 0; i < n_cores; i++) {
+        corrected[i] = upcall_pids[i % n_upcall_pids];
+    }
+    request.upcall_pids = corrected;
     request.n_upcall_pids = n_cores;
 
     error = dpif_netlink_dp_transact(&request, &reply, &bufp);
@@ -826,9 +831,10 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
         dpif->user_features = reply.user_features;
         ofpbuf_delete(bufp);
         if (!dpif_netlink_upcall_per_cpu(dpif)) {
-            return -EOPNOTSUPP;
+            error = -EOPNOTSUPP;
         }
     }
+    free(corrected);
     return error;
 }

[ovs-dev,v5,2/2] handlers: Fix handlers mapping

Checks

Commit Message

Comments

Patch