diff mbox series

[ovs-dev,v5,2/2] handlers: Fix handlers mapping

Message ID 20220606185942.858101-2-msantana@redhat.com
State Superseded
Headers show
Series [ovs-dev,v5,1/2] handlers: Create additional handler threads when using CPU isolation | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Michael Santana June 6, 2022, 6:59 p.m. UTC
The handler and CPU mapping in upcalls are incorrect, and this is
specially noticeable systems with cpu isolation enabled.

Say we have a 12 core system where only every even number CPU is enabled
C0, C2, C4, C6, C8, C10

This means we will create an array of size 6 that will be sent to
kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]

The problem is when the kernel does an upcall it checks the socket array
via the index of the CPU, effectively adding additional load on some
CPUs while leaving no work on other CPUs.

e.g.

C0  indexes to S0
C2  indexes to S2 (should be S1)
C4  indexes to S4 (should be S2)

Modulo of 6 (size of socket array) is applied, so we wrap back to S0
C6  indexes to S0 (should be S3)
C8  indexes to S2 (should be S4)
C10 indexes to S4 (should be S5)

Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
get no work assigned to them

This leads to the kernel to throw the following message:
"openvswitch: cpu_id mismatch with handler threads"

Instead we will send the kernel a corrected array of sockets the size
of all CPUs in the system. In the above example we would create a
corrected array in a round-robin(assuming prime bias) fashion as follows:
[S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4]

Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")

Co-authored-by: Aaron Conole <aconole@redhat.com>
signed-off-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: Michael Santana <msantana@redhat.com>
---
 lib/dpif-netlink.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

Comments

Ilya Maximets July 4, 2022, 2:57 p.m. UTC | #1
On 6/6/22 20:59, Michael Santana wrote:
> The handler and CPU mapping in upcalls are incorrect, and this is
> specially noticeable systems with cpu isolation enabled.
> 
> Say we have a 12 core system where only every even number CPU is enabled
> C0, C2, C4, C6, C8, C10
> 
> This means we will create an array of size 6 that will be sent to
> kernel that is populated with sockets [S0, S1, S2, S3, S4, S5]
> 
> The problem is when the kernel does an upcall it checks the socket array
> via the index of the CPU, effectively adding additional load on some
> CPUs while leaving no work on other CPUs.
> 
> e.g.
> 
> C0  indexes to S0
> C2  indexes to S2 (should be S1)
> C4  indexes to S4 (should be S2)
> 
> Modulo of 6 (size of socket array) is applied, so we wrap back to S0
> C6  indexes to S0 (should be S3)
> C8  indexes to S2 (should be S4)
> C10 indexes to S4 (should be S5)
> 
> Effectively sockets S0, S2, S4 get overloaded while sockets S1, S3, S5
> get no work assigned to them

I suppose, most of this commit message should be a commit message
of the first patch.  This one patch doesn't ensure fairness of
the distribution, its only purpose is to silence the kernel warning,
AFAICT.  Actual workload distribution with and without this patch
should be the same.

> 
> This leads to the kernel to throw the following message:
> "openvswitch: cpu_id mismatch with handler threads"
> 
> Instead we will send the kernel a corrected array of sockets the size
> of all CPUs in the system. In the above example we would create a
> corrected array in a round-robin(assuming prime bias) fashion as follows:
> [S0, S1, S2, S3, S4, S5, S6, S0, S1, S2, S3, S4]
> 
> Fixes: b1e517bd2f81 ("dpif-netlink: Introduce per-cpu upcall dispatch.")
> 
> Co-authored-by: Aaron Conole <aconole@redhat.com>
> signed-off-by: Aaron Conole <aconole@redhat.com>
> Signed-off-by: Michael Santana <msantana@redhat.com>
> ---
>  lib/dpif-netlink.c | 18 ++++++++++++------
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
> index e948a829b..dad4f684c 100644
> --- a/lib/dpif-netlink.c
> +++ b/lib/dpif-netlink.c
> @@ -804,11 +804,11 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>      struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
>      struct dpif_netlink_dp request, reply;
>      struct ofpbuf *bufp;
> -    int error;
> -    int n_cores;
>  
> -    n_cores = count_cpu_cores();
> -    ovs_assert(n_cores == n_upcall_pids);
> +    uint32_t *corrected;
> +    int error, i, n_cores;
> +
> +    n_cores = count_total_cores();
>      VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores);
>  
>      dpif_netlink_dp_init(&request);
> @@ -818,7 +818,12 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>      request.user_features = dpif->user_features |
>                              OVS_DP_F_DISPATCH_UPCALL_PER_CPU;
>  
> -    request.upcall_pids = upcall_pids;
> +    corrected = xcalloc(n_cores, sizeof(uint32_t));

sizeof *corrected

> +
> +    for (i = 0; i < n_cores; i++) {
> +        corrected[i] = upcall_pids[i % n_upcall_pids];
> +    }

IIUC, core numbers are not necessarily contiguous.  So,
by using the number of cores we will create enough threads,
but may not cover all the actual core ids in the array.
Maybe it makes sense to add a new function to the ovs-numa
module that will iterate over all_cpu_cores and return the
maximum core id?  It will miss offline cores, so we'll need
to get the max between this value and the count_total_cores()
if we want to try to accommodate offline cores as well.
Not sure if we care much about offline cores though.

> +    request.upcall_pids = corrected;
>      request.n_upcall_pids = n_cores;
>  
>      error = dpif_netlink_dp_transact(&request, &reply, &bufp);
> @@ -826,9 +831,10 @@ dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
>          dpif->user_features = reply.user_features;
>          ofpbuf_delete(bufp);
>          if (!dpif_netlink_upcall_per_cpu(dpif)) {
> -            return -EOPNOTSUPP;
> +            error = -EOPNOTSUPP;
>          }
>      }
> +    free(corrected);
>      return error;
>  }
>
diff mbox series

Patch

diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index e948a829b..dad4f684c 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -804,11 +804,11 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
     struct dpif_netlink *dpif = dpif_netlink_cast(dpif_);
     struct dpif_netlink_dp request, reply;
     struct ofpbuf *bufp;
-    int error;
-    int n_cores;
 
-    n_cores = count_cpu_cores();
-    ovs_assert(n_cores == n_upcall_pids);
+    uint32_t *corrected;
+    int error, i, n_cores;
+
+    n_cores = count_total_cores();
     VLOG_DBG("Dispatch mode(per-cpu): Number of CPUs is %d", n_cores);
 
     dpif_netlink_dp_init(&request);
@@ -818,7 +818,12 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
     request.user_features = dpif->user_features |
                             OVS_DP_F_DISPATCH_UPCALL_PER_CPU;
 
-    request.upcall_pids = upcall_pids;
+    corrected = xcalloc(n_cores, sizeof(uint32_t));
+
+    for (i = 0; i < n_cores; i++) {
+        corrected[i] = upcall_pids[i % n_upcall_pids];
+    }
+    request.upcall_pids = corrected;
     request.n_upcall_pids = n_cores;
 
     error = dpif_netlink_dp_transact(&request, &reply, &bufp);
@@ -826,9 +831,10 @@  dpif_netlink_set_handler_pids(struct dpif *dpif_, const uint32_t *upcall_pids,
         dpif->user_features = reply.user_features;
         ofpbuf_delete(bufp);
         if (!dpif_netlink_upcall_per_cpu(dpif)) {
-            return -EOPNOTSUPP;
+            error = -EOPNOTSUPP;
         }
     }
+    free(corrected);
     return error;
 }