Message ID | 91101df319805c55dc05597cb34d36cbd8bce867.1707226223.git.echaudro@redhat.com |
---|---|
State | Changes Requested |
Delegated to: | Eelco Chaudron |
Headers | show |
Series | [ovs-dev,1/2] dpif-netdev: Do not create handler threads. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/intel-ovs-compilation | success | test: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote: > > Avoid unnecessary thread creation as no upcalls are generated, > resulting in idle threads waiting for process termination. > > This optimization significantly reduces memory usage, cutting it > by half on a 128 CPU/thread system during testing, with the number > of threads reduced from 95 to 0. > > Signed-off-by: Eelco Chaudron <echaudro@redhat.com> I find it weird that the dpif layer reports an information on how the ofproto-dpif layer behaves. The handler threads are something ofproto-dpif is responsible for. The upcall receiving loop is something the ofproto-dpif owns. Why should the dpif layer tells how many handlers are needed? I would have seen a different change, where the dpif layer exports a capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. ofproto-dpif would then deduce there is no handler to start at all.
On 6 Feb 2024, at 15:17, David Marchand wrote: > On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote: >> >> Avoid unnecessary thread creation as no upcalls are generated, >> resulting in idle threads waiting for process termination. >> >> This optimization significantly reduces memory usage, cutting it >> by half on a 128 CPU/thread system during testing, with the number >> of threads reduced from 95 to 0. >> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > > I find it weird that the dpif layer reports an information on how the > ofproto-dpif layer behaves. > The handler threads are something ofproto-dpif is responsible for. > The upcall receiving loop is something the ofproto-dpif owns. > Why should the dpif layer tells how many handlers are needed? > > > I would have seen a different change, where the dpif layer exports a > capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. > ofproto-dpif would then deduce there is no handler to start at all. That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition: 369 /* Queries 'dpif' to see if a certain number of handlers are required by 370 * the implementation. 371 * 372 * If a certain number of handlers are required, returns 'true' and sets 373 * 'n_handlers' to that number of handler threads. 374 * 375 * If not, returns 'false'. 376 */ 377 bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers); I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0. //Eelco
On Tue, Feb 6, 2024 at 3:47 PM Eelco Chaudron <echaudro@redhat.com> wrote: > On 6 Feb 2024, at 15:17, David Marchand wrote: > > > On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote: > >> > >> Avoid unnecessary thread creation as no upcalls are generated, > >> resulting in idle threads waiting for process termination. > >> > >> This optimization significantly reduces memory usage, cutting it > >> by half on a 128 CPU/thread system during testing, with the number > >> of threads reduced from 95 to 0. > >> > >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> > > > > I find it weird that the dpif layer reports an information on how the > > ofproto-dpif layer behaves. > > The handler threads are something ofproto-dpif is responsible for. > > The upcall receiving loop is something the ofproto-dpif owns. > > Why should the dpif layer tells how many handlers are needed? > > > > > > I would have seen a different change, where the dpif layer exports a > > capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. > > ofproto-dpif would then deduce there is no handler to start at all. > > That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition: > > 369 /* Queries 'dpif' to see if a certain number of handlers are required by > 370 * the implementation. > 371 * > 372 * If a certain number of handlers are required, returns 'true' and sets > 373 * 'n_handlers' to that number of handler threads. > 374 * > 375 * If not, returns 'false'. > 376 */ > 377 bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers); > > I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0. The fact that it exists does not convince me on its validity :-). I must be missing something.
On 6 Feb 2024, at 15:50, David Marchand wrote: > On Tue, Feb 6, 2024 at 3:47 PM Eelco Chaudron <echaudro@redhat.com> wrote: >> On 6 Feb 2024, at 15:17, David Marchand wrote: >> >>> On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote: >>>> >>>> Avoid unnecessary thread creation as no upcalls are generated, >>>> resulting in idle threads waiting for process termination. >>>> >>>> This optimization significantly reduces memory usage, cutting it >>>> by half on a 128 CPU/thread system during testing, with the number >>>> of threads reduced from 95 to 0. >>>> >>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com> >>> >>> I find it weird that the dpif layer reports an information on how the >>> ofproto-dpif layer behaves. >>> The handler threads are something ofproto-dpif is responsible for. >>> The upcall receiving loop is something the ofproto-dpif owns. >>> Why should the dpif layer tells how many handlers are needed? >>> >>> >>> I would have seen a different change, where the dpif layer exports a >>> capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }. >>> ofproto-dpif would then deduce there is no handler to start at all. >> >> That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition: >> >> 369 /* Queries 'dpif' to see if a certain number of handlers are required by >> 370 * the implementation. >> 371 * >> 372 * If a certain number of handlers are required, returns 'true' and sets >> 373 * 'n_handlers' to that number of handler threads. >> 374 * >> 375 * If not, returns 'false'. >> 376 */ >> 377 bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers); >> >> I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0. > > The fact that it exists does not convince me on its validity :-). > I must be missing something. Well, it makes sense that if your dpif requires a specific number of handlers other than what ofproto suggests you can override this. And this is what happens here, ofproto will suggest 90+ threads if we do not report that we want 0. I think this is more clean than doing another check for a missing callback and using that to determine if we need to start threads. //Eelco
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c1981137f..1a29ab7ac 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -5250,6 +5250,14 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config) return 0; } +static bool +dpif_netdev_number_handlers_required(struct dpif *dpif_ OVS_UNUSED, + uint32_t *n_handlers) +{ + *n_handlers = 0; + return true; +} + /* Parses affinity list and returns result in 'core_ids'. */ static int parse_affinity_list(const char *affinity_list, unsigned *core_ids, int n_rxq) @@ -9985,7 +9993,7 @@ const struct dpif_class dpif_netdev_class = { dpif_netdev_offload_stats_get, NULL, /* recv_set */ NULL, /* handlers_set */ - NULL, /* number_handlers_required */ + dpif_netdev_number_handlers_required, dpif_netdev_set_config, dpif_netdev_queue_to_priority, NULL, /* recv */ diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index b5cbeed87..9a5c5c29c 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -584,7 +584,7 @@ static void udpif_start_threads(struct udpif *udpif, uint32_t n_handlers_, uint32_t n_revalidators_) { - if (udpif && n_handlers_ && n_revalidators_) { + if (udpif && n_revalidators_) { /* Creating a thread can take a significant amount of time on some * systems, even hundred of milliseconds, so quiesce around it. */ ovsrcu_quiesce_start(); @@ -592,14 +592,19 @@ udpif_start_threads(struct udpif *udpif, uint32_t n_handlers_, udpif->n_handlers = n_handlers_; udpif->n_revalidators = n_revalidators_; - udpif->handlers = xzalloc(udpif->n_handlers * sizeof *udpif->handlers); - for (size_t i = 0; i < udpif->n_handlers; i++) { - struct handler *handler = &udpif->handlers[i]; + if (udpif->n_handlers) { + udpif->handlers = xzalloc(udpif->n_handlers + * sizeof *udpif->handlers); + for (size_t i = 0; i < udpif->n_handlers; i++) { + struct handler *handler = &udpif->handlers[i]; - handler->udpif = udpif; - handler->handler_id = i; - handler->thread = ovs_thread_create( - "handler", udpif_upcall_handler, handler); + handler->udpif = udpif; + handler->handler_id = i; + handler->thread = ovs_thread_create( + "handler", udpif_upcall_handler, handler); + } + } else { + udpif->handlers = NULL; } atomic_init(&udpif->enable_ufid, udpif->backer->rt_support.ufid); @@ -662,7 +667,9 @@ udpif_set_threads(struct udpif *udpif, uint32_t n_handlers_, if (dpif_number_handlers_required(udpif->dpif, &n_handlers_requested)) { forced = true; if (!n_revalidators_) { - n_revalidators_requested = n_handlers_requested / 4 + 1; + n_revalidators_requested = (n_handlers_requested + ? n_handlers_requested + : MAX(count_cpu_cores(), 2)) / 4 + 1; } else { n_revalidators_requested = n_revalidators_; }
Avoid unnecessary thread creation as no upcalls are generated, resulting in idle threads waiting for process termination. This optimization significantly reduces memory usage, cutting it by half on a 128 CPU/thread system during testing, with the number of threads reduced from 95 to 0. Signed-off-by: Eelco Chaudron <echaudro@redhat.com> --- lib/dpif-netdev.c | 10 +++++++++- ofproto/ofproto-dpif-upcall.c | 25 ++++++++++++++++--------- 2 files changed, 25 insertions(+), 10 deletions(-)