diff mbox series

[ovs-dev,1/2] dpif-netdev: Do not create handler threads.

Message ID 91101df319805c55dc05597cb34d36cbd8bce867.1707226223.git.echaudro@redhat.com
State Changes Requested
Delegated to: Eelco Chaudron
Headers show
Series [ovs-dev,1/2] dpif-netdev: Do not create handler threads. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/intel-ovs-compilation success test: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Eelco Chaudron Feb. 6, 2024, 1:30 p.m. UTC
Avoid unnecessary thread creation as no upcalls are generated,
resulting in idle threads waiting for process termination.

This optimization significantly reduces memory usage, cutting it
by half on a 128 CPU/thread system during testing, with the number
of threads reduced from 95 to 0.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
---
 lib/dpif-netdev.c             | 10 +++++++++-
 ofproto/ofproto-dpif-upcall.c | 25 ++++++++++++++++---------
 2 files changed, 25 insertions(+), 10 deletions(-)

Comments

David Marchand Feb. 6, 2024, 2:17 p.m. UTC | #1
On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote:
>
> Avoid unnecessary thread creation as no upcalls are generated,
> resulting in idle threads waiting for process termination.
>
> This optimization significantly reduces memory usage, cutting it
> by half on a 128 CPU/thread system during testing, with the number
> of threads reduced from 95 to 0.
>
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>

I find it weird that the dpif layer reports an information on how the
ofproto-dpif layer behaves.
The handler threads are something ofproto-dpif is responsible for.
The upcall receiving loop is something the ofproto-dpif owns.
Why should the dpif layer tells how many handlers are needed?


I would have seen a different change, where the dpif layer exports a
capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }.
ofproto-dpif would then deduce there is no handler to start at all.
Eelco Chaudron Feb. 6, 2024, 2:47 p.m. UTC | #2
On 6 Feb 2024, at 15:17, David Marchand wrote:

> On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote:
>>
>> Avoid unnecessary thread creation as no upcalls are generated,
>> resulting in idle threads waiting for process termination.
>>
>> This optimization significantly reduces memory usage, cutting it
>> by half on a 128 CPU/thread system during testing, with the number
>> of threads reduced from 95 to 0.
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>
> I find it weird that the dpif layer reports an information on how the
> ofproto-dpif layer behaves.
> The handler threads are something ofproto-dpif is responsible for.
> The upcall receiving loop is something the ofproto-dpif owns.
> Why should the dpif layer tells how many handlers are needed?
>
>
> I would have seen a different change, where the dpif layer exports a
> capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }.
> ofproto-dpif would then deduce there is no handler to start at all.

That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition:

369      /* Queries 'dpif' to see if a certain number of handlers are required by
370       * the implementation.
371       *
372       * If a certain number of handlers are required, returns 'true' and sets
373       * 'n_handlers' to that number of handler threads.
374       *
375       * If not, returns 'false'.
376       */
377      bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers);

I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0.

//Eelco
David Marchand Feb. 6, 2024, 2:50 p.m. UTC | #3
On Tue, Feb 6, 2024 at 3:47 PM Eelco Chaudron <echaudro@redhat.com> wrote:
> On 6 Feb 2024, at 15:17, David Marchand wrote:
>
> > On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote:
> >>
> >> Avoid unnecessary thread creation as no upcalls are generated,
> >> resulting in idle threads waiting for process termination.
> >>
> >> This optimization significantly reduces memory usage, cutting it
> >> by half on a 128 CPU/thread system during testing, with the number
> >> of threads reduced from 95 to 0.
> >>
> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> >
> > I find it weird that the dpif layer reports an information on how the
> > ofproto-dpif layer behaves.
> > The handler threads are something ofproto-dpif is responsible for.
> > The upcall receiving loop is something the ofproto-dpif owns.
> > Why should the dpif layer tells how many handlers are needed?
> >
> >
> > I would have seen a different change, where the dpif layer exports a
> > capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }.
> > ofproto-dpif would then deduce there is no handler to start at all.
>
> That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition:
>
> 369      /* Queries 'dpif' to see if a certain number of handlers are required by
> 370       * the implementation.
> 371       *
> 372       * If a certain number of handlers are required, returns 'true' and sets
> 373       * 'n_handlers' to that number of handler threads.
> 374       *
> 375       * If not, returns 'false'.
> 376       */
> 377      bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers);
>
> I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0.

The fact that it exists does not convince me on its validity :-).
I must be missing something.
Eelco Chaudron Feb. 6, 2024, 3 p.m. UTC | #4
On 6 Feb 2024, at 15:50, David Marchand wrote:

> On Tue, Feb 6, 2024 at 3:47 PM Eelco Chaudron <echaudro@redhat.com> wrote:
>> On 6 Feb 2024, at 15:17, David Marchand wrote:
>>
>>> On Tue, Feb 6, 2024 at 2:31 PM Eelco Chaudron <echaudro@redhat.com> wrote:
>>>>
>>>> Avoid unnecessary thread creation as no upcalls are generated,
>>>> resulting in idle threads waiting for process termination.
>>>>
>>>> This optimization significantly reduces memory usage, cutting it
>>>> by half on a 128 CPU/thread system during testing, with the number
>>>> of threads reduced from 95 to 0.
>>>>
>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>
>>> I find it weird that the dpif layer reports an information on how the
>>> ofproto-dpif layer behaves.
>>> The handler threads are something ofproto-dpif is responsible for.
>>> The upcall receiving loop is something the ofproto-dpif owns.
>>> Why should the dpif layer tells how many handlers are needed?
>>>
>>>
>>> I would have seen a different change, where the dpif layer exports a
>>> capability, like dpif_can_recv() { return !!dpif->dpif_class->recv; }.
>>> ofproto-dpif would then deduce there is no handler to start at all.
>>
>> That was my first idea also, but then I found there is already an API call to the dpif layer where it can tell the user (ofproto in this case) how many threads it needs to function correctly. Here is the API definition:
>>
>> 369      /* Queries 'dpif' to see if a certain number of handlers are required by
>> 370       * the implementation.
>> 371       *
>> 372       * If a certain number of handlers are required, returns 'true' and sets
>> 373       * 'n_handlers' to that number of handler threads.
>> 374       *
>> 375       * If not, returns 'false'.
>> 376       */
>> 377      bool (*number_handlers_required)(struct dpif *dpif, uint32_t *n_handlers);
>>
>> I guess the ‘If a certain number of handlers are required, returns 'true’’ part fits here, as we need 0.
>
> The fact that it exists does not convince me on its validity :-).
> I must be missing something.

Well, it makes sense that if your dpif requires a specific number of handlers other than what ofproto suggests you can override this.
And this is what happens here, ofproto will suggest 90+ threads if we do not report that we want 0.

I think this is more clean than doing another check for a missing callback and using that to determine if we need to start threads.

//Eelco
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c1981137f..1a29ab7ac 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -5250,6 +5250,14 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
     return 0;
 }
 
+static bool
+dpif_netdev_number_handlers_required(struct dpif *dpif_ OVS_UNUSED,
+                                     uint32_t *n_handlers)
+{
+    *n_handlers = 0;
+    return true;
+}
+
 /* Parses affinity list and returns result in 'core_ids'. */
 static int
 parse_affinity_list(const char *affinity_list, unsigned *core_ids, int n_rxq)
@@ -9985,7 +9993,7 @@  const struct dpif_class dpif_netdev_class = {
     dpif_netdev_offload_stats_get,
     NULL,                       /* recv_set */
     NULL,                       /* handlers_set */
-    NULL,                       /* number_handlers_required */
+    dpif_netdev_number_handlers_required,
     dpif_netdev_set_config,
     dpif_netdev_queue_to_priority,
     NULL,                       /* recv */
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index b5cbeed87..9a5c5c29c 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -584,7 +584,7 @@  static void
 udpif_start_threads(struct udpif *udpif, uint32_t n_handlers_,
                     uint32_t n_revalidators_)
 {
-    if (udpif && n_handlers_ && n_revalidators_) {
+    if (udpif && n_revalidators_) {
         /* Creating a thread can take a significant amount of time on some
          * systems, even hundred of milliseconds, so quiesce around it. */
         ovsrcu_quiesce_start();
@@ -592,14 +592,19 @@  udpif_start_threads(struct udpif *udpif, uint32_t n_handlers_,
         udpif->n_handlers = n_handlers_;
         udpif->n_revalidators = n_revalidators_;
 
-        udpif->handlers = xzalloc(udpif->n_handlers * sizeof *udpif->handlers);
-        for (size_t i = 0; i < udpif->n_handlers; i++) {
-            struct handler *handler = &udpif->handlers[i];
+        if (udpif->n_handlers) {
+            udpif->handlers = xzalloc(udpif->n_handlers
+                                      * sizeof *udpif->handlers);
+            for (size_t i = 0; i < udpif->n_handlers; i++) {
+                struct handler *handler = &udpif->handlers[i];
 
-            handler->udpif = udpif;
-            handler->handler_id = i;
-            handler->thread = ovs_thread_create(
-                "handler", udpif_upcall_handler, handler);
+                handler->udpif = udpif;
+                handler->handler_id = i;
+                handler->thread = ovs_thread_create(
+                    "handler", udpif_upcall_handler, handler);
+            }
+        } else {
+            udpif->handlers = NULL;
         }
 
         atomic_init(&udpif->enable_ufid, udpif->backer->rt_support.ufid);
@@ -662,7 +667,9 @@  udpif_set_threads(struct udpif *udpif, uint32_t n_handlers_,
     if (dpif_number_handlers_required(udpif->dpif, &n_handlers_requested)) {
         forced = true;
         if (!n_revalidators_) {
-            n_revalidators_requested = n_handlers_requested / 4 + 1;
+            n_revalidators_requested = (n_handlers_requested
+                                        ? n_handlers_requested
+                                        : MAX(count_cpu_cores(), 2)) / 4 + 1;
         } else {
             n_revalidators_requested = n_revalidators_;
         }