diff mbox series

[ovs-dev,v2] dpif-netdev: add parameters to configure autolb

Message ID 20200813090713.48316-1-cfontain@redhat.com
State Superseded
Headers show
Series [ovs-dev,v2] dpif-netdev: add parameters to configure autolb | expand

Commit Message

Christophe Fontaine Aug. 13, 2020, 9:07 a.m. UTC
From: Christophe Fontaine <cfontain@redhat.com>

ALB_ACCEPTABLE_IMPROVEMENT and ALB_PMD_LOAD_THRESHOLD default values
can be overriden with "pmd-auto-lb-acc-improvement" and "pmd-auto-lb-threshold".

Default values may not be suitable for all use cases, and we may want to
experiment a more (or less) aggressive rebalance, either on the threshold
(ie CPU usage which triggers a rebalance) or on the acceptable improvement
(ie if the new queue assignation will be applied or discarded).

$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-acc-improvement=20
$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-threshold=70

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
---
 lib/dpif-netdev.c    | 12 ++++++++++--
 vswitchd/vswitch.xml | 28 +++++++++++++++++++++++++++-
 2 files changed, 37 insertions(+), 3 deletions(-)

Comments

Aaron Conole Aug. 25, 2020, 3:30 p.m. UTC | #1
cfontain@redhat.com writes:

> From: Christophe Fontaine <cfontain@redhat.com>
>
> ALB_ACCEPTABLE_IMPROVEMENT and ALB_PMD_LOAD_THRESHOLD default values
> can be overriden with "pmd-auto-lb-acc-improvement" and "pmd-auto-lb-threshold".
>
> Default values may not be suitable for all use cases, and we may want to
> experiment a more (or less) aggressive rebalance, either on the threshold
> (ie CPU usage which triggers a rebalance) or on the acceptable improvement
> (ie if the new queue assignation will be applied or discarded).
>
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-acc-improvement=20
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-threshold=70
>
> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> ---

Hi Christophe,

Thanks for the contribution.  I think it should have a NEWS item.

Also, add an entry into .mailmap and AUTHORS.rst with your name and
email  :)

>  lib/dpif-netdev.c    | 12 ++++++++++--
>  vswitchd/vswitch.xml | 28 +++++++++++++++++++++++++++-
>  2 files changed, 37 insertions(+), 3 deletions(-)
>
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 02df8f11e..b981706b5 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -300,6 +300,8 @@ struct pmd_auto_lb {
>      bool is_enabled;            /* Current status of Auto load balancing. */
>      uint64_t rebalance_intvl;
>      uint64_t rebalance_poll_timer;
> +    uint64_t rebalance_acc_improvement;
> +    uint64_t rebalance_threshold;
>  };
>  
>  /* Datapath based on the network device interface from netdev.h.
> @@ -4346,6 +4348,12 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>          pmd_alb->rebalance_intvl = rebalance_intvl;
>      }
>  
> +    pmd_alb->rebalance_acc_improvement = smap_get_int(other_config,
> +                "pmd-auto-lb-acc-improvement", ALB_ACCEPTABLE_IMPROVEMENT);
> +
> +    pmd_alb->rebalance_threshold = smap_get_int(other_config,
> +                "pmd-auto-lb-threshold", ALB_PMD_LOAD_THRESHOLD);
> +
>      set_pmd_auto_lb(dp);
>      return 0;
>  }
> @@ -5674,7 +5682,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>              improvement =
>                  ((curr_variance - new_variance) * 100) / curr_variance;
>          }
> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> +        if (improvement < dp->pmd_alb.rebalance_acc_improvement) {
>              ret = false;
>          }
>      }
> @@ -8724,7 +8732,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>                  pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc));
>              }
>  
> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> +            if (pmd_load >= pmd_alb->rebalance_threshold) {
>                  atomic_count_inc(&pmd->pmd_overloaded);
>              } else {
>                  atomic_count_set(&pmd->pmd_overloaded, 0);
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 81c84927f..53f9be591 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,7 +654,7 @@
>          <p>
>           Configures PMD Auto Load Balancing that allows automatic assignment of
>           RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> -         > 95%).
> +         > other_config:pmd-auto-lb-threshold).
>          </p>
>          <p>
>           It uses current scheme of cycle based assignment of RX queues that
> @@ -690,6 +690,32 @@
>           once in few hours or a day or a week.
>          </p>
>        </column>
> +      <column name="other_config" key="pmd-auto-lb-threshold"
> +              type='{"type": "integer",
> +                     "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the threshold defining when a PMD is overloaded.
> +         When this threshold is reached, it will trigger a request to
> +         rebalance the different queues between the non-pinned pmds.
> +        </p>
> +        <p>
> +         The default value is <code>95</code>.
> +        </p>
> +      </column>
> +      <column name="other_config" key="pmd-auto-lb-acc-improvement"
> +              type='{"type": "integer",
> +                     "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         When an auto load balance iteration is requested, this value
> +         defines if the new queue assignation will be applied: comparing
> +         the current queue assignation and the new one, the new assignation
> +         will be applied only if the improvement of CPU usage is above
> +         this value.
> +        </p>
> +        <p>
> +         The default value is <code>25</code>.
> +        </p>
> +      </column>
>        <column name="other_config" key="userspace-tso-enable"
>                type='{"type": "boolean"}'>
>          <p>
Kevin Traynor Aug. 26, 2020, 4:32 p.m. UTC | #2
Hi Christophe,

Thanks for sending the patch. Few comments below.

On 13/08/2020 10:07, cfontain@redhat.com wrote:
> From: Christophe Fontaine <cfontain@redhat.com>
> 
> ALB_ACCEPTABLE_IMPROVEMENT and ALB_PMD_LOAD_THRESHOLD default values
> can be overriden with "pmd-auto-lb-acc-improvement" and "pmd-auto-lb-threshold".
> 
> Default values may not be suitable for all use cases, and we may want to
> experiment a more (or less) aggressive rebalance, either on the threshold
> (ie CPU usage which triggers a rebalance) or on the acceptable improvement
> (ie if the new queue assignation will be applied or discarded).
> 
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-acc-improvement=20
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-threshold=70
> 

Naming is hard. Trying to think what is a good name for a user not
familiar with the internals...

How about something like,
pmd-auto-lb-variance/pmd-auto-lb-improvement/pmd-auto-lb-var-improvement
and pmd-auto-lb-cpu. Just suggestions, there could be better names.

> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> ---
>  lib/dpif-netdev.c    | 12 ++++++++++--
>  vswitchd/vswitch.xml | 28 +++++++++++++++++++++++++++-
>  2 files changed, 37 insertions(+), 3 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 02df8f11e..b981706b5 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -300,6 +300,8 @@ struct pmd_auto_lb {
>      bool is_enabled;            /* Current status of Auto load balancing. */
>      uint64_t rebalance_intvl;
>      uint64_t rebalance_poll_timer;
> +    uint64_t rebalance_acc_improvement;

> +    uint64_t rebalance_threshold;

afaict, this one is being accessed in main and pmd threads and should be
type atomic.

>  };
>  
>  /* Datapath based on the network device interface from netdev.h.
> @@ -4346,6 +4348,12 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>          pmd_alb->rebalance_intvl = rebalance_intvl;
>      }
>  
> +    pmd_alb->rebalance_acc_improvement = smap_get_int(other_config,
> +                "pmd-auto-lb-acc-improvement", ALB_ACCEPTABLE_IMPROVEMENT);
> +
> +    pmd_alb->rebalance_threshold = smap_get_int(other_config,
> +                "pmd-auto-lb-threshold", ALB_PMD_LOAD_THRESHOLD);

You could check if these have increased (see other args above) and if
so, clear the pmd_overloaded for each pmd (like in
https://github.com/openvswitch/ovs/blob/master/lib/dpif-netdev.c#L5657).

Otherwise, overloads by the old threshold may have already been recorded
and contribute towards a rebalance happening. OTOH, you could argue that
they were legitimate when they were recorded and should stay.

> +
>      set_pmd_auto_lb(dp);

set_pmd_auto_lib() prints the rebalance interval when it is enabled, I
think the two new parameters should be printed too now.

>      return 0;
>  }
> @@ -5674,7 +5682,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>              improvement =
>                  ((curr_variance - new_variance) * 100) / curr_variance;
>          }
> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> +        if (improvement < dp->pmd_alb.rebalance_acc_improvement) {
>              ret = false;
>          }
>      }

Now that the user is setting the variance % improvement required it
makes sense to add some debug so they can see %'s rather than just the
variance values.

> @@ -8724,7 +8732,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>                  pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc));
>              }
>  
> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> +            if (pmd_load >= pmd_alb->rebalance_threshold) {
>                  atomic_count_inc(&pmd->pmd_overloaded);
>              } else {
>                  atomic_count_set(&pmd->pmd_overloaded, 0);
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 81c84927f..53f9be591 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,7 +654,7 @@
>          <p>
>           Configures PMD Auto Load Balancing that allows automatic assignment of
>           RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> -         > 95%).
> +         > other_config:pmd-auto-lb-threshold).

We can't say overloaded as it may be a lowish value now. It's not worded
great in the first place as CPU is not the only consideration. Could be
something like "...if CPU thresholds are exceeded and there is an
imbalance between PMDs."

>          </p>
>          <p>
>           It uses current scheme of cycle based assignment of RX queues that
> @@ -690,6 +690,32 @@
>           once in few hours or a day or a week.
>          </p>
>        </column>
> +      <column name="other_config" key="pmd-auto-lb-threshold"
> +              type='{"type": "integer",
> +                     "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the threshold defining when a PMD is overloaded.
> +         When this threshold is reached, it will trigger a request to
> +         rebalance the different queues between the non-pinned pmds.

Need to mention the units - % of used cycles on the PMD.

> +        </p>
> +        <p>
> +         The default value is <code>95</code>.
> +        </p>
> +      </column>
> +      <column name="other_config" key="pmd-auto-lb-acc-improvement"
> +              type='{"type": "integer",
> +                     "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         When an auto load balance iteration is requested, this value
> +         defines if the new queue assignation will be applied: comparing
> +         the current queue assignation and the new one, the new assignation
> +         will be applied only if the improvement of CPU usage is above
> +         this value.

Need to mention the units - % variance between the loads on the PMDs.

> +        </p>
> +        <p>
> +         The default value is <code>25</code>.
> +        </p>
> +      </column>
>        <column name="other_config" key="userspace-tso-enable"
>                type='{"type": "boolean"}'>
>          <p>
> 

You could add a few sentences to the auto-load balance section in
pmd.rst also, as it's a bit more humanly readable than here.

thanks,
Kevin.
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 02df8f11e..b981706b5 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -300,6 +300,8 @@  struct pmd_auto_lb {
     bool is_enabled;            /* Current status of Auto load balancing. */
     uint64_t rebalance_intvl;
     uint64_t rebalance_poll_timer;
+    uint64_t rebalance_acc_improvement;
+    uint64_t rebalance_threshold;
 };
 
 /* Datapath based on the network device interface from netdev.h.
@@ -4346,6 +4348,12 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
         pmd_alb->rebalance_intvl = rebalance_intvl;
     }
 
+    pmd_alb->rebalance_acc_improvement = smap_get_int(other_config,
+                "pmd-auto-lb-acc-improvement", ALB_ACCEPTABLE_IMPROVEMENT);
+
+    pmd_alb->rebalance_threshold = smap_get_int(other_config,
+                "pmd-auto-lb-threshold", ALB_PMD_LOAD_THRESHOLD);
+
     set_pmd_auto_lb(dp);
     return 0;
 }
@@ -5674,7 +5682,7 @@  pmd_rebalance_dry_run(struct dp_netdev *dp)
             improvement =
                 ((curr_variance - new_variance) * 100) / curr_variance;
         }
-        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
+        if (improvement < dp->pmd_alb.rebalance_acc_improvement) {
             ret = false;
         }
     }
@@ -8724,7 +8732,7 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
                 pmd_load = ((tot_proc * 100) / (tot_idle + tot_proc));
             }
 
-            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
+            if (pmd_load >= pmd_alb->rebalance_threshold) {
                 atomic_count_inc(&pmd->pmd_overloaded);
             } else {
                 atomic_count_set(&pmd->pmd_overloaded, 0);
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 81c84927f..53f9be591 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -654,7 +654,7 @@ 
         <p>
          Configures PMD Auto Load Balancing that allows automatic assignment of
          RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
-         > 95%).
+         > other_config:pmd-auto-lb-threshold).
         </p>
         <p>
          It uses current scheme of cycle based assignment of RX queues that
@@ -690,6 +690,32 @@ 
          once in few hours or a day or a week.
         </p>
       </column>
+      <column name="other_config" key="pmd-auto-lb-threshold"
+              type='{"type": "integer",
+                     "minInteger": 0, "maxInteger": 100}'>
+        <p>
+         Specifies the threshold defining when a PMD is overloaded.
+         When this threshold is reached, it will trigger a request to
+         rebalance the different queues between the non-pinned pmds.
+        </p>
+        <p>
+         The default value is <code>95</code>.
+        </p>
+      </column>
+      <column name="other_config" key="pmd-auto-lb-acc-improvement"
+              type='{"type": "integer",
+                     "minInteger": 0, "maxInteger": 100}'>
+        <p>
+         When an auto load balance iteration is requested, this value
+         defines if the new queue assignation will be applied: comparing
+         the current queue assignation and the new one, the new assignation
+         will be applied only if the improvement of CPU usage is above
+         this value.
+        </p>
+        <p>
+         The default value is <code>25</code>.
+        </p>
+      </column>
       <column name="other_config" key="userspace-tso-enable"
               type='{"type": "boolean"}'>
         <p>