diff mbox series

[ovs-dev,v6,2/4] dpif-netdev: Add parameters to configure PMD auto load balance.

Message ID 20201221131033.835007-3-ktraynor@redhat.com
State Accepted
Headers show
Series Add PMD auto load balance parameters. | expand

Commit Message

Kevin Traynor Dec. 21, 2020, 1:10 p.m. UTC
From: Christophe Fontaine <cfontain@redhat.com>

Two important parts of how PMD auto load balance operates are how
loaded a core needs to be and how much improvement is estimated
before a PMD auto load balance can trigger.

Previously they were hardcoded to 95% loaded and 25% variance
improvement.

These default values may not be suitable for all use cases and
we may want to use a more (or less) aggressive rebalance, either
on the pmd load threshold or on the minimum variance improvement
threshold.

The defaults are not changed, but "pmd-auto-lb-pmd-load" and
"pmd-auto-lb-improvement" parameters are added to override the
defaults.

$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
$ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"

Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
Acked-by: David Marchand <david.marchand@redhat.com>
---
 NEWS                 |  1 +
 lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-------
 vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
 3 files changed, 74 insertions(+), 10 deletions(-)

Comments

Stokes, Ian Jan. 14, 2021, 7:49 p.m. UTC | #1
> From: Christophe Fontaine <cfontain@redhat.com>
> 
> Two important parts of how PMD auto load balance operates are how
> loaded a core needs to be and how much improvement is estimated
> before a PMD auto load balance can trigger.
> 
> Previously they were hardcoded to 95% loaded and 25% variance
> improvement.
> 
> These default values may not be suitable for all use cases and
> we may want to use a more (or less) aggressive rebalance, either
> on the pmd load threshold or on the minimum variance improvement
> threshold.
> 
> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> "pmd-auto-lb-improvement" parameters are added to override the
> defaults.
> 
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
> 

Thanks for the patch Kevin,

Have tested over the last few days and it looks good to me. The only note I would have is
in addition to updating the vswitch.xml, I would think it was worth noting the two new parameters
in the vswitch documentation also (possibly giving an example of their usage).

https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#automatic-assignment-of-port-rx-queue-to-pmd-threads-experimental

Right now the documentation just refers to the default 95% threshold and 25% improvement.

So even an additional note to specify how the threshold and improvements can be changed would suffice I would think.

This doesn't have to block merge, but would appreciate a follow up patch to improve the docs soon.

Minor comment for discussion below.

> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> Acked-by: David Marchand <david.marchand@redhat.com>
> ---
>  NEWS                 |  1 +
>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-----
> --
>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
>  3 files changed, 74 insertions(+), 10 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 7e291a180..7994e434d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,4 +17,5 @@ Post-v2.14.0
>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>         that allows configuring the MAC address of a VF representor.
> +     * Add parameters to configure PMD auto load balance behaviour.
>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
>       as the DNS resolver's (unbound) configuration file.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d19d5bbff..bf2112815 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> 
>  /* Auto Load Balancing Defaults */
> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
> -#define ALB_PMD_LOAD_THRESHOLD           95
> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> +#define ALB_IMPROVEMENT_THRESHOLD    25
> +#define ALB_LOAD_THRESHOLD           95
> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
>  #define MIN_TO_MSEC                  60000
> 
> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
>      uint64_t rebalance_intvl;
>      uint64_t rebalance_poll_timer;
> +    uint8_t rebalance_improve_thresh;
> +    atomic_uint8_t rebalance_load_thresh;
>  };
> 
> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>      struct dp_netdev_pmd_thread *pmd;
>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> +    uint8_t rebalance_load_thresh;
> 
>      bool enable_alb = false;
> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>          pmd_alb->is_enabled = enable_alb;
>          if (pmd_alb->is_enabled) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_thresh);
>              VLOG_INFO("PMD auto load balance is enabled "
> -                      "interval %"PRIu64" mins",
> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> +                      "interval %"PRIu64" mins, "
> +                      "pmd load threshold %"PRIu8"%%, "
> +                      "improvement threshold %"PRIu8"%%",
> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> +                       rebalance_load_thresh,
> +                       pmd_alb->rebalance_improve_thresh);
> +
>          } else {
>              pmd_alb->rebalance_poll_timer = 0;
> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>      uint64_t rebalance_intvl;
> +    uint8_t rebalance_load, cur_rebalance_load;
> +    uint8_t rebalance_improve;
> 
>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> 
>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
> interval",
> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
> +                                   ALB_REBALANCE_INTERVAL);
> 
>      /* Input is in min, convert it to msec. */
> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
>      }
> 
> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
> improvement",
> +                                     ALB_IMPROVEMENT_THRESHOLD);
> +    if (rebalance_improve > 100) {
> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> +    }
> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
> +                  "%"PRIu8"%%\n", rebalance_improve);
> +    }
> +
> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> +                             ALB_LOAD_THRESHOLD);
> +    if (rebalance_load > 100) {
> +        rebalance_load = ALB_LOAD_THRESHOLD;
> +    }
> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> &cur_rebalance_load);
> +    if (rebalance_load != cur_rebalance_load) {
> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> +                             rebalance_load);
> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
> +                "%"PRIu8"%%\n", rebalance_load);
> +    }
>      set_pmd_auto_lb(dp);
>      return 0;
> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>                  ((curr_variance - new_variance) * 100) / curr_variance;
>          }
> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
>              ret = false;
>          }
> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>          uint64_t curr_tsc;
> +        uint8_t rebalance_load_trigger;
>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
>          if (pmd_alb->is_enabled && !pmd->isolated
> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
>              }
> 
> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_trigger);
> +            if (pmd_load >= rebalance_load_trigger) {
>                  atomic_count_inc(&pmd->pmd_overloaded);
>              } else {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 89a876796..76d528ea7 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,6 +654,6 @@
>          <p>
>           Configures PMD Auto Load Balancing that allows automatic assignment
> of
> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> -         > 95%).
> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
>          </p>
>          <p>
> @@ -691,4 +691,29 @@
>          </p>
>        </column>
> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum pmd load threshold (% of used cycled) of
> +         any non-isolated pmds when a PMD auto load balance may be
> triggered.
> +        </p>
> +        <p>
> +         The default value is <code>95%</code>.
> +        </p>
> +      </column>
> +      <column name="other_config" key="pmd-auto-lb-improvement"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum evaluated % improvement in load distribution
> +         across the non-isolated pmds that will allow a PMD auto load balance
> +         to occur.
> +        </p>
> +        <p>
> +         Note, setting this parameter to 0 will always allow an auto load
> +         balance to occur regardless of estimated improvement or not.

Is there any reason you would want the 0 option above? Surely it would just become disruptive to the deployment.

I know you could argue an enforced minimum of 1% would not make much of a difference, but at least it would not re-distribute regardless of a gain?
Just curious as to the thinking behind it.


Regards
Ian

> +        </p>
> +        <p>
> +         The default value is <code>25%</code>.
> +        </p>
> +      </column>
>        <column name="other_config" key="userspace-tso-enable"
>                type='{"type": "boolean"}'>
> --
> 2.26.2
Kevin Traynor Jan. 15, 2021, 10:09 a.m. UTC | #2
On 14/01/2021 19:49, Stokes, Ian wrote:
>> From: Christophe Fontaine <cfontain@redhat.com>
>>
>> Two important parts of how PMD auto load balance operates are how
>> loaded a core needs to be and how much improvement is estimated
>> before a PMD auto load balance can trigger.
>>
>> Previously they were hardcoded to 95% loaded and 25% variance
>> improvement.
>>
>> These default values may not be suitable for all use cases and
>> we may want to use a more (or less) aggressive rebalance, either
>> on the pmd load threshold or on the minimum variance improvement
>> threshold.
>>
>> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
>> "pmd-auto-lb-improvement" parameters are added to override the
>> defaults.
>>
>> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
>> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
>>
> 
> Thanks for the patch Kevin,
> 
> Have tested over the last few days and it looks good to me. The only note I would have is
> in addition to updating the vswitch.xml, I would think it was worth noting the two new parameters
> in the vswitch documentation also (possibly giving an example of their usage).
> 
> https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#automatic-assignment-of-port-rx-queue-to-pmd-threads-experimental
> 
> Right now the documentation just refers to the default 95% threshold and 25% improvement.
> 

Thanks for testing and reviewing. Yes, this part of the doc should be
updated at least.

> So even an additional note to specify how the threshold and improvements can be changed would suffice I would think.
> 
> This doesn't have to block merge, but would appreciate a follow up patch to improve the docs soon.
> 
> Minor comment for discussion below.
> 
>> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
>> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>> Acked-by: David Marchand <david.marchand@redhat.com>
>> ---
>>  NEWS                 |  1 +
>>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-----
>> --
>>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
>>  3 files changed, 74 insertions(+), 10 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 7e291a180..7994e434d 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -17,4 +17,5 @@ Post-v2.14.0
>>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>>         that allows configuring the MAC address of a VF representor.
>> +     * Add parameters to configure PMD auto load balance behaviour.
>>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
>>       as the DNS resolver's (unbound) configuration file.
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index d19d5bbff..bf2112815 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>>
>>  /* Auto Load Balancing Defaults */
>> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
>> -#define ALB_PMD_LOAD_THRESHOLD           95
>> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
>> +#define ALB_IMPROVEMENT_THRESHOLD    25
>> +#define ALB_LOAD_THRESHOLD           95
>> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
>>  #define MIN_TO_MSEC                  60000
>>
>> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
>>      uint64_t rebalance_intvl;
>>      uint64_t rebalance_poll_timer;
>> +    uint8_t rebalance_improve_thresh;
>> +    atomic_uint8_t rebalance_load_thresh;
>>  };
>>
>> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>>      struct dp_netdev_pmd_thread *pmd;
>>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
>> +    uint8_t rebalance_load_thresh;
>>
>>      bool enable_alb = false;
>> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>>          pmd_alb->is_enabled = enable_alb;
>>          if (pmd_alb->is_enabled) {
>> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                                &rebalance_load_thresh);
>>              VLOG_INFO("PMD auto load balance is enabled "
>> -                      "interval %"PRIu64" mins",
>> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
>> +                      "interval %"PRIu64" mins, "
>> +                      "pmd load threshold %"PRIu8"%%, "
>> +                      "improvement threshold %"PRIu8"%%",
>> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
>> +                       rebalance_load_thresh,
>> +                       pmd_alb->rebalance_improve_thresh);
>> +
>>          } else {
>>              pmd_alb->rebalance_poll_timer = 0;
>> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>>      uint64_t rebalance_intvl;
>> +    uint8_t rebalance_load, cur_rebalance_load;
>> +    uint8_t rebalance_improve;
>>
>>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
>> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>
>>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
>> interval",
>> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
>> +                                   ALB_REBALANCE_INTERVAL);
>>
>>      /* Input is in min, convert it to msec. */
>> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
>> struct smap *other_config)
>>      }
>>
>> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
>> improvement",
>> +                                     ALB_IMPROVEMENT_THRESHOLD);
>> +    if (rebalance_improve > 100) {
>> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
>> +    }
>> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
>> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
>> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
>> +                  "%"PRIu8"%%\n", rebalance_improve);
>> +    }
>> +
>> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
>> +                             ALB_LOAD_THRESHOLD);
>> +    if (rebalance_load > 100) {
>> +        rebalance_load = ALB_LOAD_THRESHOLD;
>> +    }
>> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> &cur_rebalance_load);
>> +    if (rebalance_load != cur_rebalance_load) {
>> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                             rebalance_load);
>> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
>> +                "%"PRIu8"%%\n", rebalance_load);
>> +    }
>>      set_pmd_auto_lb(dp);
>>      return 0;
>> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>>                  ((curr_variance - new_variance) * 100) / curr_variance;
>>          }
>> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
>> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
>>              ret = false;
>>          }
>> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
>> dp_netdev_pmd_thread *pmd,
>>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>>          uint64_t curr_tsc;
>> +        uint8_t rebalance_load_trigger;
>>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
>>          if (pmd_alb->is_enabled && !pmd->isolated
>> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
>> dp_netdev_pmd_thread *pmd,
>>              }
>>
>> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
>> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                                &rebalance_load_trigger);
>> +            if (pmd_load >= rebalance_load_trigger) {
>>                  atomic_count_inc(&pmd->pmd_overloaded);
>>              } else {
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 89a876796..76d528ea7 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -654,6 +654,6 @@
>>          <p>
>>           Configures PMD Auto Load Balancing that allows automatic assignment
>> of
>> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
>> -         > 95%).
>> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
>> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
>>          </p>
>>          <p>
>> @@ -691,4 +691,29 @@
>>          </p>
>>        </column>
>> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
>> +        <p>
>> +         Specifies the minimum pmd load threshold (% of used cycled) of
>> +         any non-isolated pmds when a PMD auto load balance may be
>> triggered.
>> +        </p>
>> +        <p>
>> +         The default value is <code>95%</code>.
>> +        </p>
>> +      </column>
>> +      <column name="other_config" key="pmd-auto-lb-improvement"
>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
>> +        <p>
>> +         Specifies the minimum evaluated % improvement in load distribution
>> +         across the non-isolated pmds that will allow a PMD auto load balance
>> +         to occur.
>> +        </p>
>> +        <p>
>> +         Note, setting this parameter to 0 will always allow an auto load
>> +         balance to occur regardless of estimated improvement or not.
> 
> Is there any reason you would want the 0 option above? Surely it would just become disruptive to the deployment.
> 
> I know you could argue an enforced minimum of 1% would not make much of a difference, but at least it would not re-distribute regardless of a gain?
> Just curious as to the thinking behind it.
> 

I've seen some users who talk about checking load only and calling
rebalance every x mins. I would say it's not useful in most cases but
the improvement is just an estimate and I didn't want to add special
handling to disallow a user from being able to rebalance on load only.

> 
> Regards
> Ian
> 
>> +        </p>
>> +        <p>
>> +         The default value is <code>25%</code>.
>> +        </p>
>> +      </column>
>>        <column name="other_config" key="userspace-tso-enable"
>>                type='{"type": "boolean"}'>
>> --
>> 2.26.2
>
Stokes, Ian Jan. 15, 2021, 10:30 a.m. UTC | #3
> On 14/01/2021 19:49, Stokes, Ian wrote:
> >> From: Christophe Fontaine <cfontain@redhat.com>
> >>
> >> Two important parts of how PMD auto load balance operates are how
> >> loaded a core needs to be and how much improvement is estimated
> >> before a PMD auto load balance can trigger.
> >>
> >> Previously they were hardcoded to 95% loaded and 25% variance
> >> improvement.
> >>
> >> These default values may not be suitable for all use cases and
> >> we may want to use a more (or less) aggressive rebalance, either
> >> on the pmd load threshold or on the minimum variance improvement
> >> threshold.
> >>
> >> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> >> "pmd-auto-lb-improvement" parameters are added to override the
> >> defaults.
> >>
> >> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> >> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
> >>
> >
> > Thanks for the patch Kevin,
> >
> > Have tested over the last few days and it looks good to me. The only note I
> would have is
> > in addition to updating the vswitch.xml, I would think it was worth noting the
> two new parameters
> > in the vswitch documentation also (possibly giving an example of their usage).
> >
> > https://docs.openvswitch.org/en/latest/topics/dpdk/pmd/#automatic-
> assignment-of-port-rx-queue-to-pmd-threads-experimental
> >
> > Right now the documentation just refers to the default 95% threshold and 25%
> improvement.
> >
> 
> Thanks for testing and reviewing. Yes, this part of the doc should be
> updated at least.
> 
> > So even an additional note to specify how the threshold and improvements
> can be changed would suffice I would think.
> >
> > This doesn't have to block merge, but would appreciate a follow up patch to
> improve the docs soon.
> >
> > Minor comment for discussion below.
> >
> >> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> >> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >> Acked-by: David Marchand <david.marchand@redhat.com>
> >> ---
> >>  NEWS                 |  1 +
> >>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-----
> >> --
> >>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
> >>  3 files changed, 74 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 7e291a180..7994e434d 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -17,4 +17,5 @@ Post-v2.14.0
> >>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> >>         that allows configuring the MAC address of a VF representor.
> >> +     * Add parameters to configure PMD auto load balance behaviour.
> >>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
> >>       as the DNS resolver's (unbound) configuration file.
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index d19d5bbff..bf2112815 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> >>
> >>  /* Auto Load Balancing Defaults */
> >> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
> >> -#define ALB_PMD_LOAD_THRESHOLD           95
> >> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> >> +#define ALB_IMPROVEMENT_THRESHOLD    25
> >> +#define ALB_LOAD_THRESHOLD           95
> >> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
> >>  #define MIN_TO_MSEC                  60000
> >>
> >> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
> >>      uint64_t rebalance_intvl;
> >>      uint64_t rebalance_poll_timer;
> >> +    uint8_t rebalance_improve_thresh;
> >> +    atomic_uint8_t rebalance_load_thresh;
> >>  };
> >>
> >> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> >>      struct dp_netdev_pmd_thread *pmd;
> >>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> >> +    uint8_t rebalance_load_thresh;
> >>
> >>      bool enable_alb = false;
> >> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> >>          pmd_alb->is_enabled = enable_alb;
> >>          if (pmd_alb->is_enabled) {
> >> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                                &rebalance_load_thresh);
> >>              VLOG_INFO("PMD auto load balance is enabled "
> >> -                      "interval %"PRIu64" mins",
> >> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> >> +                      "interval %"PRIu64" mins, "
> >> +                      "pmd load threshold %"PRIu8"%%, "
> >> +                      "improvement threshold %"PRIu8"%%",
> >> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> >> +                       rebalance_load_thresh,
> >> +                       pmd_alb->rebalance_improve_thresh);
> >> +
> >>          } else {
> >>              pmd_alb->rebalance_poll_timer = 0;
> >> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
> >> struct smap *other_config)
> >>      uint32_t tx_flush_interval, cur_tx_flush_interval;
> >>      uint64_t rebalance_intvl;
> >> +    uint8_t rebalance_load, cur_rebalance_load;
> >> +    uint8_t rebalance_improve;
> >>
> >>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> >> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
> >> struct smap *other_config)
> >>
> >>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
> >> interval",
> >> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
> >> +                                   ALB_REBALANCE_INTERVAL);
> >>
> >>      /* Input is in min, convert it to msec. */
> >> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
> >> struct smap *other_config)
> >>      }
> >>
> >> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
> >> improvement",
> >> +                                     ALB_IMPROVEMENT_THRESHOLD);
> >> +    if (rebalance_improve > 100) {
> >> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> >> +    }
> >> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> >> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
> >> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
> >> +                  "%"PRIu8"%%\n", rebalance_improve);
> >> +    }
> >> +
> >> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> >> +                             ALB_LOAD_THRESHOLD);
> >> +    if (rebalance_load > 100) {
> >> +        rebalance_load = ALB_LOAD_THRESHOLD;
> >> +    }
> >> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> &cur_rebalance_load);
> >> +    if (rebalance_load != cur_rebalance_load) {
> >> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                             rebalance_load);
> >> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
> >> +                "%"PRIu8"%%\n", rebalance_load);
> >> +    }
> >>      set_pmd_auto_lb(dp);
> >>      return 0;
> >> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
> >>                  ((curr_variance - new_variance) * 100) / curr_variance;
> >>          }
> >> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> >> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
> >>              ret = false;
> >>          }
> >> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
> >> dp_netdev_pmd_thread *pmd,
> >>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
> >>          uint64_t curr_tsc;
> >> +        uint8_t rebalance_load_trigger;
> >>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
> >>          if (pmd_alb->is_enabled && !pmd->isolated
> >> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
> >> dp_netdev_pmd_thread *pmd,
> >>              }
> >>
> >> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> >> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                                &rebalance_load_trigger);
> >> +            if (pmd_load >= rebalance_load_trigger) {
> >>                  atomic_count_inc(&pmd->pmd_overloaded);
> >>              } else {
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >> index 89a876796..76d528ea7 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -654,6 +654,6 @@
> >>          <p>
> >>           Configures PMD Auto Load Balancing that allows automatic assignment
> >> of
> >> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> >> -         > 95%).
> >> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> >> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
> >>          </p>
> >>          <p>
> >> @@ -691,4 +691,29 @@
> >>          </p>
> >>        </column>
> >> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
> >> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> >> +        <p>
> >> +         Specifies the minimum pmd load threshold (% of used cycled) of
> >> +         any non-isolated pmds when a PMD auto load balance may be
> >> triggered.
> >> +        </p>
> >> +        <p>
> >> +         The default value is <code>95%</code>.
> >> +        </p>
> >> +      </column>
> >> +      <column name="other_config" key="pmd-auto-lb-improvement"
> >> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> >> +        <p>
> >> +         Specifies the minimum evaluated % improvement in load distribution
> >> +         across the non-isolated pmds that will allow a PMD auto load balance
> >> +         to occur.
> >> +        </p>
> >> +        <p>
> >> +         Note, setting this parameter to 0 will always allow an auto load
> >> +         balance to occur regardless of estimated improvement or not.
> >
> > Is there any reason you would want the 0 option above? Surely it would just
> become disruptive to the deployment.
> >
> > I know you could argue an enforced minimum of 1% would not make much of
> a difference, but at least it would not re-distribute regardless of a gain?
> > Just curious as to the thinking behind it.
> >
> 
> I've seen some users who talk about checking load only and calling
> rebalance every x mins. I would say it's not useful in most cases but
> the improvement is just an estimate and I didn't want to add special
> handling to disallow a user from being able to rebalance on load only.

OK, if it's a usecase then fair enough.  I would have thought rebalancing on
load only and 0% improvement introduces the risk of a worse 
deployment depending on the load but again, maybe that’s load specific. 

With the agreement of the future doc improvements before the 2.15 release:

Acked-by: ian.stokes@intel.com

> 
> >
> > Regards
> > Ian
> >
> >> +        </p>
> >> +        <p>
> >> +         The default value is <code>25%</code>.
> >> +        </p>
> >> +      </column>
> >>        <column name="other_config" key="userspace-tso-enable"
> >>                type='{"type": "boolean"}'>
> >> --
> >> 2.26.2
> >
Ilya Maximets Jan. 15, 2021, 5:22 p.m. UTC | #4
On 12/21/20 2:10 PM, Kevin Traynor wrote:
> From: Christophe Fontaine <cfontain@redhat.com>
> 
> Two important parts of how PMD auto load balance operates are how
> loaded a core needs to be and how much improvement is estimated
> before a PMD auto load balance can trigger.
> 
> Previously they were hardcoded to 95% loaded and 25% variance
> improvement.
> 
> These default values may not be suitable for all use cases and
> we may want to use a more (or less) aggressive rebalance, either
> on the pmd load threshold or on the minimum variance improvement
> threshold.
> 
> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> "pmd-auto-lb-improvement" parameters are added to override the
> defaults.
> 
> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"

Sorry for the late review, but this name is bothering me.
Especially the repeated 'pmd'.  And it's also unclear what this parameter
means by only looking at the name.

Maybe it's better to rename it to 'pmd-auto-lb-load-threshold'?

> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"

And following the logic, just to be consistent, this one could be renamed
to 'pmd-auto-lb-improvement-threshold'.

Kevin, Ian, what do you think?

I can replace on commit.

> 
> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> Acked-by: David Marchand <david.marchand@redhat.com>
> ---
>  NEWS                 |  1 +
>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-------
>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
>  3 files changed, 74 insertions(+), 10 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 7e291a180..7994e434d 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -17,4 +17,5 @@ Post-v2.14.0
>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>         that allows configuring the MAC address of a VF representor.
> +     * Add parameters to configure PMD auto load balance behaviour.
>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
>       as the DNS resolver's (unbound) configuration file.
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index d19d5bbff..bf2112815 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>  
>  /* Auto Load Balancing Defaults */
> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
> -#define ALB_PMD_LOAD_THRESHOLD           95
> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> +#define ALB_IMPROVEMENT_THRESHOLD    25
> +#define ALB_LOAD_THRESHOLD           95
> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
>  #define MIN_TO_MSEC                  60000
>  
> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
>      uint64_t rebalance_intvl;
>      uint64_t rebalance_poll_timer;
> +    uint8_t rebalance_improve_thresh;
> +    atomic_uint8_t rebalance_load_thresh;
>  };
>  
> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>      struct dp_netdev_pmd_thread *pmd;
>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> +    uint8_t rebalance_load_thresh;
>  
>      bool enable_alb = false;
> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>          pmd_alb->is_enabled = enable_alb;
>          if (pmd_alb->is_enabled) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_thresh);
>              VLOG_INFO("PMD auto load balance is enabled "
> -                      "interval %"PRIu64" mins",
> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> +                      "interval %"PRIu64" mins, "
> +                      "pmd load threshold %"PRIu8"%%, "
> +                      "improvement threshold %"PRIu8"%%",
> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> +                       rebalance_load_thresh,
> +                       pmd_alb->rebalance_improve_thresh);
> +
>          } else {
>              pmd_alb->rebalance_poll_timer = 0;
> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>      uint64_t rebalance_intvl;
> +    uint8_t rebalance_load, cur_rebalance_load;
> +    uint8_t rebalance_improve;
>  
>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>  
>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-interval",
> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
> +                                   ALB_REBALANCE_INTERVAL);
>  
>      /* Input is in min, convert it to msec. */
> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>      }
>  
> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-improvement",
> +                                     ALB_IMPROVEMENT_THRESHOLD);
> +    if (rebalance_improve > 100) {
> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> +    }
> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
> +                  "%"PRIu8"%%\n", rebalance_improve);
> +    }
> +
> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> +                             ALB_LOAD_THRESHOLD);
> +    if (rebalance_load > 100) {
> +        rebalance_load = ALB_LOAD_THRESHOLD;
> +    }
> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh, &cur_rebalance_load);
> +    if (rebalance_load != cur_rebalance_load) {
> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> +                             rebalance_load);
> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
> +                "%"PRIu8"%%\n", rebalance_load);
> +    }
>      set_pmd_auto_lb(dp);
>      return 0;
> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>                  ((curr_variance - new_variance) * 100) / curr_variance;
>          }
> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
>              ret = false;
>          }
> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>          uint64_t curr_tsc;
> +        uint8_t rebalance_load_trigger;
>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
>          if (pmd_alb->is_enabled && !pmd->isolated
> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>              }
>  
> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> +                                &rebalance_load_trigger);
> +            if (pmd_load >= rebalance_load_trigger) {
>                  atomic_count_inc(&pmd->pmd_overloaded);
>              } else {
> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> index 89a876796..76d528ea7 100644
> --- a/vswitchd/vswitch.xml
> +++ b/vswitchd/vswitch.xml
> @@ -654,6 +654,6 @@
>          <p>
>           Configures PMD Auto Load Balancing that allows automatic assignment of
> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> -         > 95%).
> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
>          </p>
>          <p>
> @@ -691,4 +691,29 @@
>          </p>
>        </column>
> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum pmd load threshold (% of used cycled) of
> +         any non-isolated pmds when a PMD auto load balance may be triggered.
> +        </p>
> +        <p>
> +         The default value is <code>95%</code>.
> +        </p>
> +      </column>
> +      <column name="other_config" key="pmd-auto-lb-improvement"
> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> +        <p>
> +         Specifies the minimum evaluated % improvement in load distribution
> +         across the non-isolated pmds that will allow a PMD auto load balance
> +         to occur.
> +        </p>
> +        <p>
> +         Note, setting this parameter to 0 will always allow an auto load
> +         balance to occur regardless of estimated improvement or not.
> +        </p>
> +        <p>
> +         The default value is <code>25%</code>.
> +        </p>
> +      </column>
>        <column name="other_config" key="userspace-tso-enable"
>                type='{"type": "boolean"}'>
>
Kevin Traynor Jan. 15, 2021, 5:35 p.m. UTC | #5
On 15/01/2021 17:22, Ilya Maximets wrote:
> On 12/21/20 2:10 PM, Kevin Traynor wrote:
>> From: Christophe Fontaine <cfontain@redhat.com>
>>
>> Two important parts of how PMD auto load balance operates are how
>> loaded a core needs to be and how much improvement is estimated
>> before a PMD auto load balance can trigger.
>>
>> Previously they were hardcoded to 95% loaded and 25% variance
>> improvement.
>>
>> These default values may not be suitable for all use cases and
>> we may want to use a more (or less) aggressive rebalance, either
>> on the pmd load threshold or on the minimum variance improvement
>> threshold.
>>
>> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
>> "pmd-auto-lb-improvement" parameters are added to override the
>> defaults.
>>
>> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> 
> Sorry for the late review, but this name is bothering me.
> Especially the repeated 'pmd'.  And it's also unclear what this parameter
> means by only looking at the name.
> 
> Maybe it's better to rename it to 'pmd-auto-lb-load-threshold'?
> 

I think we had an additional "threshold" at some point but it was
getting long. Your suggestion reads well to me.

>> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
> 
> And following the logic, just to be consistent, this one could be renamed
> to 'pmd-auto-lb-improvement-threshold'.
> 

+1

> Kevin, Ian, what do you think?
> 
> I can replace on commit.
> 

Thanks, or I can re-spin, just let me know.

>>
>> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
>> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>> Acked-by: David Marchand <david.marchand@redhat.com>
>> ---
>>  NEWS                 |  1 +
>>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-------
>>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
>>  3 files changed, 74 insertions(+), 10 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 7e291a180..7994e434d 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -17,4 +17,5 @@ Post-v2.14.0
>>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
>>         that allows configuring the MAC address of a VF representor.
>> +     * Add parameters to configure PMD auto load balance behaviour.
>>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
>>       as the DNS resolver's (unbound) configuration file.
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index d19d5bbff..bf2112815 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
>>  
>>  /* Auto Load Balancing Defaults */
>> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
>> -#define ALB_PMD_LOAD_THRESHOLD           95
>> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
>> +#define ALB_IMPROVEMENT_THRESHOLD    25
>> +#define ALB_LOAD_THRESHOLD           95
>> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
>>  #define MIN_TO_MSEC                  60000
>>  
>> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
>>      uint64_t rebalance_intvl;
>>      uint64_t rebalance_poll_timer;
>> +    uint8_t rebalance_improve_thresh;
>> +    atomic_uint8_t rebalance_load_thresh;
>>  };
>>  
>> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>>      struct dp_netdev_pmd_thread *pmd;
>>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
>> +    uint8_t rebalance_load_thresh;
>>  
>>      bool enable_alb = false;
>> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
>>          pmd_alb->is_enabled = enable_alb;
>>          if (pmd_alb->is_enabled) {
>> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                                &rebalance_load_thresh);
>>              VLOG_INFO("PMD auto load balance is enabled "
>> -                      "interval %"PRIu64" mins",
>> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
>> +                      "interval %"PRIu64" mins, "
>> +                      "pmd load threshold %"PRIu8"%%, "
>> +                      "improvement threshold %"PRIu8"%%",
>> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
>> +                       rebalance_load_thresh,
>> +                       pmd_alb->rebalance_improve_thresh);
>> +
>>          } else {
>>              pmd_alb->rebalance_poll_timer = 0;
>> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>>      uint32_t tx_flush_interval, cur_tx_flush_interval;
>>      uint64_t rebalance_intvl;
>> +    uint8_t rebalance_load, cur_rebalance_load;
>> +    uint8_t rebalance_improve;
>>  
>>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
>> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>>  
>>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-interval",
>> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
>> +                                   ALB_REBALANCE_INTERVAL);
>>  
>>      /* Input is in min, convert it to msec. */
>> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
>>      }
>>  
>> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-improvement",
>> +                                     ALB_IMPROVEMENT_THRESHOLD);
>> +    if (rebalance_improve > 100) {
>> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
>> +    }
>> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
>> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
>> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
>> +                  "%"PRIu8"%%\n", rebalance_improve);
>> +    }
>> +
>> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
>> +                             ALB_LOAD_THRESHOLD);
>> +    if (rebalance_load > 100) {
>> +        rebalance_load = ALB_LOAD_THRESHOLD;
>> +    }
>> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh, &cur_rebalance_load);
>> +    if (rebalance_load != cur_rebalance_load) {
>> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                             rebalance_load);
>> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
>> +                "%"PRIu8"%%\n", rebalance_load);
>> +    }
>>      set_pmd_auto_lb(dp);
>>      return 0;
>> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
>>                  ((curr_variance - new_variance) * 100) / curr_variance;
>>          }
>> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
>> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
>>              ret = false;
>>          }
>> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
>>          uint64_t curr_tsc;
>> +        uint8_t rebalance_load_trigger;
>>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
>>          if (pmd_alb->is_enabled && !pmd->isolated
>> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
>>              }
>>  
>> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
>> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
>> +                                &rebalance_load_trigger);
>> +            if (pmd_load >= rebalance_load_trigger) {
>>                  atomic_count_inc(&pmd->pmd_overloaded);
>>              } else {
>> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
>> index 89a876796..76d528ea7 100644
>> --- a/vswitchd/vswitch.xml
>> +++ b/vswitchd/vswitch.xml
>> @@ -654,6 +654,6 @@
>>          <p>
>>           Configures PMD Auto Load Balancing that allows automatic assignment of
>> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
>> -         > 95%).
>> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
>> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
>>          </p>
>>          <p>
>> @@ -691,4 +691,29 @@
>>          </p>
>>        </column>
>> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
>> +        <p>
>> +         Specifies the minimum pmd load threshold (% of used cycled) of
>> +         any non-isolated pmds when a PMD auto load balance may be triggered.
>> +        </p>
>> +        <p>
>> +         The default value is <code>95%</code>.
>> +        </p>
>> +      </column>
>> +      <column name="other_config" key="pmd-auto-lb-improvement"
>> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
>> +        <p>
>> +         Specifies the minimum evaluated % improvement in load distribution
>> +         across the non-isolated pmds that will allow a PMD auto load balance
>> +         to occur.
>> +        </p>
>> +        <p>
>> +         Note, setting this parameter to 0 will always allow an auto load
>> +         balance to occur regardless of estimated improvement or not.
>> +        </p>
>> +        <p>
>> +         The default value is <code>25%</code>.
>> +        </p>
>> +      </column>
>>        <column name="other_config" key="userspace-tso-enable"
>>                type='{"type": "boolean"}'>
>>
>
Stokes, Ian Jan. 15, 2021, 5:56 p.m. UTC | #6
> On 15/01/2021 17:22, Ilya Maximets wrote:
> > On 12/21/20 2:10 PM, Kevin Traynor wrote:
> >> From: Christophe Fontaine <cfontain@redhat.com>
> >>
> >> Two important parts of how PMD auto load balance operates are how
> >> loaded a core needs to be and how much improvement is estimated
> >> before a PMD auto load balance can trigger.
> >>
> >> Previously they were hardcoded to 95% loaded and 25% variance
> >> improvement.
> >>
> >> These default values may not be suitable for all use cases and
> >> we may want to use a more (or less) aggressive rebalance, either
> >> on the pmd load threshold or on the minimum variance improvement
> >> threshold.
> >>
> >> The defaults are not changed, but "pmd-auto-lb-pmd-load" and
> >> "pmd-auto-lb-improvement" parameters are added to override the
> >> defaults.
> >>
> >> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-pmd-load="70"
> >
> > Sorry for the late review, but this name is bothering me.
> > Especially the repeated 'pmd'.  And it's also unclear what this parameter
> > means by only looking at the name.
> >
> > Maybe it's better to rename it to 'pmd-auto-lb-load-threshold'?
> >
> 
> I think we had an additional "threshold" at some point but it was
> getting long. Your suggestion reads well to me.
> 
> >> $ ovs-vsctl set open_vswitch . other_config:pmd-auto-lb-improvement="20"
> >
> > And following the logic, just to be consistent, this one could be renamed
> > to 'pmd-auto-lb-improvement-threshold'.
> >
> 
> +1
> 
++1

> > Kevin, Ian, what do you think?
> >
> > I can replace on commit.
> >

Sounds good to me.

BR
Ian
> 
> Thanks, or I can re-spin, just let me know.
> 
> >>
> >> Signed-off-by: Christophe Fontaine <cfontain@redhat.com>
> >> Co-Authored-by: Kevin Traynor <ktraynor@redhat.com>
> >> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> >> Acked-by: David Marchand <david.marchand@redhat.com>
> >> ---
> >>  NEWS                 |  1 +
> >>  lib/dpif-netdev.c    | 54 +++++++++++++++++++++++++++++++++++++-------
> >>  vswitchd/vswitch.xml | 29 ++++++++++++++++++++++--
> >>  3 files changed, 74 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 7e291a180..7994e434d 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -17,4 +17,5 @@ Post-v2.14.0
> >>       * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
> >>         that allows configuring the MAC address of a VF representor.
> >> +     * Add parameters to configure PMD auto load balance behaviour.
> >>     - The environment variable OVS_UNBOUND_CONF, if set, is now used
> >>       as the DNS resolver's (unbound) configuration file.
> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> >> index d19d5bbff..bf2112815 100644
> >> --- a/lib/dpif-netdev.c
> >> +++ b/lib/dpif-netdev.c
> >> @@ -86,7 +86,7 @@ VLOG_DEFINE_THIS_MODULE(dpif_netdev);
> >>
> >>  /* Auto Load Balancing Defaults */
> >> -#define ALB_ACCEPTABLE_IMPROVEMENT       25
> >> -#define ALB_PMD_LOAD_THRESHOLD           95
> >> -#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
> >> +#define ALB_IMPROVEMENT_THRESHOLD    25
> >> +#define ALB_LOAD_THRESHOLD           95
> >> +#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
> >>  #define MIN_TO_MSEC                  60000
> >>
> >> @@ -301,4 +301,6 @@ struct pmd_auto_lb {
> >>      uint64_t rebalance_intvl;
> >>      uint64_t rebalance_poll_timer;
> >> +    uint8_t rebalance_improve_thresh;
> >> +    atomic_uint8_t rebalance_load_thresh;
> >>  };
> >>
> >> @@ -4205,4 +4207,5 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> >>      struct dp_netdev_pmd_thread *pmd;
> >>      struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
> >> +    uint8_t rebalance_load_thresh;
> >>
> >>      bool enable_alb = false;
> >> @@ -4234,7 +4237,14 @@ set_pmd_auto_lb(struct dp_netdev *dp)
> >>          pmd_alb->is_enabled = enable_alb;
> >>          if (pmd_alb->is_enabled) {
> >> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                                &rebalance_load_thresh);
> >>              VLOG_INFO("PMD auto load balance is enabled "
> >> -                      "interval %"PRIu64" mins",
> >> -                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
> >> +                      "interval %"PRIu64" mins, "
> >> +                      "pmd load threshold %"PRIu8"%%, "
> >> +                      "improvement threshold %"PRIu8"%%",
> >> +                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
> >> +                       rebalance_load_thresh,
> >> +                       pmd_alb->rebalance_improve_thresh);
> >> +
> >>          } else {
> >>              pmd_alb->rebalance_poll_timer = 0;
> >> @@ -4260,4 +4270,6 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> >>      uint32_t tx_flush_interval, cur_tx_flush_interval;
> >>      uint64_t rebalance_intvl;
> >> +    uint8_t rebalance_load, cur_rebalance_load;
> >> +    uint8_t rebalance_improve;
> >>
> >>      tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
> >> @@ -4337,5 +4349,5 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> >>
> >>      rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-
> interval",
> >> -                              ALB_PMD_REBALANCE_POLL_INTERVAL);
> >> +                                   ALB_REBALANCE_INTERVAL);
> >>
> >>      /* Input is in min, convert it to msec. */
> >> @@ -4349,4 +4361,27 @@ dpif_netdev_set_config(struct dpif *dpif, const
> struct smap *other_config)
> >>      }
> >>
> >> +    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-
> improvement",
> >> +                                     ALB_IMPROVEMENT_THRESHOLD);
> >> +    if (rebalance_improve > 100) {
> >> +        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
> >> +    }
> >> +    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
> >> +        pmd_alb->rebalance_improve_thresh = rebalance_improve;
> >> +        VLOG_INFO("PMD auto load balance improvement threshold set to "
> >> +                  "%"PRIu8"%%\n", rebalance_improve);
> >> +    }
> >> +
> >> +    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
> >> +                             ALB_LOAD_THRESHOLD);
> >> +    if (rebalance_load > 100) {
> >> +        rebalance_load = ALB_LOAD_THRESHOLD;
> >> +    }
> >> +    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> &cur_rebalance_load);
> >> +    if (rebalance_load != cur_rebalance_load) {
> >> +        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                             rebalance_load);
> >> +        VLOG_INFO("PMD auto load balance pmd load threshold set to "
> >> +                "%"PRIu8"%%\n", rebalance_load);
> >> +    }
> >>      set_pmd_auto_lb(dp);
> >>      return 0;
> >> @@ -5677,5 +5712,5 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
> >>                  ((curr_variance - new_variance) * 100) / curr_variance;
> >>          }
> >> -        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
> >> +        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
> >>              ret = false;
> >>          }
> >> @@ -8712,4 +8747,5 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> >>      if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
> >>          uint64_t curr_tsc;
> >> +        uint8_t rebalance_load_trigger;
> >>          struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
> >>          if (pmd_alb->is_enabled && !pmd->isolated
> >> @@ -8728,5 +8764,7 @@ dp_netdev_pmd_try_optimize(struct
> dp_netdev_pmd_thread *pmd,
> >>              }
> >>
> >> -            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
> >> +            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
> >> +                                &rebalance_load_trigger);
> >> +            if (pmd_load >= rebalance_load_trigger) {
> >>                  atomic_count_inc(&pmd->pmd_overloaded);
> >>              } else {
> >> diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
> >> index 89a876796..76d528ea7 100644
> >> --- a/vswitchd/vswitch.xml
> >> +++ b/vswitchd/vswitch.xml
> >> @@ -654,6 +654,6 @@
> >>          <p>
> >>           Configures PMD Auto Load Balancing that allows automatic assignment
> of
> >> -         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
> >> -         > 95%).
> >> +         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
> >> +         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
> >>          </p>
> >>          <p>
> >> @@ -691,4 +691,29 @@
> >>          </p>
> >>        </column>
> >> +      <column name="other_config" key="pmd-auto-lb-pmd-load"
> >> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> >> +        <p>
> >> +         Specifies the minimum pmd load threshold (% of used cycled) of
> >> +         any non-isolated pmds when a PMD auto load balance may be
> triggered.
> >> +        </p>
> >> +        <p>
> >> +         The default value is <code>95%</code>.
> >> +        </p>
> >> +      </column>
> >> +      <column name="other_config" key="pmd-auto-lb-improvement"
> >> +              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
> >> +        <p>
> >> +         Specifies the minimum evaluated % improvement in load distribution
> >> +         across the non-isolated pmds that will allow a PMD auto load balance
> >> +         to occur.
> >> +        </p>
> >> +        <p>
> >> +         Note, setting this parameter to 0 will always allow an auto load
> >> +         balance to occur regardless of estimated improvement or not.
> >> +        </p>
> >> +        <p>
> >> +         The default value is <code>25%</code>.
> >> +        </p>
> >> +      </column>
> >>        <column name="other_config" key="userspace-tso-enable"
> >>                type='{"type": "boolean"}'>
> >>
> >
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 7e291a180..7994e434d 100644
--- a/NEWS
+++ b/NEWS
@@ -17,4 +17,5 @@  Post-v2.14.0
      * New 'options:dpdk-vf-mac' field for DPDK interface of VF ports,
        that allows configuring the MAC address of a VF representor.
+     * Add parameters to configure PMD auto load balance behaviour.
    - The environment variable OVS_UNBOUND_CONF, if set, is now used
      as the DNS resolver's (unbound) configuration file.
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index d19d5bbff..bf2112815 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -86,7 +86,7 @@  VLOG_DEFINE_THIS_MODULE(dpif_netdev);
 
 /* Auto Load Balancing Defaults */
-#define ALB_ACCEPTABLE_IMPROVEMENT       25
-#define ALB_PMD_LOAD_THRESHOLD           95
-#define ALB_PMD_REBALANCE_POLL_INTERVAL  1 /* 1 Min */
+#define ALB_IMPROVEMENT_THRESHOLD    25
+#define ALB_LOAD_THRESHOLD           95
+#define ALB_REBALANCE_INTERVAL       1 /* 1 Min */
 #define MIN_TO_MSEC                  60000
 
@@ -301,4 +301,6 @@  struct pmd_auto_lb {
     uint64_t rebalance_intvl;
     uint64_t rebalance_poll_timer;
+    uint8_t rebalance_improve_thresh;
+    atomic_uint8_t rebalance_load_thresh;
 };
 
@@ -4205,4 +4207,5 @@  set_pmd_auto_lb(struct dp_netdev *dp)
     struct dp_netdev_pmd_thread *pmd;
     struct pmd_auto_lb *pmd_alb = &dp->pmd_alb;
+    uint8_t rebalance_load_thresh;
 
     bool enable_alb = false;
@@ -4234,7 +4237,14 @@  set_pmd_auto_lb(struct dp_netdev *dp)
         pmd_alb->is_enabled = enable_alb;
         if (pmd_alb->is_enabled) {
+            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
+                                &rebalance_load_thresh);
             VLOG_INFO("PMD auto load balance is enabled "
-                      "interval %"PRIu64" mins",
-                       pmd_alb->rebalance_intvl / MIN_TO_MSEC);
+                      "interval %"PRIu64" mins, "
+                      "pmd load threshold %"PRIu8"%%, "
+                      "improvement threshold %"PRIu8"%%",
+                       pmd_alb->rebalance_intvl / MIN_TO_MSEC,
+                       rebalance_load_thresh,
+                       pmd_alb->rebalance_improve_thresh);
+
         } else {
             pmd_alb->rebalance_poll_timer = 0;
@@ -4260,4 +4270,6 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
     uint32_t tx_flush_interval, cur_tx_flush_interval;
     uint64_t rebalance_intvl;
+    uint8_t rebalance_load, cur_rebalance_load;
+    uint8_t rebalance_improve;
 
     tx_flush_interval = smap_get_int(other_config, "tx-flush-interval",
@@ -4337,5 +4349,5 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
 
     rebalance_intvl = smap_get_int(other_config, "pmd-auto-lb-rebal-interval",
-                              ALB_PMD_REBALANCE_POLL_INTERVAL);
+                                   ALB_REBALANCE_INTERVAL);
 
     /* Input is in min, convert it to msec. */
@@ -4349,4 +4361,27 @@  dpif_netdev_set_config(struct dpif *dpif, const struct smap *other_config)
     }
 
+    rebalance_improve = smap_get_int(other_config, "pmd-auto-lb-improvement",
+                                     ALB_IMPROVEMENT_THRESHOLD);
+    if (rebalance_improve > 100) {
+        rebalance_improve = ALB_IMPROVEMENT_THRESHOLD;
+    }
+    if (rebalance_improve != pmd_alb->rebalance_improve_thresh) {
+        pmd_alb->rebalance_improve_thresh = rebalance_improve;
+        VLOG_INFO("PMD auto load balance improvement threshold set to "
+                  "%"PRIu8"%%\n", rebalance_improve);
+    }
+
+    rebalance_load = smap_get_int(other_config, "pmd-auto-lb-pmd-load",
+                             ALB_LOAD_THRESHOLD);
+    if (rebalance_load > 100) {
+        rebalance_load = ALB_LOAD_THRESHOLD;
+    }
+    atomic_read_relaxed(&pmd_alb->rebalance_load_thresh, &cur_rebalance_load);
+    if (rebalance_load != cur_rebalance_load) {
+        atomic_store_relaxed(&pmd_alb->rebalance_load_thresh,
+                             rebalance_load);
+        VLOG_INFO("PMD auto load balance pmd load threshold set to "
+                "%"PRIu8"%%\n", rebalance_load);
+    }
     set_pmd_auto_lb(dp);
     return 0;
@@ -5677,5 +5712,5 @@  pmd_rebalance_dry_run(struct dp_netdev *dp)
                 ((curr_variance - new_variance) * 100) / curr_variance;
         }
-        if (improvement < ALB_ACCEPTABLE_IMPROVEMENT) {
+        if (improvement < dp->pmd_alb.rebalance_improve_thresh) {
             ret = false;
         }
@@ -8712,4 +8747,5 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
     if (pmd->ctx.now > pmd->rxq_next_cycle_store) {
         uint64_t curr_tsc;
+        uint8_t rebalance_load_trigger;
         struct pmd_auto_lb *pmd_alb = &pmd->dp->pmd_alb;
         if (pmd_alb->is_enabled && !pmd->isolated
@@ -8728,5 +8764,7 @@  dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
             }
 
-            if (pmd_load >= ALB_PMD_LOAD_THRESHOLD) {
+            atomic_read_relaxed(&pmd_alb->rebalance_load_thresh,
+                                &rebalance_load_trigger);
+            if (pmd_load >= rebalance_load_trigger) {
                 atomic_count_inc(&pmd->pmd_overloaded);
             } else {
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 89a876796..76d528ea7 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -654,6 +654,6 @@ 
         <p>
          Configures PMD Auto Load Balancing that allows automatic assignment of
-         RX queues to PMDs if any of PMDs is overloaded (i.e. processing cycles
-         > 95%).
+         RX queues to PMDs if any of PMDs is overloaded (i.e. a processing
+         cycles > <ref column="other_config" key="pmd-auto-lb-pmd-load"/>).
         </p>
         <p>
@@ -691,4 +691,29 @@ 
         </p>
       </column>
+      <column name="other_config" key="pmd-auto-lb-pmd-load"
+              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
+        <p>
+         Specifies the minimum pmd load threshold (% of used cycled) of
+         any non-isolated pmds when a PMD auto load balance may be triggered.
+        </p>
+        <p>
+         The default value is <code>95%</code>.
+        </p>
+      </column>
+      <column name="other_config" key="pmd-auto-lb-improvement"
+              type='{"type": "integer", "minInteger": 0, "maxInteger": 100}'>
+        <p>
+         Specifies the minimum evaluated % improvement in load distribution
+         across the non-isolated pmds that will allow a PMD auto load balance
+         to occur.
+        </p>
+        <p>
+         Note, setting this parameter to 0 will always allow an auto load
+         balance to occur regardless of estimated improvement or not.
+        </p>
+        <p>
+         The default value is <code>25%</code>.
+        </p>
+      </column>
       <column name="other_config" key="userspace-tso-enable"
               type='{"type": "boolean"}'>