diff mbox series

[ovs-dev,v2] netdev-dpdk: Check rx/tx descriptor sizes for device.

Message ID 20230414154453.319376-1-ktraynor@redhat.com
State Superseded
Headers show
Series [ovs-dev,v2] netdev-dpdk: Check rx/tx descriptor sizes for device. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Kevin Traynor April 14, 2023, 3:44 p.m. UTC
By default OVS configures 2048 descriptors for tx and rx queues
on DPDK devices. It also allows the user to configure those values.

If the values used are not acceptable to the device then queue setup
would fail.

The device exposes it's max/min/alignment requirements and OVS
applies some limits also. Use these to ensure an acceptable value
is used for the number of descriptors on a device tx/rx.

If the default or user value is not acceptable, adjust to a suitable
value and log.

Reported-at: https://bugzilla.redhat.com/2119876
Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
---
 lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 61 insertions(+), 15 deletions(-)

Comments

Simon Horman April 20, 2023, 2 p.m. UTC | #1
On Fri, Apr 14, 2023 at 04:44:53PM +0100, Kevin Traynor wrote:
> By default OVS configures 2048 descriptors for tx and rx queues
> on DPDK devices. It also allows the user to configure those values.
> 
> If the values used are not acceptable to the device then queue setup
> would fail.
> 
> The device exposes it's max/min/alignment requirements and OVS
> applies some limits also. Use these to ensure an acceptable value
> is used for the number of descriptors on a device tx/rx.
> 
> If the default or user value is not acceptable, adjust to a suitable
> value and log.
> 
> Reported-at: https://bugzilla.redhat.com/2119876
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

Reviewed-by: Simon Horman <simon.horman@corigine.com>
David Marchand April 21, 2023, 10 a.m. UTC | #2
On Fri, Apr 14, 2023 at 5:45 PM Kevin Traynor <ktraynor@redhat.com> wrote:
>
> By default OVS configures 2048 descriptors for tx and rx queues
> on DPDK devices. It also allows the user to configure those values.
>
> If the values used are not acceptable to the device then queue setup
> would fail.
>
> The device exposes it's max/min/alignment requirements and OVS
> applies some limits also. Use these to ensure an acceptable value
> is used for the number of descriptors on a device tx/rx.
>
> If the default or user value is not acceptable, adjust to a suitable
> value and log.
>
> Reported-at: https://bugzilla.redhat.com/2119876
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>

Thanks for the added log.
Overall, this looks good to me.

I have a comment on how the limits are taken into account (see below).
But in any case this patch is ok as is for me:
Reviewed-by: David Marchand <david.marchand@redhat.com>


> ---
>  lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 61 insertions(+), 15 deletions(-)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fb0dd43f7..cf91990e5 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>          int requested_txq_size;
>
> +        /* Adjusted sizes are requested sizes after any adjustments
> +         * for OVS or device limits. */
> +        int adjusted_rxq_size;
> +        int adjusted_txq_size;
> +
>          /* Number of rx/tx descriptors for physical devices */
>          int rxq_size;
> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>  static void
>  dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
> -                        const char *flag, int default_size, int *new_size)
> +                        struct rte_eth_dev_info *info, bool is_rx)
>  {
> -    int queue_size = smap_get_int(args, flag, default_size);
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    struct rte_eth_desc_lim *lim;
> +    int default_size, queue_size;
> +    int *requested_size, *adjusted_size;
>
> +    if (is_rx) {
> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
> +        requested_size = &dev->requested_rxq_size;
> +        adjusted_size = &dev->adjusted_rxq_size;
> +        lim = info ? &info->rx_desc_lim : NULL;
> +
> +    } else {
> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
> +        requested_size = &dev->requested_txq_size;
> +        adjusted_size = &dev->adjusted_txq_size;
> +        lim = info ? &info->tx_desc_lim : NULL;
> +    }
> +
> +    *requested_size = queue_size;
> +
> +    /* Check for OVS limits. */
>      if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>              || !is_pow2(queue_size)) {
> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>      }
>
> -    if (queue_size != *new_size) {
> -        *new_size = queue_size;
> +    if (lim) {
> +        /* Check for device limits. */
> +        if (lim->nb_align) {
> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
> +        }
> +        queue_size = MIN(queue_size, lim->nb_max);
> +        queue_size = MAX(queue_size, lim->nb_min);

OVS put some constraints on user input:
- queue_size <= NIC_PORT_MAX_Q_SIZE (== 4096) which is probably linked
to the mempool sizing,
- queue_size is a power of two (here.. the reason eludes me),

On the other hand, with this patch, the OVS constraints are not
applied to what a DPDK driver requests.
So a DPDK driver may force a queue_size > NIC_PORT_MAX_Q_SIZE (to be
fair, I can't find any atm in the DPDK tree).
Configuring such a queue size may later result in some allocation
failures on the rx side if the associated mempool is too small.

I don't think OVS can do anything else but accept this constraint from
the DPDK driver, otherwise DPDK ethdev will refuse to initialize it.
And this patch adds a log on the adjustment, so I think this is good
enough as is.
Kevin Traynor April 21, 2023, 10:39 a.m. UTC | #3
On 21/04/2023 11:00, David Marchand wrote:
> On Fri, Apr 14, 2023 at 5:45 PM Kevin Traynor <ktraynor@redhat.com> wrote:
>>
>> By default OVS configures 2048 descriptors for tx and rx queues
>> on DPDK devices. It also allows the user to configure those values.
>>
>> If the values used are not acceptable to the device then queue setup
>> would fail.
>>
>> The device exposes it's max/min/alignment requirements and OVS
>> applies some limits also. Use these to ensure an acceptable value
>> is used for the number of descriptors on a device tx/rx.
>>
>> If the default or user value is not acceptable, adjust to a suitable
>> value and log.
>>
>> Reported-at: https://bugzilla.redhat.com/2119876
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> 
> Thanks for the added log.
> Overall, this looks good to me.
> 
> I have a comment on how the limits are taken into account (see below).
> But in any case this patch is ok as is for me:
> Reviewed-by: David Marchand <david.marchand@redhat.com>
> 

Thanks for reviewing.

> 
>> ---
>>   lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>>   1 file changed, 61 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index fb0dd43f7..cf91990e5 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>>           int requested_txq_size;
>>
>> +        /* Adjusted sizes are requested sizes after any adjustments
>> +         * for OVS or device limits. */
>> +        int adjusted_rxq_size;
>> +        int adjusted_txq_size;
>> +
>>           /* Number of rx/tx descriptors for physical devices */
>>           int rxq_size;
>> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>>   static void
>>   dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>> -                        const char *flag, int default_size, int *new_size)
>> +                        struct rte_eth_dev_info *info, bool is_rx)
>>   {
>> -    int queue_size = smap_get_int(args, flag, default_size);
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    struct rte_eth_desc_lim *lim;
>> +    int default_size, queue_size;
>> +    int *requested_size, *adjusted_size;
>>
>> +    if (is_rx) {
>> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
>> +        requested_size = &dev->requested_rxq_size;
>> +        adjusted_size = &dev->adjusted_rxq_size;
>> +        lim = info ? &info->rx_desc_lim : NULL;
>> +
>> +    } else {
>> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
>> +        requested_size = &dev->requested_txq_size;
>> +        adjusted_size = &dev->adjusted_txq_size;
>> +        lim = info ? &info->tx_desc_lim : NULL;
>> +    }
>> +
>> +    *requested_size = queue_size;
>> +
>> +    /* Check for OVS limits. */
>>       if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>>               || !is_pow2(queue_size)) {
>> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>       }
>>
>> -    if (queue_size != *new_size) {
>> -        *new_size = queue_size;
>> +    if (lim) {
>> +        /* Check for device limits. */
>> +        if (lim->nb_align) {
>> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
>> +        }
>> +        queue_size = MIN(queue_size, lim->nb_max);
>> +        queue_size = MAX(queue_size, lim->nb_min);
> 
> OVS put some constraints on user input:
> - queue_size <= NIC_PORT_MAX_Q_SIZE (== 4096) which is probably linked
> to the mempool sizing,
> - queue_size is a power of two (here.. the reason eludes me),
> 

Can only assume it was related to mempools being allocated in power of 2 
sizes, or that it was an implicit requirement for some drivers or just a 
cautious approach on alignment.

> On the other hand, with this patch, the OVS constraints are not
> applied to what a DPDK driver requests.
> So a DPDK driver may force a queue_size > NIC_PORT_MAX_Q_SIZE (to be
> fair, I can't find any atm in the DPDK tree).
> Configuring such a queue size may later result in some allocation
> failures on the rx side if the associated mempool is too small.
> 
> I don't think OVS can do anything else but accept this constraint from
> the DPDK driver, otherwise DPDK ethdev will refuse to initialize it.
> And this patch adds a log on the adjustment, so I think this is good
> enough as is.
> 
> 

It's a good point. Unlikely, but the constraints could conflict. I agree 
with your analysis and conclusion - better to have lights on the NIC :-)
Ilya Maximets April 28, 2023, 6:16 p.m. UTC | #4
On 4/14/23 17:44, Kevin Traynor wrote:
> By default OVS configures 2048 descriptors for tx and rx queues
> on DPDK devices. It also allows the user to configure those values.
> 
> If the values used are not acceptable to the device then queue setup
> would fail.
> 
> The device exposes it's max/min/alignment requirements and OVS
> applies some limits also. Use these to ensure an acceptable value
> is used for the number of descriptors on a device tx/rx.
> 
> If the default or user value is not acceptable, adjust to a suitable
> value and log.
> 
> Reported-at: https://bugzilla.redhat.com/2119876
> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
> ---
>  lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>  1 file changed, 61 insertions(+), 15 deletions(-)
> 
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index fb0dd43f7..cf91990e5 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>          int requested_txq_size;
>  
> +        /* Adjusted sizes are requested sizes after any adjustments
> +         * for OVS or device limits. */
> +        int adjusted_rxq_size;
> +        int adjusted_txq_size;
> +
>          /* Number of rx/tx descriptors for physical devices */
>          int rxq_size;
> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>  static void
>  dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
> -                        const char *flag, int default_size, int *new_size)
> +                        struct rte_eth_dev_info *info, bool is_rx)
>  {
> -    int queue_size = smap_get_int(args, flag, default_size);
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    struct rte_eth_desc_lim *lim;
> +    int default_size, queue_size;
> +    int *requested_size, *adjusted_size;
>  
> +    if (is_rx) {
> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
> +        requested_size = &dev->requested_rxq_size;
> +        adjusted_size = &dev->adjusted_rxq_size;
> +        lim = info ? &info->rx_desc_lim : NULL;
> +
> +    } else {
> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
> +        requested_size = &dev->requested_txq_size;
> +        adjusted_size = &dev->adjusted_txq_size;
> +        lim = info ? &info->tx_desc_lim : NULL;
> +    }
> +
> +    *requested_size = queue_size;
> +
> +    /* Check for OVS limits. */
>      if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>              || !is_pow2(queue_size)) {
> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>      }
>  
> -    if (queue_size != *new_size) {
> -        *new_size = queue_size;
> +    if (lim) {
> +        /* Check for device limits. */
> +        if (lim->nb_align) {
> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
> +        }
> +        queue_size = MIN(queue_size, lim->nb_max);
> +        queue_size = MAX(queue_size, lim->nb_min);
> +    }
> +
> +    if (queue_size != *adjusted_size) {
> +        *adjusted_size = queue_size;
>          netdev_request_reconfigure(netdev);
>      }
> @@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>          {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
>      };
> +    struct rte_eth_dev_info info;
>      const char *new_devargs;
>      const char *vf_mac;
>      int err = 0;
> +    int ret;
>  
>      ovs_mutex_lock(&dpdk_mutex);
> @@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>      dpdk_set_rxq_config(dev, args);
>  
> -    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
> -                            NIC_PORT_DEFAULT_RXQ_SIZE,
> -                            &dev->requested_rxq_size);
> -    dpdk_process_queue_size(netdev, args, "n_txq_desc",
> -                            NIC_PORT_DEFAULT_TXQ_SIZE,
> -                            &dev->requested_txq_size);
> -
>      new_devargs = smap_get(args, "dpdk-devargs");
>  
> @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>      }
>  
> +    ret = rte_eth_dev_info_get(dev->port_id, &info);
> +
> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
> +
>  out:
>      ovs_mutex_unlock(&dev->mutex);
> @@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>          && dev->mtu == dev->requested_mtu
>          && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
> -        && dev->rxq_size == dev->requested_rxq_size
> -        && dev->txq_size == dev->requested_txq_size
> +        && dev->rxq_size == dev->adjusted_rxq_size
> +        && dev->txq_size == dev->adjusted_txq_size
>          && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>          && dev->socket_id == dev->requested_socket_id
> @@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>      netdev->n_rxq = dev->requested_n_rxq;
>  
> -    dev->rxq_size = dev->requested_rxq_size;
> -    dev->txq_size = dev->requested_txq_size;
> +    dev->rxq_size = dev->adjusted_rxq_size;
> +    if (dev->rxq_size != dev->requested_rxq_size) {
> +        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
> +                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
> +                  dev->rxq_size);
> +    }
> +
> +    dev->txq_size = dev->adjusted_txq_size;
> +    if (dev->txq_size != dev->requested_txq_size) {
> +        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
> +                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
> +                  dev->txq_size);
> +    }

Hi, Kevin.

I'm not sure if there is a point in having 'requested' variables after this change.

Both 'adjusted' and 'requested' are determined after dpdk_process_queue_size() and
not going to change.  So, the logs above can likely be moved to that function
under if (queue_size != *adjusted_size).

The only side effect is that we will log adjustment even if device configuration
ultimately fails, but it doesn't sound like an issue.

The other place where 'requested' value is used is mempool size calculation.
And we should use the adjusted value there instead, because that is the number
of buffers we will ultimately use.

The last place where it is used is the get_config() callback.  Information there
is not something we should report anyway, so I don't see the reason why we can't
change the 'requested' to 'adjusted' there.  Maybe use a different word though,
so users can better understand what happened.  In general, get_config() should
just return the same 'n_rxq_desc' that was initially provided by the user, but
returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, because
by setting that value user should be able to reproduce the configuration, and
that is the actual meaning of the get_config() callback.

All in all, the 'requested' value can probably be just a local variable inside
of the dpdk_process_queue_size().

What do you think?

Best regards, Ilya Maximets.
Kevin Traynor May 3, 2023, 9:49 a.m. UTC | #5
On 28/04/2023 19:16, Ilya Maximets wrote:
> On 4/14/23 17:44, Kevin Traynor wrote:
>> By default OVS configures 2048 descriptors for tx and rx queues
>> on DPDK devices. It also allows the user to configure those values.
>>
>> If the values used are not acceptable to the device then queue setup
>> would fail.
>>
>> The device exposes it's max/min/alignment requirements and OVS
>> applies some limits also. Use these to ensure an acceptable value
>> is used for the number of descriptors on a device tx/rx.
>>
>> If the default or user value is not acceptable, adjust to a suitable
>> value and log.
>>
>> Reported-at: https://bugzilla.redhat.com/2119876
>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>> ---
>>   lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>>   1 file changed, 61 insertions(+), 15 deletions(-)
>>
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index fb0dd43f7..cf91990e5 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>>           int requested_txq_size;
>>   
>> +        /* Adjusted sizes are requested sizes after any adjustments
>> +         * for OVS or device limits. */
>> +        int adjusted_rxq_size;
>> +        int adjusted_txq_size;
>> +
>>           /* Number of rx/tx descriptors for physical devices */
>>           int rxq_size;
>> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>>   static void
>>   dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>> -                        const char *flag, int default_size, int *new_size)
>> +                        struct rte_eth_dev_info *info, bool is_rx)
>>   {
>> -    int queue_size = smap_get_int(args, flag, default_size);
>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>> +    struct rte_eth_desc_lim *lim;
>> +    int default_size, queue_size;
>> +    int *requested_size, *adjusted_size;
>>   
>> +    if (is_rx) {
>> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
>> +        requested_size = &dev->requested_rxq_size;
>> +        adjusted_size = &dev->adjusted_rxq_size;
>> +        lim = info ? &info->rx_desc_lim : NULL;
>> +
>> +    } else {
>> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
>> +        requested_size = &dev->requested_txq_size;
>> +        adjusted_size = &dev->adjusted_txq_size;
>> +        lim = info ? &info->tx_desc_lim : NULL;
>> +    }
>> +
>> +    *requested_size = queue_size;
>> +
>> +    /* Check for OVS limits. */
>>       if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>>               || !is_pow2(queue_size)) {
>> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>       }
>>   
>> -    if (queue_size != *new_size) {
>> -        *new_size = queue_size;
>> +    if (lim) {
>> +        /* Check for device limits. */
>> +        if (lim->nb_align) {
>> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
>> +        }
>> +        queue_size = MIN(queue_size, lim->nb_max);
>> +        queue_size = MAX(queue_size, lim->nb_min);
>> +    }
>> +
>> +    if (queue_size != *adjusted_size) {
>> +        *adjusted_size = queue_size;
>>           netdev_request_reconfigure(netdev);
>>       }
>> @@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>           {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
>>       };
>> +    struct rte_eth_dev_info info;
>>       const char *new_devargs;
>>       const char *vf_mac;
>>       int err = 0;
>> +    int ret;
>>   
>>       ovs_mutex_lock(&dpdk_mutex);
>> @@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>       dpdk_set_rxq_config(dev, args);
>>   
>> -    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>> -                            NIC_PORT_DEFAULT_RXQ_SIZE,
>> -                            &dev->requested_rxq_size);
>> -    dpdk_process_queue_size(netdev, args, "n_txq_desc",
>> -                            NIC_PORT_DEFAULT_TXQ_SIZE,
>> -                            &dev->requested_txq_size);
>> -
>>       new_devargs = smap_get(args, "dpdk-devargs");
>>   
>> @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>       }
>>   
>> +    ret = rte_eth_dev_info_get(dev->port_id, &info);
>> +
>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
>> +
>>   out:
>>       ovs_mutex_unlock(&dev->mutex);
>> @@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>           && dev->mtu == dev->requested_mtu
>>           && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>> -        && dev->rxq_size == dev->requested_rxq_size
>> -        && dev->txq_size == dev->requested_txq_size
>> +        && dev->rxq_size == dev->adjusted_rxq_size
>> +        && dev->txq_size == dev->adjusted_txq_size
>>           && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>           && dev->socket_id == dev->requested_socket_id
>> @@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>       netdev->n_rxq = dev->requested_n_rxq;
>>   
>> -    dev->rxq_size = dev->requested_rxq_size;
>> -    dev->txq_size = dev->requested_txq_size;
>> +    dev->rxq_size = dev->adjusted_rxq_size;
>> +    if (dev->rxq_size != dev->requested_rxq_size) {
>> +        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
>> +                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
>> +                  dev->rxq_size);
>> +    }
>> +
>> +    dev->txq_size = dev->adjusted_txq_size;
>> +    if (dev->txq_size != dev->requested_txq_size) {
>> +        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
>> +                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
>> +                  dev->txq_size);
>> +    }
> 
> Hi, Kevin.
> 

Hi Ilya, thanks for reviewing.

> I'm not sure if there is a point in having 'requested' variables after this change.
> 

I kept requested for the log in get_config only, will explain more below.

> Both 'adjusted' and 'requested' are determined after dpdk_process_queue_size() and
> not going to change.  So, the logs above can likely be moved to that function
> under if (queue_size != *adjusted_size).
> 

yes, that's true. I can move it there.

> The only side effect is that we will log adjustment even if device configuration
> ultimately fails, but it doesn't sound like an issue.
> 

agreed

> The other place where 'requested' value is used is mempool size calculation.
> And we should use the adjusted value there instead, because that is the number
> of buffers we will ultimately use.
> 

ack

> The last place where it is used is the get_config() callback.  Information there
> is not something we should report anyway, so I don't see the reason why we can't
> change the 'requested' to 'adjusted' there.  Maybe use a different word though,
> so users can better understand what happened.  In general, get_config() should
> just return the same 'n_rxq_desc' that was initially provided by the user, but
> returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, because
> by setting that value user should be able to reproduce the configuration, and
> that is the actual meaning of the get_config() callback.
> 
> All in all, the 'requested' value can probably be just a local variable inside
> of the dpdk_process_queue_size().
> 
> What do you think?
> 

ok, I think I understand where the difference in approach is coming 
from. I wasn't sure the reason that the previous 'requested' (post OVS 
adjustements) was reported as it would be the same as the configured values.

Seen as it is called requested and the user param is called requested, I 
thought it should show what the user requested value was (or OVS default 
in absence).

Now that the requested and adjusted will be logged if there are any 
changes needed to what the user has requested/default, it won't be 
silently adjusted anymore.

So if 'user requested' doesn't fit the context of get_config, then I can 
remove it from there. Though it can sometimes be handier to pull the 
info with a command than scroll through logs.

Not sure if there's much point to continue to have 'adjusted' in 
get_config, it will be the same as n_rx/tx_desc and it may be hard to 
find a good name. If it's kept I would definitely suggest we don't have 
'requested' in the name anymore as that is easily misinterpreted as 
being the user requested value.

Hope that is readable :-)

thanks,
Kevin.

> Best regards, Ilya Maximets.
>
Ilya Maximets May 3, 2023, 11:37 a.m. UTC | #6
On 5/3/23 11:49, Kevin Traynor wrote:
> On 28/04/2023 19:16, Ilya Maximets wrote:
>> On 4/14/23 17:44, Kevin Traynor wrote:
>>> By default OVS configures 2048 descriptors for tx and rx queues
>>> on DPDK devices. It also allows the user to configure those values.
>>>
>>> If the values used are not acceptable to the device then queue setup
>>> would fail.
>>>
>>> The device exposes it's max/min/alignment requirements and OVS
>>> applies some limits also. Use these to ensure an acceptable value
>>> is used for the number of descriptors on a device tx/rx.
>>>
>>> If the default or user value is not acceptable, adjust to a suitable
>>> value and log.
>>>
>>> Reported-at: https://bugzilla.redhat.com/2119876
>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>>> ---
>>>   lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>>>   1 file changed, 61 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index fb0dd43f7..cf91990e5 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>>>           int requested_txq_size;
>>>   +        /* Adjusted sizes are requested sizes after any adjustments
>>> +         * for OVS or device limits. */
>>> +        int adjusted_rxq_size;
>>> +        int adjusted_txq_size;
>>> +
>>>           /* Number of rx/tx descriptors for physical devices */
>>>           int rxq_size;
>>> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>>>   static void
>>>   dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>> -                        const char *flag, int default_size, int *new_size)
>>> +                        struct rte_eth_dev_info *info, bool is_rx)
>>>   {
>>> -    int queue_size = smap_get_int(args, flag, default_size);
>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>> +    struct rte_eth_desc_lim *lim;
>>> +    int default_size, queue_size;
>>> +    int *requested_size, *adjusted_size;
>>>   +    if (is_rx) {
>>> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>>> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
>>> +        requested_size = &dev->requested_rxq_size;
>>> +        adjusted_size = &dev->adjusted_rxq_size;
>>> +        lim = info ? &info->rx_desc_lim : NULL;
>>> +
>>> +    } else {
>>> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>>> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
>>> +        requested_size = &dev->requested_txq_size;
>>> +        adjusted_size = &dev->adjusted_txq_size;
>>> +        lim = info ? &info->tx_desc_lim : NULL;
>>> +    }
>>> +
>>> +    *requested_size = queue_size;
>>> +
>>> +    /* Check for OVS limits. */
>>>       if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>>>               || !is_pow2(queue_size)) {
>>> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>>       }
>>>   -    if (queue_size != *new_size) {
>>> -        *new_size = queue_size;
>>> +    if (lim) {
>>> +        /* Check for device limits. */
>>> +        if (lim->nb_align) {
>>> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
>>> +        }
>>> +        queue_size = MIN(queue_size, lim->nb_max);
>>> +        queue_size = MAX(queue_size, lim->nb_min);
>>> +    }
>>> +
>>> +    if (queue_size != *adjusted_size) {
>>> +        *adjusted_size = queue_size;
>>>           netdev_request_reconfigure(netdev);
>>>       }
>>> @@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>           {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
>>>       };
>>> +    struct rte_eth_dev_info info;
>>>       const char *new_devargs;
>>>       const char *vf_mac;
>>>       int err = 0;
>>> +    int ret;
>>>         ovs_mutex_lock(&dpdk_mutex);
>>> @@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>       dpdk_set_rxq_config(dev, args);
>>>   -    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>>> -                            NIC_PORT_DEFAULT_RXQ_SIZE,
>>> -                            &dev->requested_rxq_size);
>>> -    dpdk_process_queue_size(netdev, args, "n_txq_desc",
>>> -                            NIC_PORT_DEFAULT_TXQ_SIZE,
>>> -                            &dev->requested_txq_size);
>>> -
>>>       new_devargs = smap_get(args, "dpdk-devargs");
>>>   @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>       }
>>>   +    ret = rte_eth_dev_info_get(dev->port_id, &info);
>>> +
>>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
>>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
>>> +
>>>   out:
>>>       ovs_mutex_unlock(&dev->mutex);
>>> @@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>           && dev->mtu == dev->requested_mtu
>>>           && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>> -        && dev->rxq_size == dev->requested_rxq_size
>>> -        && dev->txq_size == dev->requested_txq_size
>>> +        && dev->rxq_size == dev->adjusted_rxq_size
>>> +        && dev->txq_size == dev->adjusted_txq_size
>>>           && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>>           && dev->socket_id == dev->requested_socket_id
>>> @@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>       netdev->n_rxq = dev->requested_n_rxq;
>>>   -    dev->rxq_size = dev->requested_rxq_size;
>>> -    dev->txq_size = dev->requested_txq_size;
>>> +    dev->rxq_size = dev->adjusted_rxq_size;
>>> +    if (dev->rxq_size != dev->requested_rxq_size) {
>>> +        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
>>> +                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
>>> +                  dev->rxq_size);
>>> +    }
>>> +
>>> +    dev->txq_size = dev->adjusted_txq_size;
>>> +    if (dev->txq_size != dev->requested_txq_size) {
>>> +        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
>>> +                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
>>> +                  dev->txq_size);
>>> +    }
>>
>> Hi, Kevin.
>>
> 
> Hi Ilya, thanks for reviewing.
> 
>> I'm not sure if there is a point in having 'requested' variables after this change.
>>
> 
> I kept requested for the log in get_config only, will explain more below.
> 
>> Both 'adjusted' and 'requested' are determined after dpdk_process_queue_size() and
>> not going to change.  So, the logs above can likely be moved to that function
>> under if (queue_size != *adjusted_size).
>>
> 
> yes, that's true. I can move it there.
> 
>> The only side effect is that we will log adjustment even if device configuration
>> ultimately fails, but it doesn't sound like an issue.
>>
> 
> agreed
> 
>> The other place where 'requested' value is used is mempool size calculation.
>> And we should use the adjusted value there instead, because that is the number
>> of buffers we will ultimately use.
>>
> 
> ack
> 
>> The last place where it is used is the get_config() callback.  Information there
>> is not something we should report anyway, so I don't see the reason why we can't
>> change the 'requested' to 'adjusted' there.  Maybe use a different word though,
>> so users can better understand what happened.  In general, get_config() should
>> just return the same 'n_rxq_desc' that was initially provided by the user, but
>> returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, because
>> by setting that value user should be able to reproduce the configuration, and
>> that is the actual meaning of the get_config() callback.
>>
>> All in all, the 'requested' value can probably be just a local variable inside
>> of the dpdk_process_queue_size().
>>
>> What do you think?
>>
> 
> ok, I think I understand where the difference in approach is coming from. I wasn't sure the reason that the previous 'requested' (post OVS adjustements) was reported as it would be the same as the configured values.
> 
> Seen as it is called requested and the user param is called requested, I thought it should show what the user requested value was (or OVS default in absence).
> 
> Now that the requested and adjusted will be logged if there are any changes needed to what the user has requested/default, it won't be silently adjusted anymore.
> 
> So if 'user requested' doesn't fit the context of get_config, then I can remove it from there. Though it can sometimes be handier to pull the info with a command than scroll through logs.

You should still be able to pull the value.  The value that user have
requested can be obtained from the 'options' column in the database
record.  And if we report the actual value in use as 'n_rxq_desc' in
get_config(), it can be seen with dpif/show command.

> 
> Not sure if there's much point to continue to have 'adjusted' in get_config, it will be the same as n_rx/tx_desc and it may be hard to find a good name. If it's kept I would definitely suggest we don't have 'requested' in the name anymore as that is easily misinterpreted as being the user requested value.

I'd say we should just have 'n_r/txq_desc' in the output that will
represent what is actually configured, without any extra prefixes.

Also, see the implementation of the "set-if" command in dpctl.c,
which is currently broken for netdev-dpdk due to free format result
of the get_config().  TBH, I don't think anyone is using it, but
it does exist and it highlights the intended use of the netdev API.

> 
> Hope that is readable :-)
> 
> thanks,
> Kevin.
> 
>> Best regards, Ilya Maximets.
>>
>
Kevin Traynor May 3, 2023, 12:19 p.m. UTC | #7
On 03/05/2023 12:37, Ilya Maximets wrote:
> On 5/3/23 11:49, Kevin Traynor wrote:
>> On 28/04/2023 19:16, Ilya Maximets wrote:
>>> On 4/14/23 17:44, Kevin Traynor wrote:
>>>> By default OVS configures 2048 descriptors for tx and rx queues
>>>> on DPDK devices. It also allows the user to configure those values.
>>>>
>>>> If the values used are not acceptable to the device then queue setup
>>>> would fail.
>>>>
>>>> The device exposes it's max/min/alignment requirements and OVS
>>>> applies some limits also. Use these to ensure an acceptable value
>>>> is used for the number of descriptors on a device tx/rx.
>>>>
>>>> If the default or user value is not acceptable, adjust to a suitable
>>>> value and log.
>>>>
>>>> Reported-at: https://bugzilla.redhat.com/2119876
>>>> Signed-off-by: Kevin Traynor <ktraynor@redhat.com>
>>>> ---
>>>>    lib/netdev-dpdk.c | 76 +++++++++++++++++++++++++++++++++++++----------
>>>>    1 file changed, 61 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>> index fb0dd43f7..cf91990e5 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -510,4 +510,9 @@ struct netdev_dpdk {
>>>>            int requested_txq_size;
>>>>    +        /* Adjusted sizes are requested sizes after any adjustments
>>>> +         * for OVS or device limits. */
>>>> +        int adjusted_rxq_size;
>>>> +        int adjusted_txq_size;
>>>> +
>>>>            /* Number of rx/tx descriptors for physical devices */
>>>>            int rxq_size;
>>>> @@ -1917,8 +1922,29 @@ dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
>>>>    static void
>>>>    dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>>> -                        const char *flag, int default_size, int *new_size)
>>>> +                        struct rte_eth_dev_info *info, bool is_rx)
>>>>    {
>>>> -    int queue_size = smap_get_int(args, flag, default_size);
>>>> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>>> +    struct rte_eth_desc_lim *lim;
>>>> +    int default_size, queue_size;
>>>> +    int *requested_size, *adjusted_size;
>>>>    +    if (is_rx) {
>>>> +        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
>>>> +        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
>>>> +        requested_size = &dev->requested_rxq_size;
>>>> +        adjusted_size = &dev->adjusted_rxq_size;
>>>> +        lim = info ? &info->rx_desc_lim : NULL;
>>>> +
>>>> +    } else {
>>>> +        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
>>>> +        queue_size = smap_get_int(args, "n_txq_desc", default_size);
>>>> +        requested_size = &dev->requested_txq_size;
>>>> +        adjusted_size = &dev->adjusted_txq_size;
>>>> +        lim = info ? &info->tx_desc_lim : NULL;
>>>> +    }
>>>> +
>>>> +    *requested_size = queue_size;
>>>> +
>>>> +    /* Check for OVS limits. */
>>>>        if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
>>>>                || !is_pow2(queue_size)) {
>>>> @@ -1926,6 +1952,15 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
>>>>        }
>>>>    -    if (queue_size != *new_size) {
>>>> -        *new_size = queue_size;
>>>> +    if (lim) {
>>>> +        /* Check for device limits. */
>>>> +        if (lim->nb_align) {
>>>> +            queue_size = ROUND_UP(queue_size, lim->nb_align);
>>>> +        }
>>>> +        queue_size = MIN(queue_size, lim->nb_max);
>>>> +        queue_size = MAX(queue_size, lim->nb_min);
>>>> +    }
>>>> +
>>>> +    if (queue_size != *adjusted_size) {
>>>> +        *adjusted_size = queue_size;
>>>>            netdev_request_reconfigure(netdev);
>>>>        }
>>>> @@ -1944,7 +1979,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>>            {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
>>>>        };
>>>> +    struct rte_eth_dev_info info;
>>>>        const char *new_devargs;
>>>>        const char *vf_mac;
>>>>        int err = 0;
>>>> +    int ret;
>>>>          ovs_mutex_lock(&dpdk_mutex);
>>>> @@ -1953,11 +1990,4 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>>        dpdk_set_rxq_config(dev, args);
>>>>    -    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
>>>> -                            NIC_PORT_DEFAULT_RXQ_SIZE,
>>>> -                            &dev->requested_rxq_size);
>>>> -    dpdk_process_queue_size(netdev, args, "n_txq_desc",
>>>> -                            NIC_PORT_DEFAULT_TXQ_SIZE,
>>>> -                            &dev->requested_txq_size);
>>>> -
>>>>        new_devargs = smap_get(args, "dpdk-devargs");
>>>>    @@ -2079,4 +2109,9 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
>>>>        }
>>>>    +    ret = rte_eth_dev_info_get(dev->port_id, &info);
>>>> +
>>>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
>>>> +    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
>>>> +
>>>>    out:
>>>>        ovs_mutex_unlock(&dev->mutex);
>>>> @@ -5191,6 +5226,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>            && dev->mtu == dev->requested_mtu
>>>>            && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>>> -        && dev->rxq_size == dev->requested_rxq_size
>>>> -        && dev->txq_size == dev->requested_txq_size
>>>> +        && dev->rxq_size == dev->adjusted_rxq_size
>>>> +        && dev->txq_size == dev->adjusted_txq_size
>>>>            && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>>>            && dev->socket_id == dev->requested_socket_id
>>>> @@ -5221,6 +5256,17 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
>>>>        netdev->n_rxq = dev->requested_n_rxq;
>>>>    -    dev->rxq_size = dev->requested_rxq_size;
>>>> -    dev->txq_size = dev->requested_txq_size;
>>>> +    dev->rxq_size = dev->adjusted_rxq_size;
>>>> +    if (dev->rxq_size != dev->requested_rxq_size) {
>>>> +        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
>>>> +                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
>>>> +                  dev->rxq_size);
>>>> +    }
>>>> +
>>>> +    dev->txq_size = dev->adjusted_txq_size;
>>>> +    if (dev->txq_size != dev->requested_txq_size) {
>>>> +        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
>>>> +                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
>>>> +                  dev->txq_size);
>>>> +    }
>>>
>>> Hi, Kevin.
>>>
>>
>> Hi Ilya, thanks for reviewing.
>>
>>> I'm not sure if there is a point in having 'requested' variables after this change.
>>>
>>
>> I kept requested for the log in get_config only, will explain more below.
>>
>>> Both 'adjusted' and 'requested' are determined after dpdk_process_queue_size() and
>>> not going to change.  So, the logs above can likely be moved to that function
>>> under if (queue_size != *adjusted_size).
>>>
>>
>> yes, that's true. I can move it there.
>>
>>> The only side effect is that we will log adjustment even if device configuration
>>> ultimately fails, but it doesn't sound like an issue.
>>>
>>
>> agreed
>>
>>> The other place where 'requested' value is used is mempool size calculation.
>>> And we should use the adjusted value there instead, because that is the number
>>> of buffers we will ultimately use.
>>>
>>
>> ack
>>
>>> The last place where it is used is the get_config() callback.  Information there
>>> is not something we should report anyway, so I don't see the reason why we can't
>>> change the 'requested' to 'adjusted' there.  Maybe use a different word though,
>>> so users can better understand what happened.  In general, get_config() should
>>> just return the same 'n_rxq_desc' that was initially provided by the user, but
>>> returning 'n_rxq_desc' equal to the 'adjusted' value is also acceptable, because
>>> by setting that value user should be able to reproduce the configuration, and
>>> that is the actual meaning of the get_config() callback.
>>>
>>> All in all, the 'requested' value can probably be just a local variable inside
>>> of the dpdk_process_queue_size().
>>>
>>> What do you think?
>>>
>>
>> ok, I think I understand where the difference in approach is coming from. I wasn't sure the reason that the previous 'requested' (post OVS adjustements) was reported as it would be the same as the configured values.
>>
>> Seen as it is called requested and the user param is called requested, I thought it should show what the user requested value was (or OVS default in absence).
>>
>> Now that the requested and adjusted will be logged if there are any changes needed to what the user has requested/default, it won't be silently adjusted anymore.
>>
>> So if 'user requested' doesn't fit the context of get_config, then I can remove it from there. Though it can sometimes be handier to pull the info with a command than scroll through logs.
> 
> You should still be able to pull the value.  The value that user have
> requested can be obtained from the 'options' column in the database
> record.  And if we report the actual value in use as 'n_rxq_desc' in
> get_config(), it can be seen with dpif/show command.
> 

Ah yes, you're right - i'd just been thinking of the get_config. Those 
two together with the new logs should be plenty of info for the user.

>>
>> Not sure if there's much point to continue to have 'adjusted' in get_config, it will be the same as n_rx/tx_desc and it may be hard to find a good name. If it's kept I would definitely suggest we don't have 'requested' in the name anymore as that is easily misinterpreted as being the user requested value.
> 
> I'd say we should just have 'n_r/txq_desc' in the output that will
> represent what is actually configured, without any extra prefixes.
> 

Sounds good.

> Also, see the implementation of the "set-if" command in dpctl.c,
> which is currently broken for netdev-dpdk due to free format result
> of the get_config().  TBH, I don't think anyone is using it, but
> it does exist and it highlights the intended use of the netdev API.
> 

Thanks.

>>
>> Hope that is readable :-)
>>
>> thanks,
>> Kevin.
>>
>>> Best regards, Ilya Maximets.
>>>
>>
>
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index fb0dd43f7..cf91990e5 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -510,4 +510,9 @@  struct netdev_dpdk {
         int requested_txq_size;
 
+        /* Adjusted sizes are requested sizes after any adjustments
+         * for OVS or device limits. */
+        int adjusted_rxq_size;
+        int adjusted_txq_size;
+
         /* Number of rx/tx descriptors for physical devices */
         int rxq_size;
@@ -1917,8 +1922,29 @@  dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args)
 static void
 dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
-                        const char *flag, int default_size, int *new_size)
+                        struct rte_eth_dev_info *info, bool is_rx)
 {
-    int queue_size = smap_get_int(args, flag, default_size);
+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+    struct rte_eth_desc_lim *lim;
+    int default_size, queue_size;
+    int *requested_size, *adjusted_size;
 
+    if (is_rx) {
+        default_size = NIC_PORT_DEFAULT_RXQ_SIZE;
+        queue_size = smap_get_int(args, "n_rxq_desc", default_size);
+        requested_size = &dev->requested_rxq_size;
+        adjusted_size = &dev->adjusted_rxq_size;
+        lim = info ? &info->rx_desc_lim : NULL;
+
+    } else {
+        default_size = NIC_PORT_DEFAULT_TXQ_SIZE;
+        queue_size = smap_get_int(args, "n_txq_desc", default_size);
+        requested_size = &dev->requested_txq_size;
+        adjusted_size = &dev->adjusted_txq_size;
+        lim = info ? &info->tx_desc_lim : NULL;
+    }
+
+    *requested_size = queue_size;
+
+    /* Check for OVS limits. */
     if (queue_size <= 0 || queue_size > NIC_PORT_MAX_Q_SIZE
             || !is_pow2(queue_size)) {
@@ -1926,6 +1952,15 @@  dpdk_process_queue_size(struct netdev *netdev, const struct smap *args,
     }
 
-    if (queue_size != *new_size) {
-        *new_size = queue_size;
+    if (lim) {
+        /* Check for device limits. */
+        if (lim->nb_align) {
+            queue_size = ROUND_UP(queue_size, lim->nb_align);
+        }
+        queue_size = MIN(queue_size, lim->nb_max);
+        queue_size = MAX(queue_size, lim->nb_min);
+    }
+
+    if (queue_size != *adjusted_size) {
+        *adjusted_size = queue_size;
         netdev_request_reconfigure(netdev);
     }
@@ -1944,7 +1979,9 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
         {RTE_ETH_FC_RX_PAUSE, RTE_ETH_FC_FULL    }
     };
+    struct rte_eth_dev_info info;
     const char *new_devargs;
     const char *vf_mac;
     int err = 0;
+    int ret;
 
     ovs_mutex_lock(&dpdk_mutex);
@@ -1953,11 +1990,4 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
     dpdk_set_rxq_config(dev, args);
 
-    dpdk_process_queue_size(netdev, args, "n_rxq_desc",
-                            NIC_PORT_DEFAULT_RXQ_SIZE,
-                            &dev->requested_rxq_size);
-    dpdk_process_queue_size(netdev, args, "n_txq_desc",
-                            NIC_PORT_DEFAULT_TXQ_SIZE,
-                            &dev->requested_txq_size);
-
     new_devargs = smap_get(args, "dpdk-devargs");
 
@@ -2079,4 +2109,9 @@  netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args,
     }
 
+    ret = rte_eth_dev_info_get(dev->port_id, &info);
+
+    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, true);
+    dpdk_process_queue_size(netdev, args, !ret ? &info : NULL, false);
+
 out:
     ovs_mutex_unlock(&dev->mutex);
@@ -5191,6 +5226,6 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
         && dev->mtu == dev->requested_mtu
         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
-        && dev->rxq_size == dev->requested_rxq_size
-        && dev->txq_size == dev->requested_txq_size
+        && dev->rxq_size == dev->adjusted_rxq_size
+        && dev->txq_size == dev->adjusted_txq_size
         && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
         && dev->socket_id == dev->requested_socket_id
@@ -5221,6 +5256,17 @@  netdev_dpdk_reconfigure(struct netdev *netdev)
     netdev->n_rxq = dev->requested_n_rxq;
 
-    dev->rxq_size = dev->requested_rxq_size;
-    dev->txq_size = dev->requested_txq_size;
+    dev->rxq_size = dev->adjusted_rxq_size;
+    if (dev->rxq_size != dev->requested_rxq_size) {
+        VLOG_INFO("Interface %s cannot set rx descriptor size to %d. "
+                  "Adjusted to %d.", dev->up.name, dev->requested_rxq_size,
+                  dev->rxq_size);
+    }
+
+    dev->txq_size = dev->adjusted_txq_size;
+    if (dev->txq_size != dev->requested_txq_size) {
+        VLOG_INFO("Interface %s cannot set tx descriptor size to %d. "
+                  "Adjusted to %d.", dev->up.name, dev->requested_txq_size,
+                  dev->txq_size);
+    }
 
     rte_free(dev->tx_q);