diff mbox series

[ovs-dev] netdev-dpdk: config tx queue number to be the minimum between device and ovs request

Message ID 20230124114023.16860-1-cleviatan@marvell.com
State Changes Requested
Headers show
Series [ovs-dev] netdev-dpdk: config tx queue number to be the minimum between device and ovs request | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Chava Leviatan Jan. 24, 2023, 11:40 a.m. UTC
From: Chava Leviatan <cleviatan@marvell.com>

Tx queue are set each time port is added/removed or the cmask changes by
reconfigure_datapath. The amount of TX queues is set according to PMD
thread and does not take into consideration the device capabilities .
As a result , when transmitting packet from OVS to device driver by
rte_eth_tx_burst , the device driver is called with
dev->data->tx_queues[queue_id] which is not a legal address , hence
can cause a crash . This patch suggests to set the tx queue as a minimum
between the device capability and the PMD thread number as requested
by reconfigure_datapath.
The patch was tested on Marvell setup include DPU running OVS-dpdk ,
and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
Traffic is sent from associated host (VF) and remote host .

Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.")

Reviewed-by: Liron Himi lironh@marvell.com
Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
---
 lib/netdev-dpdk.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

Comments

Ilya Maximets Jan. 24, 2023, 12:24 p.m. UTC | #1
On 1/24/23 12:40, cleviatan@marvell.com wrote:
> From: Chava Leviatan <cleviatan@marvell.com>
> 
> Tx queue are set each time port is added/removed or the cmask changes by
> reconfigure_datapath. The amount of TX queues is set according to PMD
> thread and does not take into consideration the device capabilities .
> As a result , when transmitting packet from OVS to device driver by
> rte_eth_tx_burst , the device driver is called with
> dev->data->tx_queues[queue_id] which is not a legal address , hence
> can cause a crash . This patch suggests to set the tx queue as a minimum
> between the device capability and the PMD thread number as requested
> by reconfigure_datapath.
> The patch was tested on Marvell setup include DPU running OVS-dpdk ,
> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
> Traffic is sent from associated host (VF) and remote host .
> 
> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change rx/tx queues.")
> 
> Reviewed-by: Liron Himi lironh@marvell.com
> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
> ---
>  lib/netdev-dpdk.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

This seems identical to the previous submission.  I'm not sure
why it was sent twice.

Please, reply to comments for the previous submission:
  https://mail.openvswitch.org/pipermail/ovs-dev/2023-January/401418.html

Marking this one as 'Changes Requested' for now.

Best regards, Ilya Maximets.
Chava Leviatan Jan. 30, 2023, 8:31 a.m. UTC | #2
Hello Ilya, 

I apologize for the late reply .

When removed our patch , I was able to re-create the crash .

The scenario that led us for the patch comprises of dynamically setting up and down ports on exist bridge ( many repeats of that action ) 
The device that is added and removed , vf_rep1 , negotiates for tx_ queue num =1 .

What we see is the following : 

There is a call for delete interface. Delete interface calls to reconfigure_datapath which in turn calls to netdev_dpdk_set_tx_multiq.
netdev_dpdk_set_tx_multiq yield a call to netdev_dpdk_reconfigure.
netdev_dpdk_reconfigure   dpdk_eth_dev_init
dpdk_eth_dev_init   dpdk_eth_dev_port_config
dpdk_eth_dev_port_config   rte_eth_dev_configure

rte_eth_dev_configure fails as we can see below ( probably because the interface is down ) and thus there is no call for rte_eth_tx_queue_setup to configure the device with the minimum .
However , this line remains :

netdev->n_rxq = dev->requested_n_rxq; ( at netdev_dpdk_reconfigure) , and from here , there will be no more tries to configure the interface tx queues as in netdev_dpdk_set_tx_multiq:

    if (dev->requested_n_txq == n_txq) {
        goto out;
   }

By the way , when I entered another debug line in reconfigure_datapath :

    HMAP_FOR_EACH (port, node, &dp->ports) {
        if (netdev_is_reconf_required(port->netdev)
            || (port->dynamic_txqs
                != (netdev_n_txq(port->netdev) < wanted_txqs))) {
            port->need_reconfigure = true;
        }
    //    VLOG_INFO("PATCH ==  reconfigure_datapath dev AFTER-2 dynamic %d dev tx_n %d  wanted_txqs %d name %s",
   //                        port->dynamic_txqs,port->netdev->n_txq,wanted_txqs, port->netdev->name);
    }
The crash was not reproduced , so I do believe , there is also a matter of race somehow 

The below is taken from the OVS logs , with some debug print I added , and some that are already exist
( my debug prints marked with PATCH) 

2023-01-29T09:06:32.786Z|00809|bridge|INFO|bridge br0: deleted interface vf_rep1 on port 19
2023-01-29T09:06:32.875Z|00863|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1
2023-01-29T09:06:32.876Z|00869|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1
2023-01-29T09:06:32.876Z|00870|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1
2023-01-29T09:06:32.876Z|00873|netdev_dpdk|WARN|Interface vf_rep1 eth_dev setup error Operation not supported
2023-01-29T09:06:32.876Z|00874|netdev_dpdk|ERR|Interface vf_rep1(rxq:1 txq:1 lsc interrupt mode:false) configure error: Operation not supported
2023-01-29T09:06:32.877Z|00880|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'vf_rep1' rx queue 0 (measured processing cycles 0).
2023-01-29T09:06:32.877Z|00881|netdev_offload|INFO|vf_rep1: Assigned flow API 'dpdk_flow_api'.
2023-01-29T09:06:32.877Z|00882|bridge|INFO|bridge br0: added interface vf_rep1 on port 22
2023-01-29T09:06:32.894Z|00883|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 2  n_txq 2 requested 2 name vf_rep1

Normal operation of that device , before add and remove start : 

2023-01-29T09:06:25.971Z|00277|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1
2023-01-29T09:06:25.972Z|00283|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1
2023-01-29T09:06:25.972Z|00284|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1
2023-01-29T09:06:25.973Z|00287|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config name vf_rep1
2023-01-29T09:06:25.973Z|00289|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config i 1 n_txq 1 name vf_rep1 
2023-01-29T09:06:25.973Z|00290|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config dev 2 n_txq 1 name vf_rep1


Here are the debug lines I added :

At netdev_dpdk_reconfigure
    ovs_mutex_lock(&dev->mutex);
    VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev %d  requested %d name %s",netdev->n_txq,
                               dev->requested_n_txq, netdev->name);

    if (netdev->n_txq == dev->requested_n_txq
        && netdev->n_rxq == dev->requested_n_rxq
        && dev->mtu == dev->requested_mtu
        && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
        && dev->rxq_size == dev->requested_rxq_size
        && dev->txq_size == dev->requested_txq_size
        && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
        && dev->socket_id == dev->requested_socket_id
        && dev->started && !dev->reset_needed) {
        /* Reconfiguration is unnecessary */

        goto out;
    }
    VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev AFTER++ %d  requested %d name %s",netdev->n_txq,
                               dev->requested_n_txq, netdev->name); 

/=================================================


At netdev_dpdk_set_tx_multiq:


    ovs_mutex_lock(&dev->mutex);
    VLOG_INFO("PATCH ==  OLD netdev_dpdk_set_tx_multiq dev %d  n_txq %d requested %d name %s",netdev->n_txq, n_txq,
                               dev->requested_n_txq, netdev->name);
    if (dev->requested_n_txq == n_txq) {
        goto out;
    }





Thank you , 
Chava

-----Original Message-----
From: Ilya Maximets <i.maximets@ovn.org> 
Sent: Tuesday, January 24, 2023 2:25 PM
To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>
Subject: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue number to be the minimum between device and ovs request

External Email

----------------------------------------------------------------------
On 1/24/23 12:40, cleviatan@marvell.com wrote:
> From: Chava Leviatan <cleviatan@marvell.com>
> 
> Tx queue are set each time port is added/removed or the cmask changes 
> by reconfigure_datapath. The amount of TX queues is set according to 
> PMD thread and does not take into consideration the device capabilities .
> As a result , when transmitting packet from OVS to device driver by 
> rte_eth_tx_burst , the device driver is called with
> dev->data->tx_queues[queue_id] which is not a legal address , hence
> can cause a crash . This patch suggests to set the tx queue as a 
> minimum between the device capability and the PMD thread number as 
> requested by reconfigure_datapath.
> The patch was tested on Marvell setup include DPU running OVS-dpdk , 
> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
> Traffic is sent from associated host (VF) and remote host .
> 
> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change 
> rx/tx queues.")
> 
> Reviewed-by: Liron Himi lironh@marvell.com
> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
> ---
>  lib/netdev-dpdk.c | 13 ++++++++++---
>  1 file changed, 10 insertions(+), 3 deletions(-)

This seems identical to the previous submission.  I'm not sure why it was sent twice.

Please, reply to comments for the previous submission:
  https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2023-2DJanuary_401418.html&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=hS7DdLpcToMvZe7BcgG7dRfLfRfOgpMqVV9gMRVT1GE&m=QU5t2CXvVESHVvXPHq9mBDYZlIPLbe1P12O1pgv9cer_faNh-WBbp6J6vovEHuRt&s=7MzuLjmM7jElXNUzeYhPlU2_FgdMD-vhlDZr7bN0cRE&e= 

Marking this one as 'Changes Requested' for now.

Best regards, Ilya Maximets.
Ilya Maximets Jan. 30, 2023, 12:19 p.m. UTC | #3
On 1/30/23 09:31, Chava Leviatan wrote:
> Hello Ilya, 
> 
> I apologize for the late reply .
> 
> When removed our patch , I was able to re-create the crash .
> 
> The scenario that led us for the patch comprises of dynamically setting up and down ports on exist bridge ( many repeats of that action ) 
> The device that is added and removed , vf_rep1 , negotiates for tx_ queue num =1 .
> 
> What we see is the following : 
> 
> There is a call for delete interface. Delete interface calls to reconfigure_datapath which in turn calls to netdev_dpdk_set_tx_multiq.
> netdev_dpdk_set_tx_multiq yield a call to netdev_dpdk_reconfigure.
> netdev_dpdk_reconfigure   dpdk_eth_dev_init
> dpdk_eth_dev_init   dpdk_eth_dev_port_config
> dpdk_eth_dev_port_config   rte_eth_dev_configure
> 
> rte_eth_dev_configure fails as we can see below ( probably because the interface is down )

Hmm.  In the log below, the rte_eth_dev_configure() call fails
with 'Operation not supported'.  That seems to be a root cause
of the issue.

OVS doesn't expect ENOSUP/EOPNOTSUPP from this API call, because
that means that device doesn't support any configuration in general.
OVS can not use such a device.  From a DPDK perspective, I think,
every driver should support rte_eth_dev_configure(), hence the
ENOSUP/EOPNOTSUPP error should never be returned.

If the arguments for the rte_eth_dev_configure() are not usable,
EINVAL should be returned.  Interface being down should not stop
the configuration process, because it's normal that device is
stopped before reconfiguration.

So, I'd consider this as a driver bug.  Drivers should not return
ENOSUP/EOPNOTSUPP as a result of rte_eth_dev_configure().


Why does this cause a crash: dpif-netdev treats EOPNOTSUPP as a
sign that reconfigure() callback is not implemented for a netdev().
If it's not implemented, it's considered to be not needed, hence
successful.

We could workaround the probelm on the OVS side, by changing the
result of rte_eth_dev_configure() from ENOSUP/EOPNOTSUPP to
something like EPROTO before returning.  In this case, the device
will be correctly destroyed and removed from the datapath.
We can't simply ignore the error, because device is not really
configured at this point, so can not be used.

Best regards, Ilya Maximets.

> and thus there is no call for rte_eth_tx_queue_setup to configure the device with the minimum .
> However , this line remains :
> 
> netdev->n_rxq = dev->requested_n_rxq; ( at netdev_dpdk_reconfigure) , and from here , there will be no more tries to configure the interface tx queues as in netdev_dpdk_set_tx_multiq:
> 
>     if (dev->requested_n_txq == n_txq) {
>         goto out;
>    }
> 
> By the way , when I entered another debug line in reconfigure_datapath :
> 
>     HMAP_FOR_EACH (port, node, &dp->ports) {
>         if (netdev_is_reconf_required(port->netdev)
>             || (port->dynamic_txqs
>                 != (netdev_n_txq(port->netdev) < wanted_txqs))) {
>             port->need_reconfigure = true;
>         }
>     //    VLOG_INFO("PATCH ==  reconfigure_datapath dev AFTER-2 dynamic %d dev tx_n %d  wanted_txqs %d name %s",
>    //                        port->dynamic_txqs,port->netdev->n_txq,wanted_txqs, port->netdev->name);
>     }
> The crash was not reproduced , so I do believe , there is also a matter of race somehow 
> 
> The below is taken from the OVS logs , with some debug print I added , and some that are already exist
> ( my debug prints marked with PATCH) 
> 
> 2023-01-29T09:06:32.786Z|00809|bridge|INFO|bridge br0: deleted interface vf_rep1 on port 19
> 2023-01-29T09:06:32.875Z|00863|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1
> 2023-01-29T09:06:32.876Z|00869|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1
> 2023-01-29T09:06:32.876Z|00870|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1
> 2023-01-29T09:06:32.876Z|00873|netdev_dpdk|WARN|Interface vf_rep1 eth_dev setup error Operation not supported
> 2023-01-29T09:06:32.876Z|00874|netdev_dpdk|ERR|Interface vf_rep1(rxq:1 txq:1 lsc interrupt mode:false) configure error: Operation not supported
> 2023-01-29T09:06:32.877Z|00880|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'vf_rep1' rx queue 0 (measured processing cycles 0).
> 2023-01-29T09:06:32.877Z|00881|netdev_offload|INFO|vf_rep1: Assigned flow API 'dpdk_flow_api'.
> 2023-01-29T09:06:32.877Z|00882|bridge|INFO|bridge br0: added interface vf_rep1 on port 22
> 2023-01-29T09:06:32.894Z|00883|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 2  n_txq 2 requested 2 name vf_rep1
> 
> Normal operation of that device , before add and remove start : 
> 
> 2023-01-29T09:06:25.971Z|00277|netdev_dpdk|INFO|PATCH ==  OLD netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1
> 2023-01-29T09:06:25.972Z|00283|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1
> 2023-01-29T09:06:25.972Z|00284|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1
> 2023-01-29T09:06:25.973Z|00287|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config name vf_rep1
> 2023-01-29T09:06:25.973Z|00289|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config i 1 n_txq 1 name vf_rep1 
> 2023-01-29T09:06:25.973Z|00290|netdev_dpdk|INFO|PATCH ==  dpdk_eth_dev_port_config dev 2 n_txq 1 name vf_rep1
> 
> 
> Here are the debug lines I added :
> 
> At netdev_dpdk_reconfigure
>     ovs_mutex_lock(&dev->mutex);
>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev %d  requested %d name %s",netdev->n_txq,
>                                dev->requested_n_txq, netdev->name);
> 
>     if (netdev->n_txq == dev->requested_n_txq
>         && netdev->n_rxq == dev->requested_n_rxq
>         && dev->mtu == dev->requested_mtu
>         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>         && dev->rxq_size == dev->requested_rxq_size
>         && dev->txq_size == dev->requested_txq_size
>         && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>         && dev->socket_id == dev->requested_socket_id
>         && dev->started && !dev->reset_needed) {
>         /* Reconfiguration is unnecessary */
> 
>         goto out;
>     }
>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev AFTER++ %d  requested %d name %s",netdev->n_txq,
>                                dev->requested_n_txq, netdev->name); 
> 
> /=================================================
> 
> 
> At netdev_dpdk_set_tx_multiq:
> 
> 
>     ovs_mutex_lock(&dev->mutex);
>     VLOG_INFO("PATCH ==  OLD netdev_dpdk_set_tx_multiq dev %d  n_txq %d requested %d name %s",netdev->n_txq, n_txq,
>                                dev->requested_n_txq, netdev->name);
>     if (dev->requested_n_txq == n_txq) {
>         goto out;
>     }
> 
> 
> 
> 
> 
> Thank you , 
> Chava
> 
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org> 
> Sent: Tuesday, January 24, 2023 2:25 PM
> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>
> Subject: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue number to be the minimum between device and ovs request
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 1/24/23 12:40, cleviatan@marvell.com wrote:
>> From: Chava Leviatan <cleviatan@marvell.com>
>>
>> Tx queue are set each time port is added/removed or the cmask changes 
>> by reconfigure_datapath. The amount of TX queues is set according to 
>> PMD thread and does not take into consideration the device capabilities .
>> As a result , when transmitting packet from OVS to device driver by 
>> rte_eth_tx_burst , the device driver is called with
>> dev->data->tx_queues[queue_id] which is not a legal address , hence
>> can cause a crash . This patch suggests to set the tx queue as a 
>> minimum between the device capability and the PMD thread number as 
>> requested by reconfigure_datapath.
>> The patch was tested on Marvell setup include DPU running OVS-dpdk , 
>> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
>> Traffic is sent from associated host (VF) and remote host .
>>
>> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change 
>> rx/tx queues.")
>>
>> Reviewed-by: Liron Himi lironh@marvell.com
>> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
>> ---
>>  lib/netdev-dpdk.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> This seems identical to the previous submission.  I'm not sure why it was sent twice.
> 
> Please, reply to comments for the previous submission:
>   https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_pipermail_ovs-2Ddev_2023-2DJanuary_401418.html&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=hS7DdLpcToMvZe7BcgG7dRfLfRfOgpMqVV9gMRVT1GE&m=QU5t2CXvVESHVvXPHq9mBDYZlIPLbe1P12O1pgv9cer_faNh-WBbp6J6vovEHuRt&s=7MzuLjmM7jElXNUzeYhPlU2_FgdMD-vhlDZr7bN0cRE&e= 
> 
> Marking this one as 'Changes Requested' for now.
> 
> Best regards, Ilya Maximets.
Chava Leviatan Jan. 30, 2023, 12:55 p.m. UTC | #4
Inline 

-----Original Message-----
From: Ilya Maximets <i.maximets@ovn.org> 
Sent: Monday, January 30, 2023 2:20 PM
To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>; David Marchand <dmarchan@redhat.com>
Subject: Re: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue number to be the minimum between device and ovs request

On 1/30/23 09:31, Chava Leviatan wrote:
> Hello Ilya,
> 
> I apologize for the late reply .
> 
> When removed our patch , I was able to re-create the crash .
> 
> The scenario that led us for the patch comprises of dynamically 
> setting up and down ports on exist bridge ( many repeats of that action ) The device that is added and removed , vf_rep1 , negotiates for tx_ queue num =1 .
> 
> What we see is the following : 
> 
> There is a call for delete interface. Delete interface calls to reconfigure_datapath which in turn calls to netdev_dpdk_set_tx_multiq.
> netdev_dpdk_set_tx_multiq yield a call to netdev_dpdk_reconfigure.
> netdev_dpdk_reconfigure   dpdk_eth_dev_init dpdk_eth_dev_init   
> dpdk_eth_dev_port_config dpdk_eth_dev_port_config   
> rte_eth_dev_configure
> 
> rte_eth_dev_configure fails as we can see below ( probably because the 
> interface is down )

Hmm.  In the log below, the rte_eth_dev_configure() call fails with 'Operation not supported'.  That seems to be a root cause of the issue.

OVS doesn't expect ENOSUP/EOPNOTSUPP from this API call, because that means that device doesn't support any configuration in general.
OVS can not use such a device.  From a DPDK perspective, I think, every driver should support rte_eth_dev_configure(), hence the ENOSUP/EOPNOTSUPP error should never be returned.
[Chava Leviatan] It does not matter what error is coming back from rte_eth_dev_configure, the flow will be the same :
dpdk_eth_dev_port_config is returned with error to dpdk_eth_dev_init, which is called by netdev_dpdk_reconfigure AFTER it has set the device tx_q num by:

netdev->n_rxq = dev->requested_n_rxq;
And from that moment on , it will NOT try to re-configure again the device with another tx_q number . 
This is what this patch is resolving 

If the arguments for the rte_eth_dev_configure() are not usable, EINVAL should be returned.  Interface being down should not stop the configuration process, because it's normal that device is stopped before reconfiguration.

So, I'd consider this as a driver bug.  Drivers should not return ENOSUP/EOPNOTSUPP as a result of rte_eth_dev_configure().
[Chava Leviatan] The driver return correct values for normal operation . This is an abnormal scenario , where the port was removed , and a new configuration was sent to the device .


Why does this cause a crash: dpif-netdev treats EOPNOTSUPP as a sign that reconfigure() callback is not implemented for a netdev().
If it's not implemented, it's considered to be not needed, hence successful.
[Chava Leviatan] I have to disagree on that . The crash happens because OVS failed to correctly configure the device tx queue num , and it kept using an incorrect value . When the device was up again ( the same port was brought up again ) , the OVS used tx_q #1 , when it should have used 0 . ( That device published that it supports only tx queue num =1 ) .

We could workaround the probelm on the OVS side, by changing the result of rte_eth_dev_configure() from ENOSUP/EOPNOTSUPP to something like EPROTO before returning.  In this case, the device will be correctly destroyed and removed from the datapath.

[Chava Leviatan] Note that in netdev_dpdk_reconfigure, the err that is returned from dpdk_eth_dev_init can be override by 
    if (!dev->tx_q) {
        err = ENOMEM;
    }
By the time you go back , back to port_reconfigure that calls netdev_reconfigure , the err can be deprecated .

We can't simply ignore the error, because device is not really configured at this point, so can not be used.

Best regards, Ilya Maximets.

> and thus there is no call for rte_eth_tx_queue_setup to configure the device with the minimum .
> However , this line remains :
> 
> netdev->n_rxq = dev->requested_n_rxq; ( at netdev_dpdk_reconfigure) , and from here , there will be no more tries to configure the interface tx queues as in netdev_dpdk_set_tx_multiq:
> 
>     if (dev->requested_n_txq == n_txq) {
>         goto out;
>    }
> 
> By the way , when I entered another debug line in reconfigure_datapath :
> 
>     HMAP_FOR_EACH (port, node, &dp->ports) {
>         if (netdev_is_reconf_required(port->netdev)
>             || (port->dynamic_txqs
>                 != (netdev_n_txq(port->netdev) < wanted_txqs))) {
>             port->need_reconfigure = true;
>         }
>     //    VLOG_INFO("PATCH ==  reconfigure_datapath dev AFTER-2 dynamic %d dev tx_n %d  wanted_txqs %d name %s",
>    //                        port->dynamic_txqs,port->netdev->n_txq,wanted_txqs, port->netdev->name);
>     }
> The crash was not reproduced , so I do believe , there is also a 
> matter of race somehow
> 
> The below is taken from the OVS logs , with some debug print I added , 
> and some that are already exist ( my debug prints marked with PATCH)
> 
> 2023-01-29T09:06:32.786Z|00809|bridge|INFO|bridge br0: deleted 
> interface vf_rep1 on port 19 
> 2023-01-29T09:06:32.875Z|00863|netdev_dpdk|INFO|PATCH ==  OLD 
> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
> 2023-01-29T09:06:32.876Z|00869|netdev_dpdk|INFO|PATCH ==  
> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
> 2023-01-29T09:06:32.876Z|00870|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 2023-01-29T09:06:32.876Z|00873|netdev_dpdk|WARN|Interface vf_rep1 eth_dev setup error Operation not supported 2023-01-29T09:06:32.876Z|00874|netdev_dpdk|ERR|Interface vf_rep1(rxq:1 txq:1 lsc interrupt mode:false) configure error: Operation not supported 2023-01-29T09:06:32.877Z|00880|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'vf_rep1' rx queue 0 (measured processing cycles 0).
> 2023-01-29T09:06:32.877Z|00881|netdev_offload|INFO|vf_rep1: Assigned flow API 'dpdk_flow_api'.
> 2023-01-29T09:06:32.877Z|00882|bridge|INFO|bridge br0: added interface 
> vf_rep1 on port 22 
> 2023-01-29T09:06:32.894Z|00883|netdev_dpdk|INFO|PATCH ==  OLD 
> netdev_dpdk_set_tx_multiq dev 2  n_txq 2 requested 2 name vf_rep1
> 
> Normal operation of that device , before add and remove start : 
> 
> 2023-01-29T09:06:25.971Z|00277|netdev_dpdk|INFO|PATCH ==  OLD 
> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
> 2023-01-29T09:06:25.972Z|00283|netdev_dpdk|INFO|PATCH ==  
> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
> 2023-01-29T09:06:25.972Z|00284|netdev_dpdk|INFO|PATCH ==  
> netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 
> 2023-01-29T09:06:25.973Z|00287|netdev_dpdk|INFO|PATCH ==  
> dpdk_eth_dev_port_config name vf_rep1 
> 2023-01-29T09:06:25.973Z|00289|netdev_dpdk|INFO|PATCH ==  
> dpdk_eth_dev_port_config i 1 n_txq 1 name vf_rep1 
> 2023-01-29T09:06:25.973Z|00290|netdev_dpdk|INFO|PATCH ==  
> dpdk_eth_dev_port_config dev 2 n_txq 1 name vf_rep1
> 
> 
> Here are the debug lines I added :
> 
> At netdev_dpdk_reconfigure
>     ovs_mutex_lock(&dev->mutex);
>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev %d  requested %d name %s",netdev->n_txq,
>                                dev->requested_n_txq, netdev->name);
> 
>     if (netdev->n_txq == dev->requested_n_txq
>         && netdev->n_rxq == dev->requested_n_rxq
>         && dev->mtu == dev->requested_mtu
>         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>         && dev->rxq_size == dev->requested_rxq_size
>         && dev->txq_size == dev->requested_txq_size
>         && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>         && dev->socket_id == dev->requested_socket_id
>         && dev->started && !dev->reset_needed) {
>         /* Reconfiguration is unnecessary */
> 
>         goto out;
>     }
>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev AFTER++ %d  requested %d name %s",netdev->n_txq,
>                                dev->requested_n_txq, netdev->name);
> 
> /=================================================
> 
> 
> At netdev_dpdk_set_tx_multiq:
> 
> 
>     ovs_mutex_lock(&dev->mutex);
>     VLOG_INFO("PATCH ==  OLD netdev_dpdk_set_tx_multiq dev %d  n_txq %d requested %d name %s",netdev->n_txq, n_txq,
>                                dev->requested_n_txq, netdev->name);
>     if (dev->requested_n_txq == n_txq) {
>         goto out;
>     }
> 
> 
> 
> 
> 
> Thank you ,
> Chava
> 
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org>
> Sent: Tuesday, January 24, 2023 2:25 PM
> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>
> Subject: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue 
> number to be the minimum between device and ovs request
> 
> External Email
> 
> ----------------------------------------------------------------------
> On 1/24/23 12:40, cleviatan@marvell.com wrote:
>> From: Chava Leviatan <cleviatan@marvell.com>
>>
>> Tx queue are set each time port is added/removed or the cmask changes 
>> by reconfigure_datapath. The amount of TX queues is set according to 
>> PMD thread and does not take into consideration the device capabilities .
>> As a result , when transmitting packet from OVS to device driver by 
>> rte_eth_tx_burst , the device driver is called with
>> dev->data->tx_queues[queue_id] which is not a legal address , hence
>> can cause a crash . This patch suggests to set the tx queue as a 
>> minimum between the device capability and the PMD thread number as 
>> requested by reconfigure_datapath.
>> The patch was tested on Marvell setup include DPU running OVS-dpdk , 
>> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
>> Traffic is sent from associated host (VF) and remote host .
>>
>> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change 
>> rx/tx queues.")
>>
>> Reviewed-by: Liron Himi lironh@marvell.com
>> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
>> ---
>>  lib/netdev-dpdk.c | 13 ++++++++++---
>>  1 file changed, 10 insertions(+), 3 deletions(-)
> 
> This seems identical to the previous submission.  I'm not sure why it was sent twice.
> 
> Please, reply to comments for the previous submission:
>   
> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
> org_pipermail_ovs-2Ddev_2023-2DJanuary_401418.html&d=DwICaQ&c=nKjWec2b
> 6R0mOyPaz7xtfQ&r=hS7DdLpcToMvZe7BcgG7dRfLfRfOgpMqVV9gMRVT1GE&m=QU5t2CX
> vVESHVvXPHq9mBDYZlIPLbe1P12O1pgv9cer_faNh-WBbp6J6vovEHuRt&s=7MzuLjmM7j
> ElXNUzeYhPlU2_FgdMD-vhlDZr7bN0cRE&e=
> 
> Marking this one as 'Changes Requested' for now.
> 
> Best regards, Ilya Maximets.
Ilya Maximets Jan. 30, 2023, 1:02 p.m. UTC | #5
On 1/30/23 13:55, Chava Leviatan wrote:
> Inline 
> 
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org> 
> Sent: Monday, January 30, 2023 2:20 PM
> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>; David Marchand <dmarchan@redhat.com>
> Subject: Re: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue number to be the minimum between device and ovs request
> 
> On 1/30/23 09:31, Chava Leviatan wrote:
>> Hello Ilya,
>>
>> I apologize for the late reply .
>>
>> When removed our patch , I was able to re-create the crash .
>>
>> The scenario that led us for the patch comprises of dynamically 
>> setting up and down ports on exist bridge ( many repeats of that action ) The device that is added and removed , vf_rep1 , negotiates for tx_ queue num =1 .
>>
>> What we see is the following : 
>>
>> There is a call for delete interface. Delete interface calls to reconfigure_datapath which in turn calls to netdev_dpdk_set_tx_multiq.
>> netdev_dpdk_set_tx_multiq yield a call to netdev_dpdk_reconfigure.
>> netdev_dpdk_reconfigure   dpdk_eth_dev_init dpdk_eth_dev_init   
>> dpdk_eth_dev_port_config dpdk_eth_dev_port_config   
>> rte_eth_dev_configure
>>
>> rte_eth_dev_configure fails as we can see below ( probably because the 
>> interface is down )
> 
> Hmm.  In the log below, the rte_eth_dev_configure() call fails with 'Operation not supported'.  That seems to be a root cause of the issue.
> 
> OVS doesn't expect ENOSUP/EOPNOTSUPP from this API call, because that means that device doesn't support any configuration in general.
> OVS can not use such a device.  From a DPDK perspective, I think, every driver should support rte_eth_dev_configure(), hence the ENOSUP/EOPNOTSUPP error should never be returned.
> [Chava Leviatan] It does not matter what error is coming back from rte_eth_dev_configure, the flow will be the same :
> dpdk_eth_dev_port_config is returned with error to dpdk_eth_dev_init, which is called by netdev_dpdk_reconfigure AFTER it has set the device tx_q num by:
> 
> netdev->n_rxq = dev->requested_n_rxq;
> And from that moment on , it will NOT try to re-configure again the device with another tx_q number . 
> This is what this patch is resolving 
> 
> If the arguments for the rte_eth_dev_configure() are not usable, EINVAL should be returned.  Interface being down should not stop the configuration process, because it's normal that device is stopped before reconfiguration.
> 
> So, I'd consider this as a driver bug.  Drivers should not return ENOSUP/EOPNOTSUPP as a result of rte_eth_dev_configure().
> [Chava Leviatan] The driver return correct values for normal operation . This is an abnormal scenario , where the port was removed , and a new configuration was sent to the device .
> 
> 
> Why does this cause a crash: dpif-netdev treats EOPNOTSUPP as a sign that reconfigure() callback is not implemented for a netdev().
> If it's not implemented, it's considered to be not needed, hence successful.
> [Chava Leviatan] I have to disagree on that . The crash happens because OVS failed to correctly configure the device tx queue num , and it kept using an incorrect value . When the device was up again ( the same port was brought up again ) , the OVS used tx_q #1 , when it should have used 0 . ( That device published that it supports only tx queue num =1 ) .
> 
> We could workaround the probelm on the OVS side, by changing the result of rte_eth_dev_configure() from ENOSUP/EOPNOTSUPP to something like EPROTO before returning.  In this case, the device will be correctly destroyed and removed from the datapath.
> 
> [Chava Leviatan] Note that in netdev_dpdk_reconfigure, the err that is returned from dpdk_eth_dev_init can be override by 
>     if (!dev->tx_q) {
>         err = ENOMEM;
>     }
> By the time you go back , back to port_reconfigure that calls netdev_reconfigure , the err can be deprecated .

That's OK, because device will be destroed and removed from the
datapath in any case.  And if we can't even allocate memory for
tx queue structures, we're in much more trouble than one faulty
device driver.

> 
> We can't simply ignore the error, because device is not really configured at this point, so can not be used.
> 
> Best regards, Ilya Maximets.
> 
>> and thus there is no call for rte_eth_tx_queue_setup to configure the device with the minimum .
>> However , this line remains :
>>
>> netdev->n_rxq = dev->requested_n_rxq; ( at netdev_dpdk_reconfigure) , and from here , there will be no more tries to configure the interface tx queues as in netdev_dpdk_set_tx_multiq:
>>
>>     if (dev->requested_n_txq == n_txq) {
>>         goto out;
>>    }
>>
>> By the way , when I entered another debug line in reconfigure_datapath :
>>
>>     HMAP_FOR_EACH (port, node, &dp->ports) {
>>         if (netdev_is_reconf_required(port->netdev)
>>             || (port->dynamic_txqs
>>                 != (netdev_n_txq(port->netdev) < wanted_txqs))) {
>>             port->need_reconfigure = true;
>>         }
>>     //    VLOG_INFO("PATCH ==  reconfigure_datapath dev AFTER-2 dynamic %d dev tx_n %d  wanted_txqs %d name %s",
>>    //                        port->dynamic_txqs,port->netdev->n_txq,wanted_txqs, port->netdev->name);
>>     }
>> The crash was not reproduced , so I do believe , there is also a 
>> matter of race somehow
>>
>> The below is taken from the OVS logs , with some debug print I added , 
>> and some that are already exist ( my debug prints marked with PATCH)
>>
>> 2023-01-29T09:06:32.786Z|00809|bridge|INFO|bridge br0: deleted 
>> interface vf_rep1 on port 19 
>> 2023-01-29T09:06:32.875Z|00863|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
>> 2023-01-29T09:06:32.876Z|00869|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:32.876Z|00870|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 2023-01-29T09:06:32.876Z|00873|netdev_dpdk|WARN|Interface vf_rep1 eth_dev setup error Operation not supported 2023-01-29T09:06:32.876Z|00874|netdev_dpdk|ERR|Interface vf_rep1(rxq:1 txq:1 lsc interrupt mode:false) configure error: Operation not supported 2023-01-29T09:06:32.877Z|00880|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'vf_rep1' rx queue 0 (measured processing cycles 0).
>> 2023-01-29T09:06:32.877Z|00881|netdev_offload|INFO|vf_rep1: Assigned flow API 'dpdk_flow_api'.
>> 2023-01-29T09:06:32.877Z|00882|bridge|INFO|bridge br0: added interface 
>> vf_rep1 on port 22 
>> 2023-01-29T09:06:32.894Z|00883|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 2  n_txq 2 requested 2 name vf_rep1
>>
>> Normal operation of that device , before add and remove start : 
>>
>> 2023-01-29T09:06:25.971Z|00277|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
>> 2023-01-29T09:06:25.972Z|00283|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:25.972Z|00284|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00287|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00289|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config i 1 n_txq 1 name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00290|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config dev 2 n_txq 1 name vf_rep1
>>
>>
>> Here are the debug lines I added :
>>
>> At netdev_dpdk_reconfigure
>>     ovs_mutex_lock(&dev->mutex);
>>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev %d  requested %d name %s",netdev->n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>
>>     if (netdev->n_txq == dev->requested_n_txq
>>         && netdev->n_rxq == dev->requested_n_rxq
>>         && dev->mtu == dev->requested_mtu
>>         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>         && dev->rxq_size == dev->requested_rxq_size
>>         && dev->txq_size == dev->requested_txq_size
>>         && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>         && dev->socket_id == dev->requested_socket_id
>>         && dev->started && !dev->reset_needed) {
>>         /* Reconfiguration is unnecessary */
>>
>>         goto out;
>>     }
>>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev AFTER++ %d  requested %d name %s",netdev->n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>
>> /=================================================
>>
>>
>> At netdev_dpdk_set_tx_multiq:
>>
>>
>>     ovs_mutex_lock(&dev->mutex);
>>     VLOG_INFO("PATCH ==  OLD netdev_dpdk_set_tx_multiq dev %d  n_txq %d requested %d name %s",netdev->n_txq, n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>     if (dev->requested_n_txq == n_txq) {
>>         goto out;
>>     }
>>
>>
>>
>>
>>
>> Thank you ,
>> Chava
>>
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@ovn.org>
>> Sent: Tuesday, January 24, 2023 2:25 PM
>> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
>> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>
>> Subject: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue 
>> number to be the minimum between device and ovs request
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 1/24/23 12:40, cleviatan@marvell.com wrote:
>>> From: Chava Leviatan <cleviatan@marvell.com>
>>>
>>> Tx queue are set each time port is added/removed or the cmask changes 
>>> by reconfigure_datapath. The amount of TX queues is set according to 
>>> PMD thread and does not take into consideration the device capabilities .
>>> As a result , when transmitting packet from OVS to device driver by 
>>> rte_eth_tx_burst , the device driver is called with
>>> dev->data->tx_queues[queue_id] which is not a legal address , hence
>>> can cause a crash . This patch suggests to set the tx queue as a 
>>> minimum between the device capability and the PMD thread number as 
>>> requested by reconfigure_datapath.
>>> The patch was tested on Marvell setup include DPU running OVS-dpdk , 
>>> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
>>> Traffic is sent from associated host (VF) and remote host .
>>>
>>> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change 
>>> rx/tx queues.")
>>>
>>> Reviewed-by: Liron Himi lironh@marvell.com
>>> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
>>> ---
>>>  lib/netdev-dpdk.c | 13 ++++++++++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> This seems identical to the previous submission.  I'm not sure why it was sent twice.
>>
>> Please, reply to comments for the previous submission:
>>   
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
>> org_pipermail_ovs-2Ddev_2023-2DJanuary_401418.html&d=DwICaQ&c=nKjWec2b
>> 6R0mOyPaz7xtfQ&r=hS7DdLpcToMvZe7BcgG7dRfLfRfOgpMqVV9gMRVT1GE&m=QU5t2CX
>> vVESHVvXPHq9mBDYZlIPLbe1P12O1pgv9cer_faNh-WBbp6J6vovEHuRt&s=7MzuLjmM7j
>> ElXNUzeYhPlU2_FgdMD-vhlDZr7bN0cRE&e=
>>
>> Marking this one as 'Changes Requested' for now.
>>
>> Best regards, Ilya Maximets.
>
Ilya Maximets Jan. 30, 2023, 1:10 p.m. UTC | #6
On 1/30/23 13:55, Chava Leviatan wrote:
> Inline 
> 
> -----Original Message-----
> From: Ilya Maximets <i.maximets@ovn.org> 
> Sent: Monday, January 30, 2023 2:20 PM
> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>; David Marchand <dmarchan@redhat.com>
> Subject: Re: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue number to be the minimum between device and ovs request
> 
> On 1/30/23 09:31, Chava Leviatan wrote:
>> Hello Ilya,
>>
>> I apologize for the late reply .
>>
>> When removed our patch , I was able to re-create the crash .
>>
>> The scenario that led us for the patch comprises of dynamically 
>> setting up and down ports on exist bridge ( many repeats of that action ) The device that is added and removed , vf_rep1 , negotiates for tx_ queue num =1 .
>>
>> What we see is the following : 
>>
>> There is a call for delete interface. Delete interface calls to reconfigure_datapath which in turn calls to netdev_dpdk_set_tx_multiq.
>> netdev_dpdk_set_tx_multiq yield a call to netdev_dpdk_reconfigure.
>> netdev_dpdk_reconfigure   dpdk_eth_dev_init dpdk_eth_dev_init   
>> dpdk_eth_dev_port_config dpdk_eth_dev_port_config   
>> rte_eth_dev_configure
>>
>> rte_eth_dev_configure fails as we can see below ( probably because the 
>> interface is down )
> 
> Hmm.  In the log below, the rte_eth_dev_configure() call fails with 'Operation not supported'.  That seems to be a root cause of the issue.
> 
> OVS doesn't expect ENOSUP/EOPNOTSUPP from this API call, because that means that device doesn't support any configuration in general.
> OVS can not use such a device.  From a DPDK perspective, I think, every driver should support rte_eth_dev_configure(), hence the ENOSUP/EOPNOTSUPP error should never be returned.
> [Chava Leviatan] It does not matter what error is coming back from rte_eth_dev_configure, the flow will be the same :
> dpdk_eth_dev_port_config is returned with error to dpdk_eth_dev_init, which is called by netdev_dpdk_reconfigure AFTER it has set the device tx_q num by:

If rte_eth_dev_configure() fails, dpdk_eth_dev_port_config() will fail,
after that dpdk_eth_dev_init() will fail and netdev_dpdk_reconfigure()
will fail and the netdev_reconfigure() will fail as a result.

port_reconfigure() doesn't fail right now because it ignores EOPNOTSUPP
that should not be returned if the call back is implemented.

If any other error will be returned, port_reconfigure() will fail and
the port will be destroyed and not used, hence there will be no crash.

> 
> netdev->n_rxq = dev->requested_n_rxq;
> And from that moment on , it will NOT try to re-configure again the device with another tx_q number . 
> This is what this patch is resolving 
> 
> If the arguments for the rte_eth_dev_configure() are not usable, EINVAL should be returned.  Interface being down should not stop the configuration process, because it's normal that device is stopped before reconfiguration.
> 
> So, I'd consider this as a driver bug.  Drivers should not return ENOSUP/EOPNOTSUPP as a result of rte_eth_dev_configure().
> [Chava Leviatan] The driver return correct values for normal operation . This is an abnormal scenario , where the port was removed , and a new configuration was sent to the device .

I dont think ENOSUP/EOPNOTSUPP is a normal reslt value for rte_eth_dev_configure().

> 
> 
> Why does this cause a crash: dpif-netdev treats EOPNOTSUPP as a sign that reconfigure() callback is not implemented for a netdev().
> If it's not implemented, it's considered to be not needed, hence successful.
> [Chava Leviatan] I have to disagree on that . The crash happens because OVS failed to correctly configure the device tx queue num , and it kept using an incorrect value . When the device was up again ( the same port was brought up again ) , the OVS used tx_q #1 , when it should have used 0 . ( That device published that it supports only tx queue num =1 ) .

OVS will not use the port at all if the error was different from ENOSUP/EOPNOTSUP.

> 
> We could workaround the probelm on the OVS side, by changing the result of rte_eth_dev_configure() from ENOSUP/EOPNOTSUPP to something like EPROTO before returning.  In this case, the device will be correctly destroyed and removed from the datapath.
> 
> [Chava Leviatan] Note that in netdev_dpdk_reconfigure, the err that is returned from dpdk_eth_dev_init can be override by 
>     if (!dev->tx_q) {
>         err = ENOMEM;
>     }
> By the time you go back , back to port_reconfigure that calls netdev_reconfigure , the err can be deprecated .
> 
> We can't simply ignore the error, because device is not really configured at this point, so can not be used.
> 
> Best regards, Ilya Maximets.
> 
>> and thus there is no call for rte_eth_tx_queue_setup to configure the device with the minimum .
>> However , this line remains :
>>
>> netdev->n_rxq = dev->requested_n_rxq; ( at netdev_dpdk_reconfigure) , and from here , there will be no more tries to configure the interface tx queues as in netdev_dpdk_set_tx_multiq:
>>
>>     if (dev->requested_n_txq == n_txq) {
>>         goto out;
>>    }
>>
>> By the way , when I entered another debug line in reconfigure_datapath :
>>
>>     HMAP_FOR_EACH (port, node, &dp->ports) {
>>         if (netdev_is_reconf_required(port->netdev)
>>             || (port->dynamic_txqs
>>                 != (netdev_n_txq(port->netdev) < wanted_txqs))) {
>>             port->need_reconfigure = true;
>>         }
>>     //    VLOG_INFO("PATCH ==  reconfigure_datapath dev AFTER-2 dynamic %d dev tx_n %d  wanted_txqs %d name %s",
>>    //                        port->dynamic_txqs,port->netdev->n_txq,wanted_txqs, port->netdev->name);
>>     }
>> The crash was not reproduced , so I do believe , there is also a 
>> matter of race somehow
>>
>> The below is taken from the OVS logs , with some debug print I added , 
>> and some that are already exist ( my debug prints marked with PATCH)
>>
>> 2023-01-29T09:06:32.786Z|00809|bridge|INFO|bridge br0: deleted 
>> interface vf_rep1 on port 19 
>> 2023-01-29T09:06:32.875Z|00863|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
>> 2023-01-29T09:06:32.876Z|00869|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:32.876Z|00870|netdev_dpdk|INFO|PATCH ==  netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 2023-01-29T09:06:32.876Z|00873|netdev_dpdk|WARN|Interface vf_rep1 eth_dev setup error Operation not supported 2023-01-29T09:06:32.876Z|00874|netdev_dpdk|ERR|Interface vf_rep1(rxq:1 txq:1 lsc interrupt mode:false) configure error: Operation not supported 2023-01-29T09:06:32.877Z|00880|dpif_netdev|INFO|Core 21 on numa node 0 assigned port 'vf_rep1' rx queue 0 (measured processing cycles 0).
>> 2023-01-29T09:06:32.877Z|00881|netdev_offload|INFO|vf_rep1: Assigned flow API 'dpdk_flow_api'.
>> 2023-01-29T09:06:32.877Z|00882|bridge|INFO|bridge br0: added interface 
>> vf_rep1 on port 22 
>> 2023-01-29T09:06:32.894Z|00883|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 2  n_txq 2 requested 2 name vf_rep1
>>
>> Normal operation of that device , before add and remove start : 
>>
>> 2023-01-29T09:06:25.971Z|00277|netdev_dpdk|INFO|PATCH ==  OLD 
>> netdev_dpdk_set_tx_multiq dev 0  n_txq 2 requested 1 name vf_rep1 
>> 2023-01-29T09:06:25.972Z|00283|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:25.972Z|00284|netdev_dpdk|INFO|PATCH ==  
>> netdev_dpdk_reconfigure dev AFTER++ 0  requested 2 name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00287|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00289|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config i 1 n_txq 1 name vf_rep1 
>> 2023-01-29T09:06:25.973Z|00290|netdev_dpdk|INFO|PATCH ==  
>> dpdk_eth_dev_port_config dev 2 n_txq 1 name vf_rep1
>>
>>
>> Here are the debug lines I added :
>>
>> At netdev_dpdk_reconfigure
>>     ovs_mutex_lock(&dev->mutex);
>>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev %d  requested %d name %s",netdev->n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>
>>     if (netdev->n_txq == dev->requested_n_txq
>>         && netdev->n_rxq == dev->requested_n_rxq
>>         && dev->mtu == dev->requested_mtu
>>         && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode
>>         && dev->rxq_size == dev->requested_rxq_size
>>         && dev->txq_size == dev->requested_txq_size
>>         && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)
>>         && dev->socket_id == dev->requested_socket_id
>>         && dev->started && !dev->reset_needed) {
>>         /* Reconfiguration is unnecessary */
>>
>>         goto out;
>>     }
>>     VLOG_INFO("PATCH ==  netdev_dpdk_reconfigure dev AFTER++ %d  requested %d name %s",netdev->n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>
>> /=================================================
>>
>>
>> At netdev_dpdk_set_tx_multiq:
>>
>>
>>     ovs_mutex_lock(&dev->mutex);
>>     VLOG_INFO("PATCH ==  OLD netdev_dpdk_set_tx_multiq dev %d  n_txq %d requested %d name %s",netdev->n_txq, n_txq,
>>                                dev->requested_n_txq, netdev->name);
>>     if (dev->requested_n_txq == n_txq) {
>>         goto out;
>>     }
>>
>>
>>
>>
>>
>> Thank you ,
>> Chava
>>
>> -----Original Message-----
>> From: Ilya Maximets <i.maximets@ovn.org>
>> Sent: Tuesday, January 24, 2023 2:25 PM
>> To: Chava Leviatan <cleviatan@marvell.com>; dev@openvswitch.org
>> Cc: i.maximets@ovn.org; Liron Himi <lironh@marvell.com>
>> Subject: [EXT] Re: [ovs-dev] [PATCH] netdev-dpdk: config tx queue 
>> number to be the minimum between device and ovs request
>>
>> External Email
>>
>> ----------------------------------------------------------------------
>> On 1/24/23 12:40, cleviatan@marvell.com wrote:
>>> From: Chava Leviatan <cleviatan@marvell.com>
>>>
>>> Tx queue are set each time port is added/removed or the cmask changes 
>>> by reconfigure_datapath. The amount of TX queues is set according to 
>>> PMD thread and does not take into consideration the device capabilities .
>>> As a result , when transmitting packet from OVS to device driver by 
>>> rte_eth_tx_burst , the device driver is called with
>>> dev->data->tx_queues[queue_id] which is not a legal address , hence
>>> can cause a crash . This patch suggests to set the tx queue as a 
>>> minimum between the device capability and the PMD thread number as 
>>> requested by reconfigure_datapath.
>>> The patch was tested on Marvell setup include DPU running OVS-dpdk , 
>>> and Marvell DPDK Octeontx2 driver. DPU is bind to a host through PCI .
>>> Traffic is sent from associated host (VF) and remote host .
>>>
>>> Fixes: 050c60bfb5b4 ("netdev-dpdk: Use ->reconfigure() call to change 
>>> rx/tx queues.")
>>>
>>> Reviewed-by: Liron Himi lironh@marvell.com
>>> Signed-off-by: Chava Leviatan <cleviatan@marvell.com>
>>> ---
>>>  lib/netdev-dpdk.c | 13 ++++++++++---
>>>  1 file changed, 10 insertions(+), 3 deletions(-)
>>
>> This seems identical to the previous submission.  I'm not sure why it was sent twice.
>>
>> Please, reply to comments for the previous submission:
>>   
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.
>> org_pipermail_ovs-2Ddev_2023-2DJanuary_401418.html&d=DwICaQ&c=nKjWec2b
>> 6R0mOyPaz7xtfQ&r=hS7DdLpcToMvZe7BcgG7dRfLfRfOgpMqVV9gMRVT1GE&m=QU5t2CX
>> vVESHVvXPHq9mBDYZlIPLbe1P12O1pgv9cer_faNh-WBbp6J6vovEHuRt&s=7MzuLjmM7j
>> ElXNUzeYhPlU2_FgdMD-vhlDZr7bN0cRE&e=
>>
>> Marking this one as 'Changes Requested' for now.
>>
>> Best regards, Ilya Maximets.
>
diff mbox series

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index ab5b8223e..278da12d4 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -508,6 +508,7 @@  struct netdev_dpdk {
         int requested_n_rxq;
         int requested_rxq_size;
         int requested_txq_size;
+        int device_n_txq;
 
         /* Number of rx/tx descriptors for physical devices */
         int rxq_size;
@@ -1124,6 +1125,7 @@  dpdk_eth_dev_port_config(struct netdev_dpdk *dev, int n_rxq, int n_txq)
 
         dev->up.n_rxq = n_rxq;
         dev->up.n_txq = n_txq;
+        dev->device_n_txq = n_txq;
 
         return 0;
     }
@@ -2133,15 +2135,20 @@  netdev_dpdk_get_numa_id(const struct netdev *netdev)
 static int
 netdev_dpdk_set_tx_multiq(struct netdev *netdev, unsigned int n_txq)
 {
+    unsigned int req_n_txq;
     struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
 
     ovs_mutex_lock(&dev->mutex);
 
-    if (dev->requested_n_txq == n_txq) {
+    req_n_txq = MIN(n_txq, dev->device_n_txq);
+    if (dev->requested_n_txq == req_n_txq) {
         goto out;
     }
-
-    dev->requested_n_txq = n_txq;
+    if (!dev->device_n_txq) {
+       dev->device_n_txq = 1;
+       req_n_txq = 1;
+    }
+    dev->requested_n_txq = req_n_txq;
     netdev_request_reconfigure(netdev);
 
 out: