Message ID | 20200324235045.704399-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] dpif-netdev: Force port reconfiguration to change dynamic_txqs. | expand |
On Wed, Mar 25, 2020 at 12:50:45AM +0100, Ilya Maximets wrote: > In case number of polling threads goes from exact number of Tx queues > in port to higher value while set_tx_multiq() not implemented or not > requesting reconfiguration, port will not be reconfigured and datapath > will continue using static Tx queue ids leading to crash. I always wonder why those static analysis tools such as Coverity can't detect crash like this. > > Ex.: > Assuming that port p0 supports up to 4 Tx queues and doesn't support > set_tx_multiq() method. For example, netdev-afxdp could be the case, > because it could have multiple Tx queues, but doesn't have > set_tx_multiq() implementation because number of Tx queues always > equals to number of Rx queues. > > 1. Configuring pmd-cpu-mask to have 3 pmd threads. > > 2. Adding port p0 to OVS. > At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd). > Port reconfigured to have 4 Tx queues successfully. > dynamic_txqs = (4 < 4) = false; > > 3. Configuring pmd-cpu-mask to have 10 pmd threads. > At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd). > Since set_tx_multiq() is not implemented, netdev doesn't request > reconfiguration and 'dynamic_txqs' remains in 'false' state. > > 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue > ids that are in range [0, 10] while device only supports 4 leading > to unwanted behavior and crashes. > > Fix that by marking for reconfiguration all the ports that will likely > change their 'dynamic_txqs' value. > > It looks like the issue could be reproduced only with afxdp ports, > because all other non-dpdk ports ignores Tx queue ids and dpdk ports > requests for reconfiguration on set_tx_multiq(). > > Reported-by: William Tu <u9012063@gmail.com> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html > Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- Thanks a lot! I tested it and it works without crash. Acked-by: William Tu <u9012063@gmail.com>
On 24/03/2020 23:50, Ilya Maximets wrote: > In case number of polling threads goes from exact number of Tx queues > in port to higher value while set_tx_multiq() not implemented or not > requesting reconfiguration, port will not be reconfigured and datapath > will continue using static Tx queue ids leading to crash. > > Ex.: > Assuming that port p0 supports up to 4 Tx queues and doesn't support > set_tx_multiq() method. For example, netdev-afxdp could be the case, > because it could have multiple Tx queues, but doesn't have > set_tx_multiq() implementation because number of Tx queues always > equals to number of Rx queues. > > 1. Configuring pmd-cpu-mask to have 3 pmd threads. > > 2. Adding port p0 to OVS. > At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd). > Port reconfigured to have 4 Tx queues successfully. > dynamic_txqs = (4 < 4) = false; > > 3. Configuring pmd-cpu-mask to have 10 pmd threads. > At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd). > Since set_tx_multiq() is not implemented, netdev doesn't request > reconfiguration and 'dynamic_txqs' remains in 'false' state. > > 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue > ids that are in range [0, 10] while device only supports 4 leading > to unwanted behavior and crashes. > > Fix that by marking for reconfiguration all the ports that will likely > change their 'dynamic_txqs' value. > > It looks like the issue could be reproduced only with afxdp ports, > because all other non-dpdk ports ignores Tx queue ids and dpdk ports > requests for reconfiguration on set_tx_multiq(). > > Reported-by: William Tu <u9012063@gmail.com> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html > Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > > Since the issue could be reproduced only with afxdp ports, this should > be backported only down to 2.12. > > lib/dpif-netdev.c | 12 ++++++++++-- > 1 file changed, 10 insertions(+), 2 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index a798db45d..e456cc9be 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > /* Check for all the ports that need reconfiguration. We cache this in > * 'port->need_reconfigure', because netdev_is_reconf_required() can > - * change at any time. */ > + * change at any time. > + * Also mark for reconfiguration all ports which will likely change their > + * 'dynamic_txqs' parameter. It's required to stop using them before > + * changing this setting and it's simpler to mark ports here and allow > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > + * no actual reconfiguration in 'port_reconfigure' because it's > + * unnecessary. */ > HMAP_FOR_EACH (port, node, &dp->ports) { > - if (netdev_is_reconf_required(port->netdev)) { > + if (netdev_is_reconf_required(port->netdev) > + || (port->dynamic_txqs > + != (netdev_n_txq(port->netdev) < wanted_txqs))) { > port->need_reconfigure = true; > } > } > LGTM Acked-by: Kevin Traynor <ktraynor@redhat.com>
On Wed, Mar 25, 2020 at 06:17:14PM +0000, Kevin Traynor wrote: > On 24/03/2020 23:50, Ilya Maximets wrote: > > In case number of polling threads goes from exact number of Tx queues > > in port to higher value while set_tx_multiq() not implemented or not > > requesting reconfiguration, port will not be reconfigured and datapath > > will continue using static Tx queue ids leading to crash. > > > > Ex.: > > Assuming that port p0 supports up to 4 Tx queues and doesn't support > > set_tx_multiq() method. For example, netdev-afxdp could be the case, > > because it could have multiple Tx queues, but doesn't have > > set_tx_multiq() implementation because number of Tx queues always > > equals to number of Rx queues. > > > > 1. Configuring pmd-cpu-mask to have 3 pmd threads. > > > > 2. Adding port p0 to OVS. > > At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd). > > Port reconfigured to have 4 Tx queues successfully. > > dynamic_txqs = (4 < 4) = false; > > > > 3. Configuring pmd-cpu-mask to have 10 pmd threads. > > At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd). > > Since set_tx_multiq() is not implemented, netdev doesn't request > > reconfiguration and 'dynamic_txqs' remains in 'false' state. > > > > 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue > > ids that are in range [0, 10] while device only supports 4 leading > > to unwanted behavior and crashes. > > > > Fix that by marking for reconfiguration all the ports that will likely > > change their 'dynamic_txqs' value. > > > > It looks like the issue could be reproduced only with afxdp ports, > > because all other non-dpdk ports ignores Tx queue ids and dpdk ports > > requests for reconfiguration on set_tx_multiq(). > > > > Reported-by: William Tu <u9012063@gmail.com> > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html > > Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.") > > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > > --- > > > > Since the issue could be reproduced only with afxdp ports, this should > > be backported only down to 2.12. > > > > lib/dpif-netdev.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index a798db45d..e456cc9be 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > /* Check for all the ports that need reconfiguration. We cache this in > > * 'port->need_reconfigure', because netdev_is_reconf_required() can > > - * change at any time. */ > > + * change at any time. > > + * Also mark for reconfiguration all ports which will likely change their > > + * 'dynamic_txqs' parameter. It's required to stop using them before > > + * changing this setting and it's simpler to mark ports here and allow > > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > > + * no actual reconfiguration in 'port_reconfigure' because it's > > + * unnecessary. */ > > HMAP_FOR_EACH (port, node, &dp->ports) { > > - if (netdev_is_reconf_required(port->netdev)) { > > + if (netdev_is_reconf_required(port->netdev) > > + || (port->dynamic_txqs > > + != (netdev_n_txq(port->netdev) < wanted_txqs))) { > > port->need_reconfigure = true; > > } > > } > > > > LGTM > Acked-by: Kevin Traynor <ktraynor@redhat.com> > Thanks! Applied to master, 2.12, 2.13. William
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index a798db45d..e456cc9be 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp) /* Check for all the ports that need reconfiguration. We cache this in * 'port->need_reconfigure', because netdev_is_reconf_required() can - * change at any time. */ + * change at any time. + * Also mark for reconfiguration all ports which will likely change their + * 'dynamic_txqs' parameter. It's required to stop using them before + * changing this setting and it's simpler to mark ports here and allow + * 'pmd_remove_stale_ports' to remove them from threads. There will be + * no actual reconfiguration in 'port_reconfigure' because it's + * unnecessary. */ HMAP_FOR_EACH (port, node, &dp->ports) { - if (netdev_is_reconf_required(port->netdev)) { + if (netdev_is_reconf_required(port->netdev) + || (port->dynamic_txqs + != (netdev_n_txq(port->netdev) < wanted_txqs))) { port->need_reconfigure = true; } }
In case number of polling threads goes from exact number of Tx queues in port to higher value while set_tx_multiq() not implemented or not requesting reconfiguration, port will not be reconfigured and datapath will continue using static Tx queue ids leading to crash. Ex.: Assuming that port p0 supports up to 4 Tx queues and doesn't support set_tx_multiq() method. For example, netdev-afxdp could be the case, because it could have multiple Tx queues, but doesn't have set_tx_multiq() implementation because number of Tx queues always equals to number of Rx queues. 1. Configuring pmd-cpu-mask to have 3 pmd threads. 2. Adding port p0 to OVS. At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd). Port reconfigured to have 4 Tx queues successfully. dynamic_txqs = (4 < 4) = false; 3. Configuring pmd-cpu-mask to have 10 pmd threads. At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd). Since set_tx_multiq() is not implemented, netdev doesn't request reconfiguration and 'dynamic_txqs' remains in 'false' state. 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue ids that are in range [0, 10] while device only supports 4 leading to unwanted behavior and crashes. Fix that by marking for reconfiguration all the ports that will likely change their 'dynamic_txqs' value. It looks like the issue could be reproduced only with afxdp ports, because all other non-dpdk ports ignores Tx queue ids and dpdk ports requests for reconfiguration on set_tx_multiq(). Reported-by: William Tu <u9012063@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- Since the issue could be reproduced only with afxdp ports, this should be backported only down to 2.12. lib/dpif-netdev.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)