diff mbox series

[ovs-dev] dpif-netdev: Force port reconfiguration to change dynamic_txqs.

Message ID 20200324235045.704399-1-i.maximets@ovn.org
State Accepted
Headers show
Series [ovs-dev] dpif-netdev: Force port reconfiguration to change dynamic_txqs. | expand

Commit Message

Ilya Maximets March 24, 2020, 11:50 p.m. UTC
In case number of polling threads goes from exact number of Tx queues
in port to higher value while set_tx_multiq() not implemented or not
requesting reconfiguration, port will not be reconfigured and datapath
will continue using static Tx queue ids leading to crash.

Ex.:
 Assuming that port p0 supports up to 4 Tx queues and doesn't support
 set_tx_multiq() method.  For example, netdev-afxdp could be the case,
 because it could have multiple Tx queues, but doesn't have
 set_tx_multiq() implementation because number of Tx queues always
 equals to number of Rx queues.

 1. Configuring pmd-cpu-mask to have 3 pmd threads.

 2. Adding port p0 to OVS.
    At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
    Port reconfigured to have 4 Tx queues successfully.
    dynamic_txqs = (4 < 4) = false;

 3. Configuring pmd-cpu-mask to have 10 pmd threads.
    At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
    Since set_tx_multiq() is not implemented, netdev doesn't request
    reconfiguration and 'dynamic_txqs' remains in 'false' state.

 4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
    ids that are in range [0, 10] while device only supports 4 leading
    to unwanted behavior and crashes.

Fix that by marking for reconfiguration all the ports that will likely
change their 'dynamic_txqs' value.

It looks like the issue could be reproduced only with afxdp ports,
because all other non-dpdk ports ignores Tx queue ids and dpdk ports
requests for reconfiguration on set_tx_multiq().

Reported-by: William Tu <u9012063@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---

Since the issue could be reproduced only with afxdp ports, this should
be backported only down to 2.12.

 lib/dpif-netdev.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

William Tu March 25, 2020, 3:11 p.m. UTC | #1
On Wed, Mar 25, 2020 at 12:50:45AM +0100, Ilya Maximets wrote:
> In case number of polling threads goes from exact number of Tx queues
> in port to higher value while set_tx_multiq() not implemented or not
> requesting reconfiguration, port will not be reconfigured and datapath
> will continue using static Tx queue ids leading to crash.

I always wonder why those static analysis tools such as Coverity can't
detect crash like this.

> 
> Ex.:
>  Assuming that port p0 supports up to 4 Tx queues and doesn't support
>  set_tx_multiq() method.  For example, netdev-afxdp could be the case,
>  because it could have multiple Tx queues, but doesn't have
>  set_tx_multiq() implementation because number of Tx queues always
>  equals to number of Rx queues.
> 
>  1. Configuring pmd-cpu-mask to have 3 pmd threads.
> 
>  2. Adding port p0 to OVS.
>     At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
>     Port reconfigured to have 4 Tx queues successfully.
>     dynamic_txqs = (4 < 4) = false;
> 
>  3. Configuring pmd-cpu-mask to have 10 pmd threads.
>     At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
>     Since set_tx_multiq() is not implemented, netdev doesn't request
>     reconfiguration and 'dynamic_txqs' remains in 'false' state.
> 
>  4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
>     ids that are in range [0, 10] while device only supports 4 leading
>     to unwanted behavior and crashes.
> 
> Fix that by marking for reconfiguration all the ports that will likely
> change their 'dynamic_txqs' value.
> 
> It looks like the issue could be reproduced only with afxdp ports,
> because all other non-dpdk ports ignores Tx queue ids and dpdk ports
> requests for reconfiguration on set_tx_multiq().
> 
> Reported-by: William Tu <u9012063@gmail.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
> Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
Thanks a lot! I tested it and it works without crash.

Acked-by: William Tu <u9012063@gmail.com>
Kevin Traynor March 25, 2020, 6:17 p.m. UTC | #2
On 24/03/2020 23:50, Ilya Maximets wrote:
> In case number of polling threads goes from exact number of Tx queues
> in port to higher value while set_tx_multiq() not implemented or not
> requesting reconfiguration, port will not be reconfigured and datapath
> will continue using static Tx queue ids leading to crash.
> 
> Ex.:
>  Assuming that port p0 supports up to 4 Tx queues and doesn't support
>  set_tx_multiq() method.  For example, netdev-afxdp could be the case,
>  because it could have multiple Tx queues, but doesn't have
>  set_tx_multiq() implementation because number of Tx queues always
>  equals to number of Rx queues.
> 
>  1. Configuring pmd-cpu-mask to have 3 pmd threads.
> 
>  2. Adding port p0 to OVS.
>     At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
>     Port reconfigured to have 4 Tx queues successfully.
>     dynamic_txqs = (4 < 4) = false;
> 
>  3. Configuring pmd-cpu-mask to have 10 pmd threads.
>     At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
>     Since set_tx_multiq() is not implemented, netdev doesn't request
>     reconfiguration and 'dynamic_txqs' remains in 'false' state.
> 
>  4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
>     ids that are in range [0, 10] while device only supports 4 leading
>     to unwanted behavior and crashes.
> 
> Fix that by marking for reconfiguration all the ports that will likely
> change their 'dynamic_txqs' value.
> 
> It looks like the issue could be reproduced only with afxdp ports,
> because all other non-dpdk ports ignores Tx queue ids and dpdk ports
> requests for reconfiguration on set_tx_multiq().
> 
> Reported-by: William Tu <u9012063@gmail.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
> Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---
> 
> Since the issue could be reproduced only with afxdp ports, this should
> be backported only down to 2.12.
> 
>  lib/dpif-netdev.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index a798db45d..e456cc9be 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp)
>  
>      /* Check for all the ports that need reconfiguration.  We cache this in
>       * 'port->need_reconfigure', because netdev_is_reconf_required() can
> -     * change at any time. */
> +     * change at any time.
> +     * Also mark for reconfiguration all ports which will likely change their
> +     * 'dynamic_txqs' parameter.  It's required to stop using them before
> +     * changing this setting and it's simpler to mark ports here and allow
> +     * 'pmd_remove_stale_ports' to remove them from threads.  There will be
> +     * no actual reconfiguration in 'port_reconfigure' because it's
> +     * unnecessary.  */
>      HMAP_FOR_EACH (port, node, &dp->ports) {
> -        if (netdev_is_reconf_required(port->netdev)) {
> +        if (netdev_is_reconf_required(port->netdev)
> +            || (port->dynamic_txqs
> +                != (netdev_n_txq(port->netdev) < wanted_txqs))) {
>              port->need_reconfigure = true;
>          }
>      }
> 

LGTM
Acked-by: Kevin Traynor <ktraynor@redhat.com>
William Tu March 25, 2020, 8:39 p.m. UTC | #3
On Wed, Mar 25, 2020 at 06:17:14PM +0000, Kevin Traynor wrote:
> On 24/03/2020 23:50, Ilya Maximets wrote:
> > In case number of polling threads goes from exact number of Tx queues
> > in port to higher value while set_tx_multiq() not implemented or not
> > requesting reconfiguration, port will not be reconfigured and datapath
> > will continue using static Tx queue ids leading to crash.
> > 
> > Ex.:
> >  Assuming that port p0 supports up to 4 Tx queues and doesn't support
> >  set_tx_multiq() method.  For example, netdev-afxdp could be the case,
> >  because it could have multiple Tx queues, but doesn't have
> >  set_tx_multiq() implementation because number of Tx queues always
> >  equals to number of Rx queues.
> > 
> >  1. Configuring pmd-cpu-mask to have 3 pmd threads.
> > 
> >  2. Adding port p0 to OVS.
> >     At this point wanted_txqs = 4 (3 for pmd threads + 1 for non-pmd).
> >     Port reconfigured to have 4 Tx queues successfully.
> >     dynamic_txqs = (4 < 4) = false;
> > 
> >  3. Configuring pmd-cpu-mask to have 10 pmd threads.
> >     At this point wanted_txqs = 11 (10 for pmd threads + 1 for non-pmd).
> >     Since set_tx_multiq() is not implemented, netdev doesn't request
> >     reconfiguration and 'dynamic_txqs' remains in 'false' state.
> > 
> >  4. Since 'dynamic_txqs == false', dpif-netdev uses static Tx queue
> >     ids that are in range [0, 10] while device only supports 4 leading
> >     to unwanted behavior and crashes.
> > 
> > Fix that by marking for reconfiguration all the ports that will likely
> > change their 'dynamic_txqs' value.
> > 
> > It looks like the issue could be reproduced only with afxdp ports,
> > because all other non-dpdk ports ignores Tx queue ids and dpdk ports
> > requests for reconfiguration on set_tx_multiq().
> > 
> > Reported-by: William Tu <u9012063@gmail.com>
> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2020-March/368364.html
> > Fixes: e32971b8ddb4 ("dpif-netdev: Centralized threads and queues handling code.")
> > Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> > ---
> > 
> > Since the issue could be reproduced only with afxdp ports, this should
> > be backported only down to 2.12.
> > 
> >  lib/dpif-netdev.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> > 
> > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> > index a798db45d..e456cc9be 100644
> > --- a/lib/dpif-netdev.c
> > +++ b/lib/dpif-netdev.c
> > @@ -4941,9 +4941,17 @@ reconfigure_datapath(struct dp_netdev *dp)
> >  
> >      /* Check for all the ports that need reconfiguration.  We cache this in
> >       * 'port->need_reconfigure', because netdev_is_reconf_required() can
> > -     * change at any time. */
> > +     * change at any time.
> > +     * Also mark for reconfiguration all ports which will likely change their
> > +     * 'dynamic_txqs' parameter.  It's required to stop using them before
> > +     * changing this setting and it's simpler to mark ports here and allow
> > +     * 'pmd_remove_stale_ports' to remove them from threads.  There will be
> > +     * no actual reconfiguration in 'port_reconfigure' because it's
> > +     * unnecessary.  */
> >      HMAP_FOR_EACH (port, node, &dp->ports) {
> > -        if (netdev_is_reconf_required(port->netdev)) {
> > +        if (netdev_is_reconf_required(port->netdev)
> > +            || (port->dynamic_txqs
> > +                != (netdev_n_txq(port->netdev) < wanted_txqs))) {
> >              port->need_reconfigure = true;
> >          }
> >      }
> > 
> 
> LGTM
> Acked-by: Kevin Traynor <ktraynor@redhat.com>
> 
Thanks!
Applied to master, 2.12, 2.13.
William
diff mbox series

Patch

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a798db45d..e456cc9be 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -4941,9 +4941,17 @@  reconfigure_datapath(struct dp_netdev *dp)
 
     /* Check for all the ports that need reconfiguration.  We cache this in
      * 'port->need_reconfigure', because netdev_is_reconf_required() can
-     * change at any time. */
+     * change at any time.
+     * Also mark for reconfiguration all ports which will likely change their
+     * 'dynamic_txqs' parameter.  It's required to stop using them before
+     * changing this setting and it's simpler to mark ports here and allow
+     * 'pmd_remove_stale_ports' to remove them from threads.  There will be
+     * no actual reconfiguration in 'port_reconfigure' because it's
+     * unnecessary.  */
     HMAP_FOR_EACH (port, node, &dp->ports) {
-        if (netdev_is_reconf_required(port->netdev)) {
+        if (netdev_is_reconf_required(port->netdev)
+            || (port->dynamic_txqs
+                != (netdev_n_txq(port->netdev) < wanted_txqs))) {
             port->need_reconfigure = true;
         }
     }