Message ID | 1496153546-20532-3-git-send-email-i.maximets@samsung.com |
---|---|
State | Superseded |
Headers | show |
> Reconfiguration of HW NICs may lead to packet drops. > In current model all physical ports will be reconfigured each time number > of PMD threads changed. Since we not stopping threads on pmd-cpu-mask > changes, this patch will help to further decrease port's downtime by > setting the maximum possible number of wanted tx queues to avoid > unnecessary reconfigurations. Hi Ilya, Thanks for the patch, in testing I can confirm it resolves the issue. A few observations and comments below. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 596d133..79db770 > 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { > struct dp_netdev_pmd_thread *pmd; > struct dp_netdev_port *port; > - int wanted_txqs; > + int needed_txqs, wanted_txqs; > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > * on the system and the user configuration. */ > reconfigure_pmd_threads(dp); > > - wanted_txqs = cmap_count(&dp->poll_threads); > + /* We need 1 Tx queue for each thread to avoid locking, but we will > try > + * to allocate the maximum possible value to minimize the number of > port > + * reconfigurations. */ > + needed_txqs = cmap_count(&dp->poll_threads); > + /* (n_cores + 1) is the maximum that we might need to have. > + * Additional queue is for non-PMD threads. */ > + wanted_txqs = ovs_numa_get_n_cores(); > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > + wanted_txqs++; So just after this line there is a call HMAP_FOR_EACH (port, node, &dp->ports) { netdev_set_tx_multiq(port->netdev, wanted_txqs); } My initial concern with this patch was twofold: (i) What would happen if the number of cores was greater than the number of supported queues. That's not an issue from what I can see as the requested_txqs will be compared to what's supported by the device and the smaller of the two will be chosen in the eventual call to dpdk_eth_dev_init(): n_txq = MIN(info.max_tx_queues, dev->up.n_txq); (ii) What happens when a device reports the incorrect number of max tx queues? (can occur depending on the mode of the device). I think this case is ok as well as it's handled by the existing txq retry logic in dpdk_eth_dev_queue_setup(). > > /* The number of pmd threads might have changed, or a port can be > new: > * adjust the txqs. */ > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > /* Check for all the ports that need reconfiguration. We cache this > in > * 'port->reconfigure', because netdev_is_reconf_required() can > change at > - * any time. */ > + * any time. > + * Also mark for reconfiguration all ports which will likely change > their > + * 'dynamic_txqs' parameter. It's required to stop using them before > + * changing this setting and it's simpler to mark ports here and > allow > + * 'pmd_remove_stale_ports' to remove them from threads. There will > be > + * no actual reconfiguration in 'port_reconfigure' because it's > + * unnecessary. */ > HMAP_FOR_EACH (port, node, &dp->ports) { > - if (netdev_is_reconf_required(port->netdev)) { > + if (netdev_is_reconf_required(port->netdev) > + || (port->dynamic_txqs > + != netdev_n_txq(port->netdev) < needed_txqs)) { GCC caught the following lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison in operand of '!=' [-Werror=parentheses] != netdev_n_txq(port->netdev) < needed_txqs)) { > port->need_reconfigure = true; > } > } > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > seq_change(dp->port_seq); > port_destroy(port); > } else { > - port->dynamic_txqs = netdev_n_txq(port->netdev) < > wanted_txqs; > + port->dynamic_txqs = netdev_n_txq(port->netdev) < > + needed_txqs; > } > } > > -- > 2.7.4
On 15.06.2017 11:02, Stokes, Ian wrote: >> Reconfiguration of HW NICs may lead to packet drops. >> In current model all physical ports will be reconfigured each time number >> of PMD threads changed. Since we not stopping threads on pmd-cpu-mask >> changes, this patch will help to further decrease port's downtime by >> setting the maximum possible number of wanted tx queues to avoid >> unnecessary reconfigurations. > > Hi Ilya, > > Thanks for the patch, in testing I can confirm it resolves the issue. A few observations and comments below. Hi Ian. Thanks for testing. > >> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >> --- >> lib/dpif-netdev.c | 26 +++++++++++++++++++++----- >> 1 file changed, 21 insertions(+), 5 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 596d133..79db770 >> 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { >> struct dp_netdev_pmd_thread *pmd; >> struct dp_netdev_port *port; >> - int wanted_txqs; >> + int needed_txqs, wanted_txqs; >> >> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); >> >> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) >> * on the system and the user configuration. */ >> reconfigure_pmd_threads(dp); >> >> - wanted_txqs = cmap_count(&dp->poll_threads); >> + /* We need 1 Tx queue for each thread to avoid locking, but we will >> try >> + * to allocate the maximum possible value to minimize the number of >> port >> + * reconfigurations. */ >> + needed_txqs = cmap_count(&dp->poll_threads); >> + /* (n_cores + 1) is the maximum that we might need to have. >> + * Additional queue is for non-PMD threads. */ >> + wanted_txqs = ovs_numa_get_n_cores(); >> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); >> + wanted_txqs++; > > So just after this line there is a call > > HMAP_FOR_EACH (port, node, &dp->ports) { > netdev_set_tx_multiq(port->netdev, wanted_txqs); > } > > My initial concern with this patch was twofold: > > (i) What would happen if the number of cores was greater than the number of supported queues. > > That's not an issue from what I can see as the requested_txqs will be compared to what's supported by the device and the smaller of the two will be chosen in the eventual call to dpdk_eth_dev_init(): > > n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > > (ii) What happens when a device reports the incorrect number of max tx queues? (can occur depending on the mode of the device). > > I think this case is ok as well as it's handled by the existing txq retry logic in dpdk_eth_dev_queue_setup(). Actually these concerns are not about this patch. Such situations are possible on current master. But you're right, both of them handled properly with existing code according to conventions between dpif-netdev and netdev layers. >> /* The number of pmd threads might have changed, or a port can be >> new: >> * adjust the txqs. */ >> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) >> >> /* Check for all the ports that need reconfiguration. We cache this >> in >> * 'port->reconfigure', because netdev_is_reconf_required() can >> change at >> - * any time. */ >> + * any time. >> + * Also mark for reconfiguration all ports which will likely change >> their >> + * 'dynamic_txqs' parameter. It's required to stop using them before >> + * changing this setting and it's simpler to mark ports here and >> allow >> + * 'pmd_remove_stale_ports' to remove them from threads. There will >> be >> + * no actual reconfiguration in 'port_reconfigure' because it's >> + * unnecessary. */ >> HMAP_FOR_EACH (port, node, &dp->ports) { >> - if (netdev_is_reconf_required(port->netdev)) { >> + if (netdev_is_reconf_required(port->netdev) >> + || (port->dynamic_txqs >> + != netdev_n_txq(port->netdev) < needed_txqs)) { > > GCC caught the following > > lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison in operand of '!=' [-Werror=parentheses] > != netdev_n_txq(port->netdev) < needed_txqs)) { It's unnecessary because equality operators has higher priority than relational, but I can add parentheses if compiler complains. What version of GCC you're using? I didn't see such warnings on my systems. >> port->need_reconfigure = true; >> } >> } >> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) >> seq_change(dp->port_seq); >> port_destroy(port); >> } else { >> - port->dynamic_txqs = netdev_n_txq(port->netdev) < >> wanted_txqs; >> + port->dynamic_txqs = netdev_n_txq(port->netdev) < >> + needed_txqs; >> } >> } >> >> -- >> 2.7.4
> -----Original Message----- > From: Ilya Maximets [mailto:i.maximets@samsung.com] > Sent: Thursday, June 15, 2017 9:45 AM > To: Stokes, Ian <ian.stokes@intel.com>; dev@openvswitch.org; Darrell Ball > <dball@vmware.com> > Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Daniele Di Proietto > <diproiettod@ovn.org>; Ben Pfaff <blp@ovn.org>; Pravin Shelar > <pshelar@ovn.org>; Loftus, Ciara <ciara.loftus@intel.com>; Kevin Traynor > <ktraynor@redhat.com> > Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on > pmd-cpu-mask changes. > > On 15.06.2017 11:02, Stokes, Ian wrote: > >> Reconfiguration of HW NICs may lead to packet drops. > >> In current model all physical ports will be reconfigured each time > >> number of PMD threads changed. Since we not stopping threads on > >> pmd-cpu-mask changes, this patch will help to further decrease port's > >> downtime by setting the maximum possible number of wanted tx queues > >> to avoid unnecessary reconfigurations. > > > > Hi Ilya, > > > > Thanks for the patch, in testing I can confirm it resolves the issue. A > few observations and comments below. > > Hi Ian. Thanks for testing. > > > > >> > >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >> --- > >> lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > >> 1 file changed, 21 insertions(+), 5 deletions(-) > >> > >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > >> 596d133..79db770 > >> 100644 > >> --- a/lib/dpif-netdev.c > >> +++ b/lib/dpif-netdev.c > >> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { > >> struct dp_netdev_pmd_thread *pmd; > >> struct dp_netdev_port *port; > >> - int wanted_txqs; > >> + int needed_txqs, wanted_txqs; > >> > >> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > >> > >> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > >> * on the system and the user configuration. */ > >> reconfigure_pmd_threads(dp); > >> > >> - wanted_txqs = cmap_count(&dp->poll_threads); > >> + /* We need 1 Tx queue for each thread to avoid locking, but we > >> + will > >> try > >> + * to allocate the maximum possible value to minimize the number > >> + of > >> port > >> + * reconfigurations. */ > >> + needed_txqs = cmap_count(&dp->poll_threads); > >> + /* (n_cores + 1) is the maximum that we might need to have. > >> + * Additional queue is for non-PMD threads. */ > >> + wanted_txqs = ovs_numa_get_n_cores(); > >> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > >> + wanted_txqs++; > > > > So just after this line there is a call > > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > netdev_set_tx_multiq(port->netdev, wanted_txqs); > > } > > > > My initial concern with this patch was twofold: > > > > (i) What would happen if the number of cores was greater than the number > of supported queues. > > > > That's not an issue from what I can see as the requested_txqs will be > compared to what's supported by the device and the smaller of the two will > be chosen in the eventual call to dpdk_eth_dev_init(): > > > > n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > > > > (ii) What happens when a device reports the incorrect number of max tx > queues? (can occur depending on the mode of the device). > > > > I think this case is ok as well as it's handled by the existing txq > retry logic in dpdk_eth_dev_queue_setup(). > > Actually these concerns are not about this patch. Such situations are > possible on current master. But you're right, both of them handled > properly with existing code according to conventions between dpif-netdev > and netdev layers. Sure, they are not direct concerns as they could be reproduced on master as is, but moving to using core count can potentially cause the number of txqs requests to be larger than what's supported, for example with hyper threading a system can appear with 64 cores & will require 65 txqs queues, depending on the DPDK device and mode this can increase the chance of requesting more txqs than the device supports (instead of the current model of requesting the number of txqs as the number polling threads). Either way the code is in place to handle such a situation so I think it's fine. > > >> /* The number of pmd threads might have changed, or a port can > >> be > >> new: > >> * adjust the txqs. */ > >> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > >> > >> /* Check for all the ports that need reconfiguration. We cache > >> this in > >> * 'port->reconfigure', because netdev_is_reconf_required() can > >> change at > >> - * any time. */ > >> + * any time. > >> + * Also mark for reconfiguration all ports which will likely > >> + change > >> their > >> + * 'dynamic_txqs' parameter. It's required to stop using them > before > >> + * changing this setting and it's simpler to mark ports here and > >> allow > >> + * 'pmd_remove_stale_ports' to remove them from threads. There > >> + will > >> be > >> + * no actual reconfiguration in 'port_reconfigure' because it's > >> + * unnecessary. */ > >> HMAP_FOR_EACH (port, node, &dp->ports) { > >> - if (netdev_is_reconf_required(port->netdev)) { > >> + if (netdev_is_reconf_required(port->netdev) > >> + || (port->dynamic_txqs > >> + != netdev_n_txq(port->netdev) < needed_txqs)) { > > > > GCC caught the following > > > > lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison > in operand of '!=' [-Werror=parentheses] > > != netdev_n_txq(port->netdev) < needed_txqs)) { > > It's unnecessary because equality operators has higher priority than > relational, but I can add parentheses if compiler complains. > > What version of GCC you're using? I didn't see such warnings on my > systems. gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC) > > >> port->need_reconfigure = true; > >> } > >> } > >> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > >> seq_change(dp->port_seq); > >> port_destroy(port); > >> } else { > >> - port->dynamic_txqs = netdev_n_txq(port->netdev) < > >> wanted_txqs; > >> + port->dynamic_txqs = netdev_n_txq(port->netdev) < > >> + needed_txqs; > >> } > >> } > >> > >> -- > >> 2.7.4
On 15.06.2017 12:03, Stokes, Ian wrote: >> -----Original Message----- >> From: Ilya Maximets [mailto:i.maximets@samsung.com] >> Sent: Thursday, June 15, 2017 9:45 AM >> To: Stokes, Ian <ian.stokes@intel.com>; dev@openvswitch.org; Darrell Ball >> <dball@vmware.com> >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Daniele Di Proietto >> <diproiettod@ovn.org>; Ben Pfaff <blp@ovn.org>; Pravin Shelar >> <pshelar@ovn.org>; Loftus, Ciara <ciara.loftus@intel.com>; Kevin Traynor >> <ktraynor@redhat.com> >> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on >> pmd-cpu-mask changes. >> >> On 15.06.2017 11:02, Stokes, Ian wrote: >>>> Reconfiguration of HW NICs may lead to packet drops. >>>> In current model all physical ports will be reconfigured each time >>>> number of PMD threads changed. Since we not stopping threads on >>>> pmd-cpu-mask changes, this patch will help to further decrease port's >>>> downtime by setting the maximum possible number of wanted tx queues >>>> to avoid unnecessary reconfigurations. >>> >>> Hi Ilya, >>> >>> Thanks for the patch, in testing I can confirm it resolves the issue. A >> few observations and comments below. >> >> Hi Ian. Thanks for testing. >> >>> >>>> >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >>>> --- >>>> lib/dpif-netdev.c | 26 +++++++++++++++++++++----- >>>> 1 file changed, 21 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >>>> 596d133..79db770 >>>> 100644 >>>> --- a/lib/dpif-netdev.c >>>> +++ b/lib/dpif-netdev.c >>>> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { >>>> struct dp_netdev_pmd_thread *pmd; >>>> struct dp_netdev_port *port; >>>> - int wanted_txqs; >>>> + int needed_txqs, wanted_txqs; >>>> >>>> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); >>>> >>>> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) >>>> * on the system and the user configuration. */ >>>> reconfigure_pmd_threads(dp); >>>> >>>> - wanted_txqs = cmap_count(&dp->poll_threads); >>>> + /* We need 1 Tx queue for each thread to avoid locking, but we >>>> + will >>>> try >>>> + * to allocate the maximum possible value to minimize the number >>>> + of >>>> port >>>> + * reconfigurations. */ >>>> + needed_txqs = cmap_count(&dp->poll_threads); >>>> + /* (n_cores + 1) is the maximum that we might need to have. >>>> + * Additional queue is for non-PMD threads. */ >>>> + wanted_txqs = ovs_numa_get_n_cores(); >>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); >>>> + wanted_txqs++; >>> >>> So just after this line there is a call >>> >>> HMAP_FOR_EACH (port, node, &dp->ports) { >>> netdev_set_tx_multiq(port->netdev, wanted_txqs); >>> } >>> >>> My initial concern with this patch was twofold: >>> >>> (i) What would happen if the number of cores was greater than the number >> of supported queues. >>> >>> That's not an issue from what I can see as the requested_txqs will be >> compared to what's supported by the device and the smaller of the two will >> be chosen in the eventual call to dpdk_eth_dev_init(): >>> >>> n_txq = MIN(info.max_tx_queues, dev->up.n_txq); >>> >>> (ii) What happens when a device reports the incorrect number of max tx >> queues? (can occur depending on the mode of the device). >>> >>> I think this case is ok as well as it's handled by the existing txq >> retry logic in dpdk_eth_dev_queue_setup(). >> >> Actually these concerns are not about this patch. Such situations are >> possible on current master. But you're right, both of them handled >> properly with existing code according to conventions between dpif-netdev >> and netdev layers. > > Sure, they are not direct concerns as they could be reproduced on master as is, but moving to using core count can potentially cause the number of txqs requests to be larger than what's supported, for example with hyper threading a system can appear with 64 cores & will require 65 txqs queues, depending on the DPDK device and mode this can increase the chance of requesting more txqs than the device supports (instead of the current model of requesting the number of txqs as the number polling threads). > > Either way the code is in place to handle such a situation so I think it's fine. >> >>>> /* The number of pmd threads might have changed, or a port can >>>> be >>>> new: >>>> * adjust the txqs. */ >>>> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) >>>> >>>> /* Check for all the ports that need reconfiguration. We cache >>>> this in >>>> * 'port->reconfigure', because netdev_is_reconf_required() can >>>> change at >>>> - * any time. */ >>>> + * any time. >>>> + * Also mark for reconfiguration all ports which will likely >>>> + change >>>> their >>>> + * 'dynamic_txqs' parameter. It's required to stop using them >> before >>>> + * changing this setting and it's simpler to mark ports here and >>>> allow >>>> + * 'pmd_remove_stale_ports' to remove them from threads. There >>>> + will >>>> be >>>> + * no actual reconfiguration in 'port_reconfigure' because it's >>>> + * unnecessary. */ >>>> HMAP_FOR_EACH (port, node, &dp->ports) { >>>> - if (netdev_is_reconf_required(port->netdev)) { >>>> + if (netdev_is_reconf_required(port->netdev) >>>> + || (port->dynamic_txqs >>>> + != netdev_n_txq(port->netdev) < needed_txqs)) { >>> >>> GCC caught the following >>> >>> lib/dpif-netdev.c:3448:48: error: suggest parentheses around comparison >> in operand of '!=' [-Werror=parentheses] >>> != netdev_n_txq(port->netdev) < needed_txqs)) { >> >> It's unnecessary because equality operators has higher priority than >> relational, but I can add parentheses if compiler complains. >> >> What version of GCC you're using? I didn't see such warnings on my >> systems. > > gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC) Ok. I'm going to make v3 with additional parentheses and maybe enhanced comments in the first patch. If you don't mind I'll add you to Tested-by tag in v3, since there will only be minor code change. >> >>>> port->need_reconfigure = true; >>>> } >>>> } >>>> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) >>>> seq_change(dp->port_seq); >>>> port_destroy(port); >>>> } else { >>>> - port->dynamic_txqs = netdev_n_txq(port->netdev) < >>>> wanted_txqs; >>>> + port->dynamic_txqs = netdev_n_txq(port->netdev) < >>>> + needed_txqs; >>>> } >>>> } >>>> >>>> -- >>>> 2.7.4 > > >
> >> -----Original Message----- > >> From: Ilya Maximets [mailto:i.maximets@samsung.com] > >> Sent: Thursday, June 15, 2017 9:45 AM > >> To: Stokes, Ian <ian.stokes@intel.com>; dev@openvswitch.org; Darrell > >> Ball <dball@vmware.com> > >> Cc: Heetae Ahn <heetae82.ahn@samsung.com>; Daniele Di Proietto > >> <diproiettod@ovn.org>; Ben Pfaff <blp@ovn.org>; Pravin Shelar > >> <pshelar@ovn.org>; Loftus, Ciara <ciara.loftus@intel.com>; Kevin > >> Traynor <ktraynor@redhat.com> > >> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration > >> on pmd-cpu-mask changes. > >> > >> On 15.06.2017 11:02, Stokes, Ian wrote: > >>>> Reconfiguration of HW NICs may lead to packet drops. > >>>> In current model all physical ports will be reconfigured each time > >>>> number of PMD threads changed. Since we not stopping threads on > >>>> pmd-cpu-mask changes, this patch will help to further decrease > >>>> port's downtime by setting the maximum possible number of wanted tx > >>>> queues to avoid unnecessary reconfigurations. > >>> > >>> Hi Ilya, > >>> > >>> Thanks for the patch, in testing I can confirm it resolves the > >>> issue. A > >> few observations and comments below. > >> > >> Hi Ian. Thanks for testing. > >> > >>> > >>>> > >>>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >>>> --- > >>>> lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > >>>> 1 file changed, 21 insertions(+), 5 deletions(-) > >>>> > >>>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index > >>>> 596d133..79db770 > >>>> 100644 > >>>> --- a/lib/dpif-netdev.c > >>>> +++ b/lib/dpif-netdev.c > >>>> @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { > >>>> struct dp_netdev_pmd_thread *pmd; > >>>> struct dp_netdev_port *port; > >>>> - int wanted_txqs; > >>>> + int needed_txqs, wanted_txqs; > >>>> > >>>> dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > >>>> > >>>> @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > >>>> * on the system and the user configuration. */ > >>>> reconfigure_pmd_threads(dp); > >>>> > >>>> - wanted_txqs = cmap_count(&dp->poll_threads); > >>>> + /* We need 1 Tx queue for each thread to avoid locking, but we > >>>> + will > >>>> try > >>>> + * to allocate the maximum possible value to minimize the > >>>> + number of > >>>> port > >>>> + * reconfigurations. */ > >>>> + needed_txqs = cmap_count(&dp->poll_threads); > >>>> + /* (n_cores + 1) is the maximum that we might need to have. > >>>> + * Additional queue is for non-PMD threads. */ > >>>> + wanted_txqs = ovs_numa_get_n_cores(); > >>>> + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > >>>> + wanted_txqs++; > >>> > >>> So just after this line there is a call > >>> > >>> HMAP_FOR_EACH (port, node, &dp->ports) { > >>> netdev_set_tx_multiq(port->netdev, wanted_txqs); > >>> } > >>> > >>> My initial concern with this patch was twofold: > >>> > >>> (i) What would happen if the number of cores was greater than the > >>> number > >> of supported queues. > >>> > >>> That's not an issue from what I can see as the requested_txqs will > >>> be > >> compared to what's supported by the device and the smaller of the two > >> will be chosen in the eventual call to dpdk_eth_dev_init(): > >>> > >>> n_txq = MIN(info.max_tx_queues, dev->up.n_txq); > >>> > >>> (ii) What happens when a device reports the incorrect number of max > >>> tx > >> queues? (can occur depending on the mode of the device). > >>> > >>> I think this case is ok as well as it's handled by the existing txq > >> retry logic in dpdk_eth_dev_queue_setup(). > >> > >> Actually these concerns are not about this patch. Such situations are > >> possible on current master. But you're right, both of them handled > >> properly with existing code according to conventions between > >> dpif-netdev and netdev layers. > > > > Sure, they are not direct concerns as they could be reproduced on master > as is, but moving to using core count can potentially cause the number of > txqs requests to be larger than what's supported, for example with hyper > threading a system can appear with 64 cores & will require 65 txqs queues, > depending on the DPDK device and mode this can increase the chance of > requesting more txqs than the device supports (instead of the current > model of requesting the number of txqs as the number polling threads). > > > > Either way the code is in place to handle such a situation so I think > it's fine. > >> > >>>> /* The number of pmd threads might have changed, or a port can > >>>> be > >>>> new: > >>>> * adjust the txqs. */ > >>>> @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > >>>> > >>>> /* Check for all the ports that need reconfiguration. We > >>>> cache this in > >>>> * 'port->reconfigure', because netdev_is_reconf_required() > >>>> can change at > >>>> - * any time. */ > >>>> + * any time. > >>>> + * Also mark for reconfiguration all ports which will likely > >>>> + change > >>>> their > >>>> + * 'dynamic_txqs' parameter. It's required to stop using them > >> before > >>>> + * changing this setting and it's simpler to mark ports here > >>>> + and > >>>> allow > >>>> + * 'pmd_remove_stale_ports' to remove them from threads. There > >>>> + will > >>>> be > >>>> + * no actual reconfiguration in 'port_reconfigure' because it's > >>>> + * unnecessary. */ > >>>> HMAP_FOR_EACH (port, node, &dp->ports) { > >>>> - if (netdev_is_reconf_required(port->netdev)) { > >>>> + if (netdev_is_reconf_required(port->netdev) > >>>> + || (port->dynamic_txqs > >>>> + != netdev_n_txq(port->netdev) < needed_txqs)) { > >>> > >>> GCC caught the following > >>> > >>> lib/dpif-netdev.c:3448:48: error: suggest parentheses around > >>> comparison > >> in operand of '!=' [-Werror=parentheses] > >>> != netdev_n_txq(port->netdev) < needed_txqs)) { > >> > >> It's unnecessary because equality operators has higher priority than > >> relational, but I can add parentheses if compiler complains. > >> > >> What version of GCC you're using? I didn't see such warnings on my > >> systems. > > > > gcc version 6.3.1 20161221 (Red Hat 6.3.1-1) (GCC) > > Ok. I'm going to make v3 with additional parentheses and maybe enhanced > comments in the first patch. > > If you don't mind I'll add you to Tested-by tag in v3, since there will > only be minor code change. Sure, sounds good. Thanks Ian > > >> > >>>> port->need_reconfigure = true; > >>>> } > >>>> } > >>>> @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > >>>> seq_change(dp->port_seq); > >>>> port_destroy(port); > >>>> } else { > >>>> - port->dynamic_txqs = netdev_n_txq(port->netdev) < > >>>> wanted_txqs; > >>>> + port->dynamic_txqs = netdev_n_txq(port->netdev) < > >>>> + needed_txqs; > >>>> } > >>>> } > >>>> > >>>> -- > >>>> 2.7.4 > > > > > >
On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote:
Reconfiguration of HW NICs may lead to packet drops.
In current model all physical ports will be reconfigured each
time number of PMD threads changed. Since we not stopping
threads on pmd-cpu-mask changes, this patch will help to further
decrease port's downtime by setting the maximum possible number
of wanted tx queues to avoid unnecessary reconfigurations.
Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---
lib/dpif-netdev.c | 26 +++++++++++++++++++++-----
1 file changed, 21 insertions(+), 5 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 596d133..79db770 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp)
{
struct dp_netdev_pmd_thread *pmd;
struct dp_netdev_port *port;
- int wanted_txqs;
+ int needed_txqs, wanted_txqs;
dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq);
@@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp)
* on the system and the user configuration. */
reconfigure_pmd_threads(dp);
- wanted_txqs = cmap_count(&dp->poll_threads);
+ /* We need 1 Tx queue for each thread to avoid locking, but we will try
+ * to allocate the maximum possible value to minimize the number of port
+ * reconfigurations. */
+ needed_txqs = cmap_count(&dp->poll_threads);
+ /* (n_cores + 1) is the maximum that we might need to have.
+ * Additional queue is for non-PMD threads. */
+ wanted_txqs = ovs_numa_get_n_cores();
+ ovs_assert(wanted_txqs != OVS_CORE_UNSPEC);
+ wanted_txqs++;
I don’t think PMD mask changes are common, so this patch is trying to optimize a
rare disruptive event that can/will be scheduled by the administrator.
Based on the actual number of queues supported and the number of cores present,
this optimization may or may not work. It is unpredictable whether there will be benefit
in a particular case from the user POV.
If I were the administrator, I would try to error on the conservative side anyways if I could
not predict the result.
Did I miss something ?
/* The number of pmd threads might have changed, or a port can be new:
* adjust the txqs. */
@@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp)
/* Check for all the ports that need reconfiguration. We cache this in
* 'port->reconfigure', because netdev_is_reconf_required() can change at
- * any time. */
+ * any time.
+ * Also mark for reconfiguration all ports which will likely change their
+ * 'dynamic_txqs' parameter. It's required to stop using them before
+ * changing this setting and it's simpler to mark ports here and allow
+ * 'pmd_remove_stale_ports' to remove them from threads. There will be
+ * no actual reconfiguration in 'port_reconfigure' because it's
+ * unnecessary. */
HMAP_FOR_EACH (port, node, &dp->ports) {
- if (netdev_is_reconf_required(port->netdev)) {
+ if (netdev_is_reconf_required(port->netdev)
+ || (port->dynamic_txqs
+ != netdev_n_txq(port->netdev) < needed_txqs)) {
port->need_reconfigure = true;
}
}
@@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp)
seq_change(dp->port_seq);
port_destroy(port);
} else {
- port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs;
+ port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs;
}
}
--
2.7.4
On 07.07.2017 08:08, Darrell Ball wrote: > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > Reconfiguration of HW NICs may lead to packet drops. > In current model all physical ports will be reconfigured each > time number of PMD threads changed. Since we not stopping > threads on pmd-cpu-mask changes, this patch will help to further > decrease port's downtime by setting the maximum possible number > of wanted tx queues to avoid unnecessary reconfigurations. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 596d133..79db770 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > { > struct dp_netdev_pmd_thread *pmd; > struct dp_netdev_port *port; > - int wanted_txqs; > + int needed_txqs, wanted_txqs; > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > * on the system and the user configuration. */ > reconfigure_pmd_threads(dp); > > - wanted_txqs = cmap_count(&dp->poll_threads); > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > + * to allocate the maximum possible value to minimize the number of port > + * reconfigurations. */ > + needed_txqs = cmap_count(&dp->poll_threads); > + /* (n_cores + 1) is the maximum that we might need to have. > + * Additional queue is for non-PMD threads. */ > + wanted_txqs = ovs_numa_get_n_cores(); > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > + wanted_txqs++; > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > rare disruptive event that can/will be scheduled by the administrator. > > Based on the actual number of queues supported and the number of cores present, > this optimization may or may not work. It is unpredictable whether there will be benefit > in a particular case from the user POV. > If I were the administrator, I would try to error on the conservative side anyways if I could > not predict the result. > > Did I miss something ? In NFV environment if you want to add one more VM to your hosts you will have to choose between: * not creating a new PMD thread -> performance degradation of networking for other working VMs * adding of the new PMD thread -> desruption of the networking for the whole host for the time of OVS reconfiguration. This patch removes the cost of the second option. I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear and this optimization will work, except for HW NICs with very low number of HW queues. But situation will never be worse than without this patch anyway. About PMD mask changes: Even one change in a week can be disruptive in high-loaded NFV environment. Also, this changes can be performed automatically by monitoring tools while VMs migrations or rebalancing the load between PMD threads to free the CPU cores for other applications/VMs. > > /* The number of pmd threads might have changed, or a port can be new: > * adjust the txqs. */ > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > /* Check for all the ports that need reconfiguration. We cache this in > * 'port->reconfigure', because netdev_is_reconf_required() can change at > - * any time. */ > + * any time. > + * Also mark for reconfiguration all ports which will likely change their > + * 'dynamic_txqs' parameter. It's required to stop using them before > + * changing this setting and it's simpler to mark ports here and allow > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > + * no actual reconfiguration in 'port_reconfigure' because it's > + * unnecessary. */ > HMAP_FOR_EACH (port, node, &dp->ports) { > - if (netdev_is_reconf_required(port->netdev)) { > + if (netdev_is_reconf_required(port->netdev) > + || (port->dynamic_txqs > + != netdev_n_txq(port->netdev) < needed_txqs)) { > port->need_reconfigure = true; > } > } > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > seq_change(dp->port_seq); > port_destroy(port); > } else { > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > } > } > > -- > 2.7.4 > > >
On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maximets@samsung.com> wrote: On 07.07.2017 08:08, Darrell Ball wrote: > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > Reconfiguration of HW NICs may lead to packet drops. > In current model all physical ports will be reconfigured each > time number of PMD threads changed. Since we not stopping > threads on pmd-cpu-mask changes, this patch will help to further > decrease port's downtime by setting the maximum possible number > of wanted tx queues to avoid unnecessary reconfigurations. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 596d133..79db770 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > { > struct dp_netdev_pmd_thread *pmd; > struct dp_netdev_port *port; > - int wanted_txqs; > + int needed_txqs, wanted_txqs; > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > * on the system and the user configuration. */ > reconfigure_pmd_threads(dp); > > - wanted_txqs = cmap_count(&dp->poll_threads); > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > + * to allocate the maximum possible value to minimize the number of port > + * reconfigurations. */ > + needed_txqs = cmap_count(&dp->poll_threads); > + /* (n_cores + 1) is the maximum that we might need to have. > + * Additional queue is for non-PMD threads. */ > + wanted_txqs = ovs_numa_get_n_cores(); > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > + wanted_txqs++; > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > rare disruptive event that can/will be scheduled by the administrator. > > Based on the actual number of queues supported and the number of cores present, > this optimization may or may not work. It is unpredictable whether there will be benefit > in a particular case from the user POV. > If I were the administrator, I would try to error on the conservative side anyways if I could > not predict the result. > > Did I miss something ? In NFV environment if you want to add one more VM to your hosts you will have to choose between: * not creating a new PMD thread -> performance degradation of networking for other working VMs * adding of the new PMD thread -> desruption of the networking for the whole host for the time of OVS reconfiguration. This patch removes the cost of the second option. In general, adding a PMD thread per VM may not always (or even usually) make sense. There are use cases for sure, but using a PMD core per VM is often viewed as dpdk using too much cpu resources and limiting the adoption of dpdk. Furthermore, for dpdk gateways it is irrelevant. I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear and this optimization will work, except for HW NICs with very low number of HW queues. HW nic interfaces carry aggregated traffic for multiple VMs etc, so these cases are most important. It is the ratio of the number of cores to hw queues that matter. If queues are less than cores, then the benefit may not occur. This patch makes assumptions about the relative number of cores vs the number of queues. These two parameters should be treated as independent – this part really bothers me. I think we could solve this in a coordinated generic fashion with a user management command that would also give feedback to the user as to the possible success and maybe provide options. I wonder if we will eventually go in that direction, in the future, anyways ? But situation will never be worse than without this patch anyway. About PMD mask changes: Even one change in a week can be disruptive in high-loaded NFV environment. Also, this changes can be performed automatically by monitoring tools while VMs migrations or rebalancing the load between PMD threads to free the CPU cores for other applications/VMs. > > /* The number of pmd threads might have changed, or a port can be new: > * adjust the txqs. */ > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > /* Check for all the ports that need reconfiguration. We cache this in > * 'port->reconfigure', because netdev_is_reconf_required() can change at > - * any time. */ > + * any time. > + * Also mark for reconfiguration all ports which will likely change their > + * 'dynamic_txqs' parameter. It's required to stop using them before > + * changing this setting and it's simpler to mark ports here and allow > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > + * no actual reconfiguration in 'port_reconfigure' because it's > + * unnecessary. */ > HMAP_FOR_EACH (port, node, &dp->ports) { > - if (netdev_is_reconf_required(port->netdev)) { > + if (netdev_is_reconf_required(port->netdev) > + || (port->dynamic_txqs > + != netdev_n_txq(port->netdev) < needed_txqs)) { > port->need_reconfigure = true; > } > } > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > seq_change(dp->port_seq); > port_destroy(port); > } else { > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > } > } > > -- > 2.7.4 > > >
On 07.07.2017 21:09, Darrell Ball wrote: > > > On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > On 07.07.2017 08:08, Darrell Ball wrote: > > > > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > Reconfiguration of HW NICs may lead to packet drops. > > In current model all physical ports will be reconfigured each > > time number of PMD threads changed. Since we not stopping > > threads on pmd-cpu-mask changes, this patch will help to further > > decrease port's downtime by setting the maximum possible number > > of wanted tx queues to avoid unnecessary reconfigurations. > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > --- > > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 596d133..79db770 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > { > > struct dp_netdev_pmd_thread *pmd; > > struct dp_netdev_port *port; > > - int wanted_txqs; > > + int needed_txqs, wanted_txqs; > > > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > > * on the system and the user configuration. */ > > reconfigure_pmd_threads(dp); > > > > - wanted_txqs = cmap_count(&dp->poll_threads); > > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > > + * to allocate the maximum possible value to minimize the number of port > > + * reconfigurations. */ > > + needed_txqs = cmap_count(&dp->poll_threads); > > + /* (n_cores + 1) is the maximum that we might need to have. > > + * Additional queue is for non-PMD threads. */ > > + wanted_txqs = ovs_numa_get_n_cores(); > > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > > + wanted_txqs++; > > > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > > rare disruptive event that can/will be scheduled by the administrator. > > > > Based on the actual number of queues supported and the number of cores present, > > this optimization may or may not work. It is unpredictable whether there will be benefit > > in a particular case from the user POV. > > If I were the administrator, I would try to error on the conservative side anyways if I could > > not predict the result. > > > > Did I miss something ? > > In NFV environment if you want to add one more VM to your hosts you will have to > choose between: > > * not creating a new PMD thread > -> performance degradation of networking for other working VMs > > > * adding of the new PMD thread > -> desruption of the networking for the whole host for the time > of OVS reconfiguration. > > This patch removes the cost of the second option. > > In general, adding a PMD thread per VM may not always (or even usually) make sense. Not per VM. Lets assume that all existing threads are already overloaded. > There are use cases for sure, but using a PMD core per VM is often viewed as dpdk using too much > cpu resources and limiting the adoption of dpdk. > > Furthermore, for dpdk gateways it is irrelevant. Disagree with that statement. > > I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear > and this optimization will work, except for HW NICs with very low number of HW queues. > > HW nic interfaces carry aggregated traffic for multiple VMs etc, so these cases are most important. And that is the point why this patch implemented. This patch tries to minimize the packet drops on the "shared" between VMs HW NICs. It allows to add new threads without breaking networking for others not reconfiguring device that handles traffic for other VMs at this moment. > It is the ratio of the number of cores to hw queues that matter. If queues are less than > cores, then the benefit may not occur. That's right. But are you saying that we should not apply optimization if it covers only 50% of cases? (There is bigger percentage, actually, because modern NICs supports pretty much tx queues) > This patch makes assumptions about the relative number of cores vs the number of queues. > These two parameters should be treated as independent – this part really bothers me. It doesn't make any assumptions, it just optimizes one of 2 possible situations. If the number of queues will be less, the behaviour will be exactly same as in current master. > I think we could solve this in a coordinated generic fashion with a user management command > that would also give feedback to the user as to the possible success and maybe provide options. > I wonder if we will eventually go in that direction, in the future, anyways ? I don't think so. We don't need to have user management command because the only sane values for the number of queues right now is the number of threads or the maximum number of available queues in hardware. We're using the first one only because HW is not able to say how many queues it able to successfully configure. This patch allows us to use the second value in all cases where possible. > > But situation will never be worse than without this patch anyway. > > About PMD mask changes: > Even one change in a week can be disruptive in high-loaded NFV environment. > Also, this changes can be performed automatically by monitoring tools while > VMs migrations or rebalancing the load between PMD threads to free the CPU > cores for other applications/VMs. > > > > > /* The number of pmd threads might have changed, or a port can be new: > > * adjust the txqs. */ > > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > /* Check for all the ports that need reconfiguration. We cache this in > > * 'port->reconfigure', because netdev_is_reconf_required() can change at > > - * any time. */ > > + * any time. > > + * Also mark for reconfiguration all ports which will likely change their > > + * 'dynamic_txqs' parameter. It's required to stop using them before > > + * changing this setting and it's simpler to mark ports here and allow > > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > > + * no actual reconfiguration in 'port_reconfigure' because it's > > + * unnecessary. */ > > HMAP_FOR_EACH (port, node, &dp->ports) { > > - if (netdev_is_reconf_required(port->netdev)) { > > + if (netdev_is_reconf_required(port->netdev) > > + || (port->dynamic_txqs > > + != netdev_n_txq(port->netdev) < needed_txqs)) { > > port->need_reconfigure = true; > > } > > } > > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > seq_change(dp->port_seq); > > port_destroy(port); > > } else { > > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > > } > > } > > > > -- > > 2.7.4 > > > > > > > >
On 7/10/17, 12:41 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: On 07.07.2017 21:09, Darrell Ball wrote: > > > On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > On 07.07.2017 08:08, Darrell Ball wrote: > > > > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > Reconfiguration of HW NICs may lead to packet drops. > > In current model all physical ports will be reconfigured each > > time number of PMD threads changed. Since we not stopping > > threads on pmd-cpu-mask changes, this patch will help to further > > decrease port's downtime by setting the maximum possible number > > of wanted tx queues to avoid unnecessary reconfigurations. > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > --- > > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 596d133..79db770 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > { > > struct dp_netdev_pmd_thread *pmd; > > struct dp_netdev_port *port; > > - int wanted_txqs; > > + int needed_txqs, wanted_txqs; > > > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > > * on the system and the user configuration. */ > > reconfigure_pmd_threads(dp); > > > > - wanted_txqs = cmap_count(&dp->poll_threads); > > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > > + * to allocate the maximum possible value to minimize the number of port > > + * reconfigurations. */ > > + needed_txqs = cmap_count(&dp->poll_threads); > > + /* (n_cores + 1) is the maximum that we might need to have. > > + * Additional queue is for non-PMD threads. */ > > + wanted_txqs = ovs_numa_get_n_cores(); > > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > > + wanted_txqs++; > > > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > > rare disruptive event that can/will be scheduled by the administrator. > > > > Based on the actual number of queues supported and the number of cores present, > > this optimization may or may not work. It is unpredictable whether there will be benefit > > in a particular case from the user POV. > > If I were the administrator, I would try to error on the conservative side anyways if I could > > not predict the result. > > > > Did I miss something ? > > In NFV environment if you want to add one more VM to your hosts you will have to > choose between: > > * not creating a new PMD thread > -> performance degradation of networking for other working VMs > > > * adding of the new PMD thread > -> desruption of the networking for the whole host for the time > of OVS reconfiguration. > > This patch removes the cost of the second option. > > In general, adding a PMD thread per VM may not always (or even usually) make sense. Not per VM. Lets assume that all existing threads are already overloaded. That would be less common; it is not the usual case. > There are use cases for sure, but using a PMD core per VM is often viewed as dpdk using too much > cpu resources and limiting the adoption of dpdk. > > Furthermore, for dpdk gateways it is irrelevant. Disagree with that statement. dpdk gateways should not usually require changing pmd-cpu-mask when a VM is added to hypervisor somewhere. Sometimes you need to deploy more resources for sure, but this should be much less frequent than adding or removing VMs. The point is the user should be rarely changing the pmd-cpu-mask and this can be done during a maintenance window when needed. > > I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear > and this optimization will work, except for HW NICs with very low number of HW queues. > > HW nic interfaces carry aggregated traffic for multiple VMs etc, so these cases are most important. And that is the point why this patch implemented. This patch tries to minimize the packet drops on the "shared" between VMs HW NICs. It allows to add new threads without breaking networking for others not reconfiguring device that handles traffic for other VMs at this moment. I understand what this patch wants to do and I understand what the last designs wanted to do as well. > It is the ratio of the number of cores to hw queues that matter. If queues are less than > cores, then the benefit may not occur. That's right. But are you saying that we should not apply optimization if it covers only 50% of cases? (There is bigger percentage, actually, because modern NICs supports pretty much tx queues) Nobody knows what the number is, but I know that the number of cpus is increasing and making assumptions about the relative numbers of cores vs queues can only work by luck. It is inherently unpredictable and unpredictable for a user initiated/controlled change is ugly. This is even worse when the user initiated change is rare because the user does not remember the full context of the command. > This patch makes assumptions about the relative number of cores vs the number of queues. > These two parameters should be treated as independent – this part really bothers me. It doesn't make any assumptions, it just optimizes one of 2 possible situations. If the number of queues will be less, the behaviour will be exactly same as in current master. We are repeating the same thing. > I think we could solve this in a coordinated generic fashion with a user management command > that would also give feedback to the user as to the possible success and maybe provide options. > I wonder if we will eventually go in that direction, in the future, anyways ? I don't think so. We don't need to have user management command because the only sane values for the number of queues right now is the number of threads or the maximum number of available queues in hardware. We're using the first one only because HW is not able to say how many queues it able to successfully configure. The fact that hardware cannot always report the correct number of queues is an issue that will get solved over time and should be part of what a user command reports (eg) “unknown queue number, therefore it is unknown whether a pmd-cpu-mask change will be hitless”, issue the following command ‘X’ when you would like to make the change”). A wrapper user command(s) would provide visibility and predictability to the user. i.e. we can do better. This patch allows us to use the second value in all cases where possible. > > But situation will never be worse than without this patch anyway. > > About PMD mask changes: > Even one change in a week can be disruptive in high-loaded NFV environment. > Also, this changes can be performed automatically by monitoring tools while > VMs migrations or rebalancing the load between PMD threads to free the CPU > cores for other applications/VMs. > > > > > /* The number of pmd threads might have changed, or a port can be new: > > * adjust the txqs. */ > > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > /* Check for all the ports that need reconfiguration. We cache this in > > * 'port->reconfigure', because netdev_is_reconf_required() can change at > > - * any time. */ > > + * any time. > > + * Also mark for reconfiguration all ports which will likely change their > > + * 'dynamic_txqs' parameter. It's required to stop using them before > > + * changing this setting and it's simpler to mark ports here and allow > > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > > + * no actual reconfiguration in 'port_reconfigure' because it's > > + * unnecessary. */ > > HMAP_FOR_EACH (port, node, &dp->ports) { > > - if (netdev_is_reconf_required(port->netdev)) { > > + if (netdev_is_reconf_required(port->netdev) > > + || (port->dynamic_txqs > > + != netdev_n_txq(port->netdev) < needed_txqs)) { > > port->need_reconfigure = true; > > } > > } > > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > seq_change(dp->port_seq); > > port_destroy(port); > > } else { > > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > > } > > } > > > > -- > > 2.7.4 > > > > > > > >
On 11.07.2017 05:10, Darrell Ball wrote: > > > On 7/10/17, 12:41 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > On 07.07.2017 21:09, Darrell Ball wrote: > > > > > > On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > On 07.07.2017 08:08, Darrell Ball wrote: > > > > > > > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > > > Reconfiguration of HW NICs may lead to packet drops. > > > In current model all physical ports will be reconfigured each > > > time number of PMD threads changed. Since we not stopping > > > threads on pmd-cpu-mask changes, this patch will help to further > > > decrease port's downtime by setting the maximum possible number > > > of wanted tx queues to avoid unnecessary reconfigurations. > > > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > > --- > > > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > > index 596d133..79db770 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > > { > > > struct dp_netdev_pmd_thread *pmd; > > > struct dp_netdev_port *port; > > > - int wanted_txqs; > > > + int needed_txqs, wanted_txqs; > > > > > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > > > > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > > > * on the system and the user configuration. */ > > > reconfigure_pmd_threads(dp); > > > > > > - wanted_txqs = cmap_count(&dp->poll_threads); > > > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > > > + * to allocate the maximum possible value to minimize the number of port > > > + * reconfigurations. */ > > > + needed_txqs = cmap_count(&dp->poll_threads); > > > + /* (n_cores + 1) is the maximum that we might need to have. > > > + * Additional queue is for non-PMD threads. */ > > > + wanted_txqs = ovs_numa_get_n_cores(); > > > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > > > + wanted_txqs++; > > > > > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > > > rare disruptive event that can/will be scheduled by the administrator. > > > > > > Based on the actual number of queues supported and the number of cores present, > > > this optimization may or may not work. It is unpredictable whether there will be benefit > > > in a particular case from the user POV. > > > If I were the administrator, I would try to error on the conservative side anyways if I could > > > not predict the result. > > > > > > Did I miss something ? > > > > In NFV environment if you want to add one more VM to your hosts you will have to > > choose between: > > > > * not creating a new PMD thread > > -> performance degradation of networking for other working VMs > > > > > > * adding of the new PMD thread > > -> desruption of the networking for the whole host for the time > > of OVS reconfiguration. > > > > This patch removes the cost of the second option. > > > > In general, adding a PMD thread per VM may not always (or even usually) make sense. > > Not per VM. Lets assume that all existing threads are already overloaded. > > That would be less common; it is not the usual case. We have different points of view. We need more opinions on that. > > > There are use cases for sure, but using a PMD core per VM is often viewed as dpdk using too much > > cpu resources and limiting the adoption of dpdk. > > > > Furthermore, for dpdk gateways it is irrelevant. > > Disagree with that statement. > > dpdk gateways should not usually require changing pmd-cpu-mask when a VM is added to hypervisor > somewhere. Sometimes you need to deploy more resources for sure, but this should be much less > frequent than adding or removing VMs. > The point is the user should be rarely changing the pmd-cpu-mask and this can be done during > a maintenance window when needed. > > > > > > I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear > > and this optimization will work, except for HW NICs with very low number of HW queues. > > > > HW nic interfaces carry aggregated traffic for multiple VMs etc, so these cases are most important. > > And that is the point why this patch implemented. This patch tries to > minimize the packet drops on the "shared" between VMs HW NICs. It > allows to add new threads without breaking networking for others not > reconfiguring device that handles traffic for other VMs at this moment. > > I understand what this patch wants to do and I understand what the last designs wanted to > do as well. > > > > It is the ratio of the number of cores to hw queues that matter. If queues are less than > > cores, then the benefit may not occur. > > That's right. But are you saying that we should not apply optimization if > it covers only 50% of cases? (There is bigger percentage, actually, because > modern NICs supports pretty much tx queues) > > Nobody knows what the number is, but I know that the number of cpus is increasing > and making assumptions about the relative numbers of cores vs queues can only work by luck. As you mentioned previously, you think that creating large number of PMD threads is not a common case (this follows from the "adding a PMD thread per VM may not always (or even usually) make sense."). This means that optimization from that patch will usually work because number of PMD threads is much less than number of cores and usually will be comparable or less than number of queues. And this will work now, not in some ephemeral future where DPDK will return real numbers for available queues. In the future this maybe will work too, because not only number of CPU cores increasing. > It is inherently unpredictable and unpredictable for a user initiated/controlled change is ugly. > This is even worse when the user initiated change is rare because the user does not remember > the full context of the command. > > > > This patch makes assumptions about the relative number of cores vs the number of queues. > > These two parameters should be treated as independent – this part really bothers me. > > It doesn't make any assumptions, it just optimizes one of 2 possible situations. > If the number of queues will be less, the behaviour will be exactly same as in > current master. > > We are repeating the same thing. > > > > I think we could solve this in a coordinated generic fashion with a user management command > > that would also give feedback to the user as to the possible success and maybe provide options. > > I wonder if we will eventually go in that direction, in the future, anyways ? > > I don't think so. We don't need to have user management command because the > only sane values for the number of queues right now is the number of threads > or the maximum number of available queues in hardware. We're using the > first one only because HW is not able to say how many queues it able to > successfully configure. > > The fact that hardware cannot always report the correct number of queues is an issue > that will get solved over time and should be part of what a user command reports > (eg) “unknown queue number, therefore it is unknown whether a pmd-cpu-mask change > will be hitless”, issue the following command ‘X’ when you would like to make the change”). Reconfiguration will never be hitless until we're not even trying to make it more lightweight. > > A wrapper user command(s) would provide visibility and predictability to the user. > i.e. we can do better. *Why we need the configuration knob which can only decrease the performance?* This will end up in situation where users will just be forced to configure number of tx queues for each port equal to number of cores in cpu mask. Or they will just set up it to 1000 for all the ports to be sure that maximum possible value was configured. And this option will have to configure maximum possible value less or equal to passed one because we have no way to find the maximum value until dpdk not fixed (which is actually the hard problem). Otherwise, they will have to bisect it and that is the ugly thing. > This patch allows us to use the second value in all cases where possible. > > > > > But situation will never be worse than without this patch anyway. > > > > About PMD mask changes: > > Even one change in a week can be disruptive in high-loaded NFV environment. > > Also, this changes can be performed automatically by monitoring tools while > > VMs migrations or rebalancing the load between PMD threads to free the CPU > > cores for other applications/VMs. > > > > > > > > /* The number of pmd threads might have changed, or a port can be new: > > > * adjust the txqs. */ > > > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > > > /* Check for all the ports that need reconfiguration. We cache this in > > > * 'port->reconfigure', because netdev_is_reconf_required() can change at > > > - * any time. */ > > > + * any time. > > > + * Also mark for reconfiguration all ports which will likely change their > > > + * 'dynamic_txqs' parameter. It's required to stop using them before > > > + * changing this setting and it's simpler to mark ports here and allow > > > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > > > + * no actual reconfiguration in 'port_reconfigure' because it's > > > + * unnecessary. */ > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > > - if (netdev_is_reconf_required(port->netdev)) { > > > + if (netdev_is_reconf_required(port->netdev) > > > + || (port->dynamic_txqs > > > + != netdev_n_txq(port->netdev) < needed_txqs)) { > > > port->need_reconfigure = true; > > > } > > > } > > > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > > seq_change(dp->port_seq); > > > port_destroy(port); > > > } else { > > > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > > > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > > > } > > > } > > > > > > -- > > > 2.7.4
Comments inline but this is mostly repetitive. -----Original Message----- From: Ilya Maximets <i.maximets@samsung.com> Date: Monday, July 10, 2017 at 11:55 PM To: Darrell Ball <dball@vmware.com>, "dev@openvswitch.org" <dev@openvswitch.org> Cc: Heetae Ahn <heetae82.ahn@samsung.com>, Daniele Di Proietto <diproiettod@ovn.org>, Ben Pfaff <blp@ovn.org>, Pravin Shelar <pshelar@ovn.org>, Ciara Loftus <ciara.loftus@intel.com>, Ian Stokes <ian.stokes@intel.com>, Kevin Traynor <ktraynor@redhat.com> Subject: Re: [PATCH v2 2/3] dpif-netdev: Avoid port's reconfiguration on pmd-cpu-mask changes. On 11.07.2017 05:10, Darrell Ball wrote: > > > On 7/10/17, 12:41 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > On 07.07.2017 21:09, Darrell Ball wrote: > > > > > > On 7/6/17, 11:11 PM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > On 07.07.2017 08:08, Darrell Ball wrote: > > > > > > > > > On 5/30/17, 7:12 AM, "Ilya Maximets" <i.maximets@samsung.com> wrote: > > > > > > Reconfiguration of HW NICs may lead to packet drops. > > > In current model all physical ports will be reconfigured each > > > time number of PMD threads changed. Since we not stopping > > > threads on pmd-cpu-mask changes, this patch will help to further > > > decrease port's downtime by setting the maximum possible number > > > of wanted tx queues to avoid unnecessary reconfigurations. > > > > > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > > > --- > > > lib/dpif-netdev.c | 26 +++++++++++++++++++++----- > > > 1 file changed, 21 insertions(+), 5 deletions(-) > > > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > > index 596d133..79db770 100644 > > > --- a/lib/dpif-netdev.c > > > +++ b/lib/dpif-netdev.c > > > @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > > { > > > struct dp_netdev_pmd_thread *pmd; > > > struct dp_netdev_port *port; > > > - int wanted_txqs; > > > + int needed_txqs, wanted_txqs; > > > > > > dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); > > > > > > @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) > > > * on the system and the user configuration. */ > > > reconfigure_pmd_threads(dp); > > > > > > - wanted_txqs = cmap_count(&dp->poll_threads); > > > + /* We need 1 Tx queue for each thread to avoid locking, but we will try > > > + * to allocate the maximum possible value to minimize the number of port > > > + * reconfigurations. */ > > > + needed_txqs = cmap_count(&dp->poll_threads); > > > + /* (n_cores + 1) is the maximum that we might need to have. > > > + * Additional queue is for non-PMD threads. */ > > > + wanted_txqs = ovs_numa_get_n_cores(); > > > + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); > > > + wanted_txqs++; > > > > > > I don’t think PMD mask changes are common, so this patch is trying to optimize a > > > rare disruptive event that can/will be scheduled by the administrator. > > > > > > Based on the actual number of queues supported and the number of cores present, > > > this optimization may or may not work. It is unpredictable whether there will be benefit > > > in a particular case from the user POV. > > > If I were the administrator, I would try to error on the conservative side anyways if I could > > > not predict the result. > > > > > > Did I miss something ? > > > > In NFV environment if you want to add one more VM to your hosts you will have to > > choose between: > > > > * not creating a new PMD thread > > -> performance degradation of networking for other working VMs > > > > > > * adding of the new PMD thread > > -> desruption of the networking for the whole host for the time > > of OVS reconfiguration. > > > > This patch removes the cost of the second option. > > > > In general, adding a PMD thread per VM may not always (or even usually) make sense. > > Not per VM. Lets assume that all existing threads are already overloaded. > > That would be less common; it is not the usual case. We have different points of view. We need more opinions on that. > > > There are use cases for sure, but using a PMD core per VM is often viewed as dpdk using too much > > cpu resources and limiting the adoption of dpdk. > > > > Furthermore, for dpdk gateways it is irrelevant. > > Disagree with that statement. > > dpdk gateways should not usually require changing pmd-cpu-mask when a VM is added to hypervisor > somewhere. Sometimes you need to deploy more resources for sure, but this should be much less > frequent than adding or removing VMs. > The point is the user should be rarely changing the pmd-cpu-mask and this can be done during > a maintenance window when needed. > > > > > > I don't understand what you mean saying 'unpredictable benefit'. The benefit is clear > > and this optimization will work, except for HW NICs with very low number of HW queues. > > > > HW nic interfaces carry aggregated traffic for multiple VMs etc, so these cases are most important. > > And that is the point why this patch implemented. This patch tries to > minimize the packet drops on the "shared" between VMs HW NICs. It > allows to add new threads without breaking networking for others not > reconfiguring device that handles traffic for other VMs at this moment. > > I understand what this patch wants to do and I understand what the last designs wanted to > do as well. > > > > It is the ratio of the number of cores to hw queues that matter. If queues are less than > > cores, then the benefit may not occur. > > That's right. But are you saying that we should not apply optimization if > it covers only 50% of cases? (There is bigger percentage, actually, because > modern NICs supports pretty much tx queues) > > Nobody knows what the number is, but I know that the number of cpus is increasing > and making assumptions about the relative numbers of cores vs queues can only work by luck. As you mentioned previously, you think that creating large number of PMD threads is not a common case (this follows from the "adding a PMD thread per VM may not always (or even usually) make sense."). This means that optimization from that patch will usually work because number of PMD threads is much less than number of cores and usually will be comparable or less than number of queues. Again, what I previously said is that I don’t expect each VM to have a dedicated PMD thread in the general case. The reasoning comes the cost based on utilization. In general, the high reservation of cores for PMD threads is persistently mentioned as a major impediment for DPDK adoption. For DPDK gateways, I would expect anything is possible including most cores having PMD threads running. Again, in general, I am not making any assumptions on the ratio of cores to queues, because I think these assumptions will break. The two parameters should be treated as independent parameters, which they are. This patch makes assumptions about the ratio of these two independent parameters. And this will work now, not in some ephemeral future where DPDK will return real numbers for available queues. In the future this maybe will work too, because not only number of CPU cores increasing. Again, the issue here is not whether it will work in some cases. The issues here is to give feedback to the user when it might work. Changing the pmd-cpu-mask is an infrequent administrative command used on a ‘non-experimental product’, so the user should have feedback on what would to be the result of the command. I might be ok with this patch if OVS-DPDK were still labelled as experimental, but I would not want to go back to the experimental tag if possible. > It is inherently unpredictable and unpredictable for a user initiated/controlled change is ugly. > This is even worse when the user initiated change is rare because the user does not remember > the full context of the command. > > > > This patch makes assumptions about the relative number of cores vs the number of queues. > > These two parameters should be treated as independent – this part really bothers me. > > It doesn't make any assumptions, it just optimizes one of 2 possible situations. > If the number of queues will be less, the behaviour will be exactly same as in > current master. > > We are repeating the same thing. > > > > I think we could solve this in a coordinated generic fashion with a user management command > > that would also give feedback to the user as to the possible success and maybe provide options. > > I wonder if we will eventually go in that direction, in the future, anyways ? > > I don't think so. We don't need to have user management command because the > only sane values for the number of queues right now is the number of threads > or the maximum number of available queues in hardware. We're using the > first one only because HW is not able to say how many queues it able to > successfully configure. > > The fact that hardware cannot always report the correct number of queues is an issue > that will get solved over time and should be part of what a user command reports > (eg) “unknown queue number, therefore it is unknown whether a pmd-cpu-mask change > will be hitless”, issue the following command ‘X’ when you would like to make the change”). Reconfiguration will never be hitless until we're not even trying to make it more lightweight. > > A wrapper user command(s) would provide visibility and predictability to the user. > i.e. we can do better. *Why we need the configuration knob which can only decrease the performance?* This will end up in situation where users will just be forced to configure number of tx queues for each port equal to number of cores in cpu mask. Or they will just set up it to 1000 for all the ports to be sure that maximum possible value was configured. The key aspect here is about giving feedback to user about the result of a potential pmd-cpu-mask change. I think it can done with the results being one of three possibilities. 1/ Command will be hitless. 2/ Command will not be hitless. 3/ Command hitless impact is unknown; assume traffic will be affected. This should be supported by documentation. With this patch as is, the result is unknown in all cases. And this option will have to configure maximum possible value less or equal to passed one because we have no way to find the maximum value until dpdk not fixed (which is actually the hard problem). Otherwise, they will have to bisect it and that is the ugly thing. ‘IF’ there are issues with rte_eth_dev_info_get(), ‘in some modes’, with returning the real number of hardware queues, then this can be characterized (meaning in which cases) and fixed. In such a case, the feedback to the user should be conservative and fall into ‘3’ above. > This patch allows us to use the second value in all cases where possible. > > > > > But situation will never be worse than without this patch anyway. > > > > About PMD mask changes: > > Even one change in a week can be disruptive in high-loaded NFV environment. > > Also, this changes can be performed automatically by monitoring tools while > > VMs migrations or rebalancing the load between PMD threads to free the CPU > > cores for other applications/VMs. > > > > > > > > /* The number of pmd threads might have changed, or a port can be new: > > > * adjust the txqs. */ > > > @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) > > > > > > /* Check for all the ports that need reconfiguration. We cache this in > > > * 'port->reconfigure', because netdev_is_reconf_required() can change at > > > - * any time. */ > > > + * any time. > > > + * Also mark for reconfiguration all ports which will likely change their > > > + * 'dynamic_txqs' parameter. It's required to stop using them before > > > + * changing this setting and it's simpler to mark ports here and allow > > > + * 'pmd_remove_stale_ports' to remove them from threads. There will be > > > + * no actual reconfiguration in 'port_reconfigure' because it's > > > + * unnecessary. */ > > > HMAP_FOR_EACH (port, node, &dp->ports) { > > > - if (netdev_is_reconf_required(port->netdev)) { > > > + if (netdev_is_reconf_required(port->netdev) > > > + || (port->dynamic_txqs > > > + != netdev_n_txq(port->netdev) < needed_txqs)) { > > > port->need_reconfigure = true; > > > } > > > } > > > @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) > > > seq_change(dp->port_seq); > > > port_destroy(port); > > > } else { > > > - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; > > > + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; > > > } > > > } > > > > > > -- > > > 2.7.4
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 596d133..79db770 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3453,7 +3453,7 @@ reconfigure_datapath(struct dp_netdev *dp) { struct dp_netdev_pmd_thread *pmd; struct dp_netdev_port *port; - int wanted_txqs; + int needed_txqs, wanted_txqs; dp->last_reconfigure_seq = seq_read(dp->reconfigure_seq); @@ -3461,7 +3461,15 @@ reconfigure_datapath(struct dp_netdev *dp) * on the system and the user configuration. */ reconfigure_pmd_threads(dp); - wanted_txqs = cmap_count(&dp->poll_threads); + /* We need 1 Tx queue for each thread to avoid locking, but we will try + * to allocate the maximum possible value to minimize the number of port + * reconfigurations. */ + needed_txqs = cmap_count(&dp->poll_threads); + /* (n_cores + 1) is the maximum that we might need to have. + * Additional queue is for non-PMD threads. */ + wanted_txqs = ovs_numa_get_n_cores(); + ovs_assert(wanted_txqs != OVS_CORE_UNSPEC); + wanted_txqs++; /* The number of pmd threads might have changed, or a port can be new: * adjust the txqs. */ @@ -3474,9 +3482,17 @@ reconfigure_datapath(struct dp_netdev *dp) /* Check for all the ports that need reconfiguration. We cache this in * 'port->reconfigure', because netdev_is_reconf_required() can change at - * any time. */ + * any time. + * Also mark for reconfiguration all ports which will likely change their + * 'dynamic_txqs' parameter. It's required to stop using them before + * changing this setting and it's simpler to mark ports here and allow + * 'pmd_remove_stale_ports' to remove them from threads. There will be + * no actual reconfiguration in 'port_reconfigure' because it's + * unnecessary. */ HMAP_FOR_EACH (port, node, &dp->ports) { - if (netdev_is_reconf_required(port->netdev)) { + if (netdev_is_reconf_required(port->netdev) + || (port->dynamic_txqs + != netdev_n_txq(port->netdev) < needed_txqs)) { port->need_reconfigure = true; } } @@ -3510,7 +3526,7 @@ reconfigure_datapath(struct dp_netdev *dp) seq_change(dp->port_seq); port_destroy(port); } else { - port->dynamic_txqs = netdev_n_txq(port->netdev) < wanted_txqs; + port->dynamic_txqs = netdev_n_txq(port->netdev) < needed_txqs; } }
Reconfiguration of HW NICs may lead to packet drops. In current model all physical ports will be reconfigured each time number of PMD threads changed. Since we not stopping threads on pmd-cpu-mask changes, this patch will help to further decrease port's downtime by setting the maximum possible number of wanted tx queues to avoid unnecessary reconfigurations. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> --- lib/dpif-netdev.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-)