Message ID | 1498428406-29712-1-git-send-email-bhanuprakash.bodireddy@intel.com |
---|---|
State | Rejected |
Delegated to: | Darrell Ball |
Headers | show |
With this change and CFS in effect, it effectively means that the dpdk control threads need to be on different cores than the PMD threads or the response latency may be too long for their control work ? Have we tested having the control threads on the same cpu with -20 nice for the pmd thread ? I see the comment is added below + It is recommended that the OVS control thread and pmd thread shouldn't be + pinned to the same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask + settings should be non-overlapping. I understand that other heavy threads would be a problem for PMD threads and we want to effectively encourage these to be on different cores in the situation where we are using a pmd-cpu-mask. However, here we are almost shutting down other threads by default on the same core as PMDs threads using -20 nice, even those with little cpu load but just needing a reasonable latency. Will this aggravate the argument from some quarters that using dpdk requires too much cpu reservation ? On 6/25/17, 3:06 PM, "ovs-dev-bounces@openvswitch.org on behalf of Bhanuprakash Bodireddy" <ovs-dev-bounces@openvswitch.org on behalf of bhanuprakash.bodireddy@intel.com> wrote: Increase the DPDK pmd thread scheduling priority by lowering the nice value. This will advise the kernel scheduler to prioritize pmd thread over other processes and will help PMD to provide deterministic performance in out-of-the-box deployments. This patch sets the nice value of PMD threads to '-20'. $ ps -eLo comm,policy,psr,nice | grep pmd COMMAND POLICY PROCESSOR NICE pmd62 TS 3 -20 pmd63 TS 0 -20 pmd64 TS 1 -20 pmd65 TS 2 -20 Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodireddy@intel.com> Tested-by: Billy O'Mahony <billy.o.mahony@intel.com> Acked-by: Billy O'Mahony <billy.o.mahony@intel.com> --- v9->v10 * Return error code if setpriority fails. v8->v9: * Rebase v7->v8: * Rebase * Update the documentation file @Documentation/intro/install/dpdk-advanced.rst v6->v7: * Remove realtime scheduling policy logic. * Increase pmd thread scheduling priority by lowering nice value to -20. * Update doc accordingly. v5->v6: * Prohibit spawning pmd thread on the lowest core in dpdk-lcore-mask if lcore-mask and pmd-mask affinity are identical. * Updated Note section in INSTALL.DPDK-ADVANCED doc. * Tested below cases to verify system stability with pmd priority patch v4->v5: * Reword Note section in DPDK-ADVANCED.md v3->v4: * Document update * Use ovs_strerror for reporting errors in lib-numa.c v2->v3: * Move set_priority() function to lib/ovs-numa.c * Apply realtime scheduling policy and priority to pmd thread only if pmd-cpu-mask is passed. * Update INSTALL.DPDK-ADVANCED. v1->v2: * Removed #ifdef and introduced dummy function "pmd_thread_setpriority" in netdev-dpdk.h * Rebase Documentation/intro/install/dpdk.rst | 8 +++++++- lib/dpif-netdev.c | 4 ++++ lib/ovs-numa.c | 22 ++++++++++++++++++++++ lib/ovs-numa.h | 1 + 4 files changed, 34 insertions(+), 1 deletion(-) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index e83f852..b5c26ba 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -453,7 +453,8 @@ affinitized accordingly. to be affinitized to isolated cores for optimum performance. By setting a bit in the mask, a pmd thread is created and pinned to the - corresponding CPU core. e.g. to run a pmd thread on core 2:: + corresponding CPU core with nice value set to -20. + e.g. to run a pmd thread on core 2:: $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4 @@ -493,6 +494,11 @@ improvements as there will be more total CPU occupancy available:: NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 + .. note:: + It is recommended that the OVS control thread and pmd thread shouldn't be + pinned to the same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask + settings should be non-overlapping. + DPDK Physical Port Rx Queues ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4e29085..e952cf9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3712,6 +3712,10 @@ pmd_thread_main(void *f_) ovs_numa_thread_setaffinity_core(pmd->core_id); dpdk_set_lcore_id(pmd->core_id); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); + + /* Set pmd thread's nice value to -20 */ +#define MIN_NICE -20 + ovs_numa_thread_setpriority(MIN_NICE); reload: emc_cache_init(&pmd->flow_cache); diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c index 98e97cb..9cf6bd4 100644 --- a/lib/ovs-numa.c +++ b/lib/ovs-numa.c @@ -23,6 +23,7 @@ #include <dirent.h> #include <stddef.h> #include <string.h> +#include <sys/resource.h> #include <sys/types.h> #include <unistd.h> #endif /* __linux__ */ @@ -570,3 +571,24 @@ int ovs_numa_thread_setaffinity_core(unsigned core_id OVS_UNUSED) return EOPNOTSUPP; #endif /* __linux__ */ } + +int +ovs_numa_thread_setpriority(int nice OVS_UNUSED) +{ + if (dummy_numa) { + return 0; + } + +#ifndef _WIN32 + int err; + err = setpriority(PRIO_PROCESS, 0, nice); + if (err) { + VLOG_ERR("Thread priority error %s", ovs_strerror(err)); + return err; + } + + return 0; +#else + return EOPNOTSUPP; +#endif +} diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h index 6946cdc..e132483 100644 --- a/lib/ovs-numa.h +++ b/lib/ovs-numa.h @@ -62,6 +62,7 @@ bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *, size_t ovs_numa_dump_count(const struct ovs_numa_dump *); void ovs_numa_dump_destroy(struct ovs_numa_dump *); int ovs_numa_thread_setaffinity_core(unsigned core_id); +int ovs_numa_thread_setpriority(int nice); #define FOR_EACH_CORE_ON_DUMP(ITER, DUMP) \ HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->cores) -- 2.4.11 _______________________________________________ dev mailing list dev@openvswitch.org https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DwICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=zXbCg4WV0RIP3JSv_63fHNJnZWnx5t936cWOkpWxXwM&s=jnpfeaeqmiFt37-rg8gCHNZvYMYwuPvl-hSrZL5ziCw&e=
>With this change and CFS in effect, it effectively means that the dpdk control >threads need to be on different cores than the PMD threads or the response >latency may be too long for their control work ? >Have we tested having the control threads on the same cpu with -20 nice for >the pmd thread ? Yes, I did some testing and had a reason to add the comment that recommends dpdk-lcore-mask and pmd-cpu-mask should be non-overlapping. The testing was done with a simple script that adds and deletes 750 vHost User ports(script copied below). The time statistics are captured in this case. dpdk-lcore-mask | PMD thread | PMD NICE | Time statistics unspecified Core 3 -20 real 1m5.610s / user 0m0.706s/ sys 0m0.023s [With patch] Core 3 Core 3 -20 real 2m14.089s / user 0m0.717s/ sys 0m0.017s [with patch] unspecified Core 3 0 real 1m5.209s /user 0m0.711s/sys 0m0.020s [Master] Core 3 Core 3 0 real 1m7.209s /user 0m0.711s/sys 0m0.020s [Master] In all cases, if the dpdk-lcore-mask is 'unspecified' the main thread floats between the available cores(0-27 in my case). With this patch(PMD nice value is at -20), and with main & pmd thread pinned to core 3, the port addition and deletion took twice the time. However most important thing to notice is with active traffic and with port addition/deletion in progress, throughput drops instantly *without* the patch. In this case the vswitchd thread consumes 7% of the CPU time at one stage there by impacting the forwarding performance. With the patch the throughput is still affected but happens gradually. In this case the vswitchd thread was consuming not more than 2% of the CPU time and so port addition/deletion took longer time. > >I see the comment is added below >+ It is recommended that the OVS control thread and pmd thread shouldn't >be >+ pinned to the same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu >mask >+ settings should be non-overlapping. > > >I understand that other heavy threads would be a problem for PMD threads >and we want to effectively encourage these to be on different cores in the >situation where we are using a pmd-cpu-mask. >However, here we are almost shutting down other threads by default on the >same core as PMDs threads using -20 nice, even those with little cpu load but >just needing a reasonable latency. I had the logic of completely shutting down other threads in the early versions of this patch by assigning real time priority to the PMD thread. But that seemed too dangerous and changing nice value is safer bet. I agree that latency can go up for non-pmd threads with this patch but it’s the same problem as there are other kernel threads that runs at -20 nice value and some with 'rt' priority. > >Will this aggravate the argument from some quarters that using dpdk requires >too much cpu reservation ? Atleast for PMD threads that are heart of packet processing in OvS-DPDK. More information on commands: script to test the port addition and deletion. $cat port_test.sh cmds=; for i in {1..750}; do cmds+=" -- add-port br0 dpdkvhostuser$i -- set Interface dpdkvhostuser$i type=dpdkvhostuser"; done ovs-vsctl $cmds sleep 1; cmds=; for i in {1..750}; do cmds+=" -- del-port br0 dpdkvhostuser$i"; done ovs-vsctl $cmds $ time ./port_test.sh dpdk-lcore-mask and pmd-cpu-mask explicitly set to CORE 3. ------------------------------------------------------------------------------------------- $ ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=8 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=8 $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 110881 20 ovsdb-server 110892 3 ovs-vswitchd 110976 3 pmd61 110898 3 eal-intr-thread 110903 3 urcu3 110947 3 handler60 Dpdk-lcore-mask unspecified, pmd-cpu-mask explicitly set to CORE 3. --------------------------------------------------------------------------------------------- $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=8 $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 111474 14 ovsdb-server 111483 6 ovs-vswitchd 111566 3 pmd61 111564 10 revalidator60 111489 0 eal-intr-thread 111493 8 urcu3 Regards, Bhanuprakash.
-----Original Message----- From: "Bodireddy, Bhanuprakash" <bhanuprakash.bodireddy@intel.com> Date: Monday, June 26, 2017 at 5:56 AM To: Darrell Ball <dball@vmware.com>, "dev@openvswitch.org" <dev@openvswitch.org> Subject: RE: [ovs-dev] [PATCH v10] netdev-dpdk: Increase pmd thread priority. >With this change and CFS in effect, it effectively means that the dpdk control >threads need to be on different cores than the PMD threads or the response >latency may be too long for their control work ? >Have we tested having the control threads on the same cpu with -20 nice for >the pmd thread ? Yes, I did some testing and had a reason to add the comment that recommends dpdk-lcore-mask and pmd-cpu-mask should be non-overlapping. The testing was done with a simple script that adds and deletes 750 vHost User ports(script copied below). The time statistics are captured in this case. dpdk-lcore-mask | PMD thread | PMD NICE | Time statistics unspecified Core 3 -20 real 1m5.610s / user 0m0.706s/ sys 0m0.023s [With patch] Core 3 Core 3 -20 real 2m14.089s / user 0m0.717s/ sys 0m0.017s [with patch] unspecified Core 3 0 real 1m5.209s /user 0m0.711s/sys 0m0.020s [Master] Core 3 Core 3 0 real 1m7.209s /user 0m0.711s/sys 0m0.020s [Master] [Darrell] So if either the lcore mask is unspecified or specified to be non-conflicting, then the advantage is basically nil. We should usually be able to do this and when we cannot I am not sure favoring throughput over management tasks such as port add is good, as the potential relative impact of the management task is high while the % of total cpu time usage is lower. /////////// In all cases, if the dpdk-lcore-mask is 'unspecified' the main thread floats between the available cores(0-27 in my case). With this patch(PMD nice value is at -20), and with main & pmd thread pinned to core 3, the port addition and deletion took twice the time. However most important thing to notice is with active traffic and with port addition/deletion in progress, throughput drops instantly *without* the patch. In this case the vswitchd thread consumes 7% of the CPU time at one stage there by impacting the forwarding performance. With the patch the throughput is still affected but happens gradually. In this case the vswitchd thread was consuming not more than 2% of the CPU time and so port addition/deletion took longer time. > >I see the comment is added below >+ It is recommended that the OVS control thread and pmd thread shouldn't >be >+ pinned to the same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu >mask >+ settings should be non-overlapping. > > >I understand that other heavy threads would be a problem for PMD threads >and we want to effectively encourage these to be on different cores in the >situation where we are using a pmd-cpu-mask. >However, here we are almost shutting down other threads by default on the >same core as PMDs threads using -20 nice, even those with little cpu load but >just needing a reasonable latency. I had the logic of completely shutting down other threads in the early versions of this patch by assigning real time priority to the PMD thread. But that seemed too dangerous and changing nice value is safer bet. I agree that latency can go up for non-pmd threads with this patch but it’s the same problem as there are other kernel threads that runs at -20 nice value and some with 'rt' priority. > >Will this aggravate the argument from some quarters that using dpdk requires >too much cpu reservation ? Atleast for PMD threads that are heart of packet processing in OvS-DPDK. More information on commands: script to test the port addition and deletion. $cat port_test.sh cmds=; for i in {1..750}; do cmds+=" -- add-port br0 dpdkvhostuser$i -- set Interface dpdkvhostuser$i type=dpdkvhostuser"; done ovs-vsctl $cmds sleep 1; cmds=; for i in {1..750}; do cmds+=" -- del-port br0 dpdkvhostuser$i"; done ovs-vsctl $cmds $ time ./port_test.sh dpdk-lcore-mask and pmd-cpu-mask explicitly set to CORE 3. ------------------------------------------------------------------------------------------- $ ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=8 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=8 $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 110881 20 ovsdb-server 110892 3 ovs-vswitchd 110976 3 pmd61 110898 3 eal-intr-thread 110903 3 urcu3 110947 3 handler60 Dpdk-lcore-mask unspecified, pmd-cpu-mask explicitly set to CORE 3. --------------------------------------------------------------------------------------------- $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=8 $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 111474 14 ovsdb-server 111483 6 ovs-vswitchd 111566 3 pmd61 111564 10 revalidator60 111489 0 eal-intr-thread 111493 8 urcu3 Regards, Bhanuprakash.
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index e83f852..b5c26ba 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -453,7 +453,8 @@ affinitized accordingly. to be affinitized to isolated cores for optimum performance. By setting a bit in the mask, a pmd thread is created and pinned to the - corresponding CPU core. e.g. to run a pmd thread on core 2:: + corresponding CPU core with nice value set to -20. + e.g. to run a pmd thread on core 2:: $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x4 @@ -493,6 +494,11 @@ improvements as there will be more total CPU occupancy available:: NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 + .. note:: + It is recommended that the OVS control thread and pmd thread shouldn't be + pinned to the same core i.e 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask + settings should be non-overlapping. + DPDK Physical Port Rx Queues ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 4e29085..e952cf9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3712,6 +3712,10 @@ pmd_thread_main(void *f_) ovs_numa_thread_setaffinity_core(pmd->core_id); dpdk_set_lcore_id(pmd->core_id); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); + + /* Set pmd thread's nice value to -20 */ +#define MIN_NICE -20 + ovs_numa_thread_setpriority(MIN_NICE); reload: emc_cache_init(&pmd->flow_cache); diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c index 98e97cb..9cf6bd4 100644 --- a/lib/ovs-numa.c +++ b/lib/ovs-numa.c @@ -23,6 +23,7 @@ #include <dirent.h> #include <stddef.h> #include <string.h> +#include <sys/resource.h> #include <sys/types.h> #include <unistd.h> #endif /* __linux__ */ @@ -570,3 +571,24 @@ int ovs_numa_thread_setaffinity_core(unsigned core_id OVS_UNUSED) return EOPNOTSUPP; #endif /* __linux__ */ } + +int +ovs_numa_thread_setpriority(int nice OVS_UNUSED) +{ + if (dummy_numa) { + return 0; + } + +#ifndef _WIN32 + int err; + err = setpriority(PRIO_PROCESS, 0, nice); + if (err) { + VLOG_ERR("Thread priority error %s", ovs_strerror(err)); + return err; + } + + return 0; +#else + return EOPNOTSUPP; +#endif +} diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h index 6946cdc..e132483 100644 --- a/lib/ovs-numa.h +++ b/lib/ovs-numa.h @@ -62,6 +62,7 @@ bool ovs_numa_dump_contains_core(const struct ovs_numa_dump *, size_t ovs_numa_dump_count(const struct ovs_numa_dump *); void ovs_numa_dump_destroy(struct ovs_numa_dump *); int ovs_numa_thread_setaffinity_core(unsigned core_id); +int ovs_numa_thread_setpriority(int nice); #define FOR_EACH_CORE_ON_DUMP(ITER, DUMP) \ HMAP_FOR_EACH((ITER), hmap_node, &(DUMP)->cores)