From patchwork Tue Jul 26 10:47:36 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Bodireddy, Bhanuprakash" X-Patchwork-Id: 652687 X-Patchwork-Delegate: diproiettod@vmware.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3rzFJ273Jmz9s3v for ; Tue, 26 Jul 2016 20:49:52 +1000 (AEST) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id E2DD3109AB; Tue, 26 Jul 2016 03:49:51 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id EEE7410230 for ; Tue, 26 Jul 2016 03:49:50 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id 8A8344200AC for ; Tue, 26 Jul 2016 04:49:50 -0600 (MDT) X-ASG-Debug-ID: 1469530189-09eadd464a57ccc0001-byXFYA Received: from mx1-pf1.cudamail.com ([192.168.24.1]) by bar5.cudamail.com with ESMTP id bWzwK5ilu3JoXfoC (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 26 Jul 2016 04:49:49 -0600 (MDT) X-Barracuda-Envelope-From: bhanuprakash.bodireddy@intel.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.1 Received: from unknown (HELO mga03.intel.com) (134.134.136.65) by mx1-pf1.cudamail.com with SMTP; 26 Jul 2016 10:49:49 -0000 Received-SPF: pass (mx1-pf1.cudamail.com: SPF record at intel.com designates 134.134.136.65 as permitted sender) X-Barracuda-Apparent-Source-IP: 134.134.136.65 X-Barracuda-RBL-IP: 134.134.136.65 Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga103.jf.intel.com with ESMTP; 26 Jul 2016 03:49:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos; i="5.28,424,1464678000"; d="scan'208"; a="1002665452" Received: from unknown (HELO silpixa00393942.ir.intel.com) ([10.243.18.113]) by orsmga001.jf.intel.com with ESMTP; 26 Jul 2016 03:49:47 -0700 X-CudaMail-Envelope-Sender: bhanuprakash.bodireddy@intel.com From: Bhanuprakash Bodireddy To: dev@openvswitch.org X-CudaMail-MID: CM-E1-725005579 X-CudaMail-DTE: 072616 X-CudaMail-Originating-IP: 134.134.136.65 Date: Tue, 26 Jul 2016 11:47:36 +0100 X-ASG-Orig-Subj: [##CM-E1-725005579##][PATCH v4] netdev-dpdk: Set pmd thread priority Message-Id: <1469530056-18999-1-git-send-email-bhanuprakash.bodireddy@intel.com> X-Mailer: git-send-email 2.4.11 X-Barracuda-Connect: UNKNOWN[192.168.24.1] X-Barracuda-Start-Time: 1469530189 X-Barracuda-Encrypted: ECDHE-RSA-AES256-GCM-SHA384 X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-Barracuda-Spam-Score: 0.60 X-Barracuda-Spam-Status: No, SCORE=0.60 using global scores of TAG_LEVEL=3.5 QUARANTINE_LEVEL=1000.0 KILL_LEVEL=4.0 tests=BSF_SC5_MJ1963, RDNS_NONE X-Barracuda-Spam-Report: Code version 3.2, rules version 3.2.3.31530 Rule breakdown below pts rule name description ---- ---------------------- -------------------------------------------------- 0.10 RDNS_NONE Delivered to trusted network by a host with no rDNS 0.50 BSF_SC5_MJ1963 Custom Rule MJ1963 Subject: [ovs-dev] [PATCH v4] netdev-dpdk: Set pmd thread priority X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Set the DPDK pmd thread scheduling policy to SCHED_RR and static priority to highest priority value of the policy. This is to deal with pmd thread starvation case where another cpu hogging process can get scheduled/affinitized on to the same core the pmd thread is running there by significantly impacting the datapath performance. Setting the realtime scheduling policy to the pmd threads is one step towards Fastpath Service Assurance in OVS DPDK. The realtime scheduling policy is applied only when CPU mask is passed to 'pmd-cpu-mask'. For example: * In the absence of pmd-cpu-mask, one pmd thread shall be created and default scheduling policy and priority gets applied. * If pmd-cpu-mask is specified, one or more pmd threads shall be spawned on the corresponding core(s) in the mask and real time scheduling policy SCHED_RR and highest priority of the policy is applied to the pmd thread(s). To reproduce the pmd thread starvation case: ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 taskset 0x2 cat /dev/zero > /dev/null & With this commit OVS control threads and pmd threads can't have same affinity ('dpdk-lcore-mask','pmd-cpu-mask' should be non-overlapping). Also other processes with same affinity as PMD thread will be unresponsive. Signed-off-by: Bhanuprakash Bodireddy --- v3->v4: * Document update * Use ovs_strerror for reporting errors in lib-numa.c v2->v3: * Move set_priority() function to lib/ovs-numa.c * Apply realtime scheduling policy and priority to pmd thread only if pmd-cpu-mask is passed. * Update INSTALL.DPDK-ADVANCED. v1->v2: * Removed #ifdef and introduced dummy function "pmd_thread_setpriority" in netdev-dpdk.h * Rebase INSTALL.DPDK-ADVANCED.md | 17 +++++++++++++---- lib/dpif-netdev.c | 9 +++++++++ lib/ovs-numa.c | 18 ++++++++++++++++++ lib/ovs-numa.h | 1 + 4 files changed, 41 insertions(+), 4 deletions(-) diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md index 9ae536d..cc27b5f 100644 --- a/INSTALL.DPDK-ADVANCED.md +++ b/INSTALL.DPDK-ADVANCED.md @@ -205,8 +205,10 @@ needs to be affinitized accordingly. pmd thread is CPU bound, and needs to be affinitized to isolated cores for optimum performance. - By setting a bit in the mask, a pmd thread is created and pinned - to the corresponding CPU core. e.g. to run a pmd thread on core 2 + By setting a bit in the mask, a pmd thread is created, pinned + to the corresponding CPU core and the scheduling policy SCHED_RR + along with maximum priority of the policy applied to the pmd thread. + e.g. to pin a pmd thread on core 2 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` @@ -234,8 +236,10 @@ needs to be affinitized accordingly. responsible for different ports/rxq's. Assignment of ports/rxq's to pmd threads is done automatically. - A set bit in the mask means a pmd thread is created and pinned - to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 + A set bit in the mask means a pmd thread is created, pinned to the + corresponding CPU core and the scheduling policy SCHED_RR with highest + priority of the scheduling policy applied to pmd thread. + e.g. to run pmd threads on core 1 and 2 `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` @@ -246,6 +250,11 @@ needs to be affinitized accordingly. NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 + Note: 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask settings should be + non-overlapping i.e OVS control threads and pmd threads can't have same + affinity. Also other processes with same affinity as PMD threads will be + unresponsive. + ### 4.3 DPDK physical port Rx Queues `ovs-vsctl set Interface options:n_rxq=` diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index f05ca4e..b85600b 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2841,6 +2841,15 @@ pmd_thread_main(void *f_) ovs_numa_thread_setaffinity_core(pmd->core_id); dpdk_set_lcore_id(pmd->core_id); poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); + + /* When cpu affinity mask explicitly set using pmd-cpu-mask, pmd thread's + * scheduling policy is set to SCHED_RR and the priority to highest priority + * of SCHED_RR policy. In the absence of pmd-cpu-mask, default scheduling + * policy and priority shall apply to pmd thread. + */ + if (pmd->dp->pmd_cmask) { + ovs_numa_thread_setpriority(SCHED_RR); + } reload: emc_cache_init(&pmd->flow_cache); diff --git a/lib/ovs-numa.c b/lib/ovs-numa.c index c8173e0..428f274 100644 --- a/lib/ovs-numa.c +++ b/lib/ovs-numa.c @@ -613,3 +613,21 @@ int ovs_numa_thread_setaffinity_core(unsigned core_id OVS_UNUSED) return EOPNOTSUPP; #endif /* __linux__ */ } + +void +ovs_numa_thread_setpriority(int policy) +{ + if (dummy_numa) { + return; + } + + struct sched_param threadparam; + int err; + + memset(&threadparam, 0, sizeof(threadparam)); + threadparam.sched_priority = sched_get_priority_max(policy); + err = pthread_setschedparam(pthread_self(), policy, &threadparam); + if (err) { + VLOG_ERR("Thread priority error %s",ovs_strerror(err)); + } +} diff --git a/lib/ovs-numa.h b/lib/ovs-numa.h index be836b2..94f0884 100644 --- a/lib/ovs-numa.h +++ b/lib/ovs-numa.h @@ -56,6 +56,7 @@ void ovs_numa_unpin_core(unsigned core_id); struct ovs_numa_dump *ovs_numa_dump_cores_on_numa(int numa_id); void ovs_numa_dump_destroy(struct ovs_numa_dump *); int ovs_numa_thread_setaffinity_core(unsigned core_id); +void ovs_numa_thread_setpriority(int policy); #define FOR_EACH_CORE_ON_NUMA(ITER, DUMP) \ LIST_FOR_EACH((ITER), list_node, &(DUMP)->dump)