From patchwork Wed May 10 15:59:09 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Billy O'Mahony X-Patchwork-Id: 760707 X-Patchwork-Delegate: dlu998@gmail.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3wNLXB1ggjz9s8V for ; Thu, 11 May 2017 01:59:21 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 33B5EB4C; Wed, 10 May 2017 15:59:18 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B4959416 for ; Wed, 10 May 2017 15:59:16 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E770A14E for ; Wed, 10 May 2017 15:59:15 +0000 (UTC) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 10 May 2017 08:59:15 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.38,320,1491289200"; d="scan'208";a="966867177" Received: from sivswdev01.ir.intel.com (HELO localhost.localdomain) ([10.237.217.45]) by orsmga003.jf.intel.com with ESMTP; 10 May 2017 08:59:13 -0700 From: Billy O'Mahony To: dev@openvswitch.org Date: Wed, 10 May 2017 16:59:09 +0100 Message-Id: <1494431949-13326-1-git-send-email-billy.o.mahony@intel.com> X-Mailer: git-send-email 1.7.0.7 X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v6] dpif-netdev: Assign ports to pmds on non-local numa node. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: billyom Previously if there is no available (non-isolated) pmd on the numa node for a port then the port is not polled at all. This can result in a non-operational system until such time as nics are physically repositioned. It is preferable to operate with a pmd on the 'wrong' numa node albeit with lower performance. Local pmds are still chosen when available. Signed-off-by: Billy O'Mahony Acked-by: Ian Stokes --- v6: Change 'port' to 'queue' in a warning msg v5: Fix warning msg; Update same in docs v4: Fix a checkpatch error v3: Fix warning messages not appearing when using multiqueue v2: Add details of warning messages into docs Documentation/intro/install/dpdk.rst | 10 +++++++++ lib/dpif-netdev.c | 43 +++++++++++++++++++++++++++++++----- 2 files changed, 48 insertions(+), 5 deletions(-) diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst index d1c0e65..7a66bff 100644 --- a/Documentation/intro/install/dpdk.rst +++ b/Documentation/intro/install/dpdk.rst @@ -460,6 +460,16 @@ affinitized accordingly. pmd thread on a NUMA node is only created if there is at least one DPDK interface from that NUMA node added to OVS. + .. note:: + On NUMA systems PCI devices are also local to a NUMA node. Rx queues for + PCI device will assigned to a pmd on it's local NUMA node if pmd-cpu-mask + has created a pmd thread on that NUMA node. If not the queue will be + assigned to a pmd on a remote NUMA node. This will result in reduced + maximum throughput on that device. In the case such a queue assignment + is made a warning message will be logged: "There's no available (non- + isolated) pmd thread on numa node N. Queue Q on port P will be assigned to + the pmd on core C (numa node N'). Expect reduced performance." + - QEMU vCPU thread Affinity A VM performing simple packet forwarding or running complex packet pipelines diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index b3a0806..34f1963 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3149,10 +3149,13 @@ rr_numa_list_lookup(struct rr_numa_list *rr, int numa_id) } static void -rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) +rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr, + int *all_numa_ids, unsigned all_numa_ids_sz, + int *num_ids_written) { struct dp_netdev_pmd_thread *pmd; struct rr_numa *numa; + unsigned idx = 0; hmap_init(&rr->numas); @@ -3170,7 +3173,11 @@ rr_numa_list_populate(struct dp_netdev *dp, struct rr_numa_list *rr) numa->n_pmds++; numa->pmds = xrealloc(numa->pmds, numa->n_pmds * sizeof *numa->pmds); numa->pmds[numa->n_pmds - 1] = pmd; + + all_numa_ids[idx % all_numa_ids_sz] = pmd->numa_id; + idx++; } + *num_ids_written = idx; } static struct dp_netdev_pmd_thread * @@ -3202,8 +3209,15 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) { struct dp_netdev_port *port; struct rr_numa_list rr; + int all_numa_ids [64]; + int all_numa_ids_sz = sizeof all_numa_ids / sizeof all_numa_ids[0]; + unsigned all_numa_ids_idx = 0; + int all_numa_ids_max_idx = 0; + int num_numa_ids = 0; - rr_numa_list_populate(dp, &rr); + rr_numa_list_populate(dp, &rr, all_numa_ids, all_numa_ids_sz, + &num_numa_ids); + all_numa_ids_max_idx = MIN(num_numa_ids - 1, all_numa_ids_sz - 1); HMAP_FOR_EACH (port, node, &dp->ports) { struct rr_numa *numa; @@ -3234,10 +3248,29 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { if (!numa) { - VLOG_WARN("There's no available (non isolated) pmd thread " + if (all_numa_ids_max_idx < 0) { + VLOG_ERR("There is no available (non-isolated) pmd " + "thread for port \'%s\' queue %d. This queue " + "will not be polled. Is pmd-cpu-mask set to " + "zero? Or are all PMDs isolated to other " + "queues?", netdev_get_name(port->netdev), + qid); + continue; + } + int alt_numa_id = all_numa_ids[all_numa_ids_idx]; + struct rr_numa *alt_numa; + alt_numa = rr_numa_list_lookup(&rr, alt_numa_id); + q->pmd = rr_numa_get_pmd(alt_numa); + VLOG_WARN("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " - "not be polled.", - numa_id, qid, netdev_get_name(port->netdev)); + "be assigned to the pmd on core %d " + "(numa node %d). Expect reduced performance.", + numa_id, qid, netdev_get_name(port->netdev), + q->pmd->core_id, q->pmd->numa_id); + all_numa_ids_idx++; + if (all_numa_ids_idx > all_numa_ids_max_idx) { + all_numa_ids_idx = 0; + } } else { q->pmd = rr_numa_get_pmd(numa); }