From patchwork Wed Aug 9 15:45:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kevin Traynor X-Patchwork-Id: 799850 X-Patchwork-Delegate: dlu998@gmail.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3xSFzB146hz9rxl for ; Thu, 10 Aug 2017 01:48:06 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id D05C6B7C; Wed, 9 Aug 2017 15:45:55 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 7F528B6A for ; Wed, 9 Aug 2017 15:45:52 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6604446E for ; Wed, 9 Aug 2017 15:45:52 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id BDB1027A6F1; Wed, 9 Aug 2017 15:45:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com BDB1027A6F1 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=ktraynor@redhat.com Received: from ktraynor.remote.csb (ovpn-116-201.ams2.redhat.com [10.36.116.201]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8CEEE6A56E; Wed, 9 Aug 2017 15:45:49 +0000 (UTC) From: Kevin Traynor To: dev@openvswitch.org, ian.stokes@intel.com, jan.scheurich@ericsson.com, bhanuprakash.bodireddy@intel.com, mark.b.kavanagh@intel.com, gvrose8192@gmail.com Date: Wed, 9 Aug 2017 16:45:28 +0100 Message-Id: <1502293530-10783-5-git-send-email-ktraynor@redhat.com> In-Reply-To: <1502293530-10783-1-git-send-email-ktraynor@redhat.com> References: <1501603092-6287-1-git-send-email-ktraynor@redhat.com> <1502293530-10783-1-git-send-email-ktraynor@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 09 Aug 2017 15:45:52 +0000 (UTC) Subject: [ovs-dev] [PATCH v4 4/6] dpif-netdev: Change rxq_scheduling to use rxq processing cycles. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Previously rxqs were assigned to pmds by round robin in port/queue order. Now that we have the processing cycles used for existing rxqs, use that information to try and produced a better balanced distribution of rxqs across pmds. i.e. given multiple pmds, the rxqs which have consumed the largest amount of processing cycles will be placed on different pmds. The rxqs are sorted by their processing cycles and assigned (in sorted order) round robin across pmds. Signed-off-by: Kevin Traynor Tested-by: Greg Rose Reviewed-by: Greg Rose --- Documentation/howto/dpdk.rst | 7 +++ lib/dpif-netdev.c | 105 ++++++++++++++++++++++++++++++------------- 2 files changed, 81 insertions(+), 31 deletions(-) diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index d7f6610..44737e4 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -119,4 +119,11 @@ After that PMD threads on cores where RX queues was pinned will become thread. +If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) +automatically. The processing cycles that have been required for each rxq +will be used where known to assign rxqs with the highest consumption of +processing cycles to different pmds. + +Rxq to pmds assignment takes place whenever there are configuration changes. + QoS --- diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e344063..b4663ab 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3328,8 +3328,29 @@ rr_numa_list_destroy(struct rr_numa_list *rr) } +/* Sort Rx Queues by the processing cycles they are consuming. */ +static int +rxq_cycle_sort(const void *a, const void *b) +{ + struct dp_netdev_rxq * qa; + struct dp_netdev_rxq * qb; + + qa = *(struct dp_netdev_rxq **) a; + qb = *(struct dp_netdev_rxq **) b; + + if (dp_netdev_rxq_get_cycles(qa, RXQ_CYCLES_PROC_LAST) >= + dp_netdev_rxq_get_cycles(qb, RXQ_CYCLES_PROC_LAST)) { + return -1; + } + + return 1; +} + /* Assign pmds to queues. If 'pinned' is true, assign pmds to pinned * queues and marks the pmds as isolated. Otherwise, assign non isolated * pmds to unpinned queues. * + * If 'pinned' is false queues will be sorted by processing cycles they are + * consuming and then assigned to pmds in round robin order. + * * The function doesn't touch the pmd threads, it just stores the assignment * in the 'pmd' member of each rxq. */ @@ -3340,18 +3361,14 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) struct rr_numa_list rr; struct rr_numa *non_local_numa = NULL; - - rr_numa_list_populate(dp, &rr); + struct dp_netdev_rxq ** rxqs = NULL; + int i, n_rxqs = 0; + struct rr_numa *numa = NULL; + int numa_id; HMAP_FOR_EACH (port, node, &dp->ports) { - struct rr_numa *numa; - int numa_id; - if (!netdev_is_pmd(port->netdev)) { continue; } - numa_id = netdev_get_numa_id(port->netdev); - numa = rr_numa_list_lookup(&rr, numa_id); - for (int qid = 0; qid < port->n_rxq; qid++) { struct dp_netdev_rxq *q = &port->rxqs[qid]; @@ -3371,34 +3388,60 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) } } else if (!pinned && q->core_id == OVS_CORE_UNSPEC) { - if (!numa) { - /* There are no pmds on the queue's local NUMA node. - Round-robin on the NUMA nodes that do have pmds. */ - non_local_numa = rr_numa_list_next(&rr, non_local_numa); - if (!non_local_numa) { - VLOG_ERR("There is no available (non-isolated) pmd " - "thread for port \'%s\' queue %d. This queue " - "will not be polled. Is pmd-cpu-mask set to " - "zero? Or are all PMDs isolated to other " - "queues?", netdev_get_name(port->netdev), - qid); - continue; - } - q->pmd = rr_numa_get_pmd(non_local_numa); - VLOG_WARN("There's no available (non-isolated) pmd thread " - "on numa node %d. Queue %d on port \'%s\' will " - "be assigned to the pmd on core %d " - "(numa node %d). Expect reduced performance.", - numa_id, qid, netdev_get_name(port->netdev), - q->pmd->core_id, q->pmd->numa_id); + if (n_rxqs == 0) { + rxqs = xmalloc(sizeof *rxqs); } else { - /* Assign queue to the next (round-robin) PMD on it's local - NUMA node. */ - q->pmd = rr_numa_get_pmd(numa); + rxqs = xrealloc(rxqs, sizeof *rxqs * (n_rxqs + 1)); } + /* Store the queue. */ + rxqs[n_rxqs++] = q; } } } + if (n_rxqs > 1) { + /* Sort the queues in order of the processing cycles + * they consumed during their last pmd interval. */ + qsort(rxqs, n_rxqs, sizeof *rxqs, rxq_cycle_sort); + } + + rr_numa_list_populate(dp, &rr); + /* Assign the sorted queues to pmds in round robin. */ + for (i = 0; i < n_rxqs; i++) { + numa_id = netdev_get_numa_id(rxqs[i]->port->netdev); + numa = rr_numa_list_lookup(&rr, numa_id); + if (!numa) { + /* There are no pmds on the queue's local NUMA node. + Round-robin on the NUMA nodes that do have pmds. */ + non_local_numa = rr_numa_list_next(&rr, non_local_numa); + if (!non_local_numa) { + VLOG_ERR("There is no available (non-isolated) pmd " + "thread for port \'%s\' queue %d. This queue " + "will not be polled. Is pmd-cpu-mask set to " + "zero? Or are all PMDs isolated to other " + "queues?", netdev_rxq_get_name(rxqs[i]->rx), + netdev_rxq_get_queue_id(rxqs[i]->rx)); + continue; + } + rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa); + VLOG_WARN("There's no available (non-isolated) pmd thread " + "on numa node %d. Queue %d on port \'%s\' will " + "be assigned to the pmd on core %d " + "(numa node %d). Expect reduced performance.", + numa_id, netdev_rxq_get_queue_id(rxqs[i]->rx), + netdev_rxq_get_name(rxqs[i]->rx), + rxqs[i]->pmd->core_id, rxqs[i]->pmd->numa_id); + } else { + rxqs[i]->pmd = rr_numa_get_pmd(numa); + VLOG_INFO("Core %d on numa node %d assigned port \'%s\' " + "rx queue %d (measured processing cycles %"PRIu64").", + rxqs[i]->pmd->core_id, numa_id, + netdev_rxq_get_name(rxqs[i]->rx), + netdev_rxq_get_queue_id(rxqs[i]->rx), + dp_netdev_rxq_get_cycles(rxqs[i], RXQ_CYCLES_PROC_LAST)); + } + } + rr_numa_list_destroy(&rr); + free(rxqs); }