From patchwork Mon Apr 16 14:30:19 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Stephen Finucane X-Patchwork-Id: 898654 X-Patchwork-Delegate: ian.stokes@intel.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=that.guru Authentication-Results: ozlabs.org; dkim=fail reason="key not found in DNS" (0-bit key; unprotected) header.d=that.guru header.i=@that.guru header.b="nFwJVojp"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40PrVm1TmCz9rx7 for ; Tue, 17 Apr 2018 00:34:24 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 1FB56EBA; Mon, 16 Apr 2018 14:30:49 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 60A42E81 for ; Mon, 16 Apr 2018 14:30:47 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from glass.birch.relay.mailchannels.net (glass.birch.relay.mailchannels.net [23.83.209.70]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id B355B682 for ; Mon, 16 Apr 2018 14:30:45 +0000 (UTC) X-Sender-Id: 5xi41l16bi|x-authuser|stephen@that.guru Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 1ECFE6813B4; Mon, 16 Apr 2018 14:30:39 +0000 (UTC) Received: from one.mxroute.com (unknown [100.96.20.22]) (Authenticated sender: 5xi41l16bi) by relay.mailchannels.net (Postfix) with ESMTPA id 77BE768131F; Mon, 16 Apr 2018 14:30:38 +0000 (UTC) X-Sender-Id: 5xi41l16bi|x-authuser|stephen@that.guru Received: from one.mxroute.com (one-outgoing.mxroute.com [172.19.42.59]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384) by 0.0.0.0:2500 (trex/5.14.1); Mon, 16 Apr 2018 14:30:39 +0000 X-MC-Relay: Neutral X-MailChannels-SenderId: 5xi41l16bi|x-authuser|stephen@that.guru X-MailChannels-Auth-Id: 5xi41l16bi X-Rock-Power: 6ae43fb140215078_1523889039005_3115375063 X-MC-Loop-Signature: 1523889039005:981229200 X-MC-Ingress-Time: 1523889039004 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=that.guru; s=default; h=References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Sender:Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding: Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=Ec/FDWQZbYVUK/x+sr0nTvbfv1B6ZORr03zEsp+M39c=; b=nFwJVojp6gEU8LEBpelqTlGEUU 11z6hD5WRk0cHZANuHiURy5+xmS7jEODuzFk6cnbCsCuX5ArG9/nGK8hD2d6p2pqj3+xFeyWOZ/Zs CD9ReltqE+ixsvKlez/ohdgh8mNtEUorbMFDJJ+nchV0j6qu084OWjCuTuRylF2e3hil73qihLqqg 34Y57dYUKKZ01W7wGxJ9yO7M9oVUwqEeLpAHOjvnGq8FORp5W+XMfBsxW8JU+FWaB3HfdbxzAicZJ iGpNQjhJ2maE877CK2XqI/rzM2Jm3MJPqHv4DAqB+NOxyilvwr4KVyJQ1SOA7NPJ6Hz9nAESoQ813 oBe8yOZw==; From: Stephen Finucane To: dev@openvswitch.org Date: Mon, 16 Apr 2018 15:30:19 +0100 Message-Id: <20180416143026.24561-3-stephen@that.guru> X-Mailer: git-send-email 2.14.3 In-Reply-To: <20180416143026.24561-1-stephen@that.guru> References: <20180416143026.24561-1-stephen@that.guru> X-AuthUser: stephen@that.guru X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_NONE,T_DKIM_INVALID autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH v2 2/9] doc: Add "PMD" topic document X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This continues the breakup of the huge DPDK "howto" into smaller components. There are a couple of related changes included, such as using "Rx queue" instead of "rxq" and noting how Tx queues cannot be configured. Signed-off-by: Stephen Finucane --- v2: - Add cross-references from 'pmd' doc to 'vhost-user' and 'phy' docs - Add 'versionchanged' warning about automatic assignment of Rx queues - Add a 'todo' to describe Tx queue behavior --- Documentation/howto/dpdk.rst | 86 ----------------- Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/phy.rst | 12 +++ Documentation/topics/dpdk/pmd.rst | 156 +++++++++++++++++++++++++++++++ Documentation/topics/dpdk/vhost-user.rst | 17 ++-- 5 files changed, 177 insertions(+), 95 deletions(-) create mode 100644 Documentation/topics/dpdk/pmd.rst diff --git a/Documentation/howto/dpdk.rst b/Documentation/howto/dpdk.rst index 79b626c76..388728363 100644 --- a/Documentation/howto/dpdk.rst +++ b/Documentation/howto/dpdk.rst @@ -81,92 +81,6 @@ To stop ovs-vswitchd & delete bridge, run:: $ ovs-appctl -t ovsdb-server exit $ ovs-vsctl del-br br0 -PMD Thread Statistics ---------------------- - -To show current stats:: - - $ ovs-appctl dpif-netdev/pmd-stats-show - -To clear previous stats:: - - $ ovs-appctl dpif-netdev/pmd-stats-clear - -Port/RXQ Assigment to PMD Threads ---------------------------------- - -To show port/rxq assignment:: - - $ ovs-appctl dpif-netdev/pmd-rxq-show - -To change default rxq assignment to pmd threads, rxqs may be manually pinned to -desired cores using:: - - $ ovs-vsctl set Interface \ - other_config:pmd-rxq-affinity= - -where: - -- ```` is a CSV list of ``:`` values - -For example:: - - $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ - other_config:pmd-rxq-affinity="0:3,1:7,3:8" - -This will ensure: - -- Queue #0 pinned to core 3 -- Queue #1 pinned to core 7 -- Queue #2 not pinned -- Queue #3 pinned to core 8 - -After that PMD threads on cores where RX queues was pinned will become -``isolated``. This means that this thread will poll only pinned RX queues. - -.. warning:: - If there are no ``non-isolated`` PMD threads, ``non-pinned`` RX queues will - not be polled. Also, if provided ``core_id`` is not available (ex. this - ``core_id`` not in ``pmd-cpu-mask``), RX queue will not be polled by any PMD - thread. - -If pmd-rxq-affinity is not set for rxqs, they will be assigned to pmds (cores) -automatically. The processing cycles that have been stored for each rxq -will be used where known to assign rxqs to pmd based on a round robin of the -sorted rxqs. - -For example, in the case where here there are 5 rxqs and 3 cores (e.g. 3,7,8) -available, and the measured usage of core cycles per rxq over the last -interval is seen to be: - -- Queue #0: 30% -- Queue #1: 80% -- Queue #3: 60% -- Queue #4: 70% -- Queue #5: 10% - -The rxqs will be assigned to cores 3,7,8 in the following order: - -Core 3: Q1 (80%) | -Core 7: Q4 (70%) | Q5 (10%) -core 8: Q3 (60%) | Q0 (30%) - -To see the current measured usage history of pmd core cycles for each rxq:: - - $ ovs-appctl dpif-netdev/pmd-rxq-show - -.. note:: - - A history of one minute is recorded and shown for each rxq to allow for - traffic pattern spikes. An rxq's pmd core cycles usage changes due to traffic - pattern or reconfig changes will take one minute before they are fully - reflected in the stats. - -Rxq to pmds assignment takes place whenever there are configuration changes -or can be triggered by using:: - - $ ovs-appctl dpif-netdev/pmd-rxq-rebalance - QoS --- diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index 5f836a6e9..dfde88377 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -31,3 +31,4 @@ The DPDK Datapath phy vhost-user ring + pmd diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst index a3f8b475c..ad191dad0 100644 --- a/Documentation/topics/dpdk/phy.rst +++ b/Documentation/topics/dpdk/phy.rst @@ -113,3 +113,15 @@ tool:: For more information, refer to the `DPDK documentation `__. .. _dpdk-drivers: http://dpdk.org/doc/guides/linux_gsg/linux_drivers.html + +.. _dpdk-phy-multiqueue: + +Multiqueue +---------- + +Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for +the DPDK datapath. Correct configuration of PMD threads and the Rx queues they +utilize is a requirement in order to deliver the high-performance possible with +DPDK acceleration. It is possible to configure multiple Rx queues for ``dpdk`` +ports, thus ensuring this is not a bottleneck for performance. For information +on configuring PMD threads, refer to :doc:`pmd`. diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst new file mode 100644 index 000000000..1be25ade0 --- /dev/null +++ b/Documentation/topics/dpdk/pmd.rst @@ -0,0 +1,156 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +=========== +PMD Threads +=========== + +Poll Mode Driver (PMD) threads are the threads that do the heavy lifting for +the DPDK datapath and perform tasks such as continuous polling of input ports +for packets, classifying packets once received, and executing actions on the +packets once they are classified. + +PMD threads utilize Receive (Rx) and Transmit (Tx) queues, commonly known as +*rxq*\s and *txq*\s. While Tx queue configuration happens automatically, Rx +queues can be configured by the user. This can happen in one of two ways: + +- For physical interfaces, configuration is done using the + :program:`ovs-appctl` utility. + +- For virtual interfaces, configuration is done using the :program:`ovs-appctl` + utility, but this configuration must be reflected in the guest configuration + (e.g. QEMU command line arguments). + +The :program:`ovs-appctl` utility also provides a number of commands for +querying PMD threads and their respective queues. This, and all of the above, +is discussed here. + +.. todo:: + + Add an overview of Tx queues including numbers created, how they relate to + PMD threads, etc. + +PMD Thread Statistics +--------------------- + +To show current stats:: + + $ ovs-appctl dpif-netdev/pmd-stats-show + +To clear previous stats:: + + $ ovs-appctl dpif-netdev/pmd-stats-clear + +Port/Rx Queue Assigment to PMD Threads +-------------------------------------- + +.. todo:: + + This needs a more detailed overview of *why* this should be done, along with + the impact on things like NUMA affinity. + +Correct configuration of PMD threads and the Rx queues they utilize is a +requirement in order to achieve maximum performance. This is particularly true +for enabling things like multiqueue for :ref:`physical ` +and :ref:`vhost-user ` interfaces. + +To show port/Rx queue assignment:: + + $ ovs-appctl dpif-netdev/pmd-rxq-show + +Rx queues may be manually pinned to cores. This will change the default Rx +queue assignment to PMD threads:: + + $ ovs-vsctl set Interface \ + other_config:pmd-rxq-affinity= + +where: + +- ```` is a CSV list of ``:`` values + +For example:: + + $ ovs-vsctl set interface dpdk-p0 options:n_rxq=4 \ + other_config:pmd-rxq-affinity="0:3,1:7,3:8" + +This will ensure there are *4* Rx queues and that these queues are configured +like so: + +- Queue #0 pinned to core 3 +- Queue #1 pinned to core 7 +- Queue #2 not pinned +- Queue #3 pinned to core 8 + +PMD threads on cores where Rx queues are *pinned* will become *isolated*. This +means that this thread will only poll the *pinned* Rx queues. + +.. warning:: + + If there are no *non-isolated* PMD threads, *non-pinned* RX queues will not + be polled. Also, if the provided ```` is not available (e.g. the + ```` is not in ``pmd-cpu-mask``), the RX queue will not be polled + by any PMD thread. + +If ``pmd-rxq-affinity`` is not set for Rx queues, they will be assigned to PMDs +(cores) automatically. Where known, the processing cycles that have been stored +for each Rx queue will be used to assign Rx queue to PMDs based on a round +robin of the sorted Rx queues. For example, take the following example, where +there are five Rx queues and three cores - 3, 7, and 8 - available and the +measured usage of core cycles per Rx queue over the last interval is seen to +be: + +- Queue #0: 30% +- Queue #1: 80% +- Queue #3: 60% +- Queue #4: 70% +- Queue #5: 10% + +The Rx queues will be assigned to the cores in the following order:: + + Core 3: Q1 (80%) | + Core 7: Q4 (70%) | Q5 (10%) + Core 8: Q3 (60%) | Q0 (30%) + +To see the current measured usage history of PMD core cycles for each Rx +queue:: + + $ ovs-appctl dpif-netdev/pmd-rxq-show + +.. note:: + + A history of one minute is recorded and shown for each Rx queue to allow for + traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage, + due to traffic pattern or reconfig changes, will take one minute to be fully + reflected in the stats. + +Rx queue to PMD assignment takes place whenever there are configuration changes +or can be triggered by using:: + + $ ovs-appctl dpif-netdev/pmd-rxq-rebalance + +.. versionchanged:: 2.8.0 + + Automatic assignment of Rx queues to PMDs and the two related commands, + ``pmd-rxq-show`` and ``pmd-rxq-rebalance``, were added in OVS 2.8.0. Prior + to this, behavior was round-robin and processing cycles were not taken into + consideration. Tracking for stats was not available. diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst index ca8a3289f..6f794f296 100644 --- a/Documentation/topics/dpdk/vhost-user.rst +++ b/Documentation/topics/dpdk/vhost-user.rst @@ -130,11 +130,10 @@ an additional set of parameters:: -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 -In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user -ports access a virtio-net device's virtual rings and packet buffers mapping the -VM's physical memory on hugetlbfs. To enable vhost-user ports to map the VM's -memory into their process address space, pass the following parameters to -QEMU:: +In addition, QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports +access a virtio-net device's virtual rings and packet buffers mapping the VM's +physical memory on hugetlbfs. To enable vhost-user ports to map the VM's memory +into their process address space, pass the following parameters to QEMU:: -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on -numa node,memdev=mem -mem-prealloc @@ -154,18 +153,18 @@ where: The number of vectors, which is ``$q`` * 2 + 2 The vhost-user interface will be automatically reconfigured with required -number of rx and tx queues after connection of virtio device. Manual +number of Rx and Tx queues after connection of virtio device. Manual configuration of ``n_rxq`` is not supported because OVS will work properly only if ``n_rxq`` will match number of queues configured in QEMU. -A least 2 PMDs should be configured for the vswitch when using multiqueue. +A least two PMDs should be configured for the vswitch when using multiqueue. Using a single PMD will cause traffic to be enqueued to the same vhost queue rather than being distributed among different vhost queues for a vhost-user interface. If traffic destined for a VM configured with multiqueue arrives to the vswitch -via a physical DPDK port, then the number of rxqs should also be set to at -least 2 for that physical DPDK port. This is required to increase the +via a physical DPDK port, then the number of Rx queues should also be set to at +least two for that physical DPDK port. This is required to increase the probability that a different PMD will handle the multiqueue transmission to the guest using a different vhost queue.