From patchwork Wed Mar 16 16:18:23 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Bodireddy, Bhanuprakash" X-Patchwork-Id: 598515 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from archives.nicira.com (archives.nicira.com [96.126.127.54]) by ozlabs.org (Postfix) with ESMTP id 3qQGrV31rGz9t5S for ; Thu, 17 Mar 2016 03:18:50 +1100 (AEDT) Received: from archives.nicira.com (localhost [127.0.0.1]) by archives.nicira.com (Postfix) with ESMTP id C13531059B; Wed, 16 Mar 2016 09:18:47 -0700 (PDT) X-Original-To: dev@openvswitch.org Delivered-To: dev@openvswitch.org Received: from mx1e3.cudamail.com (mx1.cudamail.com [69.90.118.67]) by archives.nicira.com (Postfix) with ESMTPS id 4027610597 for ; Wed, 16 Mar 2016 09:18:46 -0700 (PDT) Received: from bar5.cudamail.com (localhost [127.0.0.1]) by mx1e3.cudamail.com (Postfix) with ESMTPS id C1225420234 for ; Wed, 16 Mar 2016 10:18:45 -0600 (MDT) X-ASG-Debug-ID: 1458145125-09eadd30622d7450001-byXFYA Received: from mx1-pf2.cudamail.com ([192.168.24.2]) by bar5.cudamail.com with ESMTP id prEgPgoRiTpp77mD (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Wed, 16 Mar 2016 10:18:45 -0600 (MDT) X-Barracuda-Envelope-From: bbodired@ecsmtp.ir.intel.com X-Barracuda-RBL-Trusted-Forwarder: 192.168.24.2 Received: from unknown (HELO mga02.intel.com) (134.134.136.20) by mx1-pf2.cudamail.com with SMTP; 16 Mar 2016 16:18:44 -0000 Received-SPF: none (mx1-pf2.cudamail.com: domain at ecsmtp.ir.intel.com does not designate permitted sender hosts) X-Barracuda-Apparent-Source-IP: 134.134.136.20 X-Barracuda-RBL-IP: 134.134.136.20 Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga101.jf.intel.com with ESMTP; 16 Mar 2016 09:18:25 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.24,345,1455004800"; d="scan'208";a="670670445" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by FMSMGA003.fm.intel.com with ESMTP; 16 Mar 2016 09:18:25 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id u2GGINpd011658 for ; Wed, 16 Mar 2016 16:18:23 GMT Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id u2GGINHO025589 for ; Wed, 16 Mar 2016 16:18:23 GMT Received: (from bbodired@localhost) by sivswdev01.ir.intel.com with id u2GGINj3025584 for dev@openvswitch.org; Wed, 16 Mar 2016 16:18:23 GMT X-CudaMail-Envelope-Sender: bbodired@ecsmtp.ir.intel.com From: Bhanuprakash Bodireddy To: dev@openvswitch.org X-CudaMail-MID: CM-E2-315047653 X-CudaMail-DTE: 031616 X-CudaMail-Originating-IP: 134.134.136.20 Date: Wed, 16 Mar 2016 16:18:23 +0000 X-ASG-Orig-Subj: [##CM-E2-315047653##][PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc Message-Id: <1458145103-20708-3-git-send-email-bhanuprakash.bodireddy@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1458145103-20708-1-git-send-email-bhanuprakash.bodireddy@intel.com> References: <1458145103-20708-1-git-send-email-bhanuprakash.bodireddy@intel.com> X-GBUdb-Analysis: 0, 134.134.136.20, Ugly c=0 p=0 Source New X-MessageSniffer-Rules: 0-0-0-32767-c X-Barracuda-Connect: UNKNOWN[192.168.24.2] X-Barracuda-Start-Time: 1458145125 X-Barracuda-Encrypted: DHE-RSA-AES256-SHA X-Barracuda-URL: https://web.cudamail.com:443/cgi-mod/mark.cgi X-Virus-Scanned: by bsmtpd at cudamail.com X-Barracuda-BRTS-Status: 1 X-ASG-Whitelist: EmailCat (corporate) Subject: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc X-BeenThere: dev@openvswitch.org X-Mailman-Version: 2.1.16 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: dev-bounces@openvswitch.org Sender: "dev" Add INSTALL.DPDK-ADVANCED document that is forked off from original INSTALL.DPDK guide. This document is targeted at users looking for optimum performance on OVS using dpdk datapath. Signed-off-by: Bhanuprakash Bodireddy --- INSTALL.DPDK-ADVANCED.md | 650 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 650 insertions(+), 0 deletions(-) create mode 100644 INSTALL.DPDK-ADVANCED.md diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md new file mode 100644 index 0000000..ef01224 --- /dev/null +++ b/INSTALL.DPDK-ADVANCED.md @@ -0,0 +1,650 @@ +OVS DPDK ADVANCED INSTALL GUIDE +================================= + +## Contents + +1. [Overview](#overview) +2. [Building Shared Library](#build) +3. [System configuration](#sysconf) +4. [Performance Tuning](#perftune) +5. [OVS Testcases](#ovstc) +6. [Vhost Walkthrough](#vhost) +7. [QOS](#qos) +8. [Static Code Analysis](#staticanalyzer) + +## 1. Overview + +The Advanced Install Guide explains how to improve OVS performance using +DPDK datapath. This guide also provides information on tuning, system configuration, +troubleshooting, static code analysis and testcases. + +## 2. Building Shared Library + +DPDK can be built as static or shared library and shall be linked by applications +using DPDK datapath. The section lists steps to build shared library and dynamically +link DPDK against OVS. + +Note: Minor performance loss is seen with OVS when using shared DPDK library as +compared to static library. + +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions +for DPDK and OVS. + + * Configure the DPDK library + + Set `CONFIG_RTE_BUILD_SHARED_LIB=y, CONFIG_RTE_BUILD_COMBINE_LIBS=y` in + `config/common_linuxapp` to generate shared DPDK library + + + * Build and install DPDK + + For Default install (without IVSHMEM), set `export DPDK_TARGET=x86_64-native-linuxapp-gcc` + For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` + + ``` + export DPDK_DIR=/usr/src/dpdk-2.2.0 + export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET + make install T=$DPDK_TARGET DESTDIR=$DPDK_BUILD/install + ``` + + * Build, Install and Setup OVS. + + Export the DPDK shared library location and setup OVS as listed in + section 3.3 of INSTALL.DPDK. + + `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib` + +## 3. System Configuration + +To achieve optimal OVS performance, the system can be configured and that includes +BIOS tweaks, Grub cmdline additions, better understanding of NUMA nodes and +apt selection of PCIe slots for NIC placement. + +### 3.1 Recommended BIOS settings + + ``` + | Settings | values | comments + |---------------------------|-----------|----------- + | C3 power state | Disabled | - + | C6 power state | Disabled | - + | MLC Streamer | Enabled | - + | MLC Spacial prefetcher | Enabled | - + | DCU Data prefetcher | Enabled | - + | DCA | Enabled | - + | CPU power and performance | Performance - + | Memory RAS and perf | | - + config-> NUMA optimized | Enabled | - + ``` + +### 3.2 PCIe Slot Selection + +The fastpath performance also depends on factors like the NIC placement, +Channel speeds between PCIe slot and CPU, proximity of PCIe slot to the CPU +cores running DPDK application. Listed below are the steps to identify +right PCIe slot. + +- Retrieve host details using cmd `dmidecode -t baseboard | grep "Product Name"` +- Download the technical specification for Product listed eg: S2600WT2. +- Check the Product Architecture Overview on the Riser slot placement, + CPU sharing info and also PCIe channel speeds. + + example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed between + CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s. Running DPDK app + on CPU1 cores and NIC inserted in to Riser card Slots will optimize OVS performance + in this case. + +- Check the Riser Card #1 - Root Port mapping information, on the available slots + and individual bus speeds. In S2600WT slot 1, slot 2 has high bus speeds and are + potential slots for NIC placement. + +### 3.3 Setup Hugepages + + 1. Allocate Huge pages + + For persistent allocation of huge pages, add the following options to the kernel bootline + - 2MB huge pages: + + Add `hugepages=N` + + - 1G huge pages: + + Add `default_hugepagesz=1GB hugepagesz=1G hugepages=N` + + For platforms supporting multiple huge page sizes, Add options + + `default_hugepagesz= hugepagesz= hugepages=N` + where 'N' = Number of huge pages requested, 'size' = huge page size, + optional suffix [kKmMgG] + + For run-time allocation of huge pages + + - 2MB huge pages: + + `echo N > /proc/sys/vm/nr_hugepages` + + - 1G huge pages: + + `echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages` + where 'N' = Number of huge pages requested, 'X' = NUMA Node + + Note: For run-time allocation of 1G huge pages, Contiguous Memory Allocator(CONFIG_CMA) + has to be supported by kernel, check with your Linux distro. + + 2. Mount huge pages + - 2MB huge pages: + + `mount -t hugetlbfs none /dev/hugepages` + + - 1G huge pages: + + `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` + +### 3.4 Enable Hyperthreading + + Requires BIOS changes + + With HT/SMT enabled, A Physical core appears as two logical cores. + SMT can be utilized to spawn worker threads on logical cores of the same + physical core there by saving additional cores. + + With DPDK, When pinning pmd threads to logical cores, care must be taken + to set the correct bits in the pmd-cpu-mask to ensure that the pmd threads are + pinned to SMT siblings. + + Example System configuration: + Dual socket Machine, 2x 10 core processors, HT enabled, 40 logical cores + + To use two logical cores which share the same physical core for pmd threads, + the following command can be used to identify a pair of logical cores. + + `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`, where N is the + logical core number. + + In this example, it would show that cores 1 and 21 share the same physical core. + The pmd-cpu-mask to enable two pmd threads running on these two logical cores + (one physical core) is. + + `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002` + +### 3.5 Isolate cores + + 'isolcpus' option can be used to isolate cores from the linux scheduler. + The isolated cores can then be used to dedicatedly run HPC applications/threads. + This helps in better application performance due to zero context switching and + minimal cache thrashing. To run platform logic on core 0 and isolate cores + between 1 and 19 from scheduler, Add `isolcpus=1-19` to GRUB cmdline. + + Note: It has been verified that core isolation has minimal advantage due to + mature Linux scheduler in some circumstances. + +### 3.6 NUMA/Cluster on Die + + Ideally inter NUMA datapaths should be avoided where possible as packets + will go across QPI and there may be a slight performance penalty when + compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3, + Cluster On Die is introduced on models that have 10 cores or more. + This makes it possible to logically split a socket into two NUMA regions + and again it is preferred where possible to keep critical datapaths + within the one cluster. + + It is good practice to ensure that threads that are in the datapath are + pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs + responsible for forwarding. + +### 3.7 Compiler Optimizations + + The default compiler optimization level is '-O2'. Changing this to + more aggressive compiler optimizations such as '-O3' or + '-Ofast -march=native' with gcc(verified on 5.3.1) can produce performance + gains though not siginificant. '-march=native' will produce optimized code + on local machine and should be used when SW compilation is done on Testbed. + +## 4. Performance Tuning + +### 4.1 Affinity + +For superior performance, DPDK pmd threads and Qemu vCPU threads +needs to be affinitized accordingly. + + * PMD thread Affinity + + A poll mode driver (pmd) thread handles the I/O of all DPDK + interfaces assigned to it. A pmd thread shall poll the ports + for incoming packets, switch the packets and send to tx port. + pmd thread is CPU bound, and needs to be affinitized to isolated + cores for optimum performance. + + By setting a bit in the mask, a pmd thread is created and pinned + to the corresponding CPU core. e.g. to run a pmd thread on core 2 + + `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` + + Note: pmd thread on a NUMA node is only created if there is + at least one DPDK interface from that NUMA node added to OVS. + + * Qemu vCPU thread Affinity + + A VM performing simple packet forwarding or running complex packet + pipelines has to ensure that the vCPU threads performing the work has + as much CPU occupancy as possible. + + Example: On a multicore VM, multiple QEMU vCPU threads shall be spawned. + when the DPDK 'testpmd' application that does packet forwarding + is invoked, 'taskset' cmd should be used to affinitize the vCPU threads + to the dedicated isolated cores on the host system. + +### 4.2 Multiple poll mode driver threads + + With pmd multi-threading support, OVS creates one pmd thread + for each NUMA node by default. However, it can be seen that in cases + where there are multiple ports/rxq's producing traffic, performance + can be improved by creating multiple pmd threads running on separate + cores. These pmd threads can then share the workload by each being + responsible for different ports/rxq's. Assignment of ports/rxq's to + pmd threads is done automatically. + + A set bit in the mask means a pmd thread is created and pinned + to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 + + `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` + + For example, when using dpdk and dpdkvhostuser ports in a bi-directional + VM loopback as shown below, spreading the workload over 2 or 4 pmd + threads shows significant improvements as there will be more total CPU + occupancy available. + + NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 + +### 4.3 DPDK port Rx Queues + + `ovs-vsctl set Interface options:n_rxq=` + + The command above sets the number of rx queues for DPDK interface. + The rx queues are assigned to pmd threads on the same NUMA node in a + round-robin fashion. For more information, please refer to the + Open_vSwitch TABLE section in + + `man ovs-vswitchd.conf.db` + +### 4.4 Exact Match Cache + + Each pmd thread contains one EMC. After initial flow setup in the + datapath, the EMC contains a single table and provides the lowest level + (fastest) switching for DPDK ports. If there is a miss in the EMC then + the next level where switching will occur is the datapath classifier. + Missing in the EMC and looking up in the datapath classifier incurs a + significant performance penalty. If lookup misses occur in the EMC + because it is too small to handle the number of flows, its size can + be increased. The EMC size can be modified by editing the define + EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c. + + As mentioned above an EMC is per pmd thread. So an alternative way of + increasing the aggregate amount of possible flow entries in EMC and + avoiding datapath classifier lookups is to have multiple pmd threads + running. This can be done as described in section 4.2. + +### 4.5 Rx Mergeable buffers + + Rx Mergeable buffers is a virtio feature that allows chaining of multiple + virtio descriptors to handle large packet sizes. As such, large packets + are handled by reserving and chaining multiple free descriptors + together. Mergeable buffer support is negotiated between the virtio + driver and virtio device and is supported by the DPDK vhost library. + This behavior is typically supported and enabled by default, however + in the case where the user knows that rx mergeable buffers are not needed + i.e. jumbo frames are not needed, it can be forced off by adding + mrg_rxbuf=off to the QEMU command line options. By not reserving multiple + chains of descriptors it will make more individual virtio descriptors + available for rx to the guest using dpdkvhost ports and this can improve + performance. + +## 5. OVS Testcases +### 5.1 PHY-VM-PHY [VHOST LOOPBACK] + +The section 5.2 in INSTALL.DPDK guide lists steps for PVP loopback testcase +and packet forwarding using DPDK testpmd application in the Guest VM. +For users wanting to do packet forwarding using kernel stack below are the steps. + + ``` + ifconfig eth1 1.1.1.2/24 + ifconfig eth2 1.1.2.2/24 + systemctl stop firewalld.service + systemctl stop iptables.service + sysctl -w net.ipv4.ip_forward=1 + sysctl -w net.ipv4.conf.all.rp_filter=0 + sysctl -w net.ipv4.conf.eth0.rp_filter=0 + sysctl -w net.ipv4.conf.eth1.rp_filter=0 + route add -net 1.1.2.0/24 eth2 + route add -net 1.1.1.0/24 eth1 + arp -s 1.1.2.99 DE:AD:BE:EF:CA:FE + arp -s 1.1.1.99 DE:AD:BE:EF:CA:EE + ``` + +### 5.2 PHY-VM-PHY [IVSHMEM] + +IVSHMEM works only with 1GB huge pages. + + The steps (1-5) in 3.3 section of INSTALL.DPDK guide will create & initialize DB, + start vswitchd and add dpdk devices to bridge br0. + + 1. Add DPDK ring port to the bridge + + ``` + ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr + ``` + + 2. Copy runtime configuration to VM, To achieve this copy the files to a temporary + directory, say /tmp/rte_config and export the directory to the VM + + ``` + mkdir /tmp/rte_config + chmod 644 /tmp/rte_config + cp -a /run/.rte_config /run/.rte_hugepage_info /tmp/rte_config + ``` + + 3. Build modified Qemu + + ``` + cd /usr/src/ + wget https://github.com/01org/dpdk-ovs/archive/development.zip + unzip development.zip + cd dpdk-ovs-development/qemu + ./configure --target-list=x86_64-softmmu --enable-debug --extra-cflags='-g' + make -j 4 + ``` + + 4. start Guest VM + + ``` + export VM_NAME=ivshmem-vm + export QCOW2_IMAGE=CentOS7_x86_64.qcow2 + export QEMU_BIN=/usr/src/dpdk-ovs-development/qemu/x86_64-softmmu/qemu-system-x86_64 + + taskset 0x20 $QEMU_BIN -cpu host -smp 2,cores=2 -hda $QCOW2_IMAGE -drive file=fat:rw:/tmp/rte_config,snapshot=off -m 4096M --enable-kvm -name $VM_NAME -nographic -vnc :2 -pidfile /tmp/vm1.pid -mem-path /dev/hugepages -mem-prealloc -device ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x40000000 + ``` + + 5. Running sample "dpdk ring" app in VM + + ``` + umount /dev/hugepages + mount -t hugetlbfs hugetlbfs /mnt/hugepages + ln -s /sys/devices/pci0000:00/0000:00:04.0/resource2 /dev/hugepages/rtemap_0 + mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs + cp /mnt/ovs/.rte_config /run/. + cp /mnt/ovs/.rte_hugepage_info /run/. + + # Build the DPDK ring application in the VM + export RTE_SDK=/root/dpdk2.2.0 + export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc + make + + # Run dpdkring application + ./build/dpdkr -c 1 -n 4 --proc-type=secondary -- -n 0 + where "-n 0" refers to ring '0' i.e dpdkr0 + ``` + +## 6. Vhost Walkthrough + +DPDK 2.2 supports two types of vhost: +1. vhost-user - enabled default +2. vhost-cuse - Legacy, disabled by default + +### 6.1 vhost-user + + - Prerequisites: + + QEMU version >= 2.2 + + - Adding vhost-user ports to Switch + + Unlike DPDK ring ports, DPDK vhost-user ports can have arbitrary names, + except that forward and backward slashes are prohibited in the names. + + For vhost-user, the name of the port type is `dpdkvhostuser` + + ``` + ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 + type=dpdkvhostuser + ``` + + This action creates a socket located at + `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide + to your VM on the QEMU command line. More instructions on this can be + found in the next section "Adding vhost-user ports to VM" + + Note: If you wish for the vhost-user sockets to be created in a + directory other than `/usr/local/var/run/openvswitch`, you may specify + another location on the ovs-vswitchd command line like so: + + `./vswitchd/ovs-vswitchd --dpdk -vhost_sock_dir /my-dir -c 0x1 ...` + + - Adding vhost-user ports to VM + + 1. Configure sockets + + Pass the following parameters to QEMU to attach a vhost-user device: + + ``` + -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + ``` + + where vhost-user-1 is the name of the vhost-user port added + to the switch. + Repeat the above parameters for multiple devices, changing the + chardev path and id as necessary. Note that a separate and different + chardev path needs to be specified for each vhost-user device. For + example you have a second vhost-user port named 'vhost-user-2', you + append your QEMU command line with an additional set of parameters: + + ``` + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 + ``` + + 2. Configure huge pages. + + QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports access + a virtio-net device's virtual rings and packet buffers mapping the VM's + physical memory on hugetlbfs. To enable vhost-user ports to map the VM's + memory into their process address space, pass the following parameters + to QEMU: + + ``` + -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, + share=on -numa node,memdev=mem -mem-prealloc + ``` + + 3. Enable multiqueue support(OPTIONAL) + + The vhost-user interface must be configured in Open vSwitch with the + desired amount of queues with: + + ``` + ovs-vsctl set Interface vhost-user-2 options:n_rxq= + ``` + + QEMU needs to be configured as well. + The $q below should match the queues requested in OVS (if $q is more, + packets will not be received). + The $v is the number of vectors, which is '$q x 2 + 2'. + + ``` + -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 + -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q + -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v + ``` + + If one wishes to use multiple queues for an interface in the guest, the + driver in the guest operating system must be configured to do so. It is + recommended that the number of queues configured be equal to '$q'. + + For example, this can be done for the Linux kernel virtio-net driver with: + + ``` + ethtool -L combined <$q> + ``` + where `-L`: Changes the numbers of channels of the specified network device + and `combined`: Changes the number of multi-purpose channels. + +### 6.2 vhost-cuse + + - Prerequisites: + + QEMU version >= 2.2 + + - Enable vhost-cuse support + + 1. Enable vhost cuse support in DPDK + + Set `CONFIG_RTE_LIBRTE_VHOST_USER=n` in config/common_linuxapp and follow the + steps in 2.2 section of INSTALL.DPDK guide to build DPDK with cuse support. + OVS will detect that DPDK has vhost-cuse libraries compiled and in turn will enable + support for it in the switch and disable vhost-user support. + + 2. Insert the Cuse module + + `modprobe cuse` + + 3. Build and insert the `eventfd_link` module + + ``` + cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ + make + insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko + ``` + + - Adding vhost-cuse ports to Switch + + Unlike DPDK ring ports, DPDK vhost-cuse ports can have arbitrary names. + For vhost-cuse, the name of the port type is `dpdkvhostcuse` + + ``` + ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 + type=dpdkvhostcuse + ``` + + When attaching vhost-cuse ports to QEMU, the name provided during the + add-port operation must match the ifname parameter on the QEMU cmd line. + + - Adding vhost-cuse ports to VM + + vhost-cuse ports use a Linux* character device to communicate with QEMU. + By default it is set to `/dev/vhost-net`. It is possible to reuse this + standard device for DPDK vhost, which makes setup a little simpler but it + is better practice to specify an alternative character device in order to + avoid any conflicts if kernel vhost is to be used in parallel. + + 1. This step is only needed if using an alternative character device. + + `./vswitchd/ovs-vswitchd --dpdk --cuse_dev_name my-vhost-net -c 0x1 ...` + where my-vhost-net is new chr device name i.e /dev/my-vhost-net. + `--cuse_dev_name` argument must follow --dpdk and should precede EAL args. + + 2. In case of reusing kernel vhost character device, there would be conflict + user should remove it. + + `rm -rf /dev/vhost-net` + + 3. Configure virtio-net adapters + + The following parameters must be passed to the QEMU binary, repeat + the below parameters for multiple devices. + + ``` + -netdev tap,id=,script=no,downscript=no,ifname=,vhost=on + -device virtio-net-pci,netdev=net1,mac= + ``` + + The DPDK vhost library will negotiate its own features, so they + need not be passed in as command line params. Note that as offloads + are disabled this is the equivalent of setting + + `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` + + When using an alternative character device, it must be explicitly + passed to QEMU using the `vhostfd` argument + + ``` + -netdev tap,id=,script=no,downscript=no,ifname=,vhost=on, + vhostfd= -device virtio-net-pci,netdev=net1,mac= + ``` + + The open file descriptor must be passed to QEMU running as a child + process. This could be done with a simple python script. + + ``` + #!/usr/bin/python + fd = os.open("/dev/usvhost", os.O_RDWR) + subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ + vhost=on,vhostfd=" + fd +"...", shell=True) + ``` + + 4. Configure huge pages + + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a + virtio-net device's virtual rings and packet buffers mapping the VM's + physical memory on hugetlbfs. To enable vhost-ports to map the VM's + memory into their process address space, pass the following parameters + to QEMU + + `-object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, + share=on -numa node,memdev=mem -mem-prealloc` + +## 7. QOS + +Here is an example on QOS usage. +Assuming you have a vhost-user port transmitting traffic consisting of +packets of size 64 bytes, the following command would limit the egress +transmission rate of the port to ~1,000,000 packets per second + +`ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos +type=egress-policer other-config:cir=46000000 other-config:cbs=2048` + +To examine the QoS configuration of the port: + +`ovs-appctl -t ovs-vswitchd qos/show vhost-user0` + +To clear the QoS configuration from the port and ovsdb use the following: + +`ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos` + +For more details regarding egress-policer parameters please refer to the +vswitch.xml. + +## 8. Static Code Analysis + +Static Analysis is method of debugging SW by examining the code rather than +actually executing it. Many third party Software is available to carry +Static analysis, few being open source and rest commercial. + +Below are the steps to run clang static analyzer on OVS codebase. + + ``` + apt-get install clang [ On Ubuntu] + dnf install clang clang-analyzer -y [ On fedora] + + cd $OVS_DIR + ./boot.sh + ./configure --with-dpdk + make clean + scan-build make CFLAGS="-std=gnu99" + scan-view --host= --port 8183 /tmp/scan-build-yyyy-mm-dd-114251-1027-1 --allow-all-hosts + ``` + + The results can be viewed on the browser using ip address and port no. + + `http://:8183/` + +Bug Reporting: +-------------- + +Please report problems to bugs@openvswitch.org. + + +[INSTALL.userspace.md]:INSTALL.userspace.md +[INSTALL.md]:INSTALL.md +[DPDK Linux GSG]: http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules +[DPDK Docs]: http://dpdk.org/doc