From patchwork Sat Jul 14 11:38:53 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943914 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FFoVUlGV"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSQt1V85z9ryt for ; Sat, 14 Jul 2018 21:40:22 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id D0915BDB; Sat, 14 Jul 2018 11:39:51 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id DCDB5BBD for ; Sat, 14 Jul 2018 11:39:50 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf0-f194.google.com (mail-pf0-f194.google.com [209.85.192.194]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E9D1867E for ; Sat, 14 Jul 2018 11:39:49 +0000 (UTC) Received: by mail-pf0-f194.google.com with SMTP id y8-v6so24020779pfm.10 for ; Sat, 14 Jul 2018 04:39:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lKryCle7xFYHA15SQ9YNBqdXc0n8ZaihIrbjltVFTCc=; b=FFoVUlGVKa3TcMZL4doLB4UtouFi9O5alNNXN2c/Xhi/yoZmn7oR+XppuyIBhKd7li +2NfuwiF14nzDe4Anm453xCiQZ8r6FBrGhnD/QxE9lNUbl7wxG4lNPz8DyVXcGKMegvT KSlLahP6VWKtmaMSFxGi8K4wbhFkkl9uA5I7Iz1/OG3dqI0f2AnyDpZ/uKWb3Iri/7XG S6CbBPm9EU9VcVKS9nIlayUUAyiaOnS5pPjz+3XYdg8sNfcPJFbtL5yD5m+aMl7Fjcev p6rP4t5jUVE1YVpBg2Ykzw6YbEEgF7BwAjopFXARKLmYx/OVWF56JqHw9BarSvt7rNrB gmqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lKryCle7xFYHA15SQ9YNBqdXc0n8ZaihIrbjltVFTCc=; b=PPQBprRMZ0N4xT1+nhVFNK1mzRxS/ZmquCccsbw+vQN+nGRjAYabBkD4qU4DOebWjI ofvdgrubv3EpCewnFYayW3z9VHaq3oGXIBuNKYBaERYqXdRCeuW8TuRdjdsgIVJEB/pz XHQfwxCStFkuSZRvQGSV7WNT4Jb6/IRsBk+gJjNLXfJaQqUqdUy5IDL6niz6ktfwuxgB fTii/LQCjWtFAPGaUWzhjhjTg2tQmqM/Ady7crIxzuLETT4p0TCPQ8zvyB1b/BDn17SR q9LENWvUfIdVSHHcyc8zQi+Rh+Ga8Q4tG0WbbF++tfG+7gDQy9YHVFnzIvjffQkaZs15 eHaQ== X-Gm-Message-State: AOUpUlE+XMBiDblwCjFlM4ceZuszapjEn7hf5dvsquLK+5baEjSUtJIy QRs2VdWAm1zfSvKQSPAZnA+PlWIB X-Google-Smtp-Source: AAOMgpd2B/utsRPBcAyrTg3xBmnEkzCpfRqUoBRymd9UV4iwJ7xfriDc1SklDUhSZv0mptAo8VkVqA== X-Received: by 2002:a65:594b:: with SMTP id g11-v6mr9440240pgu.260.1531568389232; Sat, 14 Jul 2018 04:39:49 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:48 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:53 -0700 Message-Id: <1531568345-80246-2-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 01/13] ovs-bpf: add documentation and configuration. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer The patch add bpf installation guide and configuration for linking to libbpf library. Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- Documentation/automake.mk | 1 + Documentation/index.rst | 2 +- Documentation/intro/install/bpf.rst | 142 ++++++++++++++++++++++++++++++++++ Documentation/intro/install/index.rst | 1 + Makefile.am | 11 ++- acinclude.m4 | 39 ++++++++++ bpf/.gitignore | 4 + configure.ac | 1 + 8 files changed, 196 insertions(+), 5 deletions(-) create mode 100644 Documentation/intro/install/bpf.rst create mode 100644 bpf/.gitignore diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 2b202cb2a836..18fad0608174 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -10,6 +10,7 @@ DOC_SOURCE = \ Documentation/intro/why-ovs.rst \ Documentation/intro/install/index.rst \ Documentation/intro/install/bash-completion.rst \ + Documentation/intro/install/bpf.rst \ Documentation/intro/install/debian.rst \ Documentation/intro/install/documentation.rst \ Documentation/intro/install/distributions.rst \ diff --git a/Documentation/index.rst b/Documentation/index.rst index ddffa3a62d4e..05199108e05a 100644 --- a/Documentation/index.rst +++ b/Documentation/index.rst @@ -59,7 +59,7 @@ vSwitch? Start here. :doc:`intro/install/windows` | :doc:`intro/install/xenserver` | :doc:`intro/install/dpdk` | - :doc:`Installation FAQs ` + :doc:`intro/install/bpf` - **Tutorials:** :doc:`tutorials/faucet` | :doc:`tutorials/ovs-advanced` | diff --git a/Documentation/intro/install/bpf.rst b/Documentation/intro/install/bpf.rst new file mode 100644 index 000000000000..a8610c9bcd31 --- /dev/null +++ b/Documentation/intro/install/bpf.rst @@ -0,0 +1,142 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +====================== +Open vSwitch with BPF +====================== + +This document describes how to build and install Open vSwitch using an BPF +datapath. + +.. warning:: + The BPF support of Open vSwitch is considered 'experimental'. + +Build requirements +------------------ + +In addition to the requirements described in :doc:`general`, building Open +vSwitch with DPDK will require the following: + +- LLVM 3.7.1 or later + +- Clang 3.7.1 or later + +- iproute-dev 4.6 or later + +- Linux kernel 4.10 or later + + The following Kconfig options must be enabled to run the BPF datapath: + +``_CONFIG_BPF=y`` +``_CONFIG_BPF_SYSCALL=y`` +``_CONFIG_NET_CLS_BPF=m`` +``_CONFIG_NET_ACT_BPF=m`` + + The following optional Kconfig options are also recommended: + +``_CONFIG_BPF_JIT=y`` +``_CONFIG_HAVE_BPF_JIT=y`` + +- Linux-tools from a recent Linux kernel + +Installing +---------- + +OVS can be installed using different methods. For OVS to use BPF datapath, it +has to be configured with BPF support (``--with-bpf``). + +#. Clone a recent version of Linux net-next tree:: + + $ git clone git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git + +#. Go into the Linux source directory and build libbpf in the tools directory:: + + $ cd linux/ + $ make -C tools/lib/bpf/ + +#. Ensure the standard OVS requirements, described in + :ref:`general-build-reqs`, are installed + +#. Bootstrap, if required, as described in :ref:`general-bootstrapping` + +#. Configure the package using the ``--with-bpf`` flag:: + + $ ./configure --with-bpf=$LINUX_TOOLS + + where ``LINUX_TOOLS`` is the path to the Linux tools/ directory that was + compiled in step 2. + + .. note:: + While ``--with-bpf`` is required, you can pass any other configuration + option described in :ref:`general-configuring`. + +#. Build and install OVS, as described in :ref:`general-building` + +Additional information can be found in :doc:`general`. + +Setup +----- + +Before running OVS, you must ensure that the BPF filesystem is available:: + + # mount -t bpf none /sys/fs/bpf + # mkdir -p /sys/fs/bpf/ovs + + .. note:: + We should get rid of this requirement on users, and just robustly ensure + that the filesystem is available and prepared correctly (or do so if it + is not). + +Open vSwitch should be started as described in :doc:`general`. + + .. note:: + Depending on how OVS was installed, the BPF datapath binary may or may + not be available. Check the logs when running OVS, if it complains about + not finding bpf/datapath.o, look for this file in your OVS build tree and + copy/symlink it across. Probably it's supposed to live in + /usr/share/openvswitch/bpf/datapath.o. + +If the linux-tools package is not installed with libbpf.so, then ensure +that this library is available via your library path:: + + $ export LD_LIBRARY_PATH=${LINUX_TOOLS}/lib/bpf:$LD_LIBRARY_PATH + +When adding a bridge to Open vSwitch, specify the datapath type as bpf:: + + $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=bpf + +To validate that the bridge has successfully instantiated, you can use the +ovs-bpfctl utility:: + + # ovs-bpfctl show + +Limitations +------------ + +- The BPF datapath is a work in progress and has a limited set of support + for matching and actions. + +Bug Reporting +------------- + +Please report problems to bugs@openvswitch.org. diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst index 3193c736cf17..e063548f2bb3 100644 --- a/Documentation/intro/install/index.rst +++ b/Documentation/intro/install/index.rst @@ -45,6 +45,7 @@ Installation from Source xenserver userspace dpdk + bpf Installation from Packages -------------------------- diff --git a/Makefile.am b/Makefile.am index 6d39d96cb47a..21e27fa32965 100644 --- a/Makefile.am +++ b/Makefile.am @@ -97,6 +97,7 @@ dist_pkgdata_SCRIPTS = dist_sbin_SCRIPTS = dist_scripts_SCRIPTS = dist_scripts_DATA = +dist_bpf_DATA = INSTALL_DATA_LOCAL = UNINSTALL_LOCAL = man_MANS = @@ -115,6 +116,7 @@ sbin_SCRIPTS = scripts_SCRIPTS = completion_SCRIPTS = scripts_DATA = +bpf_DATA = SUFFIXES = check_DATA = check_SCRIPTS = @@ -128,6 +130,7 @@ endif scriptsdir = $(pkgdatadir)/scripts completiondir = $(sysconfdir)/bash_completion.d pkgconfigdir = $(libdir)/pkgconfig +bpfdir = $(pkgdatadir)/bpf # This ensures that files added to EXTRA_DIST are always distributed, # even if they are inside an Automake if...endif conditional block that is @@ -226,7 +229,7 @@ config-h-check: @cd $(srcdir); \ if test -e .git && (git --version) >/dev/null 2>&1 && \ git --no-pager grep -L '#include ' `git ls-files | grep '\.c$$' | \ - grep -vE '^datapath|^lib/sflow|^third-party|^datapath-windows|^python'`; \ + grep -vE '^bpf|^datapath|^lib/sflow|^third-party|^datapath-windows|^python'`; \ then \ echo "See above for list of violations of the rule that"; \ echo "every C source file must #include ."; \ @@ -247,7 +250,7 @@ printf-check: @cd $(srcdir); \ if test -e .git && (git --version) >/dev/null 2>&1 && \ git --no-pager grep -n -E -e '%[-+ #0-9.*]*([ztj]|hh)' --and --not -e 'ovs_scan' `git ls-files | grep '\.[ch]$$' | \ - grep -vE '^datapath|^lib/sflow|^third-party'`; \ + grep -vE '^bpf|^datapath|^lib/sflow|^third-party'`; \ then \ echo "See above for list of violations of the rule that"; \ echo "'z', 't', 'j', 'hh' printf() type modifiers are"; \ @@ -290,7 +293,7 @@ check-endian: @if test -e $(srcdir)/.git && (git --version) >/dev/null 2>&1 && \ (cd $(srcdir) && git --no-pager grep -l -E \ -e 'BIG_ENDIAN|LITTLE_ENDIAN' --and --not -e 'BYTE_ORDER' | \ - $(EGREP) -v '^datapath/'); \ + $(EGREP) -v '^bpf/|^datapath/'); \ then \ echo "See above for list of files that misuse LITTLE""_ENDIAN"; \ echo "or BIG""_ENDIAN. Please use WORDS_BIGENDIAN instead."; \ @@ -315,7 +318,7 @@ thread-safety-check: if test -e .git && (git --version) >/dev/null 2>&1 && \ grep -n -f build-aux/thread-safety-blacklist \ `git ls-files | grep '\.[ch]$$' \ - | $(EGREP) -v '^datapath|^lib/sflow|^third-party'` /dev/null \ + | $(EGREP) -v '^bpf|^datapath|^lib/sflow|^third-party'` /dev/null \ | $(EGREP) -v ':[ ]*/?\*'; \ then \ echo "See above for list of calls to functions that are"; \ diff --git a/acinclude.m4 b/acinclude.m4 index bf790fe72d87..257de4e178a8 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -341,6 +341,45 @@ AC_DEFUN([OVS_CHECK_DPDK], [ AM_CONDITIONAL([DPDK_NETDEV], test "$DPDKLIB_FOUND" = true) ]) +AC_DEFUN([OVS_CHECK_BPF], [ + AC_ARG_WITH([bpf], + [AC_HELP_STRING([--with-bpf=/path/to/linux/tools/], + [Specify the linux tools directory])], + [have_bpf=yes]) + + AC_MSG_CHECKING([whether bpf datapath is enabled]) + if test "$have_bpf" != yes || test "$with_bpf" = no; then + AC_MSG_RESULT([no]) + have_bpf=no + else + AC_MSG_RESULT([yes]) + CFLAGS="$CFLAGS -I${with_bpf}/lib -I${with_bpf}/include/uapi" + LDFLAGS="$LDFLAGS -L${with_bpf}/lib/bpf" + AC_SEARCH_LIBS([elf_begin],[elf],[], + [AC_MSG_ERROR([unable to find libelf, install the dependency package])]) + + have_bpf=no + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM([#include ], + [struct bpf_map; + struct bpf_map_def; + struct bpf_prog_prep_result;])], + [AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM([#include ], [])], + [have_bpf=yes], + [AC_MSG_ERROR([unable to find iproute2 >= 4.6.0])])], + [unable to find libbpf]) + fi + + AM_CONDITIONAL([HAVE_BPF], [test "$have_bpf" = yes]) + if test "$have_bpf" = yes; then + AC_DEFINE([HAVE_BPF], [1], + [Define to 1 if BPF is available.]) + BPF_LDADD="-lbpf -lelf" + AC_SUBST([BPF_LDADD]) + fi +]) + dnl OVS_GREP_IFELSE(FILE, REGEX, [IF-MATCH], [IF-NO-MATCH]) dnl dnl Greps FILE for REGEX. If it matches, runs IF-MATCH, otherwise IF-NO-MATCH. diff --git a/bpf/.gitignore b/bpf/.gitignore new file mode 100644 index 000000000000..1a5ee8e7bc33 --- /dev/null +++ b/bpf/.gitignore @@ -0,0 +1,4 @@ +/Makefile +/Makefile.in +*.o +/distfiles diff --git a/configure.ac b/configure.ac index 4d7bd8d754d0..0c2bca29969f 100644 --- a/configure.ac +++ b/configure.ac @@ -80,6 +80,7 @@ AC_SEARCH_LIBS([timer_create], [rt]) AC_SEARCH_LIBS([pthread_create], [pthread]) AC_FUNC_STRERROR_R +OVS_CHECK_BPF OVS_CHECK_ESX OVS_CHECK_WIN64 OVS_CHECK_WIN32 From patchwork Sat Jul 14 11:38:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943915 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="KNqR29vQ"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSRf4fXsz9ryt for ; Sat, 14 Jul 2018 21:41:02 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id C5B37C7A; Sat, 14 Jul 2018 11:39:54 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 2DFE9BD8 for ; Sat, 14 Jul 2018 11:39:53 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf0-f170.google.com (mail-pf0-f170.google.com [209.85.192.170]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 46FA7794 for ; Sat, 14 Jul 2018 11:39:51 +0000 (UTC) Received: by mail-pf0-f170.google.com with SMTP id b17-v6so24026414pfi.0 for ; Sat, 14 Jul 2018 04:39:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=N24LCQYzcvaormDRK5GtT6/l0P2LZSVm0PT19kPggQc=; b=KNqR29vQf9SbgoyiIXCE0+kq6WVp+Xn8Yf+mHKanrZvHxmr7AbZ6Vu5PijPe/JX9XK bC3SFyZ50kG27UfMy+SzUNelmLRTqN1cm/ahSLCmo7qyaDOxAJuNv2WswgCjEGcAK3zi IR5Bh4OhDm+W5wn3uwYs/2cbIQJD1bJwdW7q9Q7NGhKnPPFY5edE8/tFDBJU5YSzAA06 jRUk6YeX0CDHbLXImLTmqOSXhBZhW4cqDExd2D1Q6FcKuAPvQDrn6H4Lcp6InPcHbu/k q8RGAtmlntQisG4doFYRm6PhwA0oO9qypTsM4xaMTQz6CH9GXu9nH+PgOa2DZsw9Ubi2 zx6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=N24LCQYzcvaormDRK5GtT6/l0P2LZSVm0PT19kPggQc=; b=tIf8nXzDG2WkBH7E4/TnpHzg1CLpd6Npq9J9jrPpIMcEwownOzqMxKlWOmeDHXQ1+A L+4DLqekrzVUpio//ETR5YccPB9zg59zWvWj9AiDLFyeOg9ZnWCK0FH3smCuoAJfQqjH HjZ2umoYxRS6MZXvWQivP0SPtdVrZGTF7iK6iKsW9WFSkCOGar8UTam+dngZhu7ftm7o PHbth0mDlT/XMtVM5V5GaL/iZdY1si7X+AFBy24GfbXW/VKPPIo/3iUufMbGFjxxmi+v cX0+Bb4VrHg3hA5Jg/zJ7s33QL6cJU8p0jZde1Pzl6Rvm7iwMzoe3C9mwpXedP9JGQFz W/rQ== X-Gm-Message-State: AOUpUlH7zFwSffhzNoj4piuF6oMmk3m/Vx0prSYyDM9MW6+8Ku4AcMyg ydGkWKzJ4NStCQKgbWX/OO4/qsbD X-Google-Smtp-Source: AAOMgpc8rfk9WdD9WlRHw2MXWieiUEeo4MyqDkfVLkY9sb0uLXMVI0RdXOADeM/cJnLLZy0akqvWwA== X-Received: by 2002:a63:5f50:: with SMTP id t77-v6mr8810633pgb.300.1531568390293; Sat, 14 Jul 2018 04:39:50 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.49 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:49 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:54 -0700 Message-Id: <1531568345-80246-3-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 02/13] netdev: add ebpf support for netdev provider. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer To receive packets, an eBPF program has to be attached to a netdev through tc ingress/egress, an XDP program has to be attached to a netdev's xdp hook point. The patch introduces two new netdev_class function: set_filter and set_xdp for the purpose. Now two netdev types, netdev-linux and netdev-vport, have the actual implementation. Signed-off-by: William Tu Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- include/linux/pkt_cls.h | 21 +++ lib/dpif-netdev.c | 29 ++-- lib/netdev-bsd.c | 2 + lib/netdev-dpdk.c | 2 + lib/netdev-dummy.c | 2 + lib/netdev-linux.c | 436 +++++++++++++++++++++++++++++++++++++++++++++++- lib/netdev-linux.h | 2 + lib/netdev-provider.h | 11 ++ lib/netdev-vport.c | 145 +++++++++++++++- lib/netdev.c | 25 +++ lib/netdev.h | 4 + 11 files changed, 655 insertions(+), 24 deletions(-) diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h index f7bc7ea708d7..770af90a5c64 100644 --- a/include/linux/pkt_cls.h +++ b/include/linux/pkt_cls.h @@ -104,6 +104,27 @@ enum { __TCA_BASIC_MAX }; +/* BPF classifier */ + +#define TCA_BPF_FLAG_ACT_DIRECT (1 << 0) + +enum { + TCA_BPF_UNSPEC, + TCA_BPF_ACT, + TCA_BPF_POLICE, + TCA_BPF_CLASSID, + TCA_BPF_OPS_LEN, + TCA_BPF_OPS, + TCA_BPF_FD, + TCA_BPF_NAME, + TCA_BPF_FLAGS, + TCA_BPF_FLAGS_GEN, + TCA_BPF_TAG, + __TCA_BPF_MAX, +}; + +#define TCA_BPF_MAX (__TCA_BPF_MAX - 1) + /* Flower classifier */ enum { diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index ba62128c758c..baff020fe3d0 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1505,12 +1505,6 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd) ovs_mutex_unlock(&pmd->cond_mutex); } -static uint32_t -hash_port_no(odp_port_t port_no) -{ - return hash_int(odp_to_u32(port_no), 0); -} - static int port_create(const char *devname, const char *type, odp_port_t port_no, struct dp_netdev_port **portp) @@ -1525,6 +1519,7 @@ port_create(const char *devname, const char *type, /* Open and validate network device. */ error = netdev_open(devname, type, &netdev); + VLOG_INFO("%s %s error %d", __func__, devname, error); if (error) { return error; } @@ -1578,7 +1573,7 @@ do_add_port(struct dp_netdev *dp, const char *devname, const char *type, return error; } - hmap_insert(&dp->ports, &port->node, hash_port_no(port_no)); + hmap_insert(&dp->ports, &port->node, netdev_hash_port_no(port_no)); seq_change(dp->port_seq); reconfigure_datapath(dp); @@ -1596,6 +1591,8 @@ dpif_netdev_port_add(struct dpif *dpif, struct netdev *netdev, odp_port_t port_no; int error; + VLOG_INFO("%s", __func__); + ovs_mutex_lock(&dp->port_mutex); dpif_port = netdev_vport_get_dpif_port(netdev, namebuf, sizeof namebuf); if (*port_nop != ODPP_NONE) { @@ -1648,7 +1645,8 @@ dp_netdev_lookup_port(const struct dp_netdev *dp, odp_port_t port_no) { struct dp_netdev_port *port; - HMAP_FOR_EACH_WITH_HASH (port, node, hash_port_no(port_no), &dp->ports) { + HMAP_FOR_EACH_WITH_HASH (port, node, netdev_hash_port_no(port_no), + &dp->ports) { if (port->port_no == port_no) { return port; } @@ -1808,7 +1806,7 @@ dp_netdev_pmd_lookup_dpcls(struct dp_netdev_pmd_thread *pmd, odp_port_t in_port) { struct dpcls *cls; - uint32_t hash = hash_port_no(in_port); + uint32_t hash = netdev_hash_port_no(in_port); CMAP_FOR_EACH_WITH_HASH (cls, node, hash, &pmd->classifiers) { if (cls->in_port == in_port) { /* Port classifier exists already */ @@ -1824,7 +1822,7 @@ dp_netdev_pmd_find_dpcls(struct dp_netdev_pmd_thread *pmd, OVS_REQUIRES(pmd->flow_mutex) { struct dpcls *cls = dp_netdev_pmd_lookup_dpcls(pmd, in_port); - uint32_t hash = hash_port_no(in_port); + uint32_t hash = netdev_hash_port_no(in_port); if (!cls) { /* Create new classifier for in_port */ @@ -3311,7 +3309,7 @@ tx_port_lookup(const struct hmap *hmap, odp_port_t port_no) { struct tx_port *tx; - HMAP_FOR_EACH_IN_BUCKET (tx, node, hash_port_no(port_no), hmap) { + HMAP_FOR_EACH_IN_BUCKET (tx, node, netdev_hash_port_no(port_no), hmap) { if (tx->port->port_no == port_no) { return tx; } @@ -4034,13 +4032,13 @@ pmd_load_cached_ports(struct dp_netdev_pmd_thread *pmd) if (netdev_has_tunnel_push_pop(tx_port->port->netdev)) { tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached); hmap_insert(&pmd->tnl_port_cache, &tx_port_cached->node, - hash_port_no(tx_port_cached->port->port_no)); + netdev_hash_port_no(tx_port_cached->port->port_no)); } if (netdev_n_txq(tx_port->port->netdev)) { tx_port_cached = xmemdup(tx_port, sizeof *tx_port_cached); hmap_insert(&pmd->send_port_cache, &tx_port_cached->node, - hash_port_no(tx_port_cached->port->port_no)); + netdev_hash_port_no(tx_port_cached->port->port_no)); } } } @@ -4793,7 +4791,8 @@ dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd, tx->flush_time = 0LL; dp_packet_batch_init(&tx->output_pkts); - hmap_insert(&pmd->tx_ports, &tx->node, hash_port_no(tx->port->port_no)); + hmap_insert(&pmd->tx_ports, &tx->node, + netdev_hash_port_no(tx->port->port_no)); pmd->need_reload = true; } @@ -5965,7 +5964,7 @@ dpif_dummy_change_port_number(struct unixctl_conn *conn, int argc OVS_UNUSED, /* Reinsert with new port number. */ port->port_no = port_no; - hmap_insert(&dp->ports, &port->node, hash_port_no(port_no)); + hmap_insert(&dp->ports, &port->node, netdev_hash_port_no(port_no)); reconfigure_datapath(dp); seq_change(dp->port_seq); diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index 05974c100895..1460ae2504c5 100644 --- a/lib/netdev-bsd.c +++ b/lib/netdev-bsd.c @@ -1516,6 +1516,8 @@ netdev_bsd_update_flags(struct netdev *netdev_, enum netdev_flags off, NULL, /* set_advertisement */ \ NULL, /* get_pt_mode */ \ NULL, /* set_policing */ \ + NULL, /* set_filter */ \ + NULL, /* set_xdp */ \ NULL, /* get_qos_type */ \ NULL, /* get_qos_capabilities */ \ NULL, /* get_qos */ \ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 52d8fe6b7ac2..20116c22137e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -3854,6 +3854,8 @@ unlock: NULL, /* get_pt_mode */ \ \ netdev_dpdk_set_policing, \ + NULL, /* set_filter */ \ + NULL, /* set_xdp */ \ netdev_dpdk_get_qos_types, \ NULL, /* get_qos_capabilities */ \ netdev_dpdk_get_qos, \ diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index 4246af3b9c86..44c9458a9a22 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -1427,6 +1427,8 @@ netdev_dummy_update_flags(struct netdev *netdev_, NULL, /* get_pt_mode */ \ \ NULL, /* set_policing */ \ + NULL, /* set_filter */ \ + NULL, /* set_xdp */ \ NULL, /* get_qos_types */ \ NULL, /* get_qos_capabilities */ \ NULL, /* get_qos */ \ diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 4e0473cf331f..121dd3bc738e 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -46,6 +46,9 @@ #include #include +#include /* linux/tools/bpf/libbpf.h */ + +#include "bpf.h" #include "coverage.h" #include "dp-packet.h" #include "dpif-netlink.h" @@ -227,6 +230,9 @@ enum { VALID_VPORT_STAT_ERROR = 1 << 5, VALID_DRVINFO = 1 << 6, VALID_FEATURES = 1 << 7, + VALID_INGRESS_FILTER = 1 << 8, + VALID_EGRESS_FILTER = 1 << 9, + VALID_XDP_FILTER = 1 << 10, }; /* Traffic control. */ @@ -421,6 +427,7 @@ static const struct tc_ops tc_ops_sfq; static const struct tc_ops tc_ops_default; static const struct tc_ops tc_ops_noop; static const struct tc_ops tc_ops_other; +static const struct tc_ops tc_ops_clsact; static const struct tc_ops *const tcs[] = { &tc_ops_htb, /* Hierarchy token bucket (see tc-htb(8)). */ @@ -431,6 +438,7 @@ static const struct tc_ops *const tcs[] = { &tc_ops_noop, /* Non operating qos type. */ &tc_ops_default, /* Default qdisc (see tc-pfifo_fast(8)). */ &tc_ops_other, /* Some other qdisc. */ + &tc_ops_clsact, /* Classifier with nested action. */ NULL }; @@ -442,8 +450,12 @@ static struct tcmsg *netdev_linux_tc_make_request(const struct netdev *, int type, unsigned int flags, struct ofpbuf *); +static int clsact_install__(struct netdev *netdev_); static int tc_add_policer(struct netdev *, uint32_t kbits_rate, uint32_t kbits_burst); +static int tc_add_filter(struct netdev *, int fd, uint32_t parent, + const char *name); +static bool tc_is_clsact(const struct tc *tc); static int tc_parse_qdisc(const struct ofpbuf *, const char **kind, struct nlattr **options); @@ -485,13 +497,19 @@ struct netdev_linux { long long int carrier_resets; uint32_t kbits_rate; /* Policing data. */ uint32_t kbits_burst; + uint32_t ingress_filter; /* BPF ingress filter fd. */ + uint32_t egress_filter; /* BPF egress filter fd. */ + uint32_t ingress_xdp_filter;/* XDP ingress filter fd. */ int vport_stats_error; /* Cached error code from vport_get_stats(). 0 or an errno value. */ int netdev_mtu_error; /* Cached error code from SIOCGIFMTU or SIOCSIFMTU. */ int ether_addr_error; /* Cached error code from set/get etheraddr. */ int netdev_policing_error; /* Cached error code from set policing. */ + int ingress_filter_error; /* Cached error code from set filter. */ + int egress_filter_error; /* Cached error code from set filter. */ int get_features_error; /* Cached error code from ETHTOOL_GSET. */ int get_ifindex_error; /* Cached error code from SIOCGIFINDEX. */ + int ingress_xdp_error; enum netdev_features current; /* Cached from ETHTOOL_GSET. */ enum netdev_features advertised; /* Cached from ETHTOOL_GSET. */ @@ -2159,8 +2177,14 @@ netdev_linux_set_policing(struct netdev *netdev_, if (kbits_rate) { error = tc_add_del_ingress_qdisc(ifindex, true); if (error) { - VLOG_WARN_RL(&rl, "%s: adding policing qdisc failed: %s", - netdev_name, ovs_strerror(error)); + const char *bpf_conflict = ""; + + if (error == EEXIST && (netdev->ingress_filter + || netdev->egress_filter)) { + bpf_conflict = " (conflicts with BPF)"; + } + VLOG_WARN_RL(&rl, "%s: adding policing qdisc failed: %s%s", + netdev_name, ovs_strerror(error), bpf_conflict); goto out; } @@ -2184,6 +2208,268 @@ out: return error; } +/* Attempts to set a BPF filter on the device. Returns 0 if successful, + * otherwise a positive errno value. */ +static int +netdev_linux_set_filter__(struct netdev *netdev_, const struct bpf_prog *prog, + unsigned int valid_bit, int *filter_error, + uint32_t *netdev_filter) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + const char *netdev_name = netdev_get_name(netdev_); + int error; + + VLOG_DBG("Setting %s filter %d on %s (handle %08"PRIx32")", prog->name, + prog->fd, netdev_name, prog->handle); + + if (netdev->cache_valid & valid_bit) { + error = *filter_error; + if (error || (prog && prog->fd == *netdev_filter)) { + /* Assume that settings haven't changed since we last set them. */ + goto out; + } + netdev->cache_valid &= ~valid_bit; + } + + /* Remove non-clsact qdiscs. */ + if (netdev->tc && !tc_is_clsact(netdev->tc)) { + error = tc_del_qdisc(netdev_); + if (error) { + VLOG_WARN_RL(&rl, "%s: removing qdisc failed: %s", + netdev_name, ovs_strerror(error)); + goto out; + } + } + + if (prog) { + if (!netdev->tc || !tc_is_clsact(netdev->tc)) { + error = clsact_install__(netdev_); + if (error && error != EEXIST) { + VLOG_WARN_RL(&rl, "%s: clsact qdisc setup failed: %s", + netdev_name, ovs_strerror(error)); + goto out; + } + } + + error = tc_add_filter(netdev_, prog->fd, prog->handle, prog->name); + if (error){ + VLOG_WARN_RL(&rl, "%s: adding filter %s failed: %s", + netdev_name, prog->name, ovs_strerror(error)); + goto out; + } + } + + *netdev_filter = prog ? prog->fd : 0; + +out: + if (!error || error == ENODEV) { + *filter_error = error; + netdev->cache_valid |= valid_bit; + } + return error; +} + +static int +netdev_linux_set_filter(struct netdev *netdev_, const struct bpf_prog *prog) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + int error; + + ovs_mutex_lock(&netdev->mutex); + if (!prog || prog->handle == INGRESS_HANDLE) { + error = netdev_linux_set_filter__(netdev_, prog, VALID_INGRESS_FILTER, + &netdev->ingress_filter_error, + &netdev->ingress_filter); + } else { + error = netdev_linux_set_filter__(netdev_, prog, VALID_EGRESS_FILTER, + &netdev->egress_filter_error, + &netdev->egress_filter); + } + ovs_mutex_unlock(&netdev->mutex); + + return error; +} + +#ifndef SOL_NETLINK +#define SOL_NETLINK 270 +#endif + +/* Extract from libbpf */ +int +bpf_set_link_xdp_fd(int ifindex, int fd, uint32_t flags) +{ + + struct sockaddr_nl sa; + int sock, seq = 0, len, ret = -1; + char buf[4096]; + struct nlattr *nla, *nla_xdp; + struct { + struct nlmsghdr nh; + struct ifinfomsg ifinfo; + char attrbuf[64]; + } req; + struct nlmsghdr *nh; + struct nlmsgerr *err; + socklen_t addrlen; + int one = 1; + + memset(&sa, 0, sizeof(sa)); + sa.nl_family = AF_NETLINK; + + sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE); + if (sock < 0) { + return -errno; + } + + if (setsockopt(sock, SOL_NETLINK, NETLINK_EXT_ACK, + &one, sizeof(one)) < 0) { + VLOG_WARN_RL(&rl, "Netlink error reporting not supported"); + } + + if (bind(sock, (struct sockaddr *)&sa, sizeof(sa)) < 0) { + ret = -errno; + goto cleanup; + } + + addrlen = sizeof(sa); + if (getsockname(sock, (struct sockaddr *)&sa, &addrlen) < 0) { + ret = -errno; + goto cleanup; + } + + if (addrlen != sizeof(sa)) { + goto cleanup; + } + + memset(&req, 0, sizeof(req)); + req.nh.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)); + req.nh.nlmsg_flags = NLM_F_REQUEST | NLM_F_ACK; + req.nh.nlmsg_type = RTM_SETLINK; + req.nh.nlmsg_pid = 0; + req.nh.nlmsg_seq = ++seq; + req.ifinfo.ifi_family = AF_UNSPEC; + req.ifinfo.ifi_index = ifindex; + + /* started nested attribute for XDP */ + nla = (struct nlattr *)(((char *)&req) + + NLMSG_ALIGN(req.nh.nlmsg_len)); + nla->nla_type = NLA_F_NESTED | IFLA_XDP; + nla->nla_len = NLA_HDRLEN; + + /* add XDP fd */ + nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len); + nla_xdp->nla_type = IFLA_XDP_FD; + nla_xdp->nla_len = NLA_HDRLEN + sizeof(int); + memcpy((char *)nla_xdp + NLA_HDRLEN, &fd, sizeof(fd)); + nla->nla_len += nla_xdp->nla_len; + + /* if user passed in any flags, add those too */ + if (flags) { + nla_xdp = (struct nlattr *)((char *)nla + nla->nla_len); + nla_xdp->nla_type = IFLA_XDP_FLAGS; + nla_xdp->nla_len = NLA_HDRLEN + sizeof(flags); + memcpy((char *)nla_xdp + NLA_HDRLEN, &flags, sizeof(flags)); + nla->nla_len += nla_xdp->nla_len; + } + + req.nh.nlmsg_len += NLA_ALIGN(nla->nla_len); + + /* send */ + if (send(sock, &req, req.nh.nlmsg_len, 0) < 0) { + ret = -errno; + goto cleanup; + } + + /* recv */ + len = recv(sock, buf, sizeof(buf), 0); + if (len < 0) { + ret = -errno; + goto cleanup; + } + + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); + nh = NLMSG_NEXT(nh, len)) { + if (nh->nlmsg_pid != sa.nl_pid) { + ret = -1; + goto cleanup; + } + if (nh->nlmsg_seq != seq) { + ret = -1; + goto cleanup; + } + switch (nh->nlmsg_type) { + case NLMSG_ERROR: + err = (struct nlmsgerr *)NLMSG_DATA(nh); + if (!err->error) + continue; + ret = err->error; + /* nla_dump_errormsg(nh); */ + goto cleanup; + case NLMSG_DONE: + break; + default: + break; + } + } + + ret = 0; + +cleanup: + close(sock); + return ret; +} + +static int +netdev_linux_set_xdp__(struct netdev *netdev_, const struct bpf_prog *prog, + unsigned int valid_bit, int *filter_error, + uint32_t *netdev_filter) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + const char *netdev_name = netdev_get_name(netdev_); + int ifindex = netdev->ifindex; + int error; + + VLOG_DBG("Setting %s XDP filter %d on %s (ifindex %d)", prog->name, + prog->fd, netdev_name, ifindex); + + if (netdev->cache_valid & valid_bit) { + error = *filter_error; + if (error || (prog && prog->fd == *netdev_filter)) { + /* Assume that settings haven't changed since we last set them. */ + goto out; + } + netdev->cache_valid &= ~valid_bit; + } + error = bpf_set_link_xdp_fd(ifindex, prog->fd, XDP_FLAGS_SKB_MODE); + if (error < 0) { + VLOG_WARN_RL(&rl, "%s: adding XDP filter %s failed: %s", + netdev_name, prog->name, ovs_strerror(error)); + goto out; + } + +out: + if (!error || error == ENODEV) { + *filter_error = error; + netdev->cache_valid |= valid_bit; + } + return error; +} + +static int +netdev_linux_set_xdp(struct netdev *netdev_, const struct bpf_prog *prog) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + int error; + + ovs_mutex_lock(&netdev->mutex); + error = netdev_linux_set_xdp__(netdev_, prog, VALID_XDP_FILTER, + &netdev->ingress_xdp_error, + &netdev->ingress_xdp_filter); + ovs_mutex_unlock(&netdev->mutex); + + return error; +} + static int netdev_linux_get_qos_types(const struct netdev *netdev OVS_UNUSED, struct sset *types) @@ -2879,6 +3165,8 @@ netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off, NULL, /* get_pt_mode */ \ \ netdev_linux_set_policing, \ + netdev_linux_set_filter, \ + netdev_linux_set_xdp, \ netdev_linux_get_qos_types, \ netdev_linux_get_qos_capabilities, \ netdev_linux_get_qos, \ @@ -4671,6 +4959,74 @@ static const struct tc_ops tc_ops_other = { NULL /* class_dump_stats */ }; +/* "linux-clsact" traffic control class. */ +static int +clsact_setup_qdisc(struct netdev *netdev) +{ + struct ofpbuf request; + struct tcmsg *tcmsg; + + tcmsg = netdev_linux_tc_make_request(netdev, RTM_NEWQDISC, + NLM_F_EXCL | NLM_F_CREATE, &request); + if (!tcmsg) { + return ENODEV; + } + tcmsg->tcm_handle = tc_make_handle(0xFFFF, 0); + tcmsg->tcm_parent = TC_H_INGRESS; + nl_msg_put_string(&request, TCA_KIND, "clsact"); + nl_msg_put_unspec(&request, TCA_OPTIONS, NULL, 0); + + return tc_transact(&request, NULL); +} + +static int +clsact_install__(struct netdev *netdev_) +{ + static const struct tc tc = TC_INITIALIZER(&tc, &tc_ops_clsact); + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + int error; + + error = clsact_setup_qdisc(netdev_); + if (error) { + return error; + } + + /* Nothing but a tc class implementation is allowed to write to a tc. This + * class never does that, so we can legitimately use a const tc object. */ + netdev->tc = CONST_CAST(struct tc *, &tc); + + return 0; +} + +static int +clsact_tc_install(struct netdev *netdev, + const struct smap *details OVS_UNUSED) +{ + return clsact_install__(netdev); +} + +static int +clsact_tc_load(struct netdev *netdev, struct ofpbuf *nlmsg OVS_UNUSED) +{ + return clsact_install__(netdev); +} + +static const struct tc_ops tc_ops_clsact = { + "clsact", /* linux_name */ + "linux-clsact", /* ovs_name */ + 0, /* n_queues */ + clsact_tc_install, + clsact_tc_load, + NULL, /* tc_destroy */ + NULL, /* qdisc_get */ + NULL, /* qdisc_set */ + NULL, /* class_get */ + NULL, /* class_set */ + NULL, /* class_delete */ + NULL, /* class_get_stats */ + NULL /* class_dump_stats */ +}; + /* Traffic control. */ /* Number of kernel "tc" ticks per second. */ @@ -4775,6 +5131,49 @@ tc_add_policer(struct netdev *netdev, return 0; } +/* Adds a filter to 'netdev' corresponding to BPF program associated with 'fd'. + * + * This function is equivalent to running: + * /sbin/tc filter add dev bpf da object-pinned + * + * The configuration and stats may be seen with the following command: + * /sbin/tc -s filter show dev + * + * Returns 0 if successful, otherwise a positive errno value. + */ +static int +tc_add_filter(struct netdev *netdev, int fd, uint32_t parent, const char *name) +{ + struct ofpbuf request; + struct tcmsg *tcmsg; + size_t opts_offset; + int error; + + tcmsg = netdev_linux_tc_make_request(netdev, RTM_NEWTFILTER, + NLM_F_EXCL | NLM_F_CREATE, &request); + if (!tcmsg) { + return ENODEV; + } + tcmsg->tcm_handle = tc_make_handle(0, 0x1); + tcmsg->tcm_parent = parent; + tcmsg->tcm_info = tc_make_handle(0, /* preference */ + (OVS_FORCE uint16_t) htons(ETH_P_ALL)); + + nl_msg_put_string(&request, TCA_KIND, "bpf"); + opts_offset = nl_msg_start_nested(&request, TCA_OPTIONS); + nl_msg_put_u32(&request, TCA_BPF_FLAGS, TCA_BPF_FLAG_ACT_DIRECT); + nl_msg_put_u32(&request, TCA_BPF_FD, fd); + nl_msg_put_string(&request, TCA_BPF_NAME, name); + nl_msg_end_nested(&request, opts_offset); + + error = tc_transact(&request, NULL); + if (error) { + return error; + } + + return 0; +} + static void read_psched(void) { @@ -5060,21 +5459,21 @@ tc_delete_class(const struct netdev *netdev, unsigned int handle) return error; } -/* Equivalent to "tc qdisc del dev root". */ +/* Equivalent to "tc qdisc del dev handle ". */ static int -tc_del_qdisc(struct netdev *netdev_) +tc_del_qdisc__(struct netdev_linux *netdev, uint32_t parent, uint32_t handle) { - struct netdev_linux *netdev = netdev_linux_cast(netdev_); struct ofpbuf request; struct tcmsg *tcmsg; int error; - tcmsg = netdev_linux_tc_make_request(netdev_, RTM_DELQDISC, 0, &request); + tcmsg = netdev_linux_tc_make_request(&netdev->up, RTM_DELQDISC, 0, + &request); if (!tcmsg) { return ENODEV; } - tcmsg->tcm_handle = tc_make_handle(1, 0); - tcmsg->tcm_parent = TC_H_ROOT; + tcmsg->tcm_handle = handle; + tcmsg->tcm_parent = parent; error = tc_transact(&request, NULL); if (error == EINVAL) { @@ -5092,6 +5491,27 @@ tc_del_qdisc(struct netdev *netdev_) } static bool +tc_is_clsact(const struct tc *tc) +{ + if (!tc || !tc->ops->linux_name) { + return false; + } + return !strcmp(tc->ops->linux_name, "clsact"); +} + +static int +tc_del_qdisc(struct netdev *netdev_) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + + if (netdev->tc && tc_is_clsact(netdev->tc)) { + return tc_del_qdisc__(netdev, TC_H_INGRESS, + tc_make_handle(TC_H_INGRESS, 0)); + } + return tc_del_qdisc__(netdev, TC_H_ROOT, tc_make_handle(1, 0)); +} + +static bool getqdisc_is_safe(void) { static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; diff --git a/lib/netdev-linux.h b/lib/netdev-linux.h index 880f86402a1e..8257d4c695f9 100644 --- a/lib/netdev-linux.h +++ b/lib/netdev-linux.h @@ -29,6 +29,8 @@ int netdev_linux_ethtool_set_flag(struct netdev *netdev, uint32_t flag, const char *flag_name, bool enable); int linux_get_ifindex(const char *netdev_name); +int bpf_set_link_xdp_fd(int ifindex, int fd, uint32_t flags); + #define LINUX_FLOW_OFFLOAD_API \ netdev_tc_flow_flush, \ netdev_tc_flow_dump_create, \ diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index 25bd671c1382..3e53a5b76272 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -32,6 +32,7 @@ extern "C" { #endif +struct bpf_prog; struct netdev_tnl_build_header_params; #define NETDEV_NUMA_UNSPEC OVS_NUMA_UNSPEC @@ -505,6 +506,16 @@ struct netdev_class { int (*set_policing)(struct netdev *netdev, unsigned int kbits_rate, unsigned int kbits_burst); + /* Attempts to attach a traffic filter in the form of an (e)BPF program. + * + * This function may be set to null if filters are not supported. */ + int (*set_filter)(struct netdev *netdev, const struct bpf_prog *); + + /* Attempts to attach a XDP eBPF program. + * + * This function may be set to null if filters are not supported. */ + int (*set_xdp)(struct netdev *netdev, const struct bpf_prog *); + /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves * it empty if 'netdev' does not support QoS. Any names added to 'types' * should be documented as valid for the "type" column in the "QoS" table diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index 52aa12d79933..4341c89894a3 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -22,12 +22,14 @@ #include #include #include +#include #include #include #include #include #include +#include "bpf.h" #include "byte-order.h" #include "daemon.h" #include "dirs.h" @@ -43,6 +45,7 @@ #include "route-table.h" #include "smap.h" #include "socket-util.h" +#include "tc.h" #include "unaligned.h" #include "unixctl.h" #include "openvswitch/vlog.h" @@ -72,6 +75,10 @@ struct vport_class { struct netdev_class netdev_class; }; +/* This is set pretty low because we probably won't learn anything from the + * additional log messages. */ +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + bool netdev_vport_is_vport_class(const struct netdev_class *class) { @@ -866,6 +873,140 @@ netdev_vport_get_ifindex(const struct netdev *netdev_) return linux_get_ifindex(name); } +/* "linux-clsact" traffic control class. */ +static int +clsact_setup_qdisc(struct netdev *netdev) +{ + struct ofpbuf request; + struct tcmsg *tcmsg; + int ifindex; + + ifindex = netdev_vport_get_ifindex(netdev); + + tcmsg = tc_make_request(ifindex, RTM_NEWQDISC, NLM_F_EXCL | NLM_F_CREATE, + &request); + if (!tcmsg) { + return ENODEV; + } + tcmsg->tcm_handle = tc_make_handle(0xFFFF, 0); + tcmsg->tcm_parent = TC_H_INGRESS; + nl_msg_put_string(&request, TCA_KIND, "clsact"); + nl_msg_put_unspec(&request, TCA_OPTIONS, NULL, 0); + + return tc_transact(&request, NULL); +} + +static int +tc_add_filter(struct netdev *netdev, int fd, uint32_t parent, const char *name) +{ + struct ofpbuf request; + struct tcmsg *tcmsg; + size_t opts_offset; + int ifindex; + int error; + + ifindex = netdev_vport_get_ifindex(netdev); + + tcmsg = tc_make_request(ifindex, RTM_NEWTFILTER, NLM_F_EXCL | NLM_F_CREATE, + &request); + if (!tcmsg) { + return ENODEV; + } + tcmsg->tcm_handle = tc_make_handle(0, 0x1); + tcmsg->tcm_parent = parent; +#define ETH_P_ALL 0x0003 + tcmsg->tcm_info = tc_make_handle(0, /* preference */ + (OVS_FORCE uint16_t) htons(ETH_P_ALL)); + + nl_msg_put_string(&request, TCA_KIND, "bpf"); + opts_offset = nl_msg_start_nested(&request, TCA_OPTIONS); + nl_msg_put_u32(&request, TCA_BPF_FLAGS, TCA_BPF_FLAG_ACT_DIRECT); + nl_msg_put_u32(&request, TCA_BPF_FD, fd); + nl_msg_put_string(&request, TCA_BPF_NAME, name); + nl_msg_end_nested(&request, opts_offset); + + error = tc_transact(&request, NULL); + if (error) { + return error; + } + + return 0; +} + +/* Attempts to set a BPF filter on the device. Returns 0 if successful, + * otherwise a positive errno value. */ +static int +netdev_vport_set_filter__(struct netdev *netdev_, const struct bpf_prog *prog, + unsigned int OVS_UNUSED valid_bit, int OVS_UNUSED *filter_error, + uint32_t OVS_UNUSED *netdev_filter) +{ + struct netdev_vport OVS_UNUSED *netdev = netdev_vport_cast(netdev_); + const char *netdev_name = netdev_get_name(netdev_); + int error; + + if (!prog) { + return 0; + } + + VLOG_DBG("Setting %s filter %d on %s (handle %08"PRIx32")", prog->name, + prog->fd, netdev_name, prog->handle); + + error = clsact_setup_qdisc(netdev_); + if (error && error != EEXIST) { + VLOG_WARN("%s: clsact qdisc setup failed: %s", + netdev_name, ovs_strerror(error)); + goto out; + } + + error = tc_add_filter(netdev_, prog->fd, prog->handle, prog->name); + if (error){ + VLOG_WARN_RL(&rl, "%s: adding filter %s failed: %s", + netdev_name, prog->name, ovs_strerror(error)); + goto out; + } + +out: + VLOG_INFO("%s %d", __func__, error); + return error; +} + +static int +netdev_vport_set_filter(struct netdev *netdev_, const struct bpf_prog *prog) +{ + struct netdev_vport *netdev = netdev_vport_cast(netdev_); + int error = 0; + + ovs_mutex_lock(&netdev->mutex); + if (!prog || prog->handle == INGRESS_HANDLE) { + error = netdev_vport_set_filter__(netdev_, prog, 0, NULL, NULL); + } + ovs_mutex_unlock(&netdev->mutex); + + VLOG_INFO("%s %d", __func__, error); + + return error; +} + +int bpf_set_link_xdp_fd(int ifindex, int fd, uint32_t flags); + +static int +netdev_vport_set_xdp(struct netdev *netdev_, const struct bpf_prog *prog) +{ + struct netdev_vport *netdev = netdev_vport_cast(netdev_); + int error = 0; + int ifindex; + + ovs_mutex_lock(&netdev->mutex); + ifindex = netdev_vport_get_ifindex(netdev_); + error = bpf_set_link_xdp_fd(ifindex, prog->fd, + XDP_FLAGS_SKB_MODE); + ovs_mutex_unlock(&netdev->mutex); + + VLOG_INFO("%s %d", __func__, error); + + return error; +} + #define NETDEV_VPORT_GET_IFINDEX netdev_vport_get_ifindex #define NETDEV_FLOW_OFFLOAD_API LINUX_FLOW_OFFLOAD_API #else /* !__linux__ */ @@ -914,6 +1055,8 @@ netdev_vport_get_ifindex(const struct netdev *netdev_) get_pt_mode, \ \ NULL, /* set_policing */ \ + netdev_vport_set_filter, /* set_filter */ \ + netdev_vport_set_xdp, /* set_xdp */ \ NULL, /* get_qos_types */ \ NULL, /* get_qos_capabilities */ \ NULL, /* get_qos */ \ @@ -972,7 +1115,7 @@ netdev_vport_tunnel_register(void) TUNNEL_CLASS("gre", "gre_sys", netdev_gre_build_header, netdev_gre_push_header, netdev_gre_pop_header, - NULL), + NETDEV_VPORT_GET_IFINDEX), TUNNEL_CLASS("vxlan", "vxlan_sys", netdev_vxlan_build_header, netdev_tnl_push_udp_header, netdev_vxlan_pop_header, diff --git a/lib/netdev.c b/lib/netdev.c index be05dc64024a..c44a1a683b92 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -759,6 +759,13 @@ netdev_get_pt_mode(const struct netdev *netdev) : NETDEV_PT_LEGACY_L2); } +/* Returns a 32-bit hash of the given port number. */ +uint32_t +netdev_hash_port_no(odp_port_t port_no) +{ + return hash_int(odp_to_u32(port_no), 0); +} + /* Sends 'batch' on 'netdev'. Returns 0 if successful (for every packet), * otherwise a positive errno value. Returns EAGAIN without blocking if * at least one the packets cannot be queued immediately. Returns EMSGSIZE @@ -1449,6 +1456,24 @@ netdev_set_policing(struct netdev *netdev, uint32_t kbits_rate, : EOPNOTSUPP); } +/* Attempts to apply (e)BPF filter 'prog' to the netdev. */ +int +netdev_set_filter(struct netdev *netdev, struct bpf_prog *prog) +{ + return (netdev->netdev_class->set_filter + ? netdev->netdev_class->set_filter(netdev, prog) + : EOPNOTSUPP); +} + +/* Attempts to apply (e)BPF filter 'prog' to the netdev. */ +int +netdev_set_xdp(struct netdev *netdev, struct bpf_prog *prog) +{ + return (netdev->netdev_class->set_xdp + ? netdev->netdev_class->set_xdp(netdev, prog) + : EOPNOTSUPP); +} + /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves it * empty if 'netdev' does not support QoS. Any names added to 'types' should * be documented as valid for the "type" column in the "QoS" table in diff --git a/lib/netdev.h b/lib/netdev.h index ff1b604b24e2..3388504d85c9 100644 --- a/lib/netdev.h +++ b/lib/netdev.h @@ -59,6 +59,7 @@ extern "C" { * netdev and access each of those from a different thread.) */ +struct bpf_prog; struct dp_packet_batch; struct dp_packet; struct netdev_class; @@ -167,6 +168,7 @@ bool netdev_mtu_is_user_config(struct netdev *); int netdev_get_ifindex(const struct netdev *); int netdev_set_tx_multiq(struct netdev *, unsigned int n_txq); enum netdev_pt_mode netdev_get_pt_mode(const struct netdev *); +uint32_t netdev_hash_port_no(odp_port_t port_no); /* Packet reception. */ int netdev_rxq_open(struct netdev *, struct netdev_rxq **, int id); @@ -316,6 +318,8 @@ struct netdev_queue_stats { int netdev_set_policing(struct netdev *, uint32_t kbits_rate, uint32_t kbits_burst); +int netdev_set_filter(struct netdev *netdev, struct bpf_prog *prog); +int netdev_set_xdp(struct netdev *netdev, struct bpf_prog *prog); int netdev_get_qos_types(const struct netdev *, struct sset *types); int netdev_get_qos_capabilities(const struct netdev *, From patchwork Sat Jul 14 11:38:55 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943916 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="W/mhcTmM"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSSJ0VXFz9ryt for ; Sat, 14 Jul 2018 21:41:36 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 91CCDC87; Sat, 14 Jul 2018 11:39:55 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 40F93C03 for ; Sat, 14 Jul 2018 11:39:53 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl0-f67.google.com (mail-pl0-f67.google.com [209.85.160.67]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 70326793 for ; Sat, 14 Jul 2018 11:39:52 +0000 (UTC) Received: by mail-pl0-f67.google.com with SMTP id e11-v6so339265plb.3 for ; Sat, 14 Jul 2018 04:39:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Sqb5pzG+9/Gq1raPzu/mdHufqu3SSZKwg6Uug/O8wlM=; b=W/mhcTmMG+d8z8GZDrLyMkpfQC92RkGwBZ0tUSlcwrY+DVbsHWF/dUsh9O8tQWohLP d7EN1bDrTkcjtcRhs/dk6h+DriuCDMVuD8MOF0ySBqZQeB8wkmEXaGFxOdt/dkBcJxuu bms2zvld1EIfRefdJj6Jp8IKzivj1y8n3ZWJ0nu2uLFCWqJY1AJ3OvOWMjHaBIyXevvk QZKyoN8Vsqa+n4AMtv5HpMJv6dKIkASbmz0AHg1RLB3BSRXtgkwELsBn8l5G+irgdqFG TM44lqVbS8hZqmWdpo/Xy6QG+dzhbFXK5T7se+owS5MhYx+hAS13JVNzP5mUhAfUw+wG PnAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Sqb5pzG+9/Gq1raPzu/mdHufqu3SSZKwg6Uug/O8wlM=; b=hiSAVU/1PyBtgemz2fOeg0/pF7rXjCYWQ2vDuwTUWda1mE7xjrq3pVDWy5/z0sZ+hr TBY1zz1PjpTXQGSVGE+x3qF93A4NHM/8ECWKtXW9XclzxQ7zuotQA293OYffyq7OUvBp 6UGpLDkFwNMA/NYudVni5wbyBE2YNbwseuN8ZAa/NauL4n97utH4R8LO1M0QNHw/shUB id1yCo5EIzemkh42uFzEUJ6UXMzTThlTYJc6oRnlOQnkJDlJVfncIL2fU3QOTu4I8qZk HJV9mTYRaFklsfNjzzJOjfjVhwhG5ccg14SMFgFHnmWhsQlHPk1PufmNflubRAkIG9YO XIsg== X-Gm-Message-State: AOUpUlGxRlYhgK5MZz/zeOTsLfMCtYqITOSvZ2sQGDcy5ntzONtuHaCT 7QVDufbxo6ac8OISAC4QGXWhmRo/ X-Google-Smtp-Source: AAOMgpdk7l0hZq8C1iH7pp1JavZfVlACGn+flaVKDZyUQcik7dDhlVzo2x+wCWueFH/evjjpdzEn2g== X-Received: by 2002:a17:902:102b:: with SMTP id b40-v6mr9877252pla.125.1531568391851; Sat, 14 Jul 2018 04:39:51 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:50 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:55 -0700 Message-Id: <1531568345-80246-4-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 03/13] lib: implement perf event ringbuffer for upcall. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer A flow missed by the match action table in ebpf triggers an upcall, which forwards the information to ovs-vswitchd using skb_perf_event_output helper function. The patch implements the userspace receiving logic. Signed-off-by: Joe Stringer Signed-off-by: William Tu --- lib/perf-event.c | 288 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/perf-event.h | 43 +++++++++ 2 files changed, 331 insertions(+) create mode 100644 lib/perf-event.c create mode 100644 lib/perf-event.h diff --git a/lib/perf-event.c b/lib/perf-event.c new file mode 100644 index 000000000000..c51c936033db --- /dev/null +++ b/lib/perf-event.c @@ -0,0 +1,288 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "perf-event.h" + +#include +#include +#include +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "openvswitch/util.h" +#include "ovs-atomic.h" + +VLOG_DEFINE_THIS_MODULE(perf_event); + +COVERAGE_DEFINE(perf_lost); +COVERAGE_DEFINE(perf_sample); +COVERAGE_DEFINE(perf_unknown); + +struct perf_event_lost { + struct perf_event_header header; + uint64_t id; + uint64_t lost; +}; + +struct rb_cursor { + struct perf_event_mmap_page *page; + uint64_t head, tail; +}; + +static int +perf_event_open_fd(int *fd_out, int cpu) +{ + struct perf_event_attr attr = { + .type = PERF_TYPE_SOFTWARE, + .size = sizeof(struct perf_event_attr), + .config = PERF_COUNT_SW_BPF_OUTPUT, + .sample_type = PERF_SAMPLE_RAW, + .watermark = 0, + .wakeup_events = 1, + }; + int fd, error; + + fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0); + if (fd < 0) { + error = errno; + VLOG_ERR("failed to open perf events (%s)", ovs_strerror(error)); + return error; + } + + if (ioctl(fd, PERF_EVENT_IOC_RESET, 1) == -1) { + error = errno; + VLOG_ERR("failed to reset perf events (%s)", ovs_strerror(error)); + return error; + } + + *fd_out = fd; + return 0; +} + +int +perf_channel_open(struct perf_channel *channel, int cpu, size_t page_len) +{ + int fd = 0, error; + void *page; + + error = perf_event_open_fd(&fd, cpu); + if (error) { + VLOG_WARN("failed to open perf channel (cpu %d): %s", + cpu, ovs_strerror(error)); + return error; + } + + page = mmap(NULL, page_len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (page == MAP_FAILED) { + error = errno; + VLOG_ERR("failed to mmap perf event fd (cpu %d): %s", + cpu, ovs_strerror(error)); + close(fd); + return error; + } + channel->page = page; + channel->cpu = cpu; + channel->fd = fd; + channel->length = page_len; + + return 0; +} + +int +perf_channel_set(struct perf_channel *channel, bool enable) +{ + int request = enable ? PERF_EVENT_IOC_ENABLE : PERF_EVENT_IOC_DISABLE; + + if (ioctl(channel->fd, request, 0) == -1) { + return errno; + } + return 0; +} + +void +perf_channel_close(struct perf_channel *channel) +{ + if (ioctl(channel->fd, PERF_EVENT_IOC_DISABLE, 0) == -1) { + int error = errno; + VLOG_ERR("failed to disable perf events (%s)", + ovs_strerror(error)); + } + + if (munmap((void *)channel->page, channel->length)) { + VLOG_WARN("Failed to unmap page for cpu %d: %s", + channel->cpu, ovs_strerror(errno)); + } + if (close(channel->fd)) { + VLOG_WARN("Failed to close page for cpu %d: %s", + channel->cpu, ovs_strerror(errno)); + } + channel->page = NULL; + channel->fd = 0; + channel->length = 0; +} + +static uint8_t * +rb_base(struct rb_cursor *cursor) +{ + return ((uint8_t *)cursor->page) + cursor->page->data_offset; +} + +static uint8_t * +rb_end(struct rb_cursor *cursor) +{ + return rb_base(cursor) + cursor->page->data_size; +} + +static uint64_t +cursor_event_offset(struct rb_cursor *cursor) +{ + return cursor->tail % cursor->page->data_size; +} + +static uint64_t +cursor_end_offset(struct rb_cursor *cursor) +{ + return cursor->head % cursor->page->data_size; +} + +static void * +cursor_peek(struct rb_cursor *cursor) +{ + void *next = rb_base(cursor) + cursor_event_offset(cursor); + void *end = rb_base(cursor) + cursor_end_offset(cursor); + + return (next != end) ? next : NULL; +} + +static uint8_t * +event_end(struct perf_event_header *header) +{ + return (uint8_t *)header + header->size; +} + +static bool +init_cursor(struct rb_cursor *cursor, + struct perf_event_mmap_page *page) +{ + uint64_t head = *((volatile uint64_t *)&page->data_head); + uint64_t tail = page->data_tail; + + /* Separate the read of 'data_head' from the read of the ringbuffer data.*/ + atomic_thread_fence(memory_order_consume); + + cursor->page = page; + cursor->head = head; + cursor->tail = tail; + + return head != tail; +} + +static void +perf_event_pull(struct perf_event_mmap_page *page, uint64_t tail) +{ + /* Separate reads in the ringbuffer from the writing of the tail. */ + atomic_thread_fence(memory_order_release); + page->data_tail = tail; +} + +static bool +perf_event_copy(struct rb_cursor *cursor, struct ofpbuf *buffer) +{ + struct perf_event_header *header = cursor_peek(cursor); + + if (!header) { + return false; + } + + ofpbuf_clear(buffer); + if (event_end(header) <= rb_end(cursor)) { + ofpbuf_push(buffer, header, header->size); + } else { + uint64_t seg1_len = rb_end(cursor) - (uint8_t *)header; + uint64_t seg2_len = header->size - seg1_len; + + ofpbuf_put(buffer, header, seg1_len); + ofpbuf_put(buffer, rb_base(cursor), seg2_len); + } + + buffer->header = buffer->data; + cursor->tail += header->size; + + return true; +} + +/* Reads the next full perf event from 'channel' into 'buffer'. + * + * 'buffer' may be reallocated, so the caller must subsequently uninitialize + * it. 'buf->header' will be updated to point to the beginning of the event, + * which starts with a 'struct perf_event_header'. + * + * Returns 0 if there is a new OVS event, otherwise a positive errno value. + * Returns EAGAIN if there are no new events. + */ +int +perf_channel_read(struct perf_channel *channel, struct ofpbuf *buffer) +{ + struct rb_cursor cursor; + int error = EAGAIN; + + if (!init_cursor(&cursor, channel->page)) { + return error; + } + + if (perf_event_copy(&cursor, buffer)) { + struct perf_event_header *header = buffer->header; + + switch (header->type) { + case PERF_RECORD_SAMPLE: + /* Success! */ + COVERAGE_INC(perf_sample); + error = 0; + break; + case PERF_RECORD_LOST: { + struct perf_event_lost *e = buffer->header; + COVERAGE_ADD(perf_lost, e->lost); + error = ENOBUFS; + break; + } + default: + COVERAGE_INC(perf_unknown); + error = EPROTO; + break; + } + + perf_event_pull(channel->page, cursor.tail); + } + + return error; +} + +void +perf_channel_flush(struct perf_channel *channel) +{ + struct perf_event_mmap_page *page = channel->page; + uint64_t head = *((volatile uint64_t *)&page->data_head); + + /* The memory_order_consume fence is unnecessary when we don't read any + * of the data from the ringbuffer - see perf_output_put_handle(). + * However, we still need to order the above read wrt to the tail write. */ + perf_event_pull(page, head); +} diff --git a/lib/perf-event.h b/lib/perf-event.h new file mode 100644 index 000000000000..74bc8e961dbc --- /dev/null +++ b/lib/perf-event.h @@ -0,0 +1,43 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef PERF_EVENT_H +#define PERF_EVENT_H 1 + +#include +#include "openvswitch/ofpbuf.h" +#include "openvswitch/types.h" + +struct perf_event_raw { + struct perf_event_header header; + uint32_t size; + /* Followed by uint8_t data[size]; */ +}; + +struct perf_channel { + struct perf_event_mmap_page *page; + int cpu; + int fd; + size_t length; +}; + +int perf_channel_open(struct perf_channel *, int cpu, size_t page_len); +int perf_channel_set(struct perf_channel *channel, bool enable); +int perf_channel_read(struct perf_channel *, struct ofpbuf *); +void perf_channel_flush(struct perf_channel *); +void perf_channel_close(struct perf_channel *); + +#endif /* PERF_EVENT_H */ From patchwork Sat Jul 14 11:38:56 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943917 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="H7NVONnB"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SST52NrSz9ryt for ; Sat, 14 Jul 2018 21:42:17 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 883C4C97; Sat, 14 Jul 2018 11:39:57 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id B8726C75 for ; Sat, 14 Jul 2018 11:39:54 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl0-f68.google.com (mail-pl0-f68.google.com [209.85.160.68]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A857B67E for ; Sat, 14 Jul 2018 11:39:53 +0000 (UTC) Received: by mail-pl0-f68.google.com with SMTP id 31-v6so13290086plc.4 for ; Sat, 14 Jul 2018 04:39:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6l1WI/wpybQ2MjKeZ2Ed2PLmwSId9oeH45iJYKTCrTs=; b=H7NVONnBPotnKTvmyCAfq4rWC8e/3CvCyM4qwGNJg/gx815fsPjl5B3Ct1x2ZVc9S1 oqeFGO5IlzYy+O69bQodXm/kH+YQ0HKJxVHAP4o3qR8oxYSftGYZQa68gwdkaXYtSyez 5fbbPPkqB8mPCwu/sgp4xKUXBUng4heD7iyP4Jpf6bEEn64xp3yceXRxTrd3lrCmng7a WYqMKVM/jeR/MDI/8Sp3nM+3WjpUTMLpDfBkFuXvNDUDxTaVkoXxkrxFgWybT6hklatk 71vAyknvDuhLqgJ+VREY/epObLsQk2iApRGxU16r1LUVcKI9rh3SQIieuVdQUbpTYYL9 Ekag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6l1WI/wpybQ2MjKeZ2Ed2PLmwSId9oeH45iJYKTCrTs=; b=EWuVSDyb8JLBUqEDBSG4tY/MG75cOCF4kyNXnft1RqJiewA6otrYGy2RT+uey271qN 06nnOaYYS06HaNGLfCZ2B3vdyZOKcoYZVQLnmNpY3kxnfM1bvLhIurEckzwERe7qfS9o 7S2RUoqf11n/GUK6iL/3SDX6pJd/vZY5jHKwo0Rbyc1NZvdHbS6FJAXH69IzuuXW85mP LPlS/D4c0KGyqj9bufXiRZQin1C9ggYfV2lInd9V9iKL4RLI21+sHOWJ4KJv1IYRBQEA 4iwoU9+Jg5mqppsTu5M/HRa3KmOwxwguZf7n9/CZl3XgoTNHBJlLbVwhLoVKKwvi5uv8 winA== X-Gm-Message-State: AOUpUlHVY9uY2hNX9NHL9mKiB0qdPWVoqaUKHTs4HNecHoEDUeT459fY P/JrX86L1MdqTxW2CF7gy+cT5rga X-Google-Smtp-Source: AAOMgpdITYCFloN8UYUgO7iSDO3d0F1dYGwYbMWmOQldp6shsc/VPSX4lsAqhsD2Ulp94yLycFmhsw== X-Received: by 2002:a17:902:292a:: with SMTP id g39-v6mr4630264plb.174.1531568392847; Sat, 14 Jul 2018 04:39:52 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.51 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:52 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:56 -0700 Message-Id: <1531568345-80246-5-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 04/13] lib/bpf: add support for managing bpf program/map. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer Through libbpf, the patch adds support for loading bpf program and maps, pinning the program and map to /sys/fs/bpf/ovs/, managing the file descriptor of each loaded map, and printting. Signed-off-by: Joe Stringer Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- lib/bpf.c | 524 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/bpf.h | 69 +++++++++ 2 files changed, 593 insertions(+) create mode 100644 lib/bpf.c create mode 100644 lib/bpf.h diff --git a/lib/bpf.c b/lib/bpf.c new file mode 100644 index 000000000000..48c677e54659 --- /dev/null +++ b/lib/bpf.c @@ -0,0 +1,524 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf.h" +#include "bpf/odp-bpf.h" +#include "util.h" +#include "openvswitch/dynamic-string.h" +#include "openvswitch/vlog.h" + +#define BPF_FS_PATH "/sys/fs/bpf/ovs/" +static const char *ovs_bpf_path = BPF_FS_PATH; + +#define MAX_BPF_PROG_ARRAY 64 //FIXME +VLOG_DEFINE_THIS_MODULE(bpf); + +static void +bpf_format_prog(struct ds *ds, const struct bpf_prog *prog) +{ + ds_put_format(ds, " %s:\n", prog->name); + ds_put_format(ds, " handle: %08"PRIx32"\n", prog->handle); +} + +typedef void map_element_writer_t(struct ds *, uint64_t, void *); + +static void +format_dp_stats(struct ds *ds, uint64_t key, void *value_) +{ + uint64_t value = *(uint64_t *)value_; + + switch (key) { + case OVS_DP_STATS_UNSPEC: + while (ds_chomp(ds, ' ')) { + /* nom nom nom */ + } + break; + case OVS_DP_STATS_HIT: + ds_put_cstr(ds, "hit"); + break; + case OVS_DP_STATS_MISSED: + ds_put_cstr(ds, "missed"); + break; + case OVS_DP_STATS_LOST: + ds_put_cstr(ds, "lost"); + break; + case OVS_DP_STATS_FLOWS: + ds_put_cstr(ds, "flows"); + break; + case OVS_DP_STATS_MASK_HIT: + ds_put_cstr(ds, "masks_hit"); + break; + case OVS_DP_STATS_MASKS: + ds_put_cstr(ds, "masks"); + break; + case OVS_DP_STATS_ERRORS: + ds_put_cstr(ds, "errors"); + break; + default: + ds_put_format(ds, "unknown-%"PRIu64, key); + break; + } + if (key) { + ds_put_format(ds, ": %"PRIu64"\n", value); + } +} + +static void +format_upcalls(struct ds *ds, uint64_t key, void *value OVS_UNUSED) +{ + ds_put_format(ds, "cpu-%"PRIu64"\n", key); +} + +static void +format_tailcalls(struct ds *ds, uint64_t key, void *value_) +{ + uint32_t value = *(uint32_t *)value_; + ds_put_format(ds, "index-%"PRIu64"prog_fd-%d\n", key, value); +} + +static int +lookup_elem(int fd, void *key, size_t key_len, void *value) +{ + int err = bpf_map_lookup_elem(fd, (uint64_t *)key, (uint64_t *)value); + if (err) { + struct ds ds = DS_EMPTY_INITIALIZER; + + ds_put_cstr(&ds, "error occurred looking up elem "); + ds_put_hex(&ds, key, key_len); + ds_put_format(&ds, ": %s", ovs_strerror(errno)); + VLOG_DBG("%s", ds_cstr(&ds)); + ds_destroy(&ds); + } + + return err; +} + +#define MAP_FORMAT_FUNC(NAME, KTYPE, VTYPE, PRINT_COUNT) \ + static void NAME(struct ds *ds, const struct bpf_map *map, \ + map_element_writer_t fmt) \ + { \ + KTYPE key = 0; \ + VTYPE value; \ + int count = 0; \ + \ + VLOG_DBG("reading map %s", map->name); \ + ds_put_format(ds, " %s:\n", map->name); \ + if (!lookup_elem(map->fd, &key, sizeof key, &value)) { \ + count++; \ + if (fmt) { \ + ds_put_cstr(ds, " "); \ + fmt(ds, key, &value); \ + } \ + } \ + while (!bpf_map_get_next_key(map->fd, &key, &key)) { \ + count++; \ + if (fmt) { \ + if (!lookup_elem(map->fd, &key, sizeof key, &value)) { \ + ds_put_cstr(ds, " "); \ + fmt(ds, key, &value); \ + } \ + } \ + }; \ + if (PRINT_COUNT) { \ + ds_put_format(ds, " count: %d\n", count); \ + } \ + } + +MAP_FORMAT_FUNC(bpf_format_map_stats, uint64_t, uint64_t, false); +MAP_FORMAT_FUNC(bpf_format_map_flows, uint64_t, struct bpf_flow, true); +MAP_FORMAT_FUNC(bpf_format_map_upcalls, uint32_t, uint32_t, true); +MAP_FORMAT_FUNC(bpf_format_map_tailcalls, uint32_t, uint32_t, true);//FIXME +//MAP_FORMAT_FUNC(bpf_format_map_dp_flow_stats, + +void +bpf_format_state(struct ds *ds, struct bpf_state *state) +{ + ds_put_format(ds, "path: %s\n", ovs_bpf_path); + ds_put_cstr(ds, "maps:\n"); + bpf_format_map_stats(ds, &state->datapath_stats, format_dp_stats); + bpf_format_map_flows(ds, &state->flow_table, NULL); + bpf_format_map_upcalls(ds, &state->upcalls, format_upcalls); + bpf_format_map_tailcalls(ds, &state->tailcalls, format_tailcalls); + //bpf_format_map_dp_flow_stats(ds, &state->dp_flow_stats, NULL); + ds_put_cstr(ds, "programs:\n"); + bpf_format_prog(ds, &state->downcall); + bpf_format_prog(ds, &state->egress); + bpf_format_prog(ds, &state->ingress); + bpf_format_prog(ds, &state->xdp); +} + +/* Populates 'state' with the standard set of programs and maps for openvswitch + * datapath as sourced from pinned programs at ovs_bpf_path. + * + * Returns 0 on success, or positive errno on error. If successful, the caller + * is resposible for releasing the resources in 'state' via bpf_put(). + */ +int +bpf_get(struct bpf_state *state, bool verbose) +{ + const struct { + int *fd; + const char *path; + } objs[] = { + /* BPF Programs */ + {&state->ingress.fd, "ingress/0"}, + {&state->egress.fd, "egress/0"}, + {&state->downcall.fd, "downcall/0"}, + {&state->xdp.fd, "xdp/0"}, + /* BPF Maps */ + {&state->upcalls.fd, "upcalls"}, + {&state->flow_table.fd, "flow_table"}, + {&state->datapath_stats.fd, "datapath_stats"}, + {&state->tailcalls.fd, "tailcalls"}, + {&state->execute_actions.fd, "execute_actions"}, + {&state->dp_flow_stats.fd, "dp_flow_stats"}, + }; + int i, k, error = 0; + char buf[BUFSIZ]; + int prog_array_fd; + + for (i = 0; i < ARRAY_SIZE(objs); i++) { + struct stat s; + + //Failed to load /sys/fs/bpf/ovs/progs/ingress_0: + snprintf(buf, ARRAY_SIZE(buf), "%s/%s", ovs_bpf_path, objs[i].path); + if (stat(buf, &s)) { + error = errno; + break; + } + error = bpf_obj_get(buf); + if (error > 0) { + VLOG_DBG("Loaded BPF object at %s fd %d", buf, error); + *objs[i].fd = error; + error = 0; + continue; + } else { + error = errno; + break; + } + } + + prog_array_fd = state->tailcalls.fd; + + VLOG_DBG("start loading/pinning program array\n"); + for (k = 0; k < BPF_MAX_PROG_ARRAY; k++) { + struct stat s; + int prog_fd; + + state->tailarray[k].fd = 0; + + snprintf(buf, ARRAY_SIZE(buf), "%s/tail-%d/0", ovs_bpf_path, k); + if (stat(buf, &s)) { + continue; + } + + prog_fd = bpf_obj_get(buf); + if (prog_fd > 0) { + VLOG_DBG("Loaded BPF object at %s", buf); + state->tailarray[k].fd = prog_fd; + error = bpf_map_update_elem(prog_array_fd, &k, &prog_fd, BPF_ANY); + if (error < 0) { + VLOG_ERR("Can not add %s into BPF_MAP_PROG_ARRAY\n", buf); + break; + } + } else { + error = errno; + break; + } + } + + if (error) { + VLOG(verbose ? VLL_WARN : VLL_DBG, "Failed to load %s: %s", + buf, ovs_strerror(error)); + + for (int j = 0; j < i; j++) { + close(*objs[j].fd); + *objs[j].fd = 0; + } + + for (int j = 0; j < BPF_MAX_PROG_ARRAY; j++) { + if (state->tailarray[j].fd) + close(state->tailarray[j].fd); + } + } + + if (!error) { + state->ingress.handle = INGRESS_HANDLE; + state->ingress.name = xstrdup("ovs_cls_ingress"); + state->egress.handle = EGRESS_HANDLE; + state->egress.name = xstrdup("ovs_cls_egress"); + state->downcall.handle = INGRESS_HANDLE; + state->downcall.name = xstrdup("ovs_cls_downcall"); + state->upcalls.name = xstrdup("upcalls"); + state->xdp.name = xstrdup("xdp"); + state->flow_table.name = xstrdup("flow_table"); + state->datapath_stats.name = xstrdup("datapath_stats"); + state->dp_flow_stats.name = xstrdup("dp_flow_stats"); + // add parser, lookup, action, deparser + state->tailcalls.name = xstrdup("tailcalls"); + + } + + return error; +} + +static void +xclose(int fd, const char *name) +{ + int error = close(fd); + if (error) { + VLOG_WARN("Failed to close BPF fd %s: %s", name, ovs_strerror(errno)); + } +} + +/* Frees resources allocated by bpf_put(). */ +void +bpf_put(struct bpf_state *state) +{ + xclose(state->ingress.fd, state->ingress.name); + xclose(state->egress.fd, state->egress.name); + xclose(state->downcall.fd, state->downcall.name); + xclose(state->upcalls.fd, state->upcalls.name); + xclose(state->xdp.fd, state->xdp.name); + xclose(state->flow_table.fd, "ovs_map_flow_table"); + xclose(state->datapath_stats.fd, "ovs_datapath_stats"); + xclose(state->dp_flow_stats.fd, state->dp_flow_stats.name); + free((void *)state->ingress.name); + free((void *)state->egress.name); + free((void *)state->downcall.name); + free((void *)state->upcalls.name); + free((void *)state->xdp.name); + free((void *)state->flow_table.name); + free((void *)state->datapath_stats.name); + free((void *)state->dp_flow_stats.name); +} + +static void +process(struct bpf_object *obj) +{ + struct bpf_program *prog; + struct bpf_map *map; + + VLOG_DBG("Opened object '%s'\n", bpf_object__name(obj)); + VLOG_DBG("Programs:\n"); + bpf_object__for_each_program(prog, obj) { + const char *title = bpf_program__title(prog, false); + int error; + + VLOG_DBG(" - %s\n", title); + if (strstr(title, "xdp")) { + error = bpf_program__set_xdp(prog); + } else { + error = bpf_program__set_sched_cls(prog); // or sched_act? + } + if (error) { + VLOG_WARN("Failed to set '%s' prog type: %s\n", title, + ovs_strerror(error)); + } + + } + + if (VLOG_IS_DBG_ENABLED()) { + VLOG_DBG("Maps:\n"); + bpf_map__for_each(map, obj) { + const char *name = bpf_map__name(map); + VLOG_DBG(" - %s\n", name); + } + } +} + +/* Attempts to load the BPF datapath in the form of an ELF compiled for the BPF + * ISA in 'path', install it into the kernel, and pin it to the filesystem + * under ovs_bpf_path/{maps,progs}/foo. + * + * Returns 0 on success, or positive errno on error. + */ +int +bpf_load(const char *path) +{ + const char *stage = NULL; + struct bpf_state state; + struct bpf_object *obj; + long error; + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + + if ((error = setrlimit(RLIMIT_MEMLOCK, &r))) { + VLOG_ERR("Failed to set rlimit %s", ovs_strerror(error)); + return error; + } + + if (!bpf_get(&state, false)) { + /* XXX: Restart; Upgrade */ + VLOG_INFO("Re-using preloaded BPF datapath"); + bpf_put(&state); + return 0; + } + + obj = bpf_object__open(path); + error = libbpf_get_error(obj); + if (error) { + stage = "open"; + goto out; + } + process(obj); + error = bpf_object__load(obj); + if (error) { + stage = "load"; + goto close; + } + error = bpf_object__pin(obj, ovs_bpf_path); + if (error) { + stage = "pin"; + goto close; + } + + error = bpf_object__unload(obj); + if (error) { + stage = "unload"; + goto close; + } + +close: + bpf_object__close(obj); +out: + if (error < 0) { + error = -error; + } else if (!error) { + VLOG_DBG("Loaded BPF datapath from %s", path); + } + if (error > __LIBBPF_ERRNO__START && error < __LIBBPF_ERRNO__END) { + char buf[BUFSIZ]; + + libbpf_strerror(error, buf, ARRAY_SIZE(buf)); + VLOG_WARN("Failed to %s BPF datapath: %s\n", stage ? stage : "", buf); + error = EINVAL; + } + return error; +} + +#define PRINT_FN(NAME) \ +static int \ +print_##NAME(const char *fmt, ...) \ +{ \ + va_list args; \ + \ + va_start(args, fmt); \ + vlog_valist(&this_module, VLL_##NAME, fmt, args); \ + va_end(args); \ + return 0; \ +} + +PRINT_FN(WARN); +PRINT_FN(INFO); +PRINT_FN(DBG); + +#define stringize(x) #x + +static int OVS_UNUSED +mount_bpf(void) +{ + struct statfs st_fs; + char path[PATH_MAX]; + char type[NAME_MAX]; + int err = 0; + FILE *fp; + int idx; + + fp = fopen("/proc/mounts", "r"); + if (fp) { + const char *fmt; + int match; + + fmt = "%*s %"stringize(PATH_MAX)"s %#"stringize(NAME_MAX)"s %*s\n"; + for (match = 0; match != EOF; match = fscanf(fp, fmt, path, type)) { + if (match == 2 && !strcmp(type, "bpf")) + break; + } + if (fclose(fp)) { + err = errno; + VLOG_INFO("Failed to close /proc/mounts: %s", ovs_strerror(err)); + } + if (strcmp(type, "bpf")) { + err = errno; + VLOG_DBG("Couldn't find bpf mountpoint in /proc/mounts"); + } + } else { + err = errno; + VLOG_INFO("Cannot open /proc/mounts: %s", ovs_strerror(err)); + } + if (err || strlen(path) == 0) { + VLOG_DBG("Using %s for BPF filesystem mountpoint", BPF_FS_PATH); + strcpy(path, BPF_FS_PATH); + } + + if (!statfs(path, &st_fs) && st_fs.f_type == BPF_FS_MAGIC) { + VLOG_INFO("BPF filesystem already mounted to %s", path); + return 0; + } + + if (mkdir(path, 0755) && errno != EEXIST) { + VLOG_WARN("Failed to create %s: %s", path, ovs_strerror(errno)); + return errno; + } + + if (mount("bpf", path, "bpf", 0, NULL)) { + VLOG_WARN("Failed to mount BPF filesystem: %s", ovs_strerror(errno)); + return errno; + } + + idx = strlen(path); + if (idx >= PATH_MAX - strlen("/ovs")) { + VLOG_WARN("BPF filesystem path \"%s\" is too long.", path); + return ENAMETOOLONG; + } else { + strncpy(&path[idx], "/ovs", strlen("/ovs")); + } + + if (mkdir(path, 0755) && errno != EEXIST) { + VLOG_WARN("Failed to create %s: %s", path, ovs_strerror(errno)); + return errno; + } + + if (ovs_bpf_path) { + free(CONST_CAST(char *, ovs_bpf_path)); + } + ovs_bpf_path = xstrdup(path); + return 0; +} + +int +bpf_init(void) +{ + libbpf_set_print(print_WARN, print_INFO, print_DBG); + /* skip using mount_bpf */ + return 0; +} diff --git a/lib/bpf.h b/lib/bpf.h new file mode 100644 index 000000000000..4b5afaf4f77f --- /dev/null +++ b/lib/bpf.h @@ -0,0 +1,69 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef LIB_BPF_H +#define LIB_BPF_H 1 + +#include +#include "openvswitch/compiler.h" + +#define INGRESS_HANDLE 0xFFFFFFF2 +#define EGRESS_HANDLE 0xFFFFFFF3 + +struct bpf_prog { + const char *name; + uint32_t handle; /* tc handle */ + int fd; +}; + +struct bpf_map { + const char *name; + int fd; +}; + +#if HAVE_BPF +struct bpf_state; +struct ds; + +#define BPF_MAX_PROG_ARRAY 64 +struct bpf_state { + /* File descriptors for programs. */ + struct bpf_prog ingress; /* BPF_PROG_TYPE_SCHED_CLS */ + struct bpf_prog egress; /* BPF_PROG_TYPE_SCHED_CLS */ + struct bpf_prog downcall; /* BPF_PROG_TYPE_SCHED_CLS */ + struct bpf_prog tailarray[BPF_MAX_PROG_ARRAY]; + struct bpf_prog xdp; /* BPF_PROG_TYPE_XDP */ + // william: struct bpf_prog parser, deparser, action, + + struct bpf_map upcalls; /* BPF_MAP_TYPE_PERF_ARRAY */ + struct bpf_map flow_table; /* BPF_MAP_TYPE_HASH */ + struct bpf_map datapath_stats; /* BPF_MAP_TYPE_ARRAY */ + struct bpf_map tailcalls; /* BPF_PROG_TYPE_PROG_ARRARY */ + struct bpf_map execute_actions; /* BPF_MAP_TYPE_ARRAY */ + struct bpf_map dp_flow_stats; /* BPF_MAP_TYPE_HASH */ +}; + +int bpf_get(struct bpf_state *state, bool verbose); +void bpf_put(struct bpf_state *state); +int bpf_load(const char *path); +int bpf_init(void); +void bpf_format_state(struct ds *ds, struct bpf_state *state); +#else /* !HAVE_BPF */ +static inline int bpf_load(const char *path OVS_UNUSED) { return EOPNOTSUPP; } +static inline int bpf_init(void) { return 0; } +#endif /* HAVE_BPF */ + +#endif /* LIB_BPF_H */ From patchwork Sat Jul 14 11:38:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943918 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="B13tYcYE"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSTy0RBvz9ryt for ; Sat, 14 Jul 2018 21:43:02 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 74CAFCB2; Sat, 14 Jul 2018 11:40:01 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id D3D68BC8 for ; Sat, 14 Jul 2018 11:39:59 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl0-f68.google.com (mail-pl0-f68.google.com [209.85.160.68]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E3A0A795 for ; Sat, 14 Jul 2018 11:39:56 +0000 (UTC) Received: by mail-pl0-f68.google.com with SMTP id o7-v6so2919178plk.10 for ; Sat, 14 Jul 2018 04:39:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bhkdJbGX+jEMQiBzVj/maTi47CTwwpDxJwIoeW4quHw=; b=B13tYcYEEro1s6NWq3DS0yVQUlVxPcWpNi9C6GMwq4PA3UJHFHgaD88u6zO4/sbhkM R6qZKIHUv7wNIR2WjkwxHDbfDJJtMUGznAyxIysWdHDTNzLPnGKBODvs/haLUTk+EvxS sWUbdOMq93aJZ+u1tiSk74B20wzgM19EeE6zU58cBNyhFjdyB4qKHFp4eDbfjTnAETV2 VXxdnQIT5cTAVSYGpv+gRv61d/YdPlLeaSUO0+7aSe7Ko7ilNo0jyPgCpcOCktPMxJCN pQQ3RzFi3YvjdYKo+vQOKeaKQ2nhm/f6Xw/YttsPI/FVW5Finht2IqEFlA/CQ33dthaf xvgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bhkdJbGX+jEMQiBzVj/maTi47CTwwpDxJwIoeW4quHw=; b=AYHbAUFKVUsE81vez3LVR71gBh0SupAiiK0HXB4MEUymp+lTgAsS81ybXqbdQZ1OoN Gv1lAx1YAdBfnFihmbXrDQQSba6lPgCopaTaFBJxouozYZ8HIHc0T0syM56306t3RE5b HXoDYGEBRuF0KnQhLGmH0WbQCYDjbzFIwtcAaua9hm7ZFMJRhM3gesDNt3IagPIv9ZhO LuPYjMSpmvRifTQ2sDtLgaJZJa+U3wU0w6UbZzm7W+0/OfRo+Rv/PcXC0dRD0ct0ErRK RM8mKqTbNcRxPBUdd02jsvvZk1Eapwdvo7KFz6Ut5CbFNgZHSIhsNcS2ZY+zSylxPLMh w8Gw== X-Gm-Message-State: AOUpUlEJOO54oSlYp9IGY5Xy7oTu2b/dx+uu8UkPeTthTHE29mU0kTqY zUelYjhFJj6dVWfbEp37bigfHC4S X-Google-Smtp-Source: AAOMgpcwDOMzmQntgUxGeNlxQQUfGs6k6pCFbuVJkeoDVj09Uo4FbaCqGzWyhO4quA+NyEmVi5iT3w== X-Received: by 2002:a17:902:e3:: with SMTP id a90-v6mr10017501pla.227.1531568395659; Sat, 14 Jul 2018 04:39:55 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:54 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:57 -0700 Message-Id: <1531568345-80246-6-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, T_FILL_THIS_FORM_SHORT autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 05/13] dpif: add 'dpif-bpf' provider. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer Implement a new datapath interface for use with BPF datapaths. Like dpif-netlink, dpif-bpf is backed by an implementation which resides within the kernel. It uses the BPF functionality available in recent versions of Linux to create the datapath. Unlike dpif-netlink there is no datapath notion of a bridge with ports attached; dpif-bpf is implemented by attaching BPF programs directly to individual devices using TC. Upcalls are implemented using a perf event ringbuffer, which is polled by handler threads. Flow execution is implemented by sending the packet plus metadata on a dedicated tap device, where there is an BPF program that understands the format of the packet coming from userspace. When this device receives a message, it strips the metadata, uses it to determine how to execute the packet, then forwards the packet onwards. This initial implementation has a number of limitations which are expected to go away over time: * The set of matches and actions supported by the datapath is not as wide as the full set known by OVS, so if a flow cannot be expressed in the current eBPF API, OVS will log errors and return errors during flow put. * Only the input port and packet length is passed as metadata from the datapath to userspace during upcall. Key extraction is done purely from the packet provided from the datapath. * Conversely, only the output port is sent down during execution. No other actions are supported currently; and only one output is supported. * Ingress policing cannot be configured on BPF datapath devices. * On startup, if the OVS BPF datapath is already loaded into the kernel and pinned to the filesystem, it will reuse this datapath, even if the datapath is out-of-date. Documentation/intro/install/bpf.rst contains further information on how to build and use the bpf datapath. For more details on the design and implementation, see our OSR paper: [1] https://dl.acm.org/citation.cfm?id=3139657 [2] http://openvswitch.org/support/ovscon2016/7/1120-tu.pdf Signed-off-by: Joe Stringer Signed-off-by: William Tu Signed-off-by: Yifeng Sun Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- lib/dpif-bpf.c | 1996 +++++++++++++++++++++++++++++++++++++++++++++++++++ lib/dpif-provider.h | 1 + lib/dpif.c | 3 + 3 files changed, 2000 insertions(+) create mode 100644 lib/dpif-bpf.c diff --git a/lib/dpif-bpf.c b/lib/dpif-bpf.c new file mode 100644 index 000000000000..34f98465ba4f --- /dev/null +++ b/lib/dpif-bpf.c @@ -0,0 +1,1996 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include +#include +#include +#include +#include +#include + +#include "bpf.h" +#include "bpf/odp-bpf.h" +#include "dirs.h" +#include "dpif.h" +#include "dpif-provider.h" +#include "dpif-bpf-odp.h" +#include "dpif-netlink-rtnl.h" +#include "fat-rwlock.h" +#include "netdev.h" +#include "netdev-provider.h" +#include "netdev-vport.h" +#include "odp-util.h" +#include "ovs-numa.h" +#include "perf-event.h" +#include "sset.h" +#include "openvswitch/poll-loop.h" + +VLOG_DEFINE_THIS_MODULE(dpif_bpf); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(60, 60); + +/* Protects against changes to 'bpf_datapaths'. */ +static struct ovs_mutex bpf_datapath_mutex = OVS_MUTEX_INITIALIZER; + +/* Contains all 'struct dpif_bpf_dp's. */ +static struct shash bpf_datapaths OVS_GUARDED_BY(bpf_datapath_mutex) + = SHASH_INITIALIZER(&bpf_datapaths); + +struct bpf_handler { + /* Into owning dpif_bpf_dp->channels */ + int offset; + int count; + int index; /* next channel to use */ +}; + +struct dpif_bpf_dp { + struct dpif *dpif; + const char *const name; + struct ovs_refcount ref_cnt; + atomic_flag destroyed; + + /* Ports. + * + * Any lookup into 'ports' requires taking 'port_mutex'. */ + struct ovs_mutex port_mutex; + struct hmap ports_by_odp OVS_GUARDED; + struct hmap ports_by_ifindex OVS_GUARDED; + struct seq *port_seq; /* Incremented whenever a port changes. */ + uint64_t last_seq; + + /* Handlers */ + struct fat_rwlock upcall_lock; + uint32_t n_handlers; + struct bpf_handler *handlers; + + /* Upcall channels. */ + size_t page_size; + int n_pages; + int n_channels; + struct perf_channel channels[]; +}; + +struct dpif_bpf { + struct dpif dpif; + struct dpif_bpf_dp *dp; +}; + +struct dpif_bpf_port { + struct hmap_node odp_node; /* Node in dpif_bpf_dp 'ports_by_odp'. */ + struct hmap_node if_node; /* Node in dpif_bpf_dp 'ports_by_ifindex'. */ + struct netdev *netdev; + odp_port_t port_no; + int ifindex; + char *type; /* Port type as requested by user. */ + struct netdev_saved_flags *sf; + + unsigned n_rxq; + struct netdev_rxq **rxqs; +}; + +static void vlog_hex_dump(const u8 *buf, size_t count) +{ + struct ds ds = DS_EMPTY_INITIALIZER; + ds_put_hex_dump(&ds, buf, count, 0, false); + VLOG_INFO("\n%s", ds_cstr(&ds)); + ds_destroy(&ds); +} + +int create_dp_bpf(const char *name, struct dpif_bpf_dp **dp); +static void dpif_bpf_close(struct dpif *dpif); +static int do_add_port(struct dpif_bpf_dp *dp, const char *devname, + const char *type, odp_port_t port_no) + OVS_REQUIRES(dp->port_mutex); +static void do_del_port(struct dpif_bpf_dp *dp, struct dpif_bpf_port *port) + OVS_REQUIRES(dp->port_mutex); +static int dpif_bpf_delete_all_flow(void); + +static struct dpif_bpf * +dpif_bpf_cast(const struct dpif *dpif) +{ + ovs_assert(dpif->dpif_class == &dpif_bpf_class); + return CONTAINER_OF(dpif, struct dpif_bpf, dpif); +} + +static struct dpif_bpf_dp * +get_dpif_bpf_dp(const struct dpif *dpif) +{ + return dpif_bpf_cast(dpif)->dp; +} + +static struct dp_bpf { + struct bpf_state bpf; + struct netdev *outport; /* Used for downcall. */ +} datapath; + +static int +configure_outport(struct netdev *outport) +{ + int error; + + error = netdev_set_filter(outport, &datapath.bpf.downcall); + if (error) { + return error; + } + + error = netdev_set_flags(outport, NETDEV_UP, NULL); + if (error) { + return error; + } + + return 0; +} + +static int +dpif_bpf_init(void) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + static int error = 0; + + if (ovsthread_once_start(&once)) { + struct netdev *outport; + + error = bpf_get(&datapath.bpf, true); + if (!error) { + /* FIXME: should we named ovs-system? */ + error = netdev_open("ovs-system", "tap", &outport); + if (!error) { + VLOG_INFO("%s: created BPF tap downcall device %s", + __func__, outport->name); + + error = configure_outport(outport); + if (error) { + VLOG_ERR("%s: configure downcall device failed", __func__); + netdev_close(outport); + } else { + datapath.outport = outport; + } + } + } + + if (!error) { + dpif_bpf_delete_all_flow(); + } + ovsthread_once_done(&once); + } + return error; +} + +static int +dpif_bpf_enumerate(struct sset *all_dps, + const struct dpif_class *dpif_class OVS_UNUSED) +{ + struct shash_node *node; + + ovs_mutex_lock(&bpf_datapath_mutex); + SHASH_FOR_EACH(node, &bpf_datapaths) { + sset_add(all_dps, node->name); + } + ovs_mutex_unlock(&bpf_datapath_mutex); + + return 0; +} + +static const char +*dpif_bpf_port_open_type(const struct dpif_class *dpif_class OVS_UNUSED, + const char *type) +{ + return strcmp(type, "internal") ? type : "tap"; +} + +static struct dpif * +create_dpif_bpf(struct dpif_bpf_dp *dp) + OVS_REQUIRES(bpf_datapath_mutex) +{ + uint16_t netflow_id = hash_string(dp->name, 0); + struct dpif_bpf *dpif; + + ovs_refcount_ref(&dp->ref_cnt); + + dpif = xmalloc(sizeof *dpif); + dpif_init(&dpif->dpif, &dpif_bpf_class, dp->name, netflow_id >> 8, netflow_id); + dpif->dp = dp; + + return &dpif->dpif; +} + +static int +dpif_bpf_open(const struct dpif_class *dpif_class OVS_UNUSED, + const char *name, bool create OVS_UNUSED, struct dpif **dpifp) +{ + struct dpif_bpf_dp *dp; + int error; + + error = dpif_bpf_init(); + if (error) { + VLOG_ERR("dpif_bpf_init failed"); + return error; + } + + ovs_mutex_lock(&bpf_datapath_mutex); + dp = shash_find_data(&bpf_datapaths, name); + if (!dp) { + error = create ? create_dp_bpf(name, &dp) : ENODEV; + } else { + ovs_assert(dpif_class == &dpif_bpf_class); + error = create ? EEXIST : 0; + } + if (!error) { + *dpifp = create_dpif_bpf(dp); + if (create) { /* XXX */ + dp->dpif = *dpifp; + } + } + ovs_mutex_unlock(&bpf_datapath_mutex); + + return error; +} + +static int +perf_event_channels_init(struct dpif_bpf_dp *dp) +{ + size_t length = dp->page_size * (dp->n_pages + 1); + int error = 0; + int i, cpu; + + for (cpu = 0; cpu < dp->n_channels; cpu++) { + struct perf_channel *channel = &dp->channels[cpu]; + + error = perf_channel_open(channel, cpu, length); + if (error) { + goto error; + } + } + +error: + if (error) { + for (i = 0; i < cpu; i++) { + perf_channel_close(&dp->channels[cpu]); + } + } + + return error; +} + +static void +dpif_bpf_free(struct dpif_bpf_dp *dp) + OVS_REQUIRES(bpf_datapath_mutex) +{ + shash_find_and_delete(&bpf_datapaths, dp->name); + + if (ovs_refcount_read(&dp->ref_cnt) == 0) { + ovs_mutex_destroy(&dp->port_mutex); + seq_destroy(dp->port_seq); + fat_rwlock_destroy(&dp->upcall_lock); + hmap_destroy(&dp->ports_by_ifindex); + hmap_destroy(&dp->ports_by_odp); + if (dp->n_handlers) { + free(dp->handlers); + } + free(dp); + } +} + +int +create_dp_bpf(const char *name, struct dpif_bpf_dp **dp_) + OVS_REQUIRES(bpf_datapath_mutex) +{ + int max_cpu; + struct dpif_bpf_dp *dp; + int i, error; + + max_cpu = ovs_numa_get_n_cores(); + + dp = xzalloc(sizeof *dp + max_cpu * sizeof(struct perf_channel)); + ovs_refcount_init(&dp->ref_cnt); + atomic_flag_clear(&dp->destroyed); + hmap_init(&dp->ports_by_odp); + hmap_init(&dp->ports_by_ifindex); + fat_rwlock_init(&dp->upcall_lock); + dp->port_seq = seq_create(); + ovs_mutex_init(&dp->port_mutex); + dp->n_pages = 8; + dp->page_size = sysconf(_SC_PAGESIZE); + dp->n_channels = max_cpu; + dp->last_seq = seq_read(dp->port_seq); + + *CONST_CAST(const char **, &dp->name) = xstrdup(name); + shash_add(&bpf_datapaths, name, dp); /* XXX */ + + error = perf_event_channels_init(dp); + if (error) { + dpif_bpf_free(dp); + return error; + } + + ovs_assert(datapath.bpf.upcalls.fd != -1); + + for (i = 0; i < dp->n_channels; i++) { + error = bpf_map_update_elem(datapath.bpf.upcalls.fd, &i, + &dp->channels[i].fd, 0); + if (error) { + VLOG_WARN("failed to insert channel fd on cpu=%d: %s", + i, ovs_strerror(error)); + goto out; + } + } + +out: + if (error) { + dpif_bpf_free(dp); + } + if (!error) { + *dp_ = dp; + } + return 0; +} + +static void +dpif_bpf_close(struct dpif *dpif_) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + + ovs_mutex_lock(&bpf_datapath_mutex); + if (ovs_refcount_unref_relaxed(&dp->ref_cnt) == 1) { + struct dpif_bpf_port *port, *next; + int i; + + fat_rwlock_wrlock(&dp->upcall_lock); + for (i = 0; i < dp->n_channels; i++) { + struct perf_channel *channel = &dp->channels[i]; + + perf_channel_close(channel); + } + fat_rwlock_unlock(&dp->upcall_lock); + + ovs_mutex_lock(&dp->port_mutex); + HMAP_FOR_EACH_SAFE (port, next, odp_node, &dp->ports_by_odp) { + do_del_port(dp, port); + } + ovs_mutex_unlock(&dp->port_mutex); + dpif_bpf_free(dp); + } + ovs_mutex_unlock(&bpf_datapath_mutex); + + free(dpif_bpf_cast(dpif_)); +} + +static int +dpif_bpf_destroy(struct dpif *dpif_) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + + if (!atomic_flag_test_and_set(&dp->destroyed)) { + if (ovs_refcount_unref_relaxed(&dp->ref_cnt) == 1) { + /* Can't happen: 'dpif' still owns a reference to 'dp'. + * The workflow is first call dpif_class->destroy() then + * dpif->close(). */ + OVS_NOT_REACHED(); + } + } +#if 0 + if (datapath.outport) { + netdev_close(datapath.outport); + } +#endif + + return 0; +} + +static int +dpif_bpf_get_stats(const struct dpif *dpif OVS_UNUSED, + struct dpif_dp_stats *stats) +{ + uint32_t key, n_flows = 0; + struct bpf_flow_key flow_key; + int err = 0; + + memset(stats, 0, sizeof(*stats)); + key = OVS_DP_STATS_HIT; + if (bpf_map_lookup_elem(datapath.bpf.datapath_stats.fd, &key, + &stats->n_hit)) { + VLOG_INFO("datapath_stats lookup failed (%d): %s", key, + ovs_strerror(errno)); + } + key = OVS_DP_STATS_MISSED; + if (bpf_map_lookup_elem(datapath.bpf.datapath_stats.fd, &key, + &stats->n_missed)) { + VLOG_INFO("datapath_stats lookup failed (%d): %s", key, + ovs_strerror(errno)); + } + + /* Count the number of datapath flow entries */ + memset(&flow_key, 0, sizeof flow_key); + do { + err = bpf_map_get_next_key(datapath.bpf.flow_table.fd, + &flow_key, &flow_key); + if (!err) { + n_flows++; + } + } while (!err); + + stats->n_flows = n_flows; + + /* XXX: Other missing stats */ + return 0; +} + +static struct dpif_bpf_port * +bpf_lookup_port(const struct dpif_bpf_dp *dp, odp_port_t port_no) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port; + + HMAP_FOR_EACH_WITH_HASH (port, odp_node, netdev_hash_port_no(port_no), + &dp->ports_by_odp) { + if (port->port_no == port_no) { + return port; + } + } + return NULL; +} + +static odp_port_t +choose_port(struct dpif_bpf_dp *dp) + OVS_REQUIRES(dp->port_mutex) +{ + uint32_t port_no; + + for (port_no = 1; port_no <= UINT16_MAX; port_no++) { + if (!bpf_lookup_port(dp, u32_to_odp(port_no))) { + return u32_to_odp(port_no); + } + } + + return ODPP_NONE; +} + +static int +get_port_by_name(struct dpif_bpf_dp *dp, const char *devname, + struct dpif_bpf_port **portp) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port; + + HMAP_FOR_EACH (port, odp_node, &dp->ports_by_odp) { + if (!strcmp(netdev_get_name(port->netdev), devname)) { + *portp = port; + return 0; + } + } + + *portp = NULL; + return ENOENT; +} + +static uint32_t +hash_ifindex(int ifindex) +{ + return hash_int(ifindex, 0); +} + +static int +get_port_by_ifindex(struct dpif_bpf_dp *dp, int ifindex, + struct dpif_bpf_port **portp) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port; + + HMAP_FOR_EACH_WITH_HASH (port, if_node, hash_ifindex(ifindex), + &dp->ports_by_ifindex) { + if (port->ifindex == ifindex) { + *portp = port; + return 0; + } + } + + *portp = NULL; + return ENOENT; +} + +static odp_port_t +ifindex_to_odp(struct dpif_bpf_dp *dp, int ifindex) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port; + + if (get_port_by_ifindex(dp, ifindex, &port)) { + return ODPP_NONE; + } + + return port->port_no; +} + +static bool output_to_local_stack(struct netdev *netdev) +{ + return !strcmp(netdev_get_type(netdev), "tap"); +} + +static bool netdev_support_xdp(struct netdev *netdev OVS_UNUSED) +{ + return true; +} + +static uint32_t +get_port_flags(struct netdev *netdev) +{ + return output_to_local_stack(netdev) ? OVS_BPF_FLAGS_TX_STACK : 0; +} + +static uint16_t +odp_port_to_ifindex(struct dpif_bpf_dp *dp, odp_port_t port_no, uint32_t *flags) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port = bpf_lookup_port(dp, port_no); + + if (port) { + if (flags) { + *flags = get_port_flags(port->netdev); + } + return port->ifindex; + } + return 0; +} + +/* Modelled after dpif-netdev 'port_create', minus pmd and txq logic, plus bpf + * filter set. */ +static int +port_create(const char *devname, const char *type, + odp_port_t port_no, struct dpif_bpf_port **portp) +{ + struct netdev_saved_flags *sf; + struct dpif_bpf_port *port; + enum netdev_flags flags; + struct netdev *netdev; + int n_open_rxqs = 0; + int i, error; + int ifindex; + + *portp = NULL; + + /* Open and validate network device. */ + error = netdev_open(devname, type, &netdev); + + VLOG_DBG("%s %s type %s error %d", __func__, devname, type, error); + if (error) { + return error; + } + /* XXX reject non-Ethernet devices */ + + netdev_get_flags(netdev, &flags); + if (flags & NETDEV_LOOPBACK) { + VLOG_ERR_RL(&rl, "%s: cannot add a loopback device", devname); + error = EINVAL; + goto out; + } + + if (netdev_is_reconf_required(netdev)) { + error = netdev_reconfigure(netdev); + if (error) { + goto out; + } + } + + ifindex = netdev_get_ifindex(netdev); + if (ifindex < 0) { + VLOG_WARN_RL(&rl, "%s: Failed to get ifindex", devname); + error = -ifindex; + goto out; + } + + VLOG_DBG("%s ifindex = %d", devname, ifindex); + + /* For all internal port, ex: br0, br-underlay, br-int, + we set bpf program only to its egress queue. (due to the + natural of tap device). For other types, ex: eth0, vxlan_sys, + we set bpf program to its ingress queue. + + A tap device's egress queue is tied to a socket for userspace + to receive the packet by open(/dev/tun0). On the other hand, + a send to the socket will show up in the tap device's ingress queue. + */ + if (output_to_local_stack(netdev)) { + error = netdev_set_filter(netdev, &datapath.bpf.egress); + } else { + error = netdev_set_filter(netdev, &datapath.bpf.ingress); + } + if (error) { + goto out; + } + + if (netdev_support_xdp(netdev)) { + error = netdev_set_xdp(netdev, &datapath.bpf.xdp); + if (error) { + VLOG_WARN("%s XDP set failed", __func__); + goto out; + } + VLOG_DBG("%s %s XDP set done", __func__, netdev->name); + } + + port = xzalloc(sizeof *port); + port->port_no = port_no; + port->ifindex = ifindex; + port->netdev = netdev; + port->n_rxq = netdev_n_rxq(netdev); + port->rxqs = xcalloc(port->n_rxq, sizeof *port->rxqs); + port->type = xstrdup(type); + + for (i = 0; i < port->n_rxq; i++) { + error = netdev_rxq_open(netdev, &port->rxqs[i], i); + if (error) { + VLOG_ERR("%s: cannot receive packets on this network device (queue %d) (%s)", + devname, i, ovs_strerror(errno)); + goto out_rxq_close; + } + n_open_rxqs++; + } + + error = netdev_turn_flags_on(netdev, NETDEV_PROMISC, &sf); + if (error) { + goto out_rxq_close; + } + port->sf = sf; + + *portp = port; + return 0; + +out_rxq_close: + for (i = 0; i < n_open_rxqs; i++) { + netdev_rxq_close(port->rxqs[i]); + } + free(port->type); + free(port->rxqs); + free(port); + +out: + netdev_close(netdev); + return error; +} + +static int +do_add_port(struct dpif_bpf_dp *dp, const char *devname, + const char *type, odp_port_t port_no) + OVS_REQUIRES(dp->port_mutex) +{ + struct dpif_bpf_port *port; + int error; + + if (!get_port_by_name(dp, devname, &port)) { + return EEXIST; + } + + error = port_create(devname, type, port_no, &port); + if (error) { + VLOG_ERR("port_create return %d", error); + return error; + } + + hmap_insert(&dp->ports_by_odp, &port->odp_node, + netdev_hash_port_no(port->port_no)); + hmap_insert(&dp->ports_by_ifindex, &port->if_node, + hash_ifindex(port->ifindex)); + seq_change(dp->port_seq); + + return 0; +} + +static int +dpif_bpf_port_add(struct dpif *dpif, struct netdev *netdev, + odp_port_t *port_nop) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif); + char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; + const char *dpif_port; + odp_port_t port_no; + int error; + + if (!strcmp(netdev_get_type(netdev), "vxlan") || + !strcmp(netdev_get_type(netdev), "gre") || + !strcmp(netdev_get_type(netdev), "geneve")) { + + VLOG_INFO("Creating %s device", netdev_get_type(netdev)); + error = dpif_netlink_rtnl_port_create(netdev); + if (error) { + if (error != EOPNOTSUPP) { + VLOG_WARN_RL(&rl, "Failed to create %s with rtnetlink: %s", + netdev_get_name(netdev), ovs_strerror(error)); + } + return error; + } + } + + ovs_mutex_lock(&dp->port_mutex); + dpif_port = netdev_vport_get_dpif_port(netdev, namebuf, sizeof namebuf); + if (*port_nop != ODPP_NONE) { + port_no = *port_nop; + error = bpf_lookup_port(dp, *port_nop) ? EBUSY : 0; + } else { + port_no = choose_port(dp); + error = port_no == ODPP_NONE ? EFBIG : 0; + } + if (error) { + goto unlock; + } + + *port_nop = port_no; + error = do_add_port(dp, dpif_port, netdev_get_type(netdev), port_no); + if (error) { + goto unlock; + } + +unlock: + ovs_mutex_unlock(&dp->port_mutex); + return error; +} + +static void +do_del_port(struct dpif_bpf_dp *dp, struct dpif_bpf_port *port) + OVS_REQUIRES(dp->port_mutex) +{ + int i, error; + + seq_change(dp->port_seq); + hmap_remove(&dp->ports_by_odp, &port->odp_node); + hmap_remove(&dp->ports_by_ifindex, &port->if_node); + + error = netdev_set_filter(port->netdev, NULL); + if (error) { + VLOG_WARN("%s: Failed to clear filter from netdev", + netdev_get_name(port->netdev)); + } + + if (netdev_support_xdp(port->netdev)) { + error = netdev_set_xdp(port->netdev, NULL); + if (error) { + VLOG_WARN("%s: Failed to clear XDP from netdev", + netdev_get_name(port->netdev)); + } + } + + netdev_close(port->netdev); + netdev_restore_flags(port->sf); + for (i = 0; i < port->n_rxq; i++) { + netdev_rxq_close(port->rxqs[i]); + } + + free(port->type); + free(port->rxqs); + free(port); +} + +static int +dpif_bpf_port_del(struct dpif *dpif, odp_port_t port_no) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif); + struct dpif_bpf_port *port; + int error = 0; + + ovs_mutex_lock(&dp->port_mutex); + port = bpf_lookup_port(dp, port_no); + if (!port) { + VLOG_WARN("deleting port %d, but it doesn't exist", port_no); + error = EINVAL; + } + ovs_mutex_unlock(&dp->port_mutex); + + return error; +} + +static void +answer_port_query(const struct dpif_bpf_port *port, + struct dpif_port *dpif_port) +{ + dpif_port->name = xstrdup(netdev_get_name(port->netdev)); + dpif_port->type = xstrdup(port->type); + dpif_port->port_no = port->port_no; +} + +static int +dpif_bpf_port_query_by_number(const struct dpif *dpif_, odp_port_t port_no, + struct dpif_port *port_) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + struct dpif_bpf_port *port; + int error = 0; + + ovs_mutex_lock(&dp->port_mutex); + port = bpf_lookup_port(dp, port_no); + if (!port) { + errno = ENOENT; + goto out; + } + answer_port_query(port, port_); + +out: + ovs_mutex_unlock(&dp->port_mutex); + return error; +} + +static int +dpif_bpf_port_query_by_name(const struct dpif *dpif_, const char *devname, + struct dpif_port *dpif_port) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + struct dpif_bpf_port *port; + int error; + + ovs_mutex_lock(&dp->port_mutex); + error = get_port_by_name(dp, devname, &port); + if (!error && dpif_port) { + answer_port_query(port, dpif_port); + } + ovs_mutex_unlock(&dp->port_mutex); + + return error; +} + +struct dpif_bpf_port_state { + struct hmap_position position; + char *name; +}; + +static int +dpif_bpf_port_dump_start(const struct dpif *dpif OVS_UNUSED, void **statep) +{ + *statep = xzalloc(sizeof(struct dpif_bpf_port_state)); + return 0; +} + +static int +dpif_bpf_port_dump_next(const struct dpif *dpif_, void *state_, + struct dpif_port *dpif_port) +{ + struct dpif_bpf_port_state *state = state_; + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + struct hmap_node *node; + int retval; + + ovs_mutex_lock(&dp->port_mutex); + node = hmap_at_position(&dp->ports_by_odp, &state->position); + if (node) { + struct dpif_bpf_port *port; + + port = CONTAINER_OF(node, struct dpif_bpf_port, odp_node); + + free(state->name); + state->name = xstrdup(netdev_get_name(port->netdev)); + dpif_port->name = state->name; + dpif_port->type = port->type; + dpif_port->port_no = port->port_no; + + retval = 0; + } else { + retval = EOF; + } + ovs_mutex_unlock(&dp->port_mutex); + + return retval; +} + +static int +dpif_bpf_port_dump_done(const struct dpif *dpif OVS_UNUSED, + void *state_) +{ + struct dpif_bpf_port_state *state = state_; + + free(state->name); + free(state); + return 0; +} + +static int +dpif_bpf_port_poll(const struct dpif *dpif_, char **devnamep OVS_UNUSED) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + uint64_t new_port_seq; + + new_port_seq = seq_read(dp->port_seq); + if (dp->last_seq != new_port_seq) { + dp->last_seq = new_port_seq; + return ENOBUFS; + } + + return EAGAIN; +} + +static void +dpif_bpf_port_poll_wait(const struct dpif *dpif_) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + + seq_wait(dp->port_seq, dp->last_seq); +} + +static int +dpif_bpf_flow_flush(struct dpif *dpif OVS_UNUSED) +{ + struct bpf_flow_key key; + int err = 0; + + /* Flow Entry Table */ + memset(&key, 0, sizeof key); + do { + err = bpf_map_get_next_key(datapath.bpf.flow_table.fd, &key, &key); + if (!err) { + bpf_map_delete_elem(datapath.bpf.flow_table.fd, &key); + } + } while (!err); + + /* Flow Stats Table */ + memset(&key, 0, sizeof key); + do { + err = bpf_map_get_next_key(datapath.bpf.dp_flow_stats.fd, &key, &key); + if (!err) { + bpf_map_delete_elem(datapath.bpf.dp_flow_stats.fd, &key); + } + } while (!err); + + + return errno == ENOENT ? 0 : errno; +} + +struct dpif_bpf_flow_dump { + struct dpif_flow_dump up; + int status; + struct bpf_flow_key pos; + struct ovs_mutex mutex; +}; + +static struct dpif_bpf_flow_dump * +dpif_bpf_flow_dump_cast(struct dpif_flow_dump *dump) +{ + return CONTAINER_OF(dump, struct dpif_bpf_flow_dump, up); +} + +static struct dpif_flow_dump * +dpif_bpf_flow_dump_create(const struct dpif *dpif_, bool terse, + char *type OVS_UNUSED) +{ + struct dpif_bpf_flow_dump *dump; + + dump = xzalloc(sizeof *dump); + dpif_flow_dump_init(&dump->up, dpif_); + dump->up.terse = terse; + ovs_mutex_init(&dump->mutex); + + return &dump->up; +} + +static int +dpif_bpf_flow_dump_destroy(struct dpif_flow_dump *dump_) +{ + struct dpif_bpf_flow_dump *dump = dpif_bpf_flow_dump_cast(dump_); + int status = dump->status; + + ovs_mutex_destroy(&dump->mutex); + free(dump); + + return status == ENOENT ? 0 : status; +} + +struct dpif_bpf_flow_dump_thread { + struct dpif_flow_dump_thread up; + struct dpif_bpf_flow_dump *dump; + struct ofpbuf buf; /* Stores key,mask,acts for a particular dump. */ +}; + +static struct dpif_bpf_flow_dump_thread * +dpif_bpf_flow_dump_thread_cast(struct dpif_flow_dump_thread *thread) +{ + return CONTAINER_OF(thread, struct dpif_bpf_flow_dump_thread, up); +} + +static struct dpif_flow_dump_thread * +dpif_bpf_flow_dump_thread_create(struct dpif_flow_dump *dump_) +{ + struct dpif_bpf_flow_dump *dump = dpif_bpf_flow_dump_cast(dump_); + struct dpif_bpf_flow_dump_thread *thread; + + thread = xmalloc(sizeof *thread); + dpif_flow_dump_thread_init(&thread->up, &dump->up); + thread->dump = dump; + ofpbuf_init(&thread->buf, 1024); + return &thread->up; +} + +static void +dpif_bpf_flow_dump_thread_destroy(struct dpif_flow_dump_thread *thread_) +{ + struct dpif_bpf_flow_dump_thread *thread = + dpif_bpf_flow_dump_thread_cast(thread_); + ofpbuf_uninit(&thread->buf); + free(thread); +} + +static int +fetch_flow(struct dpif_bpf_dp *dp, struct dpif_flow *flow, + struct ofpbuf *out, const struct bpf_flow_key *key) +{ + struct flow f; + struct odp_flow_key_parms parms = { + .flow = &f, + }; + struct bpf_action_batch action; + struct bpf_flow_stats stats; + int err; + + memset(flow, 0, sizeof *flow); + + err = bpf_map_lookup_elem(datapath.bpf.flow_table.fd, key, &action); + if (err) { + return errno; + } + + /* XXX: Extract 'dp_flow' into 'flow'. */ + if (bpf_flow_key_to_flow(key, &f) == ODP_FIT_ERROR) { + VLOG_WARN("%s: bpf flow key parsing error", __func__); + return EINVAL; + } + f.in_port.odp_port = ifindex_to_odp(dp, + odp_to_u32(f.in_port.odp_port)); + + /* Translate BPF flow into netlink format. */ + ofpbuf_clear(out); + + /* Use 'out->header' to point to the flow key, 'out->msg' for actions */ + out->header = out->data; + odp_flow_key_from_flow(&parms, out); + out->msg = ofpbuf_tail(out); + err = bpf_actions_to_odp_actions(&action, out); + if (err) { + VLOG_ERR("%s: bpf_actions to odp actions fails", __func__); + return err; + } + + flow->key = out->header; + flow->key_len = ofpbuf_headersize(out); + flow->actions = out->msg; + flow->actions_len = ofpbuf_msgsize(out); + + dpif_flow_hash(dp->dpif, flow->key, flow->key_len, &flow->ufid); + flow->ufid_present = false; /* XXX */ + + /* Fetch datapath flow stats */ + err = bpf_map_lookup_elem(datapath.bpf.dp_flow_stats.fd, key, &stats); + if (err) { + VLOG_DBG("flow stats lookup fails, fd %d err = %d %s", + datapath.bpf.dp_flow_stats.fd, err, ovs_strerror(errno)); + return errno; + } else { + VLOG_DBG("flow stats lookup OK"); + memcpy(&flow->stats, &stats, 3 * sizeof(uint64_t)); + } + + return 0; +} + +static int +dpif_bpf_insert_flow(struct bpf_flow_key *flow_key, + struct bpf_action_batch *actions) +{ + int err; + + VLOG_DBG("Insert bof_flow_key:"); + vlog_hex_dump((unsigned char *)flow_key, sizeof *flow_key); + + VLOG_DBG("Insert action:"); + vlog_hex_dump((unsigned char *)actions, sizeof actions[0]); + + ovs_assert(datapath.bpf.flow_table.fd != -1); + err = bpf_map_update_elem(datapath.bpf.flow_table.fd, + flow_key, + actions, BPF_ANY); + if (err) { + VLOG_ERR("Failed to add flow into flow table, map fd %d, error %s", + datapath.bpf.flow_table.fd, ovs_strerror(errno)); + return errno; + } + + return 0; +} + +static int +dpif_bpf_delete_flow(struct bpf_flow_key *flow_key, + struct dpif_flow_stats *stats) +{ + int err; + struct bpf_action_batch actions; + + ovs_assert(datapath.bpf.flow_table.fd != -1); + + err = bpf_map_lookup_elem(datapath.bpf.flow_table.fd, flow_key, &actions); + if (err != 0) { + VLOG_ERR("Failed to find flow into flow table, map fd %d: %s", + datapath.bpf.flow_table.fd, ovs_strerror(errno)); + VLOG_WARN("bpf_flow_key not found\n"); + vlog_hex_dump((unsigned char *)flow_key, sizeof *flow_key); + + goto delete_stats; + } + + err = bpf_map_delete_elem(datapath.bpf.flow_table.fd, flow_key); + if (err) { + VLOG_ERR("Failed to del flow into flow table, map fd %d: %s", + datapath.bpf.flow_table.fd, ovs_strerror(errno)); + return errno; + } + + if (stats) { + /* XXX: Stats */ + memset(stats, 0, sizeof *stats); + +delete_stats: + err = bpf_map_delete_elem(datapath.bpf.dp_flow_stats.fd, flow_key); + if (err) { + VLOG_ERR("Failed to del flow into flow stat table, map fd %d: %s", + datapath.bpf.flow_table.fd, ovs_strerror(errno)); + /* Skip when element is not found */ + return 0; + } + } + return 0; +} + +static int +dpif_bpf_delete_all_flow(void) +{ + int err; + struct bpf_flow_key key; + + do { + err = bpf_map_get_next_key(datapath.bpf.flow_table.fd, NULL, &key); + if (err) { + return err; + } + + err = bpf_map_delete_elem(datapath.bpf.flow_table.fd, &key); + } while (!err); + + return err; +} + +static int +dpif_bpf_flow_dump_next(struct dpif_flow_dump_thread *thread_, + struct dpif_flow *flows, int max_flows) +{ + struct dpif_bpf_flow_dump_thread *thread = + dpif_bpf_flow_dump_thread_cast(thread_); + struct dpif_bpf_flow_dump *dump = thread->dump; + int n = 0; + int err; + + ovs_mutex_lock(&dump->mutex); + err = dump->status; + if (err) { + goto unlock; + } + + while (n <= max_flows) { + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dump->up.dpif); + + err = bpf_map_get_next_key(datapath.bpf.flow_table.fd, + &dump->pos, &dump->pos); + if (err) { + err = errno; + break; + } + err = fetch_flow(dp, &flows[n], &thread->buf, &dump->pos); + if (err == ENOENT) { + /* Flow disappeared. Oh well, we tried. */ + continue; + } else if (err) { + break; + } + n++; + } + dump->status = err; +unlock: + ovs_mutex_unlock(&dump->mutex); + return n; +} + +struct dpif_bpf_downcall_parms { + uint32_t type; + odp_port_t port_no; + struct bpf_action_batch *action_batch; +}; + +static int +dpif_bpf_downcall(struct dpif *dpif_, struct dp_packet *packet, + const struct flow *flow, + struct dpif_bpf_downcall_parms *parms) +{ + struct dp_packet_batch batch; + struct bpf_downcall md = { + .type = parms->type, + .debug = 0xC0FFEEEE, + }; + uint32_t ifindex; + uint32_t flags; + int error; + int queue = 0; + struct dp_packet *clone_pkt; + + ovs_assert(datapath.bpf.execute_actions.fd != -1); + + bpf_metadata_from_flow(flow, &md.md); + + ifindex = odp_port_to_ifindex(get_dpif_bpf_dp(dpif_), + flow->in_port.odp_port, &flags); +#if 0 + /* this is ok at check_support time */ + if (!ifindex) { + VLOG_WARN("%s: in_port.odp_port %d found", + __func__, flow->in_port.odp_port); + return ENODEV; + } +#endif + + md.md.md.in_port = ifindex; + md.ifindex = ifindex; + + if (parms->action_batch) { + int zero_index = 0; + error = bpf_map_update_elem(datapath.bpf.execute_actions.fd, + &zero_index, parms->action_batch, 0); + if (error) { + VLOG_ERR("%s: map update failed", __func__); + return error; + } + } + + /* XXX: Check that ovs-system device MTU is large enough to include md. */ + dp_packet_put(packet, &md, sizeof md); + clone_pkt = dp_packet_clone(packet); + dp_packet_batch_init_packet(&batch, clone_pkt); + + VLOG_INFO("send downcall (%d)", parms->type); + error = netdev_send(datapath.outport, queue, &batch, false); + dp_packet_set_size(packet, dp_packet_size(packet) - sizeof md); + + return error; +} + +static int OVS_UNUSED +dpif_bpf_output(struct dpif *dpif_, struct dp_packet *packet, + const struct flow *flow, odp_port_t port_no, + uint32_t flags OVS_UNUSED) +{ + struct dpif_bpf_downcall_parms parms = { + .port_no = port_no, + .type = OVS_BPF_DOWNCALL_OUTPUT, + .action_batch = NULL + }; + return dpif_bpf_downcall(dpif_, packet, flow, &parms); +} + +static int +dpif_bpf_execute_(struct dpif *dpif_, struct dp_packet *packet, + const struct flow *flow, + struct bpf_action_batch *action_batch) +{ + struct dpif_bpf_downcall_parms parms = { + .type = OVS_BPF_DOWNCALL_EXECUTE, + .action_batch = action_batch, + }; + return dpif_bpf_downcall(dpif_, packet, flow, &parms); +} + +static int +dpif_bpf_serialize_actions(struct dpif_bpf_dp *dp, + struct bpf_action_batch *action_batch, + const struct nlattr *nlactions, + size_t actions_len) +{ + + const struct nlattr *a; + unsigned int left, count = 0, skipped = 0; + struct bpf_action *actions; + + memset(action_batch, 0, sizeof(*action_batch)); + actions = action_batch->actions; + + NL_ATTR_FOR_EACH_UNSAFE (a, left, nlactions, actions_len) { + enum ovs_action_attr type = nl_attr_type(a); + actions[count].type = type; + + if (type == OVS_ACTION_ATTR_OUTPUT) { + struct dpif_bpf_port *port; + odp_port_t port_no = nl_attr_get_odp_port(a); + + ovs_mutex_lock(&dp->port_mutex); + port = bpf_lookup_port(dp, port_no); + if (port) { + VLOG_INFO("output action to port %d ifindex %d", port_no, + port->ifindex); + actions[count].u.out.port = port->ifindex; + actions[count].u.out.flags = get_port_flags(port->netdev); + } + ovs_mutex_unlock(&dp->port_mutex); + } else { + if (odp_action_to_bpf_action(a, &actions[count])) { + skipped++; + } + } + count++; + } + + VLOG_INFO("Processing flow actions (%d/%d skipped)", skipped, count); + if (skipped) { + /* XXX: VLOG actions that couldn't be processed */ + } + return 0; +} + +static int +dpif_bpf_execute(struct dpif *dpif_, struct dpif_execute *execute) +{ + struct bpf_action_batch batch; + int error = 0; + + error = dpif_bpf_serialize_actions(get_dpif_bpf_dp(dpif_), &batch, execute->actions, + execute->actions_len); + if (error) { + return error; + } + + error = dpif_bpf_execute_(dpif_, execute->packet, + execute->flow, &batch); + return error; +} + +/* Translates 'port' into an ifindex and sets it inside 'key'. + * + * Returns 0 on success, or a positive errno otherwise. */ +static int +set_in_port(struct dpif_bpf_dp *dp, struct bpf_flow_key *key, odp_port_t port) +{ + uint16_t ifindex; + + ifindex = odp_port_to_ifindex(dp, port, NULL); + if (!ifindex && port) { + VLOG_WARN("Could not find ifindex corresponding to port %"PRIu32, + port); + return ENODEV; + } + + key->mds.md.in_port = ifindex; + return 0; +} + +/* Converts 'key' (of size 'key_len') into a bpf flow key in 'key_out', and + * optionally 'actions' (of size 'actions_len') into 'batch'. 'mask' (of size + * 'mask_len') may optionally be used for logging, of which the verbosity is + * controlled by 'verbose'. + * + * Returns 0 on success, or a positive errno otherwise. + */ +static int +prepare_bpf_flow__(struct dpif_bpf_dp *dp, + const struct nlattr *key, size_t key_len, + const struct nlattr *mask, size_t mask_len, + const struct nlattr *actions, size_t actions_len, + struct bpf_flow_key *key_out, struct bpf_action_batch *batch, + bool verbose) +{ + odp_port_t in_port; + int err = EINVAL; + + if (1) { + struct ds ds = DS_EMPTY_INITIALIZER; + + /* XXX: Use dpif_format_flow()? */ + odp_flow_format(key, key_len, mask, mask_len, NULL, &ds, true); + ds_put_cstr(&ds, ", actions="); + format_odp_actions(&ds, actions, actions_len, NULL); + VLOG_WARN("Translating odp key to bpf key:\n%s", ds_cstr(&ds)); + ds_destroy(&ds); + } + + memset(key_out, 0, sizeof *key_out); + if (odp_key_to_bpf_flow_key(key, key_len, key_out, + &in_port, false, verbose)) { + if (verbose) { + struct ds ds = DS_EMPTY_INITIALIZER; + + /* XXX: Use dpif_format_flow()? */ + odp_flow_format(key, key_len, mask, mask_len, NULL, &ds, + true); + VLOG_WARN("Failed to translate odp key to bpf key:\n%s", + ds_cstr(&ds)); + ds_destroy(&ds); + } + return err; + } + + err = set_in_port(dp, key_out, in_port); + if (err) { + return err; + } + if (batch) { + err = dpif_bpf_serialize_actions(dp, batch, actions, actions_len); + if (err) { + return err; + } + } + + /* Transfer back to flow to check if everything is good */ + if (1) { + struct flow flow; + enum odp_key_fitness res; + + res = bpf_flow_key_to_flow(key_out, &flow); + if (res != ODP_FIT_PERFECT) { + VLOG_ERR("transfer bpf key back to flow failed"); + } else { + struct ds ds = DS_EMPTY_INITIALIZER; + + flow_format(&ds, &flow, NULL); + ds_put_cstr(&ds, ", actions="); + format_odp_actions(&ds, actions, actions_len, NULL); + VLOG_WARN("Translating back:\n%s", ds_cstr(&ds)); + ds_destroy(&ds); + } + } + + return 0; +} + +static int +prepare_bpf_flow(struct dpif_bpf_dp *dp, const struct nlattr *key, + size_t key_len, struct bpf_flow_key *key_out, bool verbose) +{ + return prepare_bpf_flow__(dp, key, key_len, NULL, 0, NULL, 0, key_out, + NULL, verbose); +} + +static void +dpif_bpf_operate(struct dpif *dpif_, struct dpif_op **ops, size_t n_ops) +{ + struct dpif_bpf_dp *dp = get_dpif_bpf_dp(dpif_); + + for (int i = 0; i < n_ops; i++) { + struct dpif_op *op = ops[i]; + struct dpif_flow_del *del OVS_UNUSED; + struct dpif_flow_get *get OVS_UNUSED; + + switch (op->type) { + case DPIF_OP_EXECUTE: + op->error = dpif_bpf_execute(dpif_, &op->u.execute); + break; + case DPIF_OP_FLOW_PUT: { + struct dpif_flow_put *put = &op->u.flow_put; + bool verbose = !(put->flags & DPIF_FP_PROBE); + struct bpf_action_batch action_batch; + struct bpf_flow_key key; + int err; + + err = prepare_bpf_flow__(dp, put->key, put->key_len, + put->mask, put->mask_len, + put->actions, put->actions_len, + &key, &action_batch, verbose); + if (!err) { + err = dpif_bpf_insert_flow(&key, &action_batch); + } + op->error = err; + break; + } + case DPIF_OP_FLOW_GET: { + struct dpif_flow_get *get = &op->u.flow_get; + struct bpf_flow_key key; + int err; + + err = prepare_bpf_flow(dp, get->key, get->key_len, &key, true); + if (!err) { + err = fetch_flow(dp, get->flow, get->buffer, &key); + } + op->error = err; + break; + } + case DPIF_OP_FLOW_DEL: { + struct dpif_flow_del *del = &op->u.flow_del; + struct bpf_flow_key key; + int err; + + err = prepare_bpf_flow(dp, del->key, del->key_len, &key, true); + if (!err) { + err = dpif_bpf_delete_flow(&key, del->stats); + } + op->error = err; + break; + } + default: + OVS_NOT_REACHED(); + } + } +} + +static int +dpif_bpf_recv_set(struct dpif *dpif_, bool enable) +{ + struct dpif_bpf_dp *dpif = get_dpif_bpf_dp(dpif_); + int stored_error = 0; + + for (int i = 0; i < dpif->n_channels; i++) { + int error = perf_channel_set(&dpif->channels[i], enable); + if (error) { + VLOG_ERR("failed to set recv_set %s (%s)", + enable ? "true": "false", ovs_strerror(error)); + stored_error = error; + } + } + + return stored_error; +} + +static int +dpif_bpf_handlers_set__(struct dpif_bpf_dp *dp, uint32_t n_handlers) + OVS_REQUIRES(&dp->upcall_lock) +{ + struct bpf_handler prev; + int i, extra; + + memset(&prev, 0, sizeof prev); + if (dp->n_handlers) { + free(dp->handlers); + dp->handlers = NULL; + dp->n_handlers = 0; + } + + if (!n_handlers) { + return 0; + } + + dp->handlers = xzalloc(sizeof *dp->handlers * n_handlers); + for (i = 0; i < n_handlers; i++) { + struct bpf_handler *curr = dp->handlers + i; + + if (i > dp->n_channels) { + VLOG_INFO("Ignoring extraneous handlers (%d for %d channels)", + n_handlers, dp->n_channels); + break; + } + + curr->offset = prev.offset + prev.count; + curr->count = dp->n_channels / n_handlers; + prev = *curr; + } + extra = dp->n_channels % n_handlers; + if (extra) { + VLOG_INFO("Extra %d channels; distributing across handlers", extra); + for (i = 0; i < extra; i++) { + struct bpf_handler *curr = dp->handlers + n_handlers - i - 1; + + curr->offset = curr->offset + extra - i - 1; + curr->count++; + } + } + + dp->n_handlers = n_handlers; + return 0; +} + +static int +dpif_bpf_handlers_set(struct dpif *dpif_, uint32_t n_handlers) +{ + struct dpif_bpf_dp *dpif = get_dpif_bpf_dp(dpif_); + int error; + + fat_rwlock_wrlock(&dpif->upcall_lock); + error = dpif_bpf_handlers_set__(dpif, n_handlers); + fat_rwlock_unlock(&dpif->upcall_lock); + + return error; +} + +/* XXX: duplicate with check_support */ +static struct odp_support dp_bpf_support = { + .max_vlan_headers = 2, + .max_mpls_depth = 2, + .recirc = true, + .ct_state = true, + .ct_zone = true, + .ct_mark = true, + .ct_label = true, + .ct_state_nat = true, + .ct_orig_tuple = true, + .ct_orig_tuple6 = true, +}; + +static int +extract_key(struct dpif_bpf_dp *dpif, const struct bpf_flow_key *key, + struct dp_packet *packet, struct ofpbuf *buf) +{ + struct flow flow; + struct odp_flow_key_parms parms = { + .flow = &flow, + .mask = NULL, + .support = dp_bpf_support, /* used at odp_flow_key_from_flow */ + }; + + { + struct ds ds = DS_EMPTY_INITIALIZER; + + bpf_flow_key_format(&ds, key); + VLOG_INFO("bpf_flow_key_format\n%s", ds_cstr(&ds)); + ds_destroy(&ds); + } + + /* This function goes first because it zeros out flow. */ + flow_extract(packet, &flow); + + bpf_flow_key_extract_metadata(key, &flow); + + VLOG_INFO("packet.md.port = %d", packet->md.in_port.odp_port); + + if (flow.in_port.odp_port != 0) { + flow.in_port.odp_port = ifindex_to_odp(dpif, + odp_to_u32(flow.in_port.odp_port)); + } else { + flow.in_port.odp_port = packet->md.in_port.odp_port; + } + VLOG_INFO("flow.in_port.odp_port %d", flow.in_port.odp_port); + + if (1) { + struct ds ds = DS_EMPTY_INITIALIZER; + + flow_format(&ds, &flow, NULL); + VLOG_WARN("Upcall flow:\n%s", + ds_cstr(&ds)); + ds_destroy(&ds); + + } + + odp_flow_key_from_flow(&parms, buf); + + return 0; +} + +struct ovs_ebpf_event { + struct perf_event_raw sample; + struct bpf_upcall header; + uint8_t data[]; +}; + +static void OVS_UNUSED +dpif_bpf_flow_dump_all(struct dpif_bpf_dp *dp OVS_UNUSED) +{ + struct dpif_bpf_flow_dump dump; + int err; + + memset(&dump, 0, sizeof dump); + while (1) { + err = bpf_map_get_next_key(datapath.bpf.flow_table.fd, + &dump.pos, &dump.pos); + if (err) { + VLOG_INFO("err is %d", err); + break; + } + vlog_hex_dump((unsigned char *)&dump.pos, sizeof dump.pos); + } +} + +/* perf_channel_read() fills the first part of 'buffer' with the full event. + * Here, the key will be extracted immediately following it, and 'upcall' + * will be initialized to point within 'buffer'. + */ +static int +perf_sample_to_upcall__(struct dpif_bpf_dp *dp, struct ovs_ebpf_event *e, + struct dpif_upcall *upcall, struct ofpbuf *buffer) +{ + size_t sample_len = e->sample.size - sizeof e->header; + size_t pkt_len = e->header.skb_len; + size_t pre_key_len; + odp_port_t port_no; + int err; + + if (pkt_len < ETH_HEADER_LEN) { + VLOG_WARN_RL(&rl, "Unexpectedly short packet (%"PRIuSIZE")", pkt_len); + return EINVAL; + } + if (e->sample.size - sizeof e->header < pkt_len) { + VLOG_WARN_RL(&rl, + "Packet longer than sample (pkt=%"PRIuSIZE", sample=%"PRIuSIZE")", + pkt_len, sample_len); + return EINVAL; + } + + port_no = ifindex_to_odp(dp, e->header.ifindex); + VLOG_INFO("ifindex %d odp %d", e->header.ifindex, port_no); + if (port_no == ODPP_NONE) { + VLOG_WARN_RL(&rl, "failed to map upcall ifindex=%d to odp", + e->header.ifindex); + return EINVAL; + } + + memset(upcall, 0, sizeof *upcall); + + /* Use buffer->header to point to the packet, and buffer->msg to point to + * the extracted flow key. Therefore, when extract_key() reallocates + * 'buffer', we can easily get pointers back to the packet and start of + * extracted key. */ + buffer->header = e->data; + buffer->msg = ofpbuf_tail(buffer); + pre_key_len = buffer->size; + + VLOG_INFO("upcall key hex\n"); + vlog_hex_dump((unsigned char *)&e->header.key, sizeof e->header.key); + //VLOG_INFO("list of bpf keys\n"); + //dpif_bpf_flow_dump_all(dp); + VLOG_INFO("raw packet data in e->data"); + vlog_hex_dump(e->data, MIN(pkt_len, 100)); + + dp_packet_use_stub(&upcall->packet, e->data, pkt_len); + dp_packet_set_size(&upcall->packet, pkt_len); + pkt_metadata_init(&upcall->packet.md, port_no); + + err = extract_key(dp, &e->header.key, &upcall->packet, buffer); + if (err) { + return err; + } + + upcall->key = buffer->msg; + upcall->key_len = buffer->size - pre_key_len; + dpif_flow_hash(dp->dpif, upcall->key, upcall->key_len, &upcall->ufid); + + return 0; +} + +/* perf_channel_read() fills the first part of 'buffer' with the full event. + * Here, the key will be extracted immediately following it, and 'upcall' + * will be initialized to point within 'buffer'. + */ +static int +perf_sample_to_upcall_miss(struct dpif_bpf_dp *dp, struct ovs_ebpf_event *e, + struct dpif_upcall *upcall, struct ofpbuf *buffer) +{ + int err; + + err = perf_sample_to_upcall__(dp, e, upcall, buffer); + if (err) { + return err; + } + + ofpbuf_prealloc_tailroom(buffer, sizeof(struct bpf_downcall)); + upcall->type = DPIF_UC_MISS; + + return 0; +} + +/* Modified from perf_sample_to_upcall. + */ +static int +perf_sample_to_upcall_userspace(struct dpif_bpf_dp *dp, struct ovs_ebpf_event *e, + struct dpif_upcall *upcall, + struct ofpbuf *buffer) +{ + const struct nlattr *actions = (struct nlattr *)e->header.uactions; + const struct nlattr *a; + unsigned int left; + int err; + + err = perf_sample_to_upcall__(dp, e, upcall, buffer); + if (err) { + return err; + } + + NL_ATTR_FOR_EACH_UNSAFE (a, left, actions, e->header.uactions_len) { + switch (nl_attr_type(a)) { + case OVS_USERSPACE_ATTR_PID: + //nl_attr_get_u32(a); + break; + case OVS_USERSPACE_ATTR_USERDATA: + upcall->userdata = CONST_CAST(struct nlattr *, a); + break; + default: + VLOG_INFO("%s unsupported userspace action. %d", + __func__, nl_attr_type(a)); + return EOPNOTSUPP; + } + } + + upcall->type = DPIF_UC_ACTION; + return 0; +} + +static void +bpf_debug_print(int subtype, int error) +{ + int level = error ? VLL_WARN : VLL_DBG; + struct ds ds = DS_EMPTY_INITIALIZER; + + if (subtype >= 0 && subtype < ARRAY_SIZE(bpf_upcall_subtypes)) { + ds_put_cstr(&ds, bpf_upcall_subtypes[subtype]); + } else { + ds_put_format(&ds, "Unknown subtype %d", subtype); + } + ds_put_format(&ds, " reports: %s", ovs_strerror(error)); + + VLOG_RL(&rl, level, "%s", ds_cstr(&ds)); + ds_destroy(&ds); +} + +static int +recv_perf_sample(struct dpif_bpf_dp *dpif, struct ovs_ebpf_event *e, + struct dpif_upcall *upcall, struct ofpbuf *buffer) +{ + if (e->sample.header.size < sizeof *e + || e->sample.size < sizeof e->header) { + VLOG_WARN_RL(&rl, "Unexpectedly short sample (%"PRIu32")", + e->sample.size); + return EINVAL; + } + + VLOG_INFO("\nreceived upcall %d", e->header.type); + + switch (e->header.type) { + case OVS_UPCALL_MISS: + return perf_sample_to_upcall_miss(dpif, e, upcall, buffer); + break; + case OVS_UPCALL_DEBUG: + bpf_debug_print(e->header.subtype, e->header.error); + return EAGAIN; + case OVS_UPCALL_ACTION: + return perf_sample_to_upcall_userspace(dpif, e, upcall, buffer); + break; + default: + break; + } + + VLOG_WARN_RL(&rl, "Unfamiliar upcall type %d", e->header.type); + return EINVAL; +} + +static int +dpif_bpf_recv(struct dpif *dpif_, uint32_t handler_id, + struct dpif_upcall *upcall, struct ofpbuf *buffer) +{ + struct dpif_bpf_dp *dpif = get_dpif_bpf_dp(dpif_); + struct bpf_handler *handler; + int error = EAGAIN; + int i; + + fat_rwlock_rdlock(&dpif->upcall_lock); + handler = dpif->handlers + handler_id; + for (i = 0; i < handler->count; i++) { + int channel_idx = (handler->index + i) % handler->count; + struct perf_channel *channel; + + channel = &dpif->channels[handler->offset + channel_idx]; + error = perf_channel_read(channel, buffer); + if (!error) { + error = recv_perf_sample(dpif, buffer->header, upcall, buffer); + } + if (error != EAGAIN) { + break; + } + } + handler->index = (handler->index + 1) % handler->count; + fat_rwlock_unlock(&dpif->upcall_lock); + + return error; +} + +static char * +dpif_bpf_get_datapath_version(void) +{ + return xstrdup(""); +} + +static void +dpif_bpf_recv_wait(struct dpif *dpif_, uint32_t handler_id) +{ + struct dpif_bpf_dp *dpif = get_dpif_bpf_dp(dpif_); + struct bpf_handler *handler; + int i; + + fat_rwlock_rdlock(&dpif->upcall_lock); + handler = dpif->handlers + handler_id; + for (i = 0; i < handler->count; i++) { + poll_fd_wait(dpif->channels[handler->offset + i].fd, POLLIN); + } + fat_rwlock_unlock(&dpif->upcall_lock); +} + +static void +dpif_bpf_recv_purge(struct dpif *dpif_) +{ + struct dpif_bpf_dp *dpif = get_dpif_bpf_dp(dpif_); + int i; + + fat_rwlock_rdlock(&dpif->upcall_lock); + for (i = 0; i < dpif->n_channels; i++) { + struct perf_channel *channel = &dpif->channels[i]; + + perf_channel_flush(channel); + } + fat_rwlock_unlock(&dpif->upcall_lock); +} + +const struct dpif_class dpif_bpf_class = { + "bpf", + dpif_bpf_init, + dpif_bpf_enumerate, + dpif_bpf_port_open_type, + dpif_bpf_open, + dpif_bpf_close, + dpif_bpf_destroy, + NULL, /* run */ + NULL, /* wait */ + dpif_bpf_get_stats, + dpif_bpf_port_add, + dpif_bpf_port_del, + NULL, /* port_set_config */ + dpif_bpf_port_query_by_number, + dpif_bpf_port_query_by_name, + NULL, /* port_get_pid */ + dpif_bpf_port_dump_start, + dpif_bpf_port_dump_next, + dpif_bpf_port_dump_done, + dpif_bpf_port_poll, + dpif_bpf_port_poll_wait, + dpif_bpf_flow_flush, + dpif_bpf_flow_dump_create, + dpif_bpf_flow_dump_destroy, + dpif_bpf_flow_dump_thread_create, + dpif_bpf_flow_dump_thread_destroy, + dpif_bpf_flow_dump_next, + dpif_bpf_operate, + dpif_bpf_recv_set, + dpif_bpf_handlers_set, + NULL, /* set_config */ + NULL, /* queue_to_priority */ + dpif_bpf_recv, + dpif_bpf_recv_wait, + dpif_bpf_recv_purge, + NULL, /* register_dp_purge_cb */ + NULL, /* register_upcall_cb */ + NULL, /* enable_upcall */ + NULL, /* disable_upcall */ + dpif_bpf_get_datapath_version, + NULL, /* ct_dump_start */ + NULL, /* ct_dump_next */ + NULL, /* ct_dump_done */ + NULL, /* ct_flush */ + NULL, /* ct_set_maxconns */ + NULL, /* ct_get_maxconns */ + NULL, /* ct_get_nconns */ + NULL, /* meter_get_features */ + NULL, /* meter_set */ + NULL, /* meter_get */ + NULL, /* meter_del */ +}; diff --git a/lib/dpif-provider.h b/lib/dpif-provider.h index 62b3598acfc5..ae21593ab1b2 100644 --- a/lib/dpif-provider.h +++ b/lib/dpif-provider.h @@ -476,6 +476,7 @@ struct dpif_class { extern const struct dpif_class dpif_netlink_class; extern const struct dpif_class dpif_netdev_class; +extern const struct dpif_class dpif_bpf_class; #ifdef __cplusplus } diff --git a/lib/dpif.c b/lib/dpif.c index f03763ec55b4..43d97ec1582a 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -71,6 +71,9 @@ static const struct dpif_class *base_dpif_classes[] = { #if defined(__linux__) || defined(_WIN32) &dpif_netlink_class, #endif +#if HAVE_BPF /* XXX: Linux 4.9+ */ + &dpif_bpf_class, +#endif &dpif_netdev_class, }; From patchwork Sat Jul 14 11:38:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943919 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="qvPmtD3R"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSVm6jlZz9ryt for ; Sat, 14 Jul 2018 21:43:44 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6B047CBF; Sat, 14 Jul 2018 11:40:04 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 737D5CB6 for ; Sat, 14 Jul 2018 11:40:02 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl0-f65.google.com (mail-pl0-f65.google.com [209.85.160.65]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id A42AA67E for ; Sat, 14 Jul 2018 11:40:00 +0000 (UTC) Received: by mail-pl0-f65.google.com with SMTP id b1-v6so13293340pls.5 for ; Sat, 14 Jul 2018 04:40:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=TRgBe7cuUWaSEkGjtyLQ/Tm4+/cPbdrqNneV9wvkowk=; b=qvPmtD3Rwpi4qgdwJqgZOsoG9lf5iDufSc9g2Wd65mee5ap+i8G9AaaRVwLCFS9Xe4 HdfuivOnX1oJ1/rVXeCqESSv0p4v0PGZbFAPydG58e5ooOchGvRtRTd1j1Rhixx32BSA JXqoacTLuApZqDkOImELY2ft827MzswcdKj2p9HNMKwH8Rcms+sGDJFWXtGXZ/d6ltsv e0CbALyjvo2cR3tLqA2uWYSKl6egVp0Qm9DZ024ABrFyDJHEDFVxzNJN8mX3WmmMYDel CHHbMYKjR4aeUDRRoBIzwsCk9bQeFE40u2dXilKR26DZ/vALnp2LQpUnEdbeadftrsmU WlpQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=TRgBe7cuUWaSEkGjtyLQ/Tm4+/cPbdrqNneV9wvkowk=; b=atpZPxBGPo+3jps0t1buTgqYUOXBBFgDGXp7SjqUqj8QQOeJRf5bER1KAshZc65QEB TtpeVZcpPCzaEe8tsH5+kjq/AHoF8gdPmM5bpifyTdIw+NyB0U51XK2XZN5m3dA/md0v yfa2hGxDiwpjRw2On48BD1et43+ve5ztia+HAwWGu1jl4Ff7BdmlPCZWBMDdqgfFW9G4 vcAytDlKYr1SkGgodb7+hBBBticOX8aKxfRMMNaMfhK7yB2yX0a35seB50C28a9ZmTEZ lZQNejvMWjPYxi1LVvWogLUaqWirKavKcpHeyCS7hGdGrkuzcvW4cuW1yUDUWoc7AwuJ 0Gkg== X-Gm-Message-State: AOUpUlEBbPr7ghv3QqKhn3WcFPx4nZuTsf3aIaBSKqbUEqZ8W52seaM8 WPbQnM3uAFiqtHHOYZJdxg6NEPUi X-Google-Smtp-Source: AAOMgpej/0681QSE2sBXg2nWPy580J45hbSKGz5vfO/Mbk4G/BmSpNn+W8F/7hwszNiyMrxW1ZdENg== X-Received: by 2002:a17:902:7688:: with SMTP id m8-v6mr9800367pll.338.1531568399657; Sat, 14 Jul 2018 04:39:59 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:39:59 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:58 -0700 Message-Id: <1531568345-80246-7-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 06/13] dpif-bpf-odp: Add bpf datapath interface and impl. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer Add an implementation of the API between the userspace "Open vSwitch Datapath Protocol" and the BPF datapath. Signed-off-by: Joe Stringer Signed-off-by: William Tu Signed-off-by: Yifeng Sun Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- lib/automake.mk | 12 + lib/dpif-bpf-odp.c | 945 +++++++++++++++++++++++++++++++++++++++++++++++++++++ lib/dpif-bpf-odp.h | 47 +++ 3 files changed, 1004 insertions(+) create mode 100644 lib/dpif-bpf-odp.c create mode 100644 lib/dpif-bpf-odp.h diff --git a/lib/automake.mk b/lib/automake.mk index 8ecad12415a3..61fef23152d3 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -9,6 +9,7 @@ lib_LTLIBRARIES += lib/libopenvswitch.la lib_libopenvswitch_la_LIBADD = $(SSL_LIBS) lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD) +lib_libopenvswitch_la_LIBADD += $(BPF_LDADD) if WIN32 lib_libopenvswitch_la_LIBADD += ${PTHREAD_LIBS} @@ -358,6 +359,7 @@ endif if LINUX lib_libopenvswitch_la_SOURCES += \ + lib/bpf.h \ lib/dpif-netlink.c \ lib/dpif-netlink.h \ lib/dpif-netlink-rtnl.c \ @@ -383,6 +385,16 @@ lib_libopenvswitch_la_SOURCES += \ lib/tc.h endif +if HAVE_BPF +lib_libopenvswitch_la_SOURCES += \ + lib/bpf.c \ + lib/dpif-bpf.c \ + lib/dpif-bpf-odp.c \ + lib/dpif-bpf-odp.h \ + lib/perf-event.c \ + lib/perf-event.h +endif + if DPDK_NETDEV lib_libopenvswitch_la_SOURCES += \ lib/dpdk.c \ diff --git a/lib/dpif-bpf-odp.c b/lib/dpif-bpf-odp.c new file mode 100644 index 000000000000..cc2b2f32224b --- /dev/null +++ b/lib/dpif-bpf-odp.c @@ -0,0 +1,945 @@ +/* + * Copyright (c) 2017 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include "dpif-bpf-odp.h" + +#include + +#include "bpf/odp-bpf.h" +#include "openvswitch/flow.h" +#include "openvswitch/vlog.h" +#include "netlink.h" +#include "util.h" + +VLOG_DEFINE_THIS_MODULE(dpif_bpf_odp); + +static void +ct_action_to_bpf(const struct nlattr *ct, struct bpf_action *dst) +{ + const struct nlattr *nla; + int left; + + NL_ATTR_FOR_EACH_UNSAFE(nla, left, ct, ct->nla_len) { + switch ((enum ovs_ct_attr)nla->nla_type) { + case OVS_CT_ATTR_COMMIT: + dst->u.ct.commit = true; + break; + case OVS_CT_ATTR_ZONE: + case OVS_CT_ATTR_MARK: + case OVS_CT_ATTR_LABELS: + case OVS_CT_ATTR_HELPER: + case OVS_CT_ATTR_NAT: + case OVS_CT_ATTR_FORCE_COMMIT: + case OVS_CT_ATTR_EVENTMASK: + default: + VLOG_INFO("Ignoring CT attribute %d", nla->nla_type); + break; + case OVS_CT_ATTR_UNSPEC: + case __OVS_CT_ATTR_MAX: + OVS_NOT_REACHED(); + } + } +} + +enum odp_key_fitness +odp_tun_to_bpf_tun(const struct nlattr *nla, size_t nla_len, + struct flow_tnl_t *tun) +{ + const struct nlattr *a; + size_t left; + + NL_ATTR_FOR_EACH(a, left, nla, nla_len) { + enum ovs_tunnel_key_attr type = nl_attr_type(a); + + switch (type) { + case OVS_TUNNEL_KEY_ATTR_ID: + tun->tun_id = ntohl(be64_to_be32(nl_attr_get_be64(a))); + break; + case OVS_TUNNEL_KEY_ATTR_IPV4_SRC: + tun->ip4.ip_src = ntohl(nl_attr_get_be32(a)); + tun->use_ipv6 = 0; + break; + case OVS_TUNNEL_KEY_ATTR_IPV4_DST: + tun->ip4.ip_dst = ntohl(nl_attr_get_be32(a)); + tun->use_ipv6 = 0; + break; + case OVS_TUNNEL_KEY_ATTR_TOS: + tun->ip_tos = nl_attr_get_u8(a); + break; + case OVS_TUNNEL_KEY_ATTR_TTL: + tun->ip_ttl = nl_attr_get_u8(a); + break; + case OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT: + //tun->flags |= FLOW_TNL_F_DONT_FRAGMENT; + // in bpf helper, there is no tun_flags extracted + break; + case OVS_TUNNEL_KEY_ATTR_TP_DST: + tun->tp_dst = nl_attr_get_be16(a); + break; + case OVS_TUNNEL_KEY_ATTR_TP_SRC: + tun->tp_src = nl_attr_get_be16(a); + break; + case OVS_TUNNEL_KEY_ATTR_IPV6_SRC: +#ifdef BPF_ENABLE_IPV6 + memcpy(&tun->ip6.ipv6_src, nl_attr_get(a), 16); + tun->use_ipv6 = 1; +#endif + break; + case OVS_TUNNEL_KEY_ATTR_IPV6_DST: +#ifdef BPF_ENABLE_IPV6 + memcpy(&tun->ip6.ipv6_dst, nl_attr_get(a), 16); + tun->use_ipv6 = 1; +#endif + break; + case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS: /* Array of Geneve options. */ + if (nl_attr_get_size(a) != sizeof tun->gnvopt) { + VLOG_ERR("%s: geneve opts size is %ld, expect %ld", __func__, + nl_attr_get_size(a), sizeof tun->gnvopt); + } else { + memcpy(&tun->gnvopt, nl_attr_get(a), sizeof tun->gnvopt); + tun->gnvopt_valid = 1; + } + break; + case OVS_TUNNEL_KEY_ATTR_CSUM: /* No argument. CSUM packet. */ + case OVS_TUNNEL_KEY_ATTR_OAM: /* No argument. OAM frame. */ + case OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS: /* Nested OVS_VXLAN_EXT_* */ + case OVS_TUNNEL_KEY_ATTR_PAD: + case __OVS_TUNNEL_KEY_ATTR_MAX: + VLOG_INFO("%s: unknown type %d", __func__, type); + break; + default: + VLOG_INFO("%s: unknown type %d", __func__, type); + OVS_NOT_REACHED(); + } + } + + return ODP_FIT_PERFECT; +} + +/* Converts the OVS netlink-formatted action 'src' into a BPF action in 'dst'. + * + * Returns 0 on success, or a positive errno value on failure. + */ +int +odp_action_to_bpf_action(const struct nlattr *src, struct bpf_action *dst) +{ + enum ovs_action_attr type = nl_attr_type(src); + + switch (type) { + case OVS_ACTION_ATTR_PUSH_VLAN: { + const struct ovs_action_push_vlan *vlan = nl_attr_get(src); + dst->u.push_vlan = *vlan; + VLOG_DBG("push vlan tpid %x tci %x", vlan->vlan_tpid, vlan->vlan_tci); + break; + } + case OVS_ACTION_ATTR_CT: + ct_action_to_bpf(nl_attr_get(src), dst); + break; + case OVS_ACTION_ATTR_RECIRC: + dst->u.recirc_id = nl_attr_get_u32(src); + break; + case OVS_ACTION_ATTR_SAMPLE: + // XXX: ignore + return 1; + case OVS_ACTION_ATTR_USERSPACE: + if (nl_attr_get_size(src) <= sizeof dst->u.userspace.nlattr_data) { + size_t len = nl_attr_get_size(src); + memcpy(dst->u.userspace.nlattr_data, nl_attr_get(src), len); + dst->u.userspace.nlattr_len = len; + VLOG_INFO("size of userspace action is %ld", len); + } else { + VLOG_WARN("Size of userspace action too large: %ld > %ld", + nl_attr_get_size(src), + sizeof dst->u.userspace.nlattr_data); + return EOPNOTSUPP; + } + break; + case OVS_ACTION_ATTR_HASH: { + const struct ovs_action_hash *hash_act = nl_attr_get(src); + dst->u.hash = *hash_act; + break; + } + case OVS_ACTION_ATTR_SET: + case OVS_ACTION_ATTR_SET_MASKED: { + const struct nlattr *a; + + dst->is_set_tunnel = 0; + a = nl_attr_get(src); + dst->u.mset.key_type = nl_attr_type(a); + + switch (nl_attr_type(a)) { + case OVS_KEY_ATTR_TUNNEL: { + enum odp_key_fitness ret; + struct flow_tnl_t tunnel; + + dst->is_set_tunnel = 1; + tunnel.tun_id = 0; + ret = odp_tun_to_bpf_tun(nl_attr_get(a), nl_attr_get_size(a), + &tunnel); + if (ret != ODP_FIT_PERFECT) { + return EOPNOTSUPP; + } + + dst->u.tunnel.tunnel_id = tunnel.tun_id; + if (!tunnel.use_ipv6) + dst->u.tunnel.remote_ipv4 = tunnel.ip4.ip_dst; +#ifdef BPF_ENABLE_IPV6 + else + memcpy(dst->u.tunnel.remote_ipv6, tunnel.ip6.ipv6_dst, 16); +#endif + dst->u.tunnel.tunnel_tos = tunnel.ip_tos; + dst->u.tunnel.tunnel_ttl = tunnel.ip_ttl; + dst->u.tunnel.use_ipv6 = tunnel.use_ipv6; + + if (tunnel.gnvopt_valid) { + dst->u.tunnel.gnvopt = tunnel.gnvopt; + dst->u.tunnel.gnvopt_valid = 1; + } + break; + } + case OVS_KEY_ATTR_ETHERNET: { + struct ovs_key_ethernet *ether; + + //ovs_assert(nl_attr_get_size(a) == 2 * sizeof *ether); + + ether = &dst->u.mset.key.ether; + memcpy(ether, nl_attr_get(a), sizeof *ether); + break; + } + case OVS_KEY_ATTR_IPV4: { + struct ovs_key_ipv4 *ip; + + //ovs_assert(nl_attr_get_size(a) == 2 * sizeof *ip); + + ip = &dst->u.mset.key.ipv4; + memcpy(ip, nl_attr_get(a), sizeof *ip); + break; + } + default: + VLOG_INFO("%s: set/set_mask %d is not supported", __func__, + nl_attr_type(a)); + return EOPNOTSUPP; + } + break; + } + case OVS_ACTION_ATTR_TRUNC: { + const struct ovs_action_trunc *trunc = nl_attr_get(src); + dst->u.trunc = *trunc; + VLOG_INFO("truncate to %d byte", trunc->max_len); + break; + } + case OVS_ACTION_ATTR_POP_VLAN: + case OVS_ACTION_ATTR_PUSH_MPLS: + case OVS_ACTION_ATTR_POP_MPLS: + case OVS_ACTION_ATTR_PUSH_ETH: + case OVS_ACTION_ATTR_POP_ETH: + case OVS_ACTION_ATTR_TUNNEL_PUSH: + case OVS_ACTION_ATTR_TUNNEL_POP: + case OVS_ACTION_ATTR_CLONE: + case OVS_ACTION_ATTR_METER: + case OVS_ACTION_ATTR_CT_CLEAR: + case OVS_ACTION_ATTR_PUSH_NSH: + case OVS_ACTION_ATTR_POP_NSH: + VLOG_WARN("Unsupported action type %d", nl_attr_type(src)); + return EOPNOTSUPP; + case OVS_ACTION_ATTR_UNSPEC: + case OVS_ACTION_ATTR_OUTPUT: + case __OVS_ACTION_ATTR_MAX: + OVS_NOT_REACHED(); + } + + return 0; +} + +int +bpf_actions_to_odp_actions(struct bpf_action_batch *batch, struct ofpbuf *out) +{ + int i; + + for (i = 0; i < BPF_DP_MAX_ACTION; i++) { + struct bpf_action *act = &batch->actions[i]; + enum ovs_action_attr type = act->type; + + switch (type) { + case OVS_ACTION_ATTR_UNSPEC: + /* End of actions list. */ + return 0; + + case OVS_ACTION_ATTR_OUTPUT: { + /* XXX: ifindex to odp translation */ + nl_msg_put_u32(out, type, act->u.out.port); + break; + } + case OVS_ACTION_ATTR_PUSH_VLAN: { + nl_msg_put_unspec(out, type, &act->u.push_vlan, + sizeof act->u.push_vlan); + break; + } + case OVS_ACTION_ATTR_RECIRC: + nl_msg_put_u32(out, type, act->u.recirc_id); + break; + case OVS_ACTION_ATTR_TRUNC: + nl_msg_put_unspec(out, type, &act->u.trunc, sizeof act->u.trunc); + break; + case OVS_ACTION_ATTR_HASH: + nl_msg_put_unspec(out, type, &act->u.hash, sizeof act->u.hash); + break; + case OVS_ACTION_ATTR_PUSH_MPLS: + nl_msg_put_unspec(out, type, &act->u.mpls, sizeof act->u.mpls); + break; + case OVS_ACTION_ATTR_POP_MPLS: + nl_msg_put_be16(out, type, act->u.ethertype); + break; + case OVS_ACTION_ATTR_SAMPLE: { + VLOG_WARN("XXX FIXME attr sample"); + break; + } + case OVS_ACTION_ATTR_SET: { + // see parse_tc_flower_to_match + size_t start_ofs; + size_t tun_key_ofs; + struct ovs_action_set_tunnel *tun; + + tun = &act->u.tunnel; + start_ofs = nl_msg_start_nested(out, OVS_ACTION_ATTR_SET); + tun_key_ofs = nl_msg_start_nested(out, OVS_KEY_ATTR_TUNNEL); + + nl_msg_put_be64(out, OVS_TUNNEL_KEY_ATTR_ID, + be32_to_be64(htonl(tun->tunnel_id))); + + if (!tun->use_ipv6) { + if (tun->remote_ipv4) { + nl_msg_put_be32(out, OVS_TUNNEL_KEY_ATTR_IPV4_DST, + htonl(tun->remote_ipv4)); + } +#ifdef BPF_ENABLE_IPV6 + } else { + if (ipv6_addr_is_set((const struct in6_addr *)&tun->remote_ipv6)) { + nl_msg_put_in6_addr(out, OVS_TUNNEL_KEY_ATTR_IPV6_DST, + (const struct in6_addr *)&tun->remote_ipv6); + } +#endif + } + +#if 0 + if (!tnl_type || !strcmp(tnl_type, "geneve")) { + tun_metadata_to_geneve_nlattr(tun_key, tun_flow_key, key_buf, a); + } +#endif + nl_msg_end_nested(out, tun_key_ofs); + nl_msg_end_nested(out, start_ofs); + break; + } + case OVS_ACTION_ATTR_SET_MASKED: { + VLOG_WARN("XXX FXIME attr set masked"); + size_t offset = nl_msg_start_nested(out, OVS_ACTION_ATTR_SET_MASKED); + + nl_msg_end_nested(out, offset); + break; + } + + case OVS_ACTION_ATTR_USERSPACE: { + VLOG_WARN("XXX FXIME attr userspace"); +#if 0 + size_t offset; + struct ovs_action_userspace *au; + + au = &act->u.userspace; + + offset = nl_msg_start_nested(out, OVS_ACTION_ATTR_USERSPACE); + nl_msg_put_u32(out, OVS_USERSPACE_ATTR_PID, 123); + if (nlattr_len != 0) { + memcpy(nl_msg_put_unspec_zero(odp_actions, OVS_USERSPACE_ATTR_USERDATA, + MAX(8, userdata_size)), + userdata, userdata_size); + } + nl_msg_end_nested(out, offset); +#endif + break; + } + case OVS_ACTION_ATTR_CT: + case OVS_ACTION_ATTR_POP_VLAN: + case OVS_ACTION_ATTR_PUSH_ETH: + case OVS_ACTION_ATTR_POP_ETH: + case OVS_ACTION_ATTR_TUNNEL_PUSH: + case OVS_ACTION_ATTR_TUNNEL_POP: + case OVS_ACTION_ATTR_CLONE: + case OVS_ACTION_ATTR_METER: + case OVS_ACTION_ATTR_CT_CLEAR: + case OVS_ACTION_ATTR_PUSH_NSH: + case OVS_ACTION_ATTR_POP_NSH: + VLOG_WARN("Unexpected action type %d", type); + return EOPNOTSUPP; + case __OVS_ACTION_ATTR_MAX: + default: + OVS_NOT_REACHED(); + break; + } + } + return 0; +} + +/* Extracts packet metadata from the BPF-formatted flow key in 'key' into a + * flow structure in 'flow'. Returns an ODP_FIT_* value that indicates how well + * 'key' fits our expectations for what a flow key should contain. + * + * Note that flow->in_port will still contain an ifindex after this call, the + * caller is responsible for converting it to an odp_port number. + */ +void +bpf_flow_key_extract_metadata(const struct bpf_flow_key *key, + struct flow *flow) +{ + const struct pkt_metadata_t *md = &key->mds.md; + + /* metadata parsing */ + flow->packet_type = htonl(PT_ETH); + flow->in_port.odp_port = u32_to_odp(md->in_port); + flow->recirc_id = md->recirc_id; + flow->dp_hash = md->dp_hash; + flow->skb_priority = md->skb_priority; + flow->pkt_mark = md->pkt_mark; + flow->ct_state = md->ct_state; + flow->ct_zone = md->ct_zone; + flow->ct_mark = md->ct_mark; + if (flow->recirc_id != 0) { + VLOG_INFO("recirc_id = %d", flow->recirc_id); + } + + const struct flow_tnl_t *tun = &key->mds.tnl_md; + if (!tun->use_ipv6) { + flow->tunnel.ip_src = htonl(tun->ip4.ip_src); + flow->tunnel.ip_dst = htonl(tun->ip4.ip_dst); +#ifdef BPF_ENABLE_IPV6 + } else { + memcpy(&flow->tunnel.ipv6_src, tun->ip6.ipv6_src, 16); + memcpy(&flow->tunnel.ipv6_dst, tun->ip6.ipv6_dst, 16); +#endif + } + flow->tunnel.ip_tos = tun->ip_tos; + flow->tunnel.ip_ttl = tun->ip_ttl; + flow->tunnel.tun_id = htonll(tun->tun_id); + //flow->tunnel.flags = FLOW_TNL_F_DONT_FRAGMENT; // this causes key differs + flow->tunnel.flags = 0; + + if (tun->gnvopt_valid) { + memcpy(flow->tunnel.metadata.opts.gnv, &tun->gnvopt, + sizeof tun->gnvopt); + flow->tunnel.metadata.present.len = sizeof tun->gnvopt; + flow->tunnel.flags |= FLOW_TNL_F_UDPIF; + } + +//#define IP_DF 0x4000 /* Flag: "Don't Fragment" */ +// flow->tunnel.flags = 0x40; //htons(IP_DF); + /* TODO */ + /* + flow->ct_label = md.ct_label; + ct_nw_proto + ct_{nw,tp}_{src,dst} + flow_tnl_copy__() + */ +} + +/* XXX The caller must perform in_port translation. */ +void +bpf_metadata_from_flow(const struct flow *flow, struct ebpf_metadata_t *md) +{ + if (flow->packet_type != htonl(PT_ETH)) { + VLOG_WARN("Cannot convert flow to bpf metadata: non-ethernet"); + } + md->md.in_port = odp_to_u32(flow->in_port.odp_port); /* XXX */ + md->md.recirc_id = flow->recirc_id; + md->md.dp_hash = flow->dp_hash; + md->md.skb_priority = flow->skb_priority; + md->md.pkt_mark = flow->pkt_mark; + md->md.ct_state = flow->ct_state; + md->md.ct_zone = flow->ct_zone; + md->md.ct_mark = flow->ct_mark; + + /* TODO */ + /* + md->md.ct_label = flow.ct_label; + flow_tnl_copy__() + */ +} + +enum odp_key_fitness +bpf_flow_key_to_flow(const struct bpf_flow_key *key, struct flow *flow) +{ + const struct ebpf_headers_t *hdrs = &key->headers; + + memset(flow, 0, sizeof *flow); + bpf_flow_key_extract_metadata(key, flow); + + /* L2 */ + if (hdrs->valid & ETHER_VALID) { + memcpy(&flow->dl_dst, &hdrs->ethernet.dstAddr, sizeof(struct eth_addr)); + memcpy(&flow->dl_src, &hdrs->ethernet.srcAddr, sizeof(struct eth_addr)); + flow->dl_type = hdrs->ethernet.etherType; + } + if (hdrs->valid & VLAN_VALID) { + flow->vlans[0].tpid = hdrs->vlan.etherType; + flow->vlans[0].tci = htons(hdrs->vlan.tci) | htons(VLAN_CFI); + // extract_ + flow->dl_type = hdrs->vlan.etherType; + } + + /* L3 */ + if (hdrs->valid & IPV4_VALID) { + flow->nw_src = hdrs->ipv4.srcAddr; + flow->nw_dst = hdrs->ipv4.dstAddr; + flow->nw_ttl = hdrs->ipv4.ttl; + flow->nw_proto = hdrs->ipv4.protocol; +#ifdef BPF_ENABLE_IPV6 + } else if (hdrs->valid & IPV6_VALID) { + memcpy(&flow->ipv6_src, &hdrs->ipv6.srcAddr, sizeof flow->ipv6_src); + memcpy(&flow->ipv6_dst, &hdrs->ipv6.dstAddr, sizeof flow->ipv6_dst); + flow->ipv6_label = htonl(hdrs->ipv6.flowLabel); + /* XXX: flow->nw_frag */ + flow->nw_tos = hdrs->ipv6.trafficClass; + flow->nw_ttl = hdrs->ipv6.hopLimit; + flow->nw_proto = hdrs->ipv6.nextHdr; +#endif + } else if (hdrs->valid & ARP_VALID) { + memcpy(&flow->arp_sha, key->headers.arp.ar_sha, 6); + memcpy(&flow->arp_tha, key->headers.arp.ar_tha, 6); + memcpy(&flow->nw_src, key->headers.arp.ar_sip, 4); /* be32 */ + memcpy(&flow->nw_dst, key->headers.arp.ar_tip, 4); + + if (ntohs(key->headers.arp.ar_op) < 0xff) { + flow->nw_proto = ntohs(key->headers.arp.ar_op); + } else { + flow->nw_proto = 0; + } + } + + /* L4 */ + if (hdrs->valid & TCP_VALID) { + flow->tcp_flags = hdrs->tcp.flags; + flow->tp_src = hdrs->tcp.srcPort; + flow->tp_dst = hdrs->tcp.dstPort; + } else if (hdrs->valid & UDP_VALID) { + flow->tp_src = htons(hdrs->udp.srcPort); + flow->tp_dst = htons(hdrs->udp.dstPort); + } else if (hdrs->valid & ICMP_VALID) { + /* XXX: validate */ + flow->tp_src = htons(hdrs->icmp.type); // u8 to be16 + flow->tp_dst = htons(hdrs->icmp.code); + } else if (hdrs->valid & ICMPV6_VALID) { + flow->tp_src = htons(hdrs->icmpv6.type); // u8 to be16 + flow->tp_dst = htons(hdrs->icmpv6.code); + } /* XXX: IGMP */ + + return ODP_FIT_PERFECT; +} + +/* Converts the 'nla_len' bytes of OVS netlink-formatted flow key in 'nla' into + * the bpf flow structure in 'key'. Returns an ODP_FIT_* value that indicates + * how well 'nla' fits into the BPF flow key format. On success, 'in_port' will + * be populated with the in_port specified by 'nla', which the caller must + * convert from an ODP port number into an ifindex and place into 'key'. + */ +enum odp_key_fitness +odp_key_to_bpf_flow_key(const struct nlattr *nla, size_t nla_len, + struct bpf_flow_key *key, odp_port_t *in_port, + bool inner, bool verbose) +{ + bool found_in_port = false; + const struct nlattr *a; + size_t left; + + NL_ATTR_FOR_EACH(a, left, nla, nla_len) { + enum ovs_key_attr type = nl_attr_type(a); + + switch (type) { + case OVS_KEY_ATTR_PRIORITY: + key->mds.md.skb_priority = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_IN_PORT: { + /* The caller must convert the ODP port number into ifindex. */ + *in_port = nl_attr_get_odp_port(a); + found_in_port = true; + break; + } + case OVS_KEY_ATTR_ETHERNET: { + const struct ovs_key_ethernet *eth = nl_attr_get(a); + + for (int i = 0; i < ARRAY_SIZE(eth->eth_dst.ea); i++) { + key->headers.ethernet.dstAddr[i] = eth->eth_dst.ea[i]; + key->headers.ethernet.srcAddr[i] = eth->eth_src.ea[i]; + } + key->headers.valid |= ETHER_VALID; + break; + } + case OVS_KEY_ATTR_VLAN: { + ovs_be16 tci = nl_attr_get_be16(a); + struct vlan_tag_t *vlan = inner ? &key->headers.cvlan + : &key->headers.vlan; + vlan->tci = ntohs(tci); + key->headers.vlan.tci = ntohs(tci); + /* etherType is set below in OVS_KEY_ATTR_ETHERTYPE. */ + key->headers.valid |= VLAN_VALID; + break; + } + case OVS_KEY_ATTR_ETHERTYPE: { + ovs_be16 dl_type; + + dl_type = nl_attr_get_be16(a); + key->headers.ethernet.etherType = dl_type; + key->headers.valid |= ETHER_VALID; + + if (dl_type == htons(ETH_P_IP)) { + key->headers.valid |= IPV4_VALID; + } else if (dl_type == htons(ETH_P_IPV6)) { + key->headers.valid |= IPV6_VALID; + } else if (dl_type == htons(ETH_P_ARP)) { + key->headers.valid |= ARP_VALID; + } else if (dl_type == htons(ETH_P_8021Q)) { + key->headers.vlan.etherType = htons(ETH_P_8021Q); + key->headers.valid |= VLAN_VALID; + } else if (dl_type == htons(ETH_P_8021AD)) { + key->headers.cvlan.etherType = htons(ETH_P_8021AD); + key->headers.valid |= CVLAN_VALID; + } else if (dl_type == htons(ETH_P_MPLS_UC) || + dl_type == htons(ETH_P_MPLS_MC)) { + key->headers.valid |= MPLS_VALID; + } else { + VLOG_WARN("%s dl_type %x not supported", + __func__, ntohs(dl_type)); + } + break; + } + case OVS_KEY_ATTR_IPV4: { + const struct ovs_key_ipv4 *ipv4 = nl_attr_get(a); + + key->headers.ipv4.srcAddr = ipv4->ipv4_src; + key->headers.ipv4.dstAddr = ipv4->ipv4_dst; + key->headers.ipv4.protocol = ipv4->ipv4_proto; + key->headers.ipv4.ttl = ipv4->ipv4_ttl; + /* XXX: ipv4->ipv4_frag; One of OVS_FRAG_TYPE_*. */ + key->headers.valid |= IPV4_VALID; + break; + } + case OVS_KEY_ATTR_IPV6: { +#ifdef BPF_ENABLE_IPV6 + const struct ovs_key_ipv6 *ipv6 = nl_attr_get(a); + + memcpy(&key->headers.ipv6.srcAddr, &ipv6->ipv6_src, + ARRAY_SIZE(key->headers.ipv6.srcAddr)); + memcpy(&key->headers.ipv6.dstAddr, &ipv6->ipv6_dst, + ARRAY_SIZE(key->headers.ipv6.dstAddr)); + key->headers.ipv6.flowLabel = ntohl(ipv6->ipv6_label); + key->headers.ipv6.nextHdr = ipv6->ipv6_proto; + key->headers.ipv6.trafficClass = ipv6->ipv6_tclass; + key->headers.ipv6.hopLimit = ipv6->ipv6_hlimit; + /* XXX: ipv6_frag; One of OVS_FRAG_TYPE_*. */ + key->headers.valid |= IPV6_VALID; +#endif + break; + } + case OVS_KEY_ATTR_TCP: { + const struct ovs_key_tcp *tcp = nl_attr_get(a); + + key->headers.tcp.srcPort = tcp->tcp_src; + key->headers.tcp.dstPort = tcp->tcp_dst; + key->headers.valid |= TCP_VALID; + break; + } + case OVS_KEY_ATTR_UDP: { + const struct ovs_key_udp *udp = nl_attr_get(a); + + key->headers.udp.srcPort = ntohs(udp->udp_src); + key->headers.udp.dstPort = ntohs(udp->udp_dst); + key->headers.valid |= UDP_VALID; + break; + } + case OVS_KEY_ATTR_ICMP: { + const struct ovs_key_icmp *icmp = nl_attr_get(a); + /* XXX: Double-check */ + key->headers.icmp.type = icmp->icmp_type; + key->headers.icmp.code = icmp->icmp_code; + key->headers.valid |= ICMP_VALID; + break; + } + case OVS_KEY_ATTR_ARP: { + const struct ovs_key_arp *arp = nl_attr_get(a); + + key->headers.arp.ar_op = arp->arp_op; + memcpy(key->headers.arp.ar_sip, &arp->arp_sip, 4); + memcpy(key->headers.arp.ar_tip, &arp->arp_tip, 4); /* be32 */ + memcpy(key->headers.arp.ar_sha, &arp->arp_sha, 6); + memcpy(key->headers.arp.ar_tha, &arp->arp_tha, 6); + key->headers.valid |= ARP_VALID; + break; + } + case OVS_KEY_ATTR_SKB_MARK: + key->mds.md.pkt_mark = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_TCP_FLAGS: { + ovs_be16 flags_be = nl_attr_get_be16(a); + + key->headers.tcp.flags = flags_be; + key->headers.valid |= TCP_VALID; + break; + } + case OVS_KEY_ATTR_DP_HASH: + key->mds.md.dp_hash = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_RECIRC_ID: + key->mds.md.recirc_id = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_CT_STATE: + key->mds.md.ct_state = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_CT_ZONE: + key->mds.md.ct_zone = nl_attr_get_u16(a); + break; + case OVS_KEY_ATTR_CT_MARK: + key->mds.md.ct_mark = nl_attr_get_u32(a); + break; + case OVS_KEY_ATTR_CT_LABELS: + memcpy(&key->mds.md.ct_label, nl_attr_get(a), + sizeof(key->mds.md.ct_label)); + break; + case OVS_KEY_ATTR_PACKET_TYPE: { + ovs_be32 pt = nl_attr_get_be32(a); + if (pt != htonl(PT_ETH)) { + return ODP_FIT_ERROR; + } + break; + } + case OVS_KEY_ATTR_MPLS: { + const struct ovs_key_mpls *mpls = nl_attr_get(a); + key->headers.mpls.top_lse = mpls->mpls_lse; + break; + } + case OVS_KEY_ATTR_ENCAP: { + enum odp_key_fitness ret; + ret = odp_key_to_bpf_flow_key(nl_attr_get(a), nl_attr_get_size(a), + key, in_port, true, verbose); + if (ret != ODP_FIT_PERFECT) { + return ret; + } + break; + } + case OVS_KEY_ATTR_TUNNEL: { + enum odp_key_fitness ret; + ret = odp_tun_to_bpf_tun(nl_attr_get(a), nl_attr_get_size(a), + &key->mds.tnl_md); + if (ret != ODP_FIT_PERFECT) { + VLOG_ERR("%s odp key to bpf tunnel key error", __func__); + return ret; + } + break; + } + case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4: + case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6: + case OVS_KEY_ATTR_ICMPV6: { + const struct ovs_key_icmpv6 *icmpv6 = nl_attr_get(a); + + key->headers.icmpv6.type = icmpv6->icmpv6_type; + key->headers.icmpv6.code = icmpv6->icmpv6_code; + key->headers.valid |= ICMPV6_VALID; + break; + } + case OVS_KEY_ATTR_ND: { + // XXX skip + break; + } + case OVS_KEY_ATTR_SCTP: + case OVS_KEY_ATTR_NSH: + { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 20); + struct ds ds = DS_EMPTY_INITIALIZER; + // compile error, remove it + //odp_format_key_attr(a, NULL, NULL, &ds, verbose); + VLOG_INFO_RL(&rl, "Cannot convert \'%s\'", ds_cstr(&ds)); + ds_destroy(&ds); + return ODP_FIT_ERROR; + } + case OVS_KEY_ATTR_UNSPEC: + case __OVS_KEY_ATTR_MAX: + default: + OVS_NOT_REACHED(); + } + } + + if (!inner && !found_in_port) { + VLOG_ERR("not found in_port"); + return ODP_FIT_ERROR; + } + + if (!inner && verbose) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + struct ds ds = DS_EMPTY_INITIALIZER; + + ds_put_format(&ds, "%s\nODP:\n", __func__); + odp_flow_key_format(nla, nla_len, &ds); + ds_put_cstr(&ds, "\nBPF:\n"); + bpf_flow_key_format(&ds, key); + VLOG_INFO_RL(&rl, "%s", ds_cstr(&ds)); + ds_destroy(&ds); + } + + return ODP_FIT_PERFECT; +} + +#define TABSPACE " " + +static void +indent(struct ds *ds, struct ds *tab, const char *string) +{ + ds_put_format(ds, "%s%s", ds_cstr(tab), string); + ds_put_cstr(tab, TABSPACE); +} + +static void +trim(struct ds *ds, struct ds *tab) +{ + ds_chomp(ds, '\n'); + ds_put_char(ds, '\n'); + ds_truncate(tab, tab->length ? tab->length - strlen(TABSPACE) : 0); +} + +#define PUT_FIELD(STRUCT, NAME, FORMAT) \ + if (STRUCT->NAME) \ + ds_put_format(ds, #NAME"=%"FORMAT",", STRUCT->NAME) + +void +bpf_flow_key_format(struct ds *ds, const struct bpf_flow_key *key) +{ + struct ds tab = DS_EMPTY_INITIALIZER; + + indent(ds, &tab, "headers:\n"); + { + if (key->headers.valid & ETHER_VALID) { + const struct ethernet_t *eth = &key->headers.ethernet; + const struct eth_addr *src = (struct eth_addr *)ð->srcAddr; + const struct eth_addr *dst = (struct eth_addr *)ð->dstAddr; + + ds_put_format(ds, "%sethernet(", ds_cstr(&tab)); + PUT_FIELD(eth, etherType, "#"PRIx16); + ds_put_format(ds, "dst="ETH_ADDR_FMT",", ETH_ADDR_ARGS(*dst)); + ds_put_format(ds, "src="ETH_ADDR_FMT",", ETH_ADDR_ARGS(*src)); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + if (key->headers.valid & IPV4_VALID) { + const struct ipv4_t *ipv4 = &key->headers.ipv4; + + ds_put_format(ds, "%sipv4(", ds_cstr(&tab)); + PUT_FIELD(ipv4, ttl, "#"PRIx8); + PUT_FIELD(ipv4, tos, "#"PRIx8); + PUT_FIELD(ipv4, protocol, "#"PRIx8); + ds_put_format(ds, "srcAddr="IP_FMT",", IP_ARGS(ipv4->srcAddr)); + ds_put_format(ds, "dstAddr="IP_FMT",", IP_ARGS(ipv4->dstAddr)); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } +#ifdef BPF_ENABLE_IPV6 + if (key->headers.valid & IPV6_VALID) { + const struct ipv6_t *ipv6 = &key->headers.ipv6; + + ds_put_format(ds, "%sipv6(", ds_cstr(&tab)); + PUT_FIELD(ipv6, version, "#"PRIx8); + PUT_FIELD(ipv6, trafficClass, "#"PRIx8); + PUT_FIELD(ipv6, flowLabel, "#"PRIx32); + PUT_FIELD(ipv6, payloadLen, "#"PRIx16); + PUT_FIELD(ipv6, nextHdr, "#"PRIx8); + PUT_FIELD(ipv6, hopLimit, "#"PRIx8); + ds_put_cstr(ds, "src="); + ipv6_format_addr((struct in6_addr *)&ipv6->srcAddr, ds); + ds_put_cstr(ds, ",dst="); + ipv6_format_addr((struct in6_addr *)&ipv6->dstAddr, ds); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } +#endif + if (key->headers.valid & ARP_VALID) { + const struct arp_rarp_t *arp = &key->headers.arp; + + ds_put_format(ds, "%sarp(", ds_cstr(&tab)); + PUT_FIELD(arp, ar_hrd, "#"PRIx16); + PUT_FIELD(arp, ar_pro, "#"PRIx16); + PUT_FIELD(arp, ar_hln, "#"PRIx8); + PUT_FIELD(arp, ar_pln, "#"PRIx8); + PUT_FIELD(arp, ar_op, "#"PRIx16); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + if (key->headers.valid & TCP_VALID) { + const struct tcp_t *tcp = &key->headers.tcp; + + ds_put_format(ds, "%stcp(", ds_cstr(&tab)); + PUT_FIELD(tcp, srcPort, PRIu16); + PUT_FIELD(tcp, dstPort, PRIu16); + PUT_FIELD(tcp, flags, "#"PRIx16); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + if (key->headers.valid & UDP_VALID) { + const struct udp_t *udp = &key->headers.udp; + + ds_put_format(ds, "%sudp(", ds_cstr(&tab)); + PUT_FIELD(udp, srcPort, PRIu16); + PUT_FIELD(udp, dstPort, PRIu16); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + if (key->headers.valid & ICMP_VALID) { + const struct icmp_t *icmp = &key->headers.icmp; + + ds_put_format(ds, "%sicmp(", ds_cstr(&tab)); + PUT_FIELD(icmp, type, "#"PRIx8); + PUT_FIELD(icmp, code, "#"PRIx8); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + if (key->headers.valid & VLAN_VALID) { + const struct vlan_tag_t *vlan = &key->headers.vlan; + + ds_put_format(ds, "%svlan(", ds_cstr(&tab)); + PUT_FIELD(vlan, pcp, "#"PRIx8); + PUT_FIELD(vlan, cfi, "#"PRIx8); + PUT_FIELD(vlan, vid, "#"PRIx16); + PUT_FIELD(vlan, tci, "#"PRIx16); + PUT_FIELD(vlan, etherType, "#"PRIx16); + ds_chomp(ds, ','); + ds_put_format(ds, ")\n"); + } + } + trim(ds, &tab); + indent(ds, &tab, "metadata:\n"); + { + indent(ds, &tab, "md:\n"); + { + ds_put_hex_dump(ds, &key->mds.md, sizeof key->mds.md, 0, false); + } + trim(ds, &tab); + indent(ds, &tab, "tnl_md:\n"); + { + ds_put_hex_dump(ds, &key->mds.tnl_md, sizeof key->mds.tnl_md, 0, + false); + } + trim(ds, &tab); + } + trim(ds, &tab); + ds_chomp(ds, '\n'); + + ds_destroy(&tab); +} diff --git a/lib/dpif-bpf-odp.h b/lib/dpif-bpf-odp.h new file mode 100644 index 000000000000..ddf9b5fec6af --- /dev/null +++ b/lib/dpif-bpf-odp.h @@ -0,0 +1,47 @@ +/* + * Copyright (c) 2017 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef DPIF_BPF_ODP_H +#define DPIF_BPF_ODP_H 1 + +#include "odp-util.h" + +struct flow; +struct flow_tnl_t; +struct nlattr; +struct bpf_flow_key; +struct bpf_action; +struct ebpf_metadata_t; +struct bpf_action_batch; + +int odp_action_to_bpf_action(const struct nlattr *, struct bpf_action *); +int bpf_actions_to_odp_actions(struct bpf_action_batch *, struct ofpbuf *out); +enum odp_key_fitness bpf_flow_key_to_flow(const struct bpf_flow_key *, + struct flow *); +void bpf_flow_key_extract_metadata(const struct bpf_flow_key *, + struct flow *flow); +void bpf_metadata_from_flow(const struct flow *flow, + struct ebpf_metadata_t *md); +enum odp_key_fitness odp_key_to_bpf_flow_key(const struct nlattr *, size_t, + struct bpf_flow_key *, + odp_port_t *in_port, + bool inner, bool verbose); +enum odp_key_fitness odp_tun_to_bpf_tun(const struct nlattr *nla, + size_t nla_len, + struct flow_tnl_t *tun); +void bpf_flow_key_format(struct ds *ds, const struct bpf_flow_key *key); + +#endif /* dpif-bpf-odp.h */ From patchwork Sat Jul 14 11:38:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943922 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Vx8TVmhE"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSXV3K3Mz9ryt for ; Sat, 14 Jul 2018 21:45:14 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id EA77DCE8; Sat, 14 Jul 2018 11:40:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 46F8ECDB for ; Sat, 14 Jul 2018 11:40:09 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f169.google.com (mail-pg1-f169.google.com [209.85.215.169]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 0BBD5794 for ; Sat, 14 Jul 2018 11:40:03 +0000 (UTC) Received: by mail-pg1-f169.google.com with SMTP id v13-v6so5838058pgr.10 for ; Sat, 14 Jul 2018 04:40:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kDESbfbsy8oIR86Gt5383fjiA0cg+qW1ZnTCa4CEbxA=; b=Vx8TVmhE5MTkpvZRusKqDyP8QCq53wXCb7Oc0rFQu1kz7x0IEL8ZT9gyEXqgb3o6KJ JEjd8bvIWP/dqvefo11XFyVHUmb6bCGe7j1THGdbCL6UYS/Sp/T3pu0Gpvg4Dc6c4IhI g2eWfMpUbouzD+Xx5H6urPX6HTiWPLCbUj56UR5cgEHmfV6hDJ/3eRJrDKUju5WQ4p2w sFPOjj4t+a+GlLCBNlSgVP1eJl+LGqrepv3Ljh79GopXClvXNsqAiRP3b106MisXgfUQ uiU8VjMEL55OVRYFNnwVKMQLy5mIFj+ff/7HEmTFInjA3IZML0DV95DEBMusDUowJke5 ffIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kDESbfbsy8oIR86Gt5383fjiA0cg+qW1ZnTCa4CEbxA=; b=tEX80DPXd2FrVb7pAuQKiiSdYlQITYzz5NxKAvMQgqXWKRdGzNXH69KdXDNR4/Qtob Kg/6+OAuS0+O/YBboH4Z7CIxaE8pEFWBAXZ88SpSvyAhwZKtpMGzBa0MQ05hWTSYKcs0 IRQ12vOtiFRRGajhoa1CcBAwgTmDOsLWwKg87XfREf0lp8zkmyNnMN+hpXWtgGHBuGh0 94XSRZOyDXtadUOFJZ1ltsMzYBbZ1USFrB7m6H+e1owRbgnZlTqU0IHTtaG0NLOaDRwP P3J2PPGpmeIzrtWeNN1YzlFUvOhNxWuzSqjiOvSowQUBZQaLTNe+9aARX1wB7w0DvLK0 Mg1g== X-Gm-Message-State: AOUpUlETbHxHY6Sh7FASBEhuxa+84wepKMEr2lo0vABZqFxPlOkJbWs8 M2rejXYvRpfQxUSUXfV1vU9oHzpk X-Google-Smtp-Source: AAOMgpcuryB5USmdBnGuBiO6aXAaGFPDTH3cCepVz/rHlLYOKNJXa6lrujrBvM3XUUj6z7JzlIyhEA== X-Received: by 2002:a62:5290:: with SMTP id g138-v6mr10892850pfb.46.1531568401830; Sat, 14 Jul 2018 04:40:01 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.39.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:40:00 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:38:59 -0700 Message-Id: <1531568345-80246-8-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 07/13] bpf: implement OVS BPF datapath. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org This patch adds the OVS-eBPF datapath implementation for dpif-bpf. Three stages are added: parse, lookup, and actions. Each stages are tail called to the next stage. When executing multiple actions, the current action also tail calls the subsequent action, based on the result of flow table lookup. The protocol headers are auto-generated and defined at generated_headers.h. The bpf_flow_key is extracted using the P4-to-eBPF compiler from the bcc project. A couple of manual tweaks are required, see parser.h. Signed-off-by: William Tu Signed-off-by: Yifeng Sun Signed-off-by: Joe Stringer Co-authored-by: Joe Stringer Co-authored-by: Yifeng Sun --- Makefile.am | 1 + bpf/action.h | 715 ++++++++++++++++++++++++++++++++++++++++++++++++ bpf/api.h | 279 +++++++++++++++++++ bpf/automake.mk | 60 ++++ bpf/datapath.c | 192 +++++++++++++ bpf/datapath.h | 71 +++++ bpf/generated_headers.h | 182 ++++++++++++ bpf/helpers.h | 248 +++++++++++++++++ bpf/lookup.h | 228 +++++++++++++++ bpf/maps.h | 170 ++++++++++++ bpf/odp-bpf.h | 255 +++++++++++++++++ bpf/openvswitch.h | 49 ++++ bpf/ovs-p4.h | 90 ++++++ bpf/ovs-proto.p4 | 329 ++++++++++++++++++++++ bpf/parser.h | 344 +++++++++++++++++++++++ bpf/xdp.h | 35 +++ 16 files changed, 3248 insertions(+) create mode 100644 bpf/action.h create mode 100644 bpf/api.h create mode 100644 bpf/automake.mk create mode 100644 bpf/datapath.c create mode 100644 bpf/datapath.h create mode 100644 bpf/generated_headers.h create mode 100644 bpf/helpers.h create mode 100644 bpf/lookup.h create mode 100644 bpf/maps.h create mode 100644 bpf/odp-bpf.h create mode 100644 bpf/openvswitch.h create mode 100644 bpf/ovs-p4.h create mode 100644 bpf/ovs-proto.p4 create mode 100644 bpf/parser.h create mode 100644 bpf/xdp.h diff --git a/Makefile.am b/Makefile.am index 21e27fa32965..ec1fc53b1060 100644 --- a/Makefile.am +++ b/Makefile.am @@ -440,6 +440,7 @@ dist-docs: include Documentation/automake.mk include m4/automake.mk +include bpf/automake.mk include lib/automake.mk include ofproto/automake.mk include utilities/automake.mk diff --git a/bpf/action.h b/bpf/action.h new file mode 100644 index 000000000000..79558ca13780 --- /dev/null +++ b/bpf/action.h @@ -0,0 +1,715 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +/* OVS Datapath Execution + * ====================== + * + * When a lookup is successful the eBPF gets a list of actions to be + * executed, such as outputting the packet to a certain port, or + * pushing a VLAN tag. The list of actions is configured in ovs-vswitchd + * and may be a variable length depending on the desired network processing + * behaviour. For example, an L2 switch doing unknown broadcast sends + * packet to all its current ports. The OVS datapath’s actions is derived + * from the OpenFlow action specification and the OVSDB schema for + * ovs-vswitchd. + * + */ +#include +#include +#include +#include + +#include "api.h" +#include "maps.h" +#include "helpers.h" + +#define ENABLE_POINTER_LOOKUP 1 + +#define ALIGNED_CAST(TYPE, ATTR) ((TYPE) (void *) (ATTR)) + +#define IP_CSUM_OFF (ETH_HLEN + offsetof(struct iphdr, check)) +#define TOS_OFF (ETH_HLEN + offsetof(struct iphdr, tos)) +#define TTL_OFF (ETH_HLEN + offsetof(struct iphdr, ttl)) +#define DST_OFF (ETH_HLEN + offsetof(struct iphdr, daddr)) +#define SRC_OFF (ETH_HLEN + offsetof(struct iphdr, saddr)) + +static inline void set_ip_tos(struct __sk_buff *skb, __u8 new_tos) +{ + __u8 old_tos; + + bpf_skb_load_bytes(skb, TOS_OFF, &old_tos, 1); + + if (old_tos == new_tos) { + printt("tos not change %d\n", old_tos); + return; + } + + bpf_l3_csum_replace(skb, IP_CSUM_OFF, old_tos, new_tos, 2); + + /* Use helper here because using direct packet + * access causes verifier error + */ + bpf_skb_store_bytes(skb, TOS_OFF, &new_tos, sizeof(new_tos), 0); +} + +static inline void set_ip_ttl(struct __sk_buff *skb, __u8 new_ttl) +{ + __u8 old_ttl; + + bpf_skb_load_bytes(skb, TTL_OFF, &old_ttl, 1); + + if (old_ttl == new_ttl) { + printt("ttl not change %d\n", old_ttl); + return; + } + + printt("old ttl %d -> new ttl %d\n", old_ttl, new_ttl); + + bpf_l3_csum_replace(skb, IP_CSUM_OFF, old_ttl, new_ttl, 2); + bpf_skb_store_bytes(skb, TTL_OFF, &new_ttl, sizeof(new_ttl), 0); +} + +static inline void set_ip_dst(struct __sk_buff *skb, ovs_be32 new_dst) +{ + ovs_be32 old_dst; + + bpf_skb_load_bytes(skb, DST_OFF, &old_dst, 4); + + if (old_dst == new_dst) { + printt("dst ip not change %x\n", old_dst); + return; + } + printt("old dst %x -> new dst %x\n", old_dst, new_dst); + + l3_csum_replace4(skb, IP_CSUM_OFF, old_dst, new_dst); + bpf_skb_store_bytes(skb, DST_OFF, &new_dst, sizeof(new_dst), 0); +} + +static inline void set_ip_src(struct __sk_buff *skb, ovs_be32 new_src) +{ + ovs_be32 old_src; + + bpf_skb_load_bytes(skb, SRC_OFF, &old_src, 4); + + if (old_src == new_src) { + printt("src ip not change %x\n", old_src); + return; + } + printt("old src %x -> new src %x\n", old_src, new_src); + + l3_csum_replace4(skb, IP_CSUM_OFF, old_src, new_src); + bpf_skb_store_bytes(skb, SRC_OFF, &new_src, sizeof(new_src), 0); +} + +/* + * Every OVS action need to lookup the action list and + * with index, find out the next action to process + */ +static inline struct bpf_action *pre_tail_action(struct __sk_buff *skb, + struct bpf_action_batch **__batch) +{ + uint32_t index = ovs_cb_get_action_index(skb); + struct bpf_action *action = NULL; + struct bpf_action_batch *batch; + int zero_index = 0; + + if (index >= BPF_DP_MAX_ACTION) { + printt("ERR max ebpf action hit\n"); + return NULL; + } + + if (skb->cb[OVS_CB_DOWNCALL_EXE]) { + /* Downcall packet has a dedicated action list */ + batch = bpf_map_lookup_elem(&execute_actions, &zero_index); + } else { + struct bpf_flow_key *exe_flow_key; + + exe_flow_key = bpf_map_lookup_elem(&percpu_executing_key, + &zero_index); + if (!exe_flow_key) { + printt("empty percpu_executing_key\n"); + return NULL; + } + +#if ENABLE_POINTER_LOOKUP + /* + * kernel 4.18-rc1, commit: + * bpf: allow map helpers access to map values directly + */ + batch = bpf_map_lookup_elem(&flow_table, exe_flow_key); +#else + struct bpf_flow_key flow_key = *exe_flow_key; + batch = bpf_map_lookup_elem(&flow_table, &flow_key); +#endif + } + if (!batch) { + printt("no batch action found\n"); + return NULL; + } + + *__batch = batch; + action = &((batch)->actions[index]); + return action; +} + +/* + * After processing the action, tail call the next. + */ +static inline int post_tail_action(struct __sk_buff *skb, + struct bpf_action_batch *batch) +{ + struct bpf_action *next_action; + uint32_t index; + + if (!batch) + return TC_ACT_SHOT; + + index = skb->cb[OVS_CB_ACT_IDX] + 1; + skb->cb[OVS_CB_ACT_IDX] = index; + + if (index >= BPF_DP_MAX_ACTION) + goto finish; + + next_action = &batch->actions[index]; + if (next_action->type == 0) + goto finish; + + printt("next action type = %d\n", next_action->type); + bpf_tail_call(skb, &tailcalls, next_action->type); + + printt("[BUG] tail call missing\n"); + return TC_ACT_SHOT; + +finish: + if (skb->cb[OVS_CB_DOWNCALL_EXE]) { + int index = 0; + bpf_map_delete_elem(&execute_actions, &index); + } + return TC_ACT_STOLEN; +} + +/* + * Use this action to indicate end of action list + * BPF program: tail-0 + */ +__section_tail(OVS_ACTION_ATTR_UNSPEC) +static int tail_action_unspec(struct __sk_buff *skb) +{ + int index OVS_UNUSED = ovs_cb_get_action_index(skb); + + printt("action index = %d, end of processing\n", index); + + /* Handle actions=drop, we return SHOT so the device's dropped stats + will be incremented (see sch_handle_ingress). + + If there are more actions, ex: actions=a1,a2,drop, this is + handled in post_tail_actions and return STOLEN + */ + return TC_ACT_SHOT; +} + +/* + * BPF program: tail-1 + */ +__section_tail(OVS_ACTION_ATTR_OUTPUT) +static int tail_action_output(struct __sk_buff *skb) +{ + int ret __attribute__((__unused__)); + struct bpf_action *action; + struct bpf_action_batch *batch; + int flags; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + /* Internal dev is tap type and hooked only to bpf egress filter. + When output to an internal device, a packet is clone-redirected to + this device's ingress so that this packet is processed by kernel stack. + Why? Since if the packet is sent to its egress, it is delivered to the + tap device's socket, not kernel. + */ + flags = action->u.out.flags & OVS_BPF_FLAGS_TX_STACK ? BPF_F_INGRESS : 0; + printt("output action port = %d ingress? %d\n", + action->u.out.port, (flags)); + + bpf_clone_redirect(skb, action->u.out.port, flags); + + return post_tail_action(skb, batch); +} + +/* + * This action implements OVS userspace + * BPF program: tail-2 + */ +__section_tail(OVS_ACTION_ATTR_USERSPACE) +static int tail_action_userspace(struct __sk_buff *skb) +{ + struct bpf_action *action; + struct bpf_action_batch *batch; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + /* XXX If move this declaration to top, the stack will overflow. */ + struct bpf_upcall md = { + .type = OVS_UPCALL_ACTION, + .skb_len = skb->len, + .ifindex = skb->ifindex, + }; + + if (action->u.userspace.nlattr_len > sizeof(md.uactions)) { + printt("userspace action is too large\n"); + return TC_ACT_SHOT; + } + + memcpy(md.uactions, action->u.userspace.nlattr_data, sizeof(md.uactions)); + md.uactions_len = action->u.userspace.nlattr_len; + + struct ebpf_headers_t *hdrs = bpf_get_headers(); + if (!hdrs) { + printt("headers is NULL\n"); + return TC_ACT_SHOT; + } + + memcpy(&md.key.headers, hdrs, sizeof(*hdrs)); + + uint64_t flags = skb->len; + flags <<= 32; + flags |= BPF_F_CURRENT_CPU; + int err = skb_event_output(skb, &upcalls, flags, &md, sizeof md); + + if (err) { + printt("skb_event_output of userspace action: %d", err); + return TC_ACT_SHOT; + } + + return post_tail_action(skb, batch); +} + +/* + * This action implements BPF tunnel + * BPF program: tail-3 + */ +__section_tail(OVS_ACTION_ATTR_SET) +static int tail_action_tunnel_set(struct __sk_buff *skb) +{ + struct bpf_tunnel_key key; + int ret; + uint64_t flags; + + struct bpf_action *action; + struct bpf_action_batch *batch; + struct ovs_action_set_tunnel *tunnel; + int key_attr; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + /* SET for tunnel */ + if (action->is_set_tunnel) { + tunnel = &action->u.tunnel; + + /* hard-coded now, should fetch it from action->u */ + __builtin_memset(&key, 0x0, sizeof(key)); + key.tunnel_id = tunnel->tunnel_id; + key.tunnel_tos = tunnel->tunnel_tos; + key.tunnel_ttl = tunnel->tunnel_ttl; + + printt("tunnel_id = %x\n", key.tunnel_id); + + /* TODO: handle BPF_F_DONT_FRAGMENT and BPF_F_SEQ_NUMBER */ + flags = BPF_F_ZERO_CSUM_TX; + if (!tunnel->use_ipv6) { + key.remote_ipv4 = tunnel->remote_ipv4; + flags &= ~BPF_F_TUNINFO_IPV6; + } else { + memcpy(&key.remote_ipv4, &tunnel->remote_ipv4, 16); + flags |= BPF_F_TUNINFO_IPV6; + } + + ret = bpf_skb_set_tunnel_key(skb, &key, sizeof(key), flags); + if (ret < 0) + printt("ERR setting tunnel key\n"); + + if (tunnel->gnvopt_valid) { + ret = bpf_skb_set_tunnel_opt(skb, &tunnel->gnvopt, + sizeof tunnel->gnvopt); + if (ret < 0) + printt("ERR setting tunnel opt\n"); + } + + return post_tail_action(skb, batch); + } + + /* SET for packet fields */ + key_attr = action->u.mset.key_type; + + switch (key_attr) { + case OVS_KEY_ATTR_ETHERNET: { + u8 *data = (u8 *)(long)skb->data; + u8 *data_end = (u8 *)(long)skb->data_end; + struct ethhdr *eth; + struct ovs_key_ethernet *ether; + int i; + + /* packet data */ + eth = (struct ethhdr *)data; + if (data + sizeof(*eth) > data_end) + return TC_ACT_SHOT; + + /* value from map */ + ether = &action->u.mset.key.ether; + for (i = 0; i < 6; i++) { + printt("mac dest[%d]: %x -> %x\n", + i, eth->h_dest[i], ether->eth_dst.ea[i]); + eth->h_dest[i] = ether->eth_dst.ea[i]; + } + for (i = 0; i < 6; i++) { + printt("mac src[%d]: %x -> %x\n", + i, eth->h_dest[i], ether->eth_dst.ea[i]); + eth->h_source[i] = ether->eth_src.ea[i]; + } + break; + } + case OVS_KEY_ATTR_UNSPEC: + case OVS_KEY_ATTR_TUNNEL: + default: + printt("ERR: Un-implemented key attr %d in set action\n", key_attr); + return TC_ACT_SHOT; + } + + return post_tail_action(skb, batch); +} + +/* + * This action implements VLAN push + * BPF program: tail-4 + */ +__section_tail(OVS_ACTION_ATTR_PUSH_VLAN) +static int tail_action_push_vlan(struct __sk_buff *skb) +{ + struct bpf_action *action; + struct bpf_action_batch *batch; + + printt("push vlan\n"); + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + printt("vlan push tci %d\n", bpf_ntohs(action->u.push_vlan.vlan_tci)); + printt("vlan push tpid %d\n", bpf_ntohs(action->u.push_vlan.vlan_tpid)); + + vlan_push(skb, action->u.push_vlan.vlan_tpid, + bpf_ntohs(action->u.push_vlan.vlan_tci) & VLAN_VID_MASK); + //bpf_ntohs(action->u.push_vlan.vlan_tci) & (u16)~VLAN_TAG_PRESENT); + + return post_tail_action(skb, batch); +} + +/* + * This action implements VLAN pop + * BPF program: tail-5 + */ +__section_tail(OVS_ACTION_ATTR_POP_VLAN) +static int tail_action_pop_vlan(struct __sk_buff *skb) +{ + struct bpf_action *action; + struct bpf_action_batch *batch; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + printt("vlan pop %d\n"); + bpf_skb_vlan_pop(skb); + + /* FIXME: invalidate_flow_key()? */ + return post_tail_action(skb, batch); +} + +/* + * This action implements sample + * BPF program: tail-6 + */ +__section_tail(OVS_ACTION_ATTR_SAMPLE) +static int tail_action_sample(struct __sk_buff *skb OVS_UNUSED) +{ + printt("ERR: Sample action not implemented,\ + do you want to do it? \n"); + + return TC_ACT_SHOT; +} + +/* + * This action implements recirculation + * BPF program: tail-7 + */ +__section_tail(OVS_ACTION_ATTR_RECIRC) +static int tail_action_recirc(struct __sk_buff *skb) +{ + u32 recirc_id = 0; + struct bpf_action *action; + struct bpf_action_batch *batch ; + struct ebpf_metadata_t *ebpf_md; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + /* recirc should be the last action. + * level does not handle */ + + /* don't check the is_flow_key_valid(), + * now always re-parsing the header. + */ + recirc_id = action->u.recirc_id; + printt("recirc id = %d\n", recirc_id); + + /* update metadata */ + ebpf_md = bpf_get_mds(); + if (!ebpf_md) { + printt("lookup metadata failed\n"); + return TC_ACT_SHOT; + } + ebpf_md->md.recirc_id = recirc_id; + + skb->cb[OVS_CB_ACT_IDX] = 0; + skb->cb[OVS_CB_DOWNCALL_EXE] = 0; + + /* FIXME: recirc should not call this. */ + bpf_tail_call(skb, &tailcalls, MATCH_ACTION_CALL); + return TC_ACT_SHOT; +} + +/* + * This action implement hash + * BPF program: tail-8 + */ +__section_tail(OVS_ACTION_ATTR_HASH) +static int tail_action_hash(struct __sk_buff *skb) +{ + u32 hash = 0; + int index = 0; + struct ebpf_metadata_t *ebpf_md; + struct bpf_action *action; + struct bpf_action_batch *batch; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + printt("skb->hash before = %x\n", skb->hash); + hash = bpf_get_hash_recalc(skb); + printt("skb->hash = %x hash \n", skb->hash); + if (!hash) + hash = 0x1; + + ebpf_md = bpf_map_lookup_elem(&percpu_metadata, &index); + if (!ebpf_md) { + printt("LOOKUP metadata failed\n"); + return TC_ACT_SHOT; + } + printt("save hash to ebpf_md->md.dp_hash\n"); + ebpf_md->md.dp_hash = hash; /* or create a ovs_flow_hash?*/ + + return post_tail_action(skb, batch); +} + +/* + * This action implements MPLS push + * BPF program: tail-9 + */ +__section_tail(OVS_ACTION_ATTR_PUSH_MPLS) +static int tail_action_mpls_push(struct __sk_buff *skb OVS_UNUSED) +{ + printt("ERR: Push MPLS action not implemented,\ + do you want to do it? \n"); + + return TC_ACT_SHOT; +} + +/* + * This action implements MPLS pop + * BPF program: tail-10 + */ +__section_tail(OVS_ACTION_ATTR_POP_MPLS) +static int tail_action_mpls_pop(struct __sk_buff *skb OVS_UNUSED) +{ + printt("ERR: Pop MPLS action not implemented,\ + do you want to do it? \n"); + + return TC_ACT_SHOT; +} + +/* + * This action implements set packet's fields, mask not supported. + * Many other fields not implemented yet. + * BPF program: tail-11 + * TODO: hit verifier limit here, maybe create more program and + * more tail call. + */ +__section_tail(OVS_ACTION_ATTR_SET_MASKED) +static int tail_action_set_masked(struct __sk_buff *skb) +{ + struct bpf_action *action; + struct bpf_action_batch *batch; + int key_attr; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + key_attr = action->u.mset.key_type; + + switch (key_attr) { + case OVS_KEY_ATTR_ETHERNET: { + u8 *data = (u8 *)(long)skb->data; + u8 *data_end = (u8 *)(long)skb->data_end; + struct ethhdr *eth; + struct ovs_key_ethernet *ether; + int i; + + /* packet data */ + eth = (struct ethhdr *)data; + if (data + sizeof(*eth) > data_end) + return TC_ACT_SHOT; + + /* value from map */ + ether = &action->u.mset.key.ether; + for (i = 0; i < 6; i++) { + printt("mac dest[%d]: %x -> %x\n", + i, eth->h_dest[i], ether->eth_dst.ea[i]); + eth->h_dest[i] = ether->eth_dst.ea[i]; + } + for (i = 0; i < 6; i++) { + printt("mac src[%d]: %x -> %x\n", + i, eth->h_dest[i], ether->eth_dst.ea[i]); + eth->h_source[i] = ether->eth_src.ea[i]; + } + break; + } + case OVS_KEY_ATTR_IPV4: { + u8 *data = (u8 *)(long)skb->data; + u8 *data_end = (u8 *)(long)skb->data_end; + struct iphdr *nh; + struct ovs_key_ipv4 *ipv4; + + /* packet data */ + nh = ALIGNED_CAST(struct iphdr *, data + sizeof(struct ethhdr)); + if ((u8 *)nh + sizeof(struct iphdr) + 12 > data_end) { + return TC_ACT_SHOT; + } + + /* value from map */ + ipv4 = &action->u.mset.key.ipv4; + /* set ipv4_proto is not supported, see + * datapath/actions.c + */ + set_ip_tos(skb, ipv4->ipv4_tos); + set_ip_ttl(skb, ipv4->ipv4_ttl); + +#if ENABLE_POINTER_LOOKUP + set_ip_src(skb, ipv4->ipv4_src); + set_ip_dst(skb, ipv4->ipv4_dst); +#endif + + printt("set_masked ipv4 done\n"); + /* XXX ignore frag */ + + break; + } + case OVS_KEY_ATTR_UNSPEC: + case OVS_KEY_ATTR_ENCAP: + case OVS_KEY_ATTR_PRIORITY: /* u32 skb->priority */ + case OVS_KEY_ATTR_IN_PORT: /* u32 OVS dp port number */ + case OVS_KEY_ATTR_VLAN: /* be16 VLAN TCI */ + case OVS_KEY_ATTR_ETHERTYPE: /* be16 Ethernet type */ + case OVS_KEY_ATTR_IPV6: /* struct ovs_key_ipv6 */ + case OVS_KEY_ATTR_TCP: /* struct ovs_key_tcp */ + case OVS_KEY_ATTR_UDP: /* struct ovs_key_udp */ + case OVS_KEY_ATTR_ICMP: /* struct ovs_key_icmp */ + case OVS_KEY_ATTR_ICMPV6: /* struct ovs_key_icmpv6 */ + case OVS_KEY_ATTR_ARP: /* struct ovs_key_arp */ + case OVS_KEY_ATTR_ND: /* struct ovs_key_nd */ + case OVS_KEY_ATTR_SKB_MARK: /* u32 skb mark */ + case OVS_KEY_ATTR_TUNNEL: /* Nested set of ovs_tunnel attributes */ + case OVS_KEY_ATTR_SCTP: /* struct ovs_key_sctp */ + case OVS_KEY_ATTR_TCP_FLAGS: /* be16 TCP flags. */ + case OVS_KEY_ATTR_DP_HASH: /* u32 hash value. Value 0 indicates the hash */ + case OVS_KEY_ATTR_RECIRC_ID: /* u32 recirc id */ + case OVS_KEY_ATTR_MPLS: /* array of struct ovs_key_mpls. */ + case OVS_KEY_ATTR_CT_STATE: /* u32 bitmask of OVS_CS_F_* */ + case OVS_KEY_ATTR_CT_ZONE: /* u16 connection tracking zone. */ + case OVS_KEY_ATTR_CT_MARK: /* u32 connection tracking mark */ + case OVS_KEY_ATTR_CT_LABELS: /* 16-octet connection tracking labels */ + case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4: /* struct ovs_key_ct_tuple_ipv4 */ + case OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6: /* struct ovs_key_ct_tuple_ipv6 */ + case OVS_KEY_ATTR_NSH: /* Nested set of ovs_nsh_key_* */ +#ifdef __KERNEL__ + case OVS_KEY_ATTR_TUNNEL_INFO: /* struct ovs_tunnel_info */ +#endif +#ifndef __KERNEL__ + case OVS_KEY_ATTR_PACKET_TYPE: /* be32 packet type */ +#endif + case __OVS_KEY_ATTR_MAX: + default: + printt("ERR Un-implemented key attr %d in set_masked\n", key_attr); + return TC_ACT_SHOT; + } + + return post_tail_action(skb, batch); +} + +/* + * This action implements connection tracking + * BPF program: tail-12 + */ +__section_tail(OVS_ACTION_ATTR_CT) +static int tail_action_ct(struct __sk_buff *skb OVS_UNUSED) +{ + printt("ERR: CT (connection tracking) not implemented,\ + do you want to do it? \n"); + return TC_ACT_SHOT; +} + +/* + * This action implements packet truncate + * BPF program: tail-13 + */ +__section_tail(OVS_ACTION_ATTR_TRUNC) +static int tail_action_trunc(struct __sk_buff *skb) +{ + struct bpf_action *action; + struct bpf_action_batch *batch; + + action = pre_tail_action(skb, &batch); + if (!action) + return TC_ACT_SHOT; + + printt("len before: %d\n", skb->len); + printt("truncate to %d\n", action->u.trunc.max_len); + + /* The helper will resize the skb to the given new size */ + bpf_skb_change_tail(skb, action->u.trunc.max_len, 0); + + printt("len after: %d\n", skb->len); + return post_tail_action(skb, batch); +} diff --git a/bpf/api.h b/bpf/api.h new file mode 100644 index 000000000000..f2db1f729157 --- /dev/null +++ b/bpf/api.h @@ -0,0 +1,279 @@ +#ifndef __BPF_API__ +#define __BPF_API__ + +/* Note: + * + * This file can be included into eBPF kernel programs. It contains + * a couple of useful helper functions, map/section ABI (bpf_elf.h), + * misc macros and some eBPF specific LLVM built-ins. + */ + +#include +#include + +#define UNSPEC_CALL 0 +#define OUTPUT_CALL 1 +#define PARSER_CALL 32 +#define MATCH_ACTION_CALL 33 +#define DEPARSER_CALL 34 +#define UPCALL_CALL 35 + +#ifndef TC_ACT_OK +#define TC_ACT_OK 0 +#define TC_ACT_RECLASSIFY 1 +#define TC_ACT_SHOT 2 +#define TC_ACT_PIPE 3 +#define TC_ACT_STOLEN 4 +#define TC_ACT_QUEUED 5 +#define TC_ACT_REPEAT 6 +#define TC_ACT_REDIRECT 7 +#endif + +/** Misc macros. */ + +#ifndef __stringify +# define __stringify(X) #X +#endif + +#ifndef __maybe_unused +# define __maybe_unused __attribute__((__unused__)) +#endif + +#ifndef htons +# define htons(X) __constant_htons((X)) +#endif + +#ifndef ntohs +# define ntohs(X) __constant_ntohs((X)) +#endif + +#ifndef htonl +# define htonl(X) __constant_htonl((X)) +#endif + +#ifndef ntohl +# define ntohl(X) __constant_ntohl((X)) +#endif + +#ifndef __inline__ +# define __inline__ __attribute__((always_inline)) +#endif + +#ifndef __section +# define __section(NAME) \ + __attribute__((section(NAME), used)) +#endif + +#ifndef __section_tail +# define __section_tail(KEY) \ + __section("tail-" __stringify(KEY)) +#endif + +#ifndef __section_license +# define __section_license \ + __section(ELF_SECTION_LICENSE) +#endif + +#ifndef __section_maps +# define __section_maps \ + __section(ELF_SECTION_MAPS) +#endif + +#ifndef BPF_LICENSE +# define BPF_LICENSE(NAME) \ + char ____license[] __section_license = NAME +#endif + +#ifndef __BPF_MAP +# define __BPF_MAP(NAME, TYPE, ID, SIZE_KEY, SIZE_VALUE, PIN, MAX_ELEM) \ + struct bpf_map_def __section_maps NAME = { \ + .type = (TYPE), \ + .key_size = (SIZE_KEY), \ + .value_size = (SIZE_VALUE), \ + .max_entries = (MAX_ELEM), \ + .map_flags = 0, \ + } +#endif + +#ifndef BPF_HASH +# define BPF_HASH(NAME, ID, SIZE_KEY, SIZE_VALUE, PIN, MAX_ELEM) \ + __BPF_MAP(NAME, BPF_MAP_TYPE_HASH, ID, SIZE_KEY, SIZE_VALUE, \ + PIN, MAX_ELEM) +#endif + +#ifndef BPF_PERCPU_HASH +# define BPF_PERCPU_HASH(NAME, ID, SIZE_KEY, SIZE_VALUE, PIN, MAX_ELEM) \ + __BPF_MAP(NAME, BPF_MAP_TYPE_PERCPU_HASH, ID, SIZE_KEY, SIZE_VALUE, \ + PIN, MAX_ELEM) +#endif + +#ifndef BPF_ARRAY +# define BPF_ARRAY(NAME, ID, SIZE_VALUE, PIN, MAX_ELEM) \ + __BPF_MAP(NAME, BPF_MAP_TYPE_ARRAY, ID, sizeof(uint32_t), \ + SIZE_VALUE, PIN, MAX_ELEM) +#endif + +#ifndef BPF_PERCPU_ARRAY +# define BPF_PERCPU_ARRAY(NAME, ID, SIZE_VALUE, PIN, MAX_ELEM) \ + __BPF_MAP(NAME, BPF_MAP_TYPE_PERCPU_ARRAY, ID, sizeof(uint32_t), \ + SIZE_VALUE, PIN, MAX_ELEM) +#endif + +#ifndef BPF_PROG_ARRAY +# define BPF_PROG_ARRAY(NAME, ID, PIN, MAX_ELEM) \ + __BPF_MAP(NAME, BPF_MAP_TYPE_PROG_ARRAY, ID, sizeof(uint32_t), \ + sizeof(uint32_t), PIN, MAX_ELEM) +#endif + +#ifndef BPF_PERF_OUTPUT +# define BPF_PERF_OUTPUT(name, pin) \ + __BPF_MAP(name, BPF_MAP_TYPE_PERF_EVENT_ARRAY, 0, sizeof(uint32_t), \ + sizeof(uint32_t), pin, __NR_CPUS__) +#endif + +/** Classifier helper */ + +#ifndef BPF_H_DEFAULT +# define BPF_H_DEFAULT -1 +#endif + +/** BPF helper functions for tc. Individual flags are in linux/bpf.h */ + +#ifndef BPF_FUNC +# define BPF_FUNC(NAME, ...) \ + (* NAME)(__VA_ARGS__) __maybe_unused = (void *) BPF_FUNC_##NAME +#endif + +#ifndef BPF_FUNC2 +# define BPF_FUNC2(NAME, ...) \ + (* NAME)(__VA_ARGS__) __maybe_unused +#endif + +/* Map access/manipulation */ +static void *BPF_FUNC(map_lookup_elem, void *map, const void *key); +static int BPF_FUNC(map_update_elem, void *map, const void *key, + const void *value, uint32_t flags); +static int BPF_FUNC(map_delete_elem, void *map, const void *key); + +/* Time access */ +static uint64_t BPF_FUNC(ktime_get_ns, void); + +/* Debugging */ + +/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless + * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved. + * It would require ____fmt to be made const, which generates a reloc + * entry (non-map). + */ +static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...); + +#ifndef printt +# ifdef DEBUG_BPF_OFF +# define printt(fmt, ...) +# else +# define printt(fmt, ...) \ + ({ \ + char ____fmt[] = fmt; \ + trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__); \ + }) +# endif +#endif + +/* Random numbers */ +static uint32_t BPF_FUNC(get_prandom_u32, void); + +/* Tail calls */ +static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map, + uint32_t index); + +/* System helpers */ +static uint32_t BPF_FUNC(get_smp_processor_id, void); + +/* Packet misc meta data */ +static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb); + +static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index); + +/* Packet redirection */ +static int BPF_FUNC(redirect, int ifindex, uint32_t flags); +static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex, + uint32_t flags); + +/* Packet manipulation */ +static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off, + void *to, uint32_t len); +static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off, + const void *from, uint32_t len, uint32_t flags); + +static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off, + uint32_t from, uint32_t to, uint32_t flags); +static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off, + uint32_t from, uint32_t to, uint32_t flags); +static int BPF_FUNC(csum_diff, void *from, uint32_t from_size, void *to, + uint32_t to_size, uint32_t seed); + +static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type); +static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto, + uint32_t flags); +static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen, + uint32_t flags); + +/* Packet vlan encap/decap */ +static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto, + uint16_t vlan_tci); +static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb); + +/* Packet tunnel encap/decap */ +static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb, + struct bpf_tunnel_key *to, uint32_t size, uint32_t flags); +static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb, + const struct bpf_tunnel_key *from, uint32_t size, + uint32_t flags); + +static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb, + void *to, uint32_t size); +static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb, + const void *from, uint32_t size); + +/* Events for user space */ +static int BPF_FUNC2(skb_event_output, struct __sk_buff *skb, void *map, uint64_t index, + const void *data, uint32_t size) = (void *)BPF_FUNC_perf_event_output; + +/** LLVM built-ins, mem*() routines work for constant size */ + +#ifndef lock_xadd +# define lock_xadd(ptr, val) ((void) __sync_fetch_and_add(ptr, val)) +#endif + +#ifndef memset +# define memset(s, c, n) __builtin_memset((s), (c), (n)) +#endif + +#ifndef memcpy +# define memcpy(d, s, n) __builtin_memcpy((d), (s), (n)) +#endif + +#ifndef memmove +# define memmove(d, s, n) __builtin_memmove((d), (s), (n)) +#endif + +/* FIXME: __builtin_memcmp() is not yet fully useable unless llvm bug + * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also + * this one would generate a reloc entry (non-map), otherwise. + */ +#if 0 +#ifndef memcmp +# define memcmp(a, b, n) __builtin_memcmp((a), (b), (n)) +#endif +#endif + +unsigned long long load_byte(void *skb, unsigned long long off) + asm ("llvm.bpf.load.byte"); + +unsigned long long load_half(void *skb, unsigned long long off) + asm ("llvm.bpf.load.half"); + +unsigned long long load_word(void *skb, unsigned long long off) + asm ("llvm.bpf.load.word"); + +#endif /* __BPF_API__ */ diff --git a/bpf/automake.mk b/bpf/automake.mk new file mode 100644 index 000000000000..3028c585b6cc --- /dev/null +++ b/bpf/automake.mk @@ -0,0 +1,60 @@ +bpf_sources = bpf/datapath.c +bpf_headers = \ + bpf/api.h \ + bpf/datapath.h \ + bpf/odp-bpf.h \ + bpf/ovs-p4.h \ + bpf/helpers.h \ + bpf/openvswitch.h \ + bpf/maps.h \ + bpf/parser.h \ + bpf/lookup.h \ + bpf/action.h \ + bpf/generated_headers.h \ + bpf/xdp.h +bpf_extra = \ + bpf/ovs-proto.p4 + +# Regardless of configuration with GCC, we must compile the BPF with clang +# since GCC doesn't have a BPF backend. Clang dones't support these flags, +# so we filter them out. + +bpf_FILTER_FLAGS := $(filter-out -Wbool-compare, $(AM_CFLAGS)) +bpf_FILTER_FLAGS2 := $(filter-out -Wduplicated-cond, $(bpf_FILTER_FLAGS)) +bpf_FILTER_FLAGS3 := $(filter-out --coverage, $(bpf_FILTER_FLAGS2)) +bpf_CFLAGS := $(bpf_FILTER_FLAGS3) +bpf_CFLAGS += -D__NR_CPUS__=$(shell nproc) -O2 -Wall -Werror -emit-llvm +bpf_CFLAGS += -I$(top_builddir)/include -I$(top_srcdir)/include +bpf_CFLAGS += -Wno-error=pointer-arith # Allow skb->data arithmetic +bpf_CFLAGS += -I${IPROUTE2_SRC_PATH}/include/uapi/ +# FIXME: +#bpf_CFLAGS += -D__KERNEL__ + +dist_sources = $(bpf_sources) +dist_headers = $(bpf_headers) +build_sources = $(dist_sources) +build_headers = $(dist_headers) +build_objects = $(patsubst %.c,%.o,$(build_sources)) + +LLC ?= llc-3.8 +CLANG ?= clang-3.8 + +bpf: $(build_objects) +bpf/datapath.o: $(bpf_sources) $(bpf_headers) + $(MKDIR_P) $(dir $@) + @which $(CLANG) >/dev/null 2>&1 || \ + (echo "Unable to find clang, Install clang (>=3.7) package"; exit 1) + $(AM_V_CC) $(CLANG) $(bpf_CFLAGS) -c $< -o - | \ + $(LLC) -march=bpf -filetype=obj -o $@ + +bpf/datapath_dbg.o: $(bpf_sources) $(bpf_headers) + @which clang-4.0 > /dev/null 2>&1 || \ + (echo "Unable to find clang-4.0 for debugging"; exit 1) + clang-4.0 $(bpf_CFLAGS) -g -c $< -o -| llc-4.0 -march=bpf -filetype=obj -o $@_dbg + llvm-objdump-4.0 -S -no-show-raw-insn $@_dbg > $@_dbg.objdump + +EXTRA_DIST += $(dist_sources) $(dist_headers) $(bpf_extra) +if HAVE_BPF +dist_bpf_DATA += $(build_objects) +endif + diff --git a/bpf/datapath.c b/bpf/datapath.c new file mode 100644 index 000000000000..644a1c1ac46d --- /dev/null +++ b/bpf/datapath.c @@ -0,0 +1,192 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include +#include +#include + +#include "api.h" +#include "odp-bpf.h" +#include "datapath.h" + +/* + * Instead of having multiple BPF object files, + * include all headers and generate single datapath.o + */ +#include "maps.h" +#include "parser.h" +#include "lookup.h" +#include "action.h" +#include "xdp.h" + +/* We don't rely on specific versions of the kernel; however libbpf requires + * this to be both specified and non-zero. */ +static const __maybe_unused __section("version") uint32_t version = 0x1; + +static inline void __maybe_unused +bpf_debug(struct __sk_buff *skb, enum ovs_dbg_subtype subtype, int error) +{ + uint64_t cpu = get_smp_processor_id(); + uint64_t flags = skb->len; + struct bpf_upcall md = { + .type = OVS_UPCALL_DEBUG, + .subtype = subtype, + .ifindex = skb->ingress_ifindex, + .cpu = cpu, + .skb_len = skb->len, + .error = error + }; + + flags <<= 32; + flags |= BPF_F_CURRENT_CPU; + + skb_event_output(skb, &upcalls, flags, &md, sizeof(md)); +} + +/* + * This program forwards the packet to userspace, using the + * perf_event_output helper function. + * BPF program: tail-35 + */ +__section_tail(UPCALL_CALL) +static inline int process_upcall(struct __sk_buff *skb) +{ + struct bpf_upcall md = { + .type = OVS_UPCALL_MISS, + .skb_len = skb->len, + //.ifindex = ovs_cb_get_ifindex(skb), + }; + int stat, err; + struct ebpf_headers_t *hdrs = bpf_get_headers(); + struct ebpf_metadata_t *mds = bpf_get_mds(); + + if (!hdrs || !mds) { + printt("headers/mds is NULL\n"); + return TC_ACT_OK; + } + + md.ifindex = mds->md.in_port; + + memcpy(&md.key.headers, hdrs, sizeof(struct ebpf_headers_t)); + memcpy(&md.key.mds, mds, sizeof(struct ebpf_metadata_t)); + + if (hdrs->valid & VLAN_VALID) { + printt("upcall skb->len(%d) with vlan %x %x\n", + skb->len, hdrs->vlan.etherType, hdrs->vlan.tci); + + /* Here we push the vlan to the packet data so + * the upcall function 'extract_key' can get vlan info. + * Is this the same as kernel dp? + */ + skb_vlan_push(skb, hdrs->vlan.etherType, + hdrs->vlan.tci & ~VLAN_TAG_PRESENT); + md.skb_len = skb->len; + } + + uint64_t flags = skb->len; + flags <<= 32; + flags |= BPF_F_CURRENT_CPU; + + err = skb_event_output(skb, &upcalls, flags, &md, sizeof(md)); + stat = !err ? OVS_DP_STATS_MISSED + : err == -ENOSPC ? OVS_DP_STATS_LOST + : OVS_DP_STATS_ERRORS; + stats_account(stat); + return TC_ACT_OK; +} + +/* + * This is the ENTRY POINT for packet seen at ingress queue + */ +__section("ingress") +static int to_stack(struct __sk_buff *skb) +{ + printt("\n\ningress from %d (%d)\n", skb->ingress_ifindex, skb->ifindex); + + ovs_cb_init(skb, true); + bpf_tail_call(skb, &tailcalls, PARSER_CALL); + + printt("ERR: tail call fail in ingress\n"); + return TC_ACT_SHOT; +} + +/* + * This is the ENTRY POINT for packet seen at egress queue + */ +__section("egress") +static int from_stack(struct __sk_buff *skb) +{ + printt("\n\negress from %d (%d)\n", skb->ingress_ifindex, skb->ifindex); + + ovs_cb_init(skb, false); + bpf_tail_call(skb, &tailcalls, PARSER_CALL); + + printt("ERR: tail call fail in egress\n"); + return TC_ACT_SHOT; +} + +/* + * This is the ENTRY POINT for downcall packet + */ +__section("downcall") +static int execute(struct __sk_buff *skb) +{ + struct bpf_downcall md; + u32 ebpf_zero = 0; + int flags, ofs; + + ofs = skb->len - sizeof(md); + skb_load_bytes(skb, ofs, &md, sizeof(md)); + flags = md.flags & OVS_BPF_FLAGS_TX_STACK ? BPF_F_INGRESS : 0; + + printt("downcall (%d) from %d flags %d\n", md.type, + md.ifindex, flags); + + bpf_map_update_elem(&percpu_metadata, &ebpf_zero, &md.md, BPF_ANY); + + skb_change_tail(skb, ofs, 0); + + switch (md.type) { + case OVS_BPF_DOWNCALL_EXECUTE: { + struct bpf_action_batch *action_batch; + + action_batch = bpf_map_lookup_elem(&execute_actions, &ebpf_zero); + if (action_batch) { + printt("get valid action_batch\n"); + skb->cb[OVS_CB_DOWNCALL_EXE] = 1; + bpf_tail_call(skb, &tailcalls, action_batch->actions[0].type); + } else { + printt("get null action_batch\n"); + } + break; + } + case OVS_BPF_DOWNCALL_OUTPUT: { + /* Skip writing the BPF metadata in parser */ + skb->cb[OVS_CB_ACT_IDX] = -1; + /* Redirect to the device this packet came from, so it's as though the + * packet was freshly received. This should execute PARSER_CALL. */ + return redirect(md.ifindex, flags); + } + default: + printt("Unknown downcall type %d\n", md.type); + break; + } + return 0; +} + +BPF_LICENSE("GPL"); diff --git a/bpf/datapath.h b/bpf/datapath.h new file mode 100644 index 000000000000..d9f48461cc79 --- /dev/null +++ b/bpf/datapath.h @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2017, 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include "odp-bpf.h" + +#define SKB_CB_U32S 5 /* According to linux/bpf.h. */ + +enum ovs_cb_idx { + OVS_CB_ACT_IDX, /* Next action to process in action batch. */ + OVS_CB_INGRESS, /* 0 = egress; nonzero = ingress. */ + OVS_CB_DOWNCALL_EXE, /* 0 = match/execute, 1 = downcall/execute. */ +}; + +static void +ovs_cb_init(struct __sk_buff *skb, bool ingress) +{ + for (int i = 0; i < SKB_CB_U32S; i++) + skb->cb[i] = 0; + skb->cb[OVS_CB_INGRESS] = ingress; +} + +static bool +ovs_cb_is_initial_parse(struct __sk_buff *skb) { + int index = skb->cb[OVS_CB_ACT_IDX]; + + if (index != 0) { + printt("recirc, don't update metadata, index %d\n", index); + } + return index == 0; +} + +static uint32_t +ovs_cb_get_action_index(struct __sk_buff *skb) +{ + return skb->cb[OVS_CB_ACT_IDX]; +} + +static uint32_t OVS_UNUSED +ovs_cb_get_ifindex(struct __sk_buff *skb) +{ + uint32_t ifindex; + + if (!skb) + return 0; + + /* This workaround the compiler optimization issue */ + if (skb->cb[OVS_CB_INGRESS]) { + __asm__ __volatile__("": : :"memory"); + return skb->ingress_ifindex; + } + + ifindex = skb->ifindex; + __asm__ __volatile__("": : :"memory"); + + return ifindex; +} diff --git a/bpf/generated_headers.h b/bpf/generated_headers.h new file mode 100644 index 000000000000..3571d744fede --- /dev/null +++ b/bpf/generated_headers.h @@ -0,0 +1,182 @@ +#ifndef P4_GENERATED_HEADERS +#define P4_GENERATED_HEADERS + +/* We sometimes disable IPV6 to work + * around 512-Byte BPF stack limit + */ +#define BPF_ENABLE_IPV6 + +#ifndef BPF_TYPES +#define BPF_TYPES +typedef signed char s8; +typedef unsigned char u8; +typedef signed short s16; +typedef unsigned short u16; +typedef signed int s32; +typedef unsigned int u32; +typedef signed long long s64; +typedef unsigned long long u64; +#endif + +/*TODO: OVS only need addr and label */ +struct ipv6_t { + u8 version; /* 4 bits */ + u8 trafficClass; /* 8 bits */ + u32 flowLabel; /* 20 bits */ + u16 payloadLen; /* 16 bits */ + u8 nextHdr; /* 8 bits */ + u8 hopLimit; /* 8 bits */ + char srcAddr[16]; /* 128 bits */ + char dstAddr[16]; /* 128 bits */ +}; +struct pkt_metadata_t { + u32 recirc_id; /* 32 bits */ + u32 dp_hash; /* 32 bits */ + u32 skb_priority; /* 32 bits */ + u32 pkt_mark; /* 32 bits */ + u16 ct_state; /* 16 bits */ + u16 ct_zone; /* 16 bits */ + u32 ct_mark; /* 32 bits */ + char ct_label[16]; /* 128 bits */ + u32 in_port; /* 32 bits ifindex */ +}; +struct udp_t { + u16 srcPort; /* 16 bits */ + u16 dstPort; /* 16 bits */ +}; +struct arp_rarp_t { + ovs_be16 ar_hrd; /* format of hardware address */ + ovs_be16 ar_pro; /* format of protocol address */ + unsigned char ar_hln; /* length of hardware address */ + unsigned char ar_pln; /* length of protocol address */ + ovs_be16 ar_op; /* ARP opcode (command) */ + + /* Ethernet+IPv4 specific members. */ + unsigned char ar_sha[6]; /* sender hardware address */ + unsigned char ar_sip[4]; /* sender IP address: be32 */ + unsigned char ar_tha[6]; /* target hardware address */ + unsigned char ar_tip[4]; /* target IP address: be32 */ +} __attribute__((packed)); +struct icmp_t { + u8 type; + u8 code; +}; +struct icmpv6_t { + u8 type; + u8 code; + u16 csum; + union { + uint32_t data32[1]; /* type-specific field */ + uint16_t data16[2]; /* type-specific field */ + uint8_t data8[4]; /* type-specific field */ + } dataun; +}; +struct ipv4_t { + u8 ttl; /* 8 bits */ + u8 protocol; /* 8 bits */ + u8 tos; /* 8 bits */ + ovs_be32 srcAddr; /* 32 bits */ + ovs_be32 dstAddr; /* 32 bits */ +}; +struct gnv_opt { + ovs_be16 opt_class; + uint8_t type; + uint8_t length:5; + uint8_t r3:1; + uint8_t r2:1; + uint8_t r1:1; + uint8_t opt_data[4]; /* hard-coded to 4 byte */ +}; +struct flow_tnl_t { + union { + struct { + u32 ip_dst; /* 32 bits */ // BPF uses host byte-order + u32 ip_src; /* 32 bits */ + } ip4; +#ifdef BPF_ENABLE_IPV6 + struct { + char ipv6_dst[16]; /* 128 bits */ + char ipv6_src[16]; /* 128 bits */ + } ip6; +#endif + }; + u32 tun_id; /* 32 bits */ + u16 flags; /* 16 bits */ + u8 ip_tos; /* 8 bits */ + u8 ip_ttl; /* 8 bits */ + ovs_be16 tp_src; /* 16 bits */ + ovs_be16 tp_dst; /* 16 bits */ + u16 gbp_id; /* 16 bits */ + u8 gbp_flags; /* 8 bits */ + u8 use_ipv6: 4, + gnvopt_valid: 4; + struct gnv_opt gnvopt; + char pad1[0]; /* 40 bits */ +}; + +/* ovs key only needs ports and flags */ +struct tcp_t { + ovs_be16 srcPort; /* 16 bits */ + ovs_be16 dstPort; /* 16 bits */ + ovs_be16 flags; /* 8 bits */ +}; + +struct ethernet_t { + char dstAddr[6]; /* 48 bits */ + char srcAddr[6]; /* 48 bits */ + ovs_be16 etherType; /* 16 bits */ +}; + +struct vlan_tag_t { + union { + u16 pcp:3, + cfi:1, + vid:12; + u16 tci; /* host byte order */ + }; + ovs_be16 etherType; /* network byte order */ +}; + +struct mpls_t { + ovs_be32 top_lse; /* top label stack entry */ +}; + +enum proto_valid { + ETHER_VALID = 1 << 0, + MPLS_VALID = 1 << 1, + IPV4_VALID = 1 << 2, + IPV6_VALID = 1 << 3, + ARP_VALID = 1 << 4, + TCP_VALID = 1 << 5, + UDP_VALID = 1 << 6, + ICMP_VALID = 1 << 7, + VLAN_VALID = 1 << 8, + CVLAN_VALID = 1 << 9, + ICMPV6_VALID = 1 << 10, +}; + +struct ebpf_headers_t { + u32 valid; + struct ethernet_t ethernet; + struct mpls_t mpls; + union { + struct ipv4_t ipv4; +#ifdef BPF_ENABLE_IPV6 + struct ipv6_t ipv6; +#endif + struct arp_rarp_t arp; + }; + union { + struct tcp_t tcp; + struct udp_t udp; + struct icmp_t icmp; + struct icmpv6_t icmpv6; + }; + struct vlan_tag_t vlan; + struct vlan_tag_t cvlan; +}; +struct ebpf_metadata_t { + struct pkt_metadata_t md; + struct flow_tnl_t tnl_md; +}; +#endif diff --git a/bpf/helpers.h b/bpf/helpers.h new file mode 100644 index 000000000000..910127f33749 --- /dev/null +++ b/bpf/helpers.h @@ -0,0 +1,248 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#ifndef __OVSBPF_HELPERS_H +#define __OVSBPF_HELPERS_H +#include +#include +#include + +/* Additional headers */ +# define printk(fmt, ...) \ +({ \ + char ____fmt[] = fmt; \ + bpf_trace_printk(____fmt, sizeof(____fmt), \ + ##__VA_ARGS__); \ +}) + +#define ERR_EXIT() \ + ({printk("[ERROR] \n"); return TC_ACT_OK;}) + +#define NOT_HERE() \ + ({printk("[ERROR] Program should not reach here\n");}) + +#ifndef BPF_TYPES +#define BPF_TYPES +typedef signed char s8; +typedef unsigned char u8; +typedef signed short s16; +typedef unsigned short u16; +typedef signed int s32; +typedef unsigned int u32; +typedef signed long long s64; +typedef unsigned long long u64; +#endif + +#define ___constant_swab16(x) ((__u16)( \ + (((__u16)(x) & (__u16)0x00ffU) << 8) | \ + (((__u16)(x) & (__u16)0xff00U) >> 8))) + +#define ___constant_swab32(x) ((__u32)( \ + (((__u32)(x) & (__u32)0x000000ffUL) << 24) | \ + (((__u32)(x) & (__u32)0x0000ff00UL) << 8) | \ + (((__u32)(x) & (__u32)0x00ff0000UL) >> 8) | \ + (((__u32)(x) & (__u32)0xff000000UL) >> 24))) + +#define ___constant_swab64(x) ((__u64)( \ + (((__u64)(x) & (__u64)0x00000000000000ffULL) << 56) | \ + (((__u64)(x) & (__u64)0x000000000000ff00ULL) << 40) | \ + (((__u64)(x) & (__u64)0x0000000000ff0000ULL) << 24) | \ + (((__u64)(x) & (__u64)0x00000000ff000000ULL) << 8) | \ + (((__u64)(x) & (__u64)0x000000ff00000000ULL) >> 8) | \ + (((__u64)(x) & (__u64)0x0000ff0000000000ULL) >> 24) | \ + (((__u64)(x) & (__u64)0x00ff000000000000ULL) >> 40) | \ + (((__u64)(x) & (__u64)0xff00000000000000ULL) >> 56))) + +#define __constant_htonl(x) (___constant_swab32((x))) +#define __constant_ntohl(x) (___constant_swab32(x)) +#define __constant_htons(x) (___constant_swab16((x))) +#define __constant_ntohs(x) ___constant_swab16((x)) + +static u16 OVS_UNUSED bpf_ntohs(ovs_be16 x) { + return __constant_ntohs((OVS_FORCE u16)x); +} + +static ovs_be16 bpf_htons(u16 x) { + return (OVS_FORCE ovs_be16)__constant_htons(x); +} + +static u32 OVS_UNUSED bpf_ntohl(ovs_be32 x) { + return __constant_ntohl((OVS_FORCE u32)x); +} + +static ovs_be32 bpf_htonl(u32 x) { + return (OVS_FORCE ovs_be32)__constant_htonl(x); +} + +static u64 OVS_UNUSED bpf_ntohll(ovs_be64 x) { + return ___constant_swab64((OVS_FORCE u64)x); +} + +static ovs_be64 bpf_htonll(u64 x) { + return (OVS_FORCE ovs_be64)___constant_swab64(x); +} + +/* helper macro to place programs, maps, license in + * different sections in elf_bpf file. Section names + * are interpreted by elf_bpf loader + */ +#define SEC(NAME) __attribute__((section(NAME), used)) + +/* helper functions called from eBPF programs written in C */ +static void *(*bpf_map_lookup_elem)(void *map, void *key) = + (void *) BPF_FUNC_map_lookup_elem; +static int (*bpf_map_update_elem)(void *map, void *key, void *value, + unsigned long long flags) = + (void *) BPF_FUNC_map_update_elem; +static int (*bpf_map_delete_elem)(void *map, void *key) = + (void *) BPF_FUNC_map_delete_elem; +static int (*bpf_probe_read)(void *dst, int size, void *unsafe_ptr) = + (void *) BPF_FUNC_probe_read; +static unsigned long long (*bpf_ktime_get_ns)(void) = + (void *) BPF_FUNC_ktime_get_ns; +static int (*bpf_trace_printk)(const char *fmt, int fmt_size, ...) = + (void *) BPF_FUNC_trace_printk; +static void (*bpf_tail_call)(void *ctx, void *map, int index) = + (void *) BPF_FUNC_tail_call; +static unsigned long long (*bpf_get_smp_processor_id)(void) = + (void *) BPF_FUNC_get_smp_processor_id; +static unsigned long long (*bpf_get_current_pid_tgid)(void) = + (void *) BPF_FUNC_get_current_pid_tgid; +static unsigned long long (*bpf_get_current_uid_gid)(void) = + (void *) BPF_FUNC_get_current_uid_gid; +static int (*bpf_get_current_comm)(void *buf, int buf_size) = + (void *) BPF_FUNC_get_current_comm; +static int (*bpf_perf_event_read)(void *map, int index) = + (void *) BPF_FUNC_perf_event_read; +static int (*bpf_clone_redirect)(void *ctx, int ifindex, int flags) = + (void *) BPF_FUNC_clone_redirect; +static int (*bpf_redirect)(int ifindex, int flags) = + (void *) BPF_FUNC_redirect; +static int (*bpf_perf_event_output)(void *ctx, void *map, + unsigned long long flags, void *data, + int size) = + (void *) BPF_FUNC_perf_event_output; +static int (*bpf_get_stackid)(void *ctx, void *map, int flags) = + (void *) BPF_FUNC_get_stackid; +static int (*bpf_probe_write_user)(void *dst, void *src, int size) = + (void *) BPF_FUNC_probe_write_user; +static int (*bpf_current_task_under_cgroup)(void *map, int index) = + (void *) BPF_FUNC_current_task_under_cgroup; +static int (*bpf_skb_get_tunnel_key)(void *ctx, void *key, int size, int flags) = + (void *) BPF_FUNC_skb_get_tunnel_key; +static int (*bpf_skb_set_tunnel_key)(void *ctx, void *key, int size, int flags) = + (void *) BPF_FUNC_skb_set_tunnel_key; +static int (*bpf_skb_get_tunnel_opt)(void *ctx, void *md, int size) = + (void *) BPF_FUNC_skb_get_tunnel_opt; +static int (*bpf_skb_set_tunnel_opt)(void *ctx, void *md, int size) = + (void *) BPF_FUNC_skb_set_tunnel_opt; +static unsigned long long (*bpf_get_prandom_u32)(void) = + (void *) BPF_FUNC_get_prandom_u32; +static int (*bpf_xdp_adjust_head)(void *ctx, int offset) = + (void *) BPF_FUNC_xdp_adjust_head; +static int (*bpf_skb_vlan_push)(void *ctx, int vlan_proto, int vlan_tci) = + (void *) BPF_FUNC_skb_vlan_push; +static int (*bpf_skb_vlan_pop)(void *ctx) = + (void *) BPF_FUNC_skb_vlan_pop; +static int (*bpf_skb_change_tail)(void *ctx, int len, int flags) = + (void *) BPF_FUNC_skb_change_tail; +static int (*bpf_get_hash_recalc)(void *ctx) = + (void *) BPF_FUNC_get_hash_recalc; + +static int OVS_UNUSED vlan_push(void *ctx, ovs_be16 proto, u16 tci) +{ + return bpf_skb_vlan_push(ctx, (OVS_FORCE int)proto, tci); +} + +/* llvm builtin functions that eBPF C program may use to + * emit BPF_LD_ABS and BPF_LD_IND instructions + */ +struct sk_buff; +unsigned long long load_byte(void *skb, + unsigned long long off) asm("llvm.bpf.load.byte"); +unsigned long long load_half(void *skb, + unsigned long long off) asm("llvm.bpf.load.half"); +unsigned long long load_word(void *skb, + unsigned long long off) asm("llvm.bpf.load.word"); + +/* a helper structure used by eBPF C program + * to describe map attributes to elf_bpf loader + */ +struct bpf_map_def { + unsigned int type; + unsigned int key_size; + unsigned int value_size; + unsigned int max_entries; + unsigned int map_flags; + unsigned int id; + unsigned int pinning; +}; + +/* used in TC */ +/* +struct bpf_elf_map { + __u32 type; + __u32 key_size; + __u32 value_size; + __u32 max_entries; + __u32 map_flags; + __u32 id; + __u32 pinning; +}; +*/ +static int (*bpf_skb_load_bytes)(void *ctx, int off, void *to, int len) = + (void *) BPF_FUNC_skb_load_bytes; +static int (*bpf_skb_store_bytes)(void *ctx, int off, void *from, int len, int flags) = + (void *) BPF_FUNC_skb_store_bytes; +static int (*bpf_l3_csum_replace)(void *ctx, int off, int from, int to, int flags) = + (void *) BPF_FUNC_l3_csum_replace; +static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flags) = + (void *) BPF_FUNC_l4_csum_replace; +static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) = + (void *) BPF_FUNC_skb_under_cgroup; +static int (*bpf_skb_change_head)(void *, int len, int flags) = + (void *) BPF_FUNC_skb_change_head; + +static int l3_csum_replace4(void *ctx, int off, ovs_be32 from, ovs_be32 to) +{ + return bpf_l3_csum_replace(ctx, off, (OVS_FORCE int)from, (OVS_FORCE int)to, 4); +} + +static int OVS_UNUSED l3_csum_replace2(void *ctx, int off, ovs_be16 from, ovs_be16 to) +{ + return bpf_l3_csum_replace(ctx, off, (OVS_FORCE int)from, (OVS_FORCE int)to, 2); +} + +#if defined(__x86_64__) +#define PT_REGS_PARM1(x) ((x)->di) +#define PT_REGS_PARM2(x) ((x)->si) +#define PT_REGS_PARM3(x) ((x)->dx) +#define PT_REGS_PARM4(x) ((x)->cx) +#define PT_REGS_PARM5(x) ((x)->r8) +#define PT_REGS_RET(x) ((x)->sp) +#define PT_REGS_FP(x) ((x)->bp) +#define PT_REGS_RC(x) ((x)->ax) +#define PT_REGS_SP(x) ((x)->sp) +#define PT_REGS_IP(x) ((x)->ip) +#endif +#define BPF_KPROBE_READ_RET_IP(ip, ctx) ({ \ + bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) +#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) ({ \ + bpf_probe_read(&(ip), sizeof(ip), \ + (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) +#endif diff --git a/bpf/lookup.h b/bpf/lookup.h new file mode 100644 index 000000000000..746730a8240d --- /dev/null +++ b/bpf/lookup.h @@ -0,0 +1,228 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ +#include +#include "ovs-p4.h" +#include "api.h" +#include "helpers.h" +#include "maps.h" + +/* eBPF executes actions by tailcall because eBPF doesn't support for-loop and + * unroll produces oversized code. + * + * Each action handler uses current packet's key to look for the next action. + * However, the key can be changed by some actions like hash, so a stable + * key is kept in an eBPF map named percpu_executing_key. In action handler, + * firstly, the stable key is got from percpu_executing_key, then it is used + * to look up the actions being executed. skb->cb[OVS_CB_ACT_IDX] points to + * next action. + */ +static inline void ovs_execute_actions(struct __sk_buff *skb, + struct bpf_action *action) +{ + enum ovs_action_attr type; + type = action->type; + + printt("action type %d\n", type); + + /* note: this isn't a for loop, tail call won't return. */ + switch (type) { + case OVS_ACTION_ATTR_UNSPEC: + printt("end of action processing\n"); + break; + case OVS_ACTION_ATTR_OUTPUT: + printt("output action port = %d\n", action->u.out.port); + break; + case OVS_ACTION_ATTR_USERSPACE: + printt("userspace action, len = %d, ifindex = %d upcall back\n", + action->u.userspace.nlattr_len, ovs_cb_get_ifindex(skb)); + break; + case OVS_ACTION_ATTR_SET: + printt("set action, is_set_tunnel = %d\n", + action->is_set_tunnel); + break; + case OVS_ACTION_ATTR_PUSH_VLAN: + printt("vlan push tci %d\n", action->u.push_vlan.vlan_tci); + break; + case OVS_ACTION_ATTR_POP_VLAN: + printt("vlan pop\n"); + break; + case OVS_ACTION_ATTR_RECIRC: + printt("recirc\n"); + break; + case OVS_ACTION_ATTR_HASH: + printt("hash\n"); + break; + case OVS_ACTION_ATTR_SET_MASKED: + printt("set masked\n"); + break; + case OVS_ACTION_ATTR_CT: + printt("ct\n"); + break; + case OVS_ACTION_ATTR_TRUNC: + printt("truncate\n"); + break; + case OVS_ACTION_ATTR_SAMPLE: /* Nested case OVS_SAMPLE_ATTR_*. */ + case OVS_ACTION_ATTR_PUSH_MPLS: /* struct ovs_action_push_mpls. */ + case OVS_ACTION_ATTR_POP_MPLS: /* __be16 ethertype. */ + case OVS_ACTION_ATTR_PUSH_ETH: /* struct ovs_action_push_eth. */ + case OVS_ACTION_ATTR_POP_ETH: /* No argument. */ + case OVS_ACTION_ATTR_CT_CLEAR: /* No argument. */ + case OVS_ACTION_ATTR_PUSH_NSH: /* Nested case OVS_NSH_KEY_ATTR_*. */ + case OVS_ACTION_ATTR_POP_NSH: /* No argument. */ +#ifndef __KERNEL__ + case OVS_ACTION_ATTR_TUNNEL_PUSH: /* struct ovs_action_push_tnl*/ + case OVS_ACTION_ATTR_TUNNEL_POP: /* u32 port number. */ + case OVS_ACTION_ATTR_CLONE: /* Nested case OVS_CLONE_ATTR_*. */ + case OVS_ACTION_ATTR_METER: /* u32 meter number. */ +#endif + case __OVS_ACTION_ATTR_MAX: +#ifdef __KERNEL__ + case OVS_ACTION_ATTR_SET_TO_MASKED: /* Kernel module internal masked + * set action converted from + * case OVS_ACTION_ATTR_SET. */ +#endif + default: + printt("ERR: action type %d not supportedn", type); + break; + } + + bpf_tail_call(skb, &tailcalls, type); + + /* OVS_NOT_REACHED */ + return; +} + +static inline void +stats_account(enum ovs_bpf_dp_stats index) +{ + uint32_t stat = 1; + uint64_t *value; + + value = map_lookup_elem(&datapath_stats, &index); + if (value) { + __sync_fetch_and_add(value, stat); + } +} + +/* OVS revalidator thread reads each entry in eBPF map + * (flow_table and dp_flow_stats), reports to OpenFlow + * table statistics, and decide to remove/keep the entry + * by comparing its timestamp. + */ +static inline void +flow_stats_account(struct ebpf_headers_t *headers, + struct ebpf_metadata_t *mds, + size_t bytes) +{ + struct bpf_flow_key flow_key; + struct bpf_flow_stats *flow_stats; + + flow_key.headers = *headers; + flow_key.mds = *mds; + + flow_stats = bpf_map_lookup_elem(&dp_flow_stats, &flow_key); + if (!flow_stats) { + struct bpf_flow_stats s = {0, 0, 0}; + int err; + + printt("flow not found in flow stats, first install\n"); + s.packet_count = 1; + s.byte_count = bytes; + s.used = bpf_ktime_get_ns() / (1000*1000); /* msec */ + err = bpf_map_update_elem(&dp_flow_stats, &flow_key, &s, BPF_ANY); + if (err) { + return; + } + } else { + flow_stats->packet_count += 1; + flow_stats->byte_count += bytes; + flow_stats->used = bpf_ktime_get_ns() / (1000*1000); /* msec */ + printt("current: packets %d count %d ts %d\n", + flow_stats->packet_count, flow_stats->byte_count, flow_stats->used); + } + + return; +} + +static inline struct bpf_action_batch * +ovs_lookup_flow(struct ebpf_headers_t *headers, + struct ebpf_metadata_t *mds) +{ + struct bpf_flow_key flow_key; + + flow_key.headers = *headers; + flow_key.mds = *mds; + + return bpf_map_lookup_elem(&flow_table, &flow_key); +} + +__section_tail(MATCH_ACTION_CALL) +static int lookup(struct __sk_buff* skb OVS_UNUSED) +{ + struct bpf_action_batch *action_batch; + struct ebpf_headers_t *headers; + struct ebpf_metadata_t *mds; + + headers = bpf_get_headers(); + if (!headers) { + printt("no packet header found\n"); + ERR_EXIT(); + } + + mds = bpf_get_mds(); + if (!mds) { + printt("no packet metadata found\n"); + ERR_EXIT(); + } + + /* LOOKUP */ + action_batch = ovs_lookup_flow(headers, mds); + if (!action_batch) { + printt("no action found, upcall to userspace\n"); + bpf_tail_call(skb, &tailcalls, UPCALL_CALL); + + /* OVS_NOT_REACHED */ + return TC_ACT_OK; + } else { + printt("action found! stay in BPF\n"); + /* DP Stats Update */ + stats_account(OVS_DP_STATS_HIT); + /* Flow Stats Update */ + flow_stats_account(headers, mds, skb->len); + } + + /* Hit verifier limit when moving declaration up. */ + struct bpf_flow_key flow_key; + flow_key.headers = *headers; + flow_key.mds = *mds; + int index = 0; + int error = bpf_map_update_elem(&percpu_executing_key, &index, + &flow_key, BPF_ANY); + if (error) { + printt("update percpu_executing_key failed: %d\n", error); + return TC_ACT_OK; + } + + /* the subsequent actions will be tail called. */ + ovs_execute_actions(skb, &action_batch->actions[0]); + + printt("ERROR: tail call fails\n"); + + /* OVS_NOT_REACHED */ + return TC_ACT_OK; +} diff --git a/bpf/maps.h b/bpf/maps.h new file mode 100644 index 000000000000..aa1c15864975 --- /dev/null +++ b/bpf/maps.h @@ -0,0 +1,170 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * This file is offered under your choice of two licenses: Apache 2.0 or GNU + * GPL 2.0 or later. The permission statements for each of these licenses is + * given below. You may license your modifications to this file under either + * of these licenses or both. If you wish to license your modifications under + * only one of these licenses, delete the permission text for the other + * license. + * + * ---------------------------------------------------------------------- + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * ---------------------------------------------------------------------- + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + * ---------------------------------------------------------------------- + */ + +#ifndef BPFMAP_OPENVSWITCH_H +#define BPFMAP_OPENVSWITCH_H 1 + +#include "api.h" +#include "openvswitch.h" +#include "ovs-p4.h" + +/* ovs-vswitchd as a writer will update these maps. + * bpf datapath as reader lookup and processes */ + +/* FIXME: copy from iproute2 */ +enum { + BPF_MAP_ID_PROTO, + BPF_MAP_ID_QUEUE, + BPF_MAP_ID_DROPS, + BPF_MAP_ID_ACTION, + BPF_MAP_ID_INGRESS, + __BPF_MAP_ID_MAX, +#define BPF_MAP_ID_MAX __BPF_MAP_ID_MAX +}; + +/* A bpf flow key is extracted from the + * parser.h and saved in + * 1) percpu_headers, and + * 2) percpu_metadata + * Access: BPF is the only writer/reader + */ +BPF_PERCPU_ARRAY(percpu_headers, + 0, + sizeof(struct ebpf_headers_t), + 0, + 1 +); +BPF_PERCPU_ARRAY(percpu_metadata, + 0, + sizeof(struct ebpf_metadata_t), + 0, + 1 +); + +/* BPF flow tale + * Access: BPF is the reader for lookup, + * ovs-vswitchd is the writer + */ +BPF_HASH(flow_table, + 0, + sizeof(struct bpf_flow_key), + sizeof(struct bpf_action_batch), + 0, + 256 +); + +/* BPF flow stats table + * Access: BPF is the writer for updating, + * ovs-vswitchd/revalidator is the reader + */ +BPF_HASH(dp_flow_stats, + 0, + sizeof(struct bpf_flow_key), + sizeof(struct bpf_flow_stats), + 0, + 256 +); + +/* + * Map for implementing the upcall, which forwards the + * first packet (lookup misses) to ovs-vswitchd + */ +BPF_PERF_OUTPUT(upcalls, 0); + + +/* BPF datapath stats + * Access: BPF is the writer, + * ovs-vswitchd is the reader + * XXX: switch to percpu to improve performance + */ +BPF_ARRAY(datapath_stats, + 0, + sizeof(uint64_t), + 0, + __OVS_DP_STATS_MAX +); + +/* Global tail call map: + * index 0-31 for actions (OVS_ACTION_ATTR_*) + * index 32-63 for others + */ +BPF_PROG_ARRAY(tailcalls, + 0, + 0, + 64 +); + +/* A dedicated action list for downcall packet. + * Access: ovs-vswitch is the writer, + * BPF is the reader + */ +BPF_ARRAY(execute_actions, + 0, + sizeof(struct bpf_action_batch), + 0, + 1 +); + +/* A dedicated key for downcall packet. + * Access: ovs-vswitch is the writer, + * BPF is the reader + */ +BPF_PERCPU_ARRAY(percpu_executing_key, + 0, + sizeof(struct bpf_flow_key), + 0, + 1 +); + +struct ebpf_headers_t; +struct ebpf_metadata_t; + +static inline struct ebpf_headers_t *bpf_get_headers() +{ + int ebpf_zero = 0; + return bpf_map_lookup_elem(&percpu_headers, &ebpf_zero); +} + +static inline struct ebpf_metadata_t *bpf_get_mds() +{ + int ebpf_zero = 0; + return bpf_map_lookup_elem(&percpu_metadata, &ebpf_zero); +} + +#endif /* BPFMAP_OPENVSWITCH_H */ diff --git a/bpf/odp-bpf.h b/bpf/odp-bpf.h new file mode 100644 index 000000000000..6bef021f24ee --- /dev/null +++ b/bpf/odp-bpf.h @@ -0,0 +1,255 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * This file is offered under your choice of two licenses: Apache 2.0 or GNU + * GPL 2.0 or later. The permission statements for each of these licenses is + * given below. You may license your modifications to this file under either + * of these licenses or both. If you wish to license your modifications under + * only one of these licenses, delete the permission text for the other + * license. + * + * ---------------------------------------------------------------------- + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * ---------------------------------------------------------------------- + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + * ---------------------------------------------------------------------- + */ + +#ifndef BPF_OPENVSWITCH_H +#define BPF_OPENVSWITCH_H 1 + +#include "odp-netlink.h" +#include "generated_headers.h" + +enum ovs_upcall_cmd { + OVS_UPCALL_UNSPEC = OVS_PACKET_CMD_UNSPEC, + + /* Kernel-to-user notifications. */ + OVS_UPCALL_MISS = OVS_PACKET_CMD_MISS, + OVS_UPCALL_ACTION = OVS_PACKET_CMD_ACTION, + + /* Userspace commands. */ + OVS_UPCALL_EXECUTE = OVS_PACKET_CMD_EXECUTE, + + OVS_UPCALL_DEBUG, +}; + +enum ovs_dbg_subtype { + OVS_DBG_ST_UNSPEC, + OVS_DBG_ST_REDIRECT, + __OVS_DBG_ST_MAX, +}; +#define OVS_DBG_ST_MAX (__OVS_DBG_ST_MAX - 1) + +static const char *bpf_upcall_subtypes[] OVS_UNUSED = { + [OVS_DBG_ST_UNSPEC] = "Unspecified", + [OVS_DBG_ST_REDIRECT] = "Downcall redirect", +}; + +/* Used with 'datapath_stats' map. */ +enum ovs_bpf_dp_stats { + OVS_DP_STATS_UNSPEC, + OVS_DP_STATS_HIT, + OVS_DP_STATS_MISSED, + OVS_DP_STATS_LOST, + OVS_DP_STATS_FLOWS, + OVS_DP_STATS_MASK_HIT, + OVS_DP_STATS_MASKS, + OVS_DP_STATS_ERRORS, + __OVS_DP_STATS_MAX, +}; +#define OVS_DP_STATS_MAX (__OVS_DP_STATS_MAX - 1) + +struct bpf_flow { + uint64_t value; /* XXX */ +}; + +struct bpf_flow_stats { + uint64_t packet_count; /* Number of packets matched. */ + uint64_t byte_count; /* Number of bytes matched. */ + uint64_t used; /* Last used time (in jiffies). */ + //spinlock_t lock; /* Lock for atomic stats update. */ + //__be16 tcp_flags; /* Union of seen TCP flags. */ +}; + +struct bpf_flow_key { + struct ebpf_headers_t headers; + struct ebpf_metadata_t mds; +}; + +struct bpf_upcall { + uint8_t type; + uint8_t subtype; + uint32_t ifindex; /* Incoming device */ + uint32_t cpu; + uint32_t error; + uint32_t skb_len; +#ifdef BPF_ENABLE_IPV6 + uint8_t uactions[24]; /* Contains 'struct nlattr' */ +#else + uint8_t uactions[64]; +#endif + uint32_t uactions_len; + struct bpf_flow_key key; + /* Followed by 'skb_len' of packet data. */ +}; + +#define OVS_BPF_FLAGS_TX_STACK (1 << 0) + +#define OVS_BPF_DOWNCALL_UNSPEC 0 +#define OVS_BPF_DOWNCALL_OUTPUT 1 +#define OVS_BPF_DOWNCALL_EXECUTE 2 + +struct bpf_downcall { + uint32_t type; + uint32_t ifindex; + uint32_t debug; + uint32_t flags; + struct ebpf_metadata_t md; + /* Followed by packet data. */ +}; + +#define ETH_ALEN 6 + +#define OVS_ACTION_ATTR_UNSPEC 0 +#define OVS_ACTION_ATTR_OUTPUT 1 +#define OVS_ACTION_ATTR_USERSPACE 2 +#define OVS_ACTION_ATTR_SET 3 +#define OVS_ACTION_ATTR_PUSH_VLAN 4 +#define OVS_ACTION_ATTR_POP_VLAN 5 +#define OVS_ACTION_ATTR_SAMPLE 6 +#define OVS_ACTION_ATTR_RECIRC 7 +#define OVS_ACTION_ATTR_HASH 8 +#define OVS_ACTION_ATTR_PUSH_MPLS 9 +#define OVS_ACTION_ATTR_POP_MPLS 10 +#define OVS_ACTION_ATTR_SET_MASKED 11 +#define OVS_ACTION_ATTR_CT 12 +#define OVS_ACTION_ATTR_TRUNC 13 +#define OVS_ACTION_ATTR_PUSH_ETH 14 +#define OVS_ACTION_ATTR_POP_ETH 15 + +#define VLAN_CFI_MASK 0x1000 /* Canonical Format Indicator */ +#define VLAN_VID_MASK 0x0fff /* VLAN Identifier */ +#define VLAN_TAG_PRESENT VLAN_CFI_MASK + +struct flow_key { + __be32 src; + __be32 dst; + union { + __be32 ports; + __be16 port16[2]; + }; + __u32 ip_proto; +}; + +struct ovs_action_set_tunnel { + /* light weight tunnel key */ + __u32 tunnel_id; /* tunnel id is host byte order */ + union { + __u32 remote_ipv4; /* host byte order */ + __u32 remote_ipv6[4]; + }; + __u8 tunnel_tos; + __u8 tunnel_ttl; + __u16 tunnel_ext; + __u32 tunnel_label; + struct gnv_opt gnvopt; + __u8 gnvopt_valid; + __u8 use_ipv6; +}; + +struct ovs_action_set_masked { + enum ovs_key_attr key_type; + union { + struct ovs_key_ethernet ether; + struct ovs_key_mpls mpls; + struct ovs_key_ipv4 ipv4; + struct ovs_key_ipv6 ipv6; + struct ovs_key_tcp tcp; + struct ovs_key_udp udp; + struct ovs_key_sctp sctp; + struct ovs_key_icmp icmp; + struct ovs_key_icmpv6 icmpv6; + struct ovs_key_arp arp; + } key; +#if 0 + /* BPF datapath does not support mask */ + union { + struct ovs_key_ethernet ether; + struct ovs_key_mpls mpls; + struct ovs_key_ipv4 ipv4; + struct ovs_key_ipv6 ipv6; + struct ovs_key_tcp tcp; + struct ovs_key_udp udp; + struct ovs_key_sctp sctp; + struct ovs_key_icmp icmp; + struct ovs_key_icmpv6 icmpv6; + struct ovs_key_arp arp; + } mask; +#endif +}; + +struct ovs_action_output { + uint32_t port; + uint32_t flags; +}; + +struct ovs_action_ct { + int commit; + /* XXX: Include everything in enum ovs_ct_attr. */ +}; + +struct ovs_action_userspace { + __u16 nlattr_len; + __u8 nlattr_data[64]; +}; + +struct bpf_action { + enum ovs_action_attr type; /* action type */ + uint32_t is_set_tunnel; /* to distinguish between SET (tunnel) and SET_MASKED (fields) */ + union { + struct ovs_action_output out; /* OVS_ACTION_ATTR_OUTPUT: 8B */ + struct ovs_action_trunc trunc; /* OVS_ACTION_ATTR_TRUNC: 4B */ + struct ovs_action_hash hash; /* OVS_ACTION_ATTR_HASH: 8B */ + struct ovs_action_push_mpls mpls; /* OVS_ACTION_ATTR_PUSH_MPLS: 6B */ + ovs_be16 ethertype; /* OVS_ACTION_ATTR_POP_MPLS: 2B */ + struct ovs_action_push_vlan push_vlan; /* OVS_ACTION_ATTR_PUSH_VLAN: 4B */ + /* OVS_ACTION_ATTR_POP_VLAN: 0B */ + uint32_t recirc_id; /* OVS_ACTION_ATTR_RECIRC: 4B */ + struct ovs_action_set_tunnel tunnel; + struct ovs_action_set_masked mset; /* OVS_ACTION_ATTR_SET_MASK: */ + struct ovs_action_ct ct; /* OVS_ACTION_ATTR_CT: */ + struct ovs_action_userspace userspace; /* OVS_ACTION_ATTR_USERSPACE: */ + + uint64_t aligned[16]; // make it 128 byte + } u; +}; + +#define BPF_DP_MAX_ACTION 32 +struct bpf_action_batch { + struct bpf_action actions[BPF_DP_MAX_ACTION]; +}; + +#endif /* BPF_OPENVSWITCH_H */ diff --git a/bpf/openvswitch.h b/bpf/openvswitch.h new file mode 100644 index 000000000000..602e223bd280 --- /dev/null +++ b/bpf/openvswitch.h @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * This file is offered under your choice of two licenses: Apache 2.0 or GNU + * GPL 2.0 or later. The permission statements for each of these licenses is + * given below. You may license your modifications to this file under either + * of these licenses or both. If you wish to license your modifications under + * only one of these licenses, delete the permission text for the other + * license. + * + * ---------------------------------------------------------------------- + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * ---------------------------------------------------------------------- + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + * ---------------------------------------------------------------------- + */ + +#ifndef __BPF_OPENVSWITCH__ +#define __BPF_OPENVSWITCH__ +#include +#include "odp-netlink.h" + +#ifndef BPFNL_OPENVSWITCH_H +#define BPFNL_OPENVSWITCH_H 1 +#endif /* BPFNL_OPENVSWITCH_H */ + +#endif /* __BPF_OPENVSWITCH__ */ diff --git a/bpf/ovs-p4.h b/bpf/ovs-p4.h new file mode 100644 index 000000000000..f1152ce79bb8 --- /dev/null +++ b/bpf/ovs-p4.h @@ -0,0 +1,90 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#ifndef BPFP4_OPENVSWITCH_H +#define BPFP4_OPENVSWITCH_H 1 + +#include "helpers.h" +#include "generated_headers.h" +/* + * From BCC src/cc/export/helpers.h + */ +#define MASK(_n) ((_n) < 64 ? (1ull << (_n)) - 1 : ((u64)-1LL)) +#define MASK128(_n) ((_n) < 128 ? ((unsigned __int128)1 << (_n)) - 1 : ((unsigned __int128)-1)) + +static inline u64 load_dword(void *skb, u64 off) { + return ((u64)load_word(skb, off) << 32) | load_word(skb, off + 4); +} +static inline __attribute__((always_inline)) +void bpf_dins_pkt(void *pkt, u64 off, u64 bofs, u64 bsz, u64 val) { + // The load_xxx function does a bswap before returning the short/word/dword, + // so the value in register will always be host endian. However, the bytes + // written back need to be in network order. + if (bofs == 0 && bsz == 8) { + bpf_skb_store_bytes(pkt, off, &val, 1, 0); + } else if (bofs + bsz <= 8) { + u8 v = load_byte(pkt, off); + v &= ~(MASK(bsz) << (8 - (bofs + bsz))); + v |= ((val & MASK(bsz)) << (8 - (bofs + bsz))); + bpf_skb_store_bytes(pkt, off, &v, 1, 0); + } else if (bofs == 0 && bsz == 16) { + u16 v = bpf_htons(val); + bpf_skb_store_bytes(pkt, off, &v, 2, 0); + } else if (bofs + bsz <= 16) { + u16 v = load_half(pkt, off); + v &= ~(MASK(bsz) << (16 - (bofs + bsz))); + v |= ((val & MASK(bsz)) << (16 - (bofs + bsz))); + v = bpf_htons(v); + bpf_skb_store_bytes(pkt, off, &v, 2, 0); + } else if (bofs == 0 && bsz == 32) { + u32 v = bpf_htonl(val); + bpf_skb_store_bytes(pkt, off, &v, 4, 0); + } else if (bofs + bsz <= 32) { + u32 v = load_word(pkt, off); + v &= ~(MASK(bsz) << (32 - (bofs + bsz))); + v |= ((val & MASK(bsz)) << (32 - (bofs + bsz))); + v = bpf_htonl(v); + bpf_skb_store_bytes(pkt, off, &v, 4, 0); + } else if (bofs == 0 && bsz == 64) { + u64 v = bpf_htonll(val); + bpf_skb_store_bytes(pkt, off, &v, 8, 0); + } else if (bofs + bsz <= 64) { + u64 v = load_dword(pkt, off); + v &= ~(MASK(bsz) << (64 - (bofs + bsz))); + v |= ((val & MASK(bsz)) << (64 - (bofs + bsz))); + v = bpf_htonll(v); + bpf_skb_store_bytes(pkt, off, &v, 8, 0); + } +} + +enum ErrorCode { + p4_pe_no_error, + p4_pe_index_out_of_bounds, + p4_pe_out_of_packet, + p4_pe_header_too_long, + p4_pe_header_too_short, + p4_pe_unhandled_select, + p4_pe_checksum, + p4_pe_too_many_encap, + p4_pe_ipv6_disabled, +}; + +#define EBPF_MASK(t, w) ((((t)(1)) << (w)) - (t)1) +#define BYTES(w) ((w + 7) / 8) + +#endif diff --git a/bpf/ovs-proto.p4 b/bpf/ovs-proto.p4 new file mode 100644 index 000000000000..c6ebdb510b75 --- /dev/null +++ b/bpf/ovs-proto.p4 @@ -0,0 +1,329 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * This file is offered under your choice of two licenses: Apache 2.0 or GNU + * GPL 2.0 or later. The permission statements for each of these licenses is + * given below. You may license your modifications to this file under either + * of these licenses or both. If you wish to license your modifications under + * only one of these licenses, delete the permission text for the other + * license. + * + * ---------------------------------------------------------------------- + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * ---------------------------------------------------------------------- + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + * ---------------------------------------------------------------------- + */ + +/* OVS P4 1.0 protocol file + * use bcc to generate eBPF C file + * see bcc project: https://github.com/iovisor/bcc.git + * under ~/bcc/src/cc/frontends/p4/test/ + */ +#define ETH_P_8021Q 0x8100 /* 802.1Q VLAN Extended Header */ +#define ETH_P_8021AD 0x88A8 /* 802.1ad Service VLAN */ +#define ETH_P_ARP 0x0806 +#define ETH_P_IPV4 0x0800 +#define ETH_P_IPV6 0x86DD + +#define IPPROTO_ICMP 1 +#define IPPROTO_IGMP 2 +#define IPPROTO_TCP 6 +#define IPPROTO_UDP 17 +#define IPPROTO_GRE 47 +#define IPPROTO_SCTP 132 + +header_type ethernet_t { + fields { + dstAddr : 48; + srcAddr : 48; + etherType : 16; + } +} + +header_type vlan_tag_t { + fields { + pcp : 3; + cfi : 1; + vid : 12; + etherType : 16; + } +} + +header_type mpls_t { + fields { + label : 20; + exp : 3; + bos : 1; + ttl : 8; + } +} + +header_type arp_rarp_t { + fields { + hwType : 16; + protoType : 16; + hwAddrLen : 8; + protoAddrLen : 8; + opcode : 16; + } +} + +header_type arp_rarp_ipv4_t { + fields { + srcHwAddr : 48; + srcProtoAddr : 32; + dstHwAddr : 48; + dstProtoAddr : 32; + } +} + +header_type ipv4_t { + fields { + version : 4; + ihl : 4; + diffserv : 8; + totalLen : 16; + identification : 16; + flags : 3; + fragOffset : 13; + ttl : 8; + protocol : 8; + hdrChecksum : 16; + srcAddr : 32; + dstAddr: 32; + } +} + +header_type ipv6_t { + fields { + version : 4; + trafficClass : 8; + flowLabel : 20; + payloadLen : 16; + nextHdr : 8; + hopLimit : 8; + srcAddr : 128; + dstAddr : 128; + } +} + +header_type icmp_t { + fields { + typeCode : 16; + hdrChecksum : 16; + } +} + +header_type tcp_t { + fields { + srcPort : 16; + dstPort : 16; + seqNo : 32; + ackNo : 32; + dataOffset : 4; + res : 4; + flags : 8; + window : 16; + checksum : 16; + urgentPtr : 16; + } +} + +header_type udp_t { + fields { + srcPort : 16; + dstPort : 16; + length_ : 16; + checksum : 16; + } +} + +header_type sctp_t { + fields { + srcPort : 16; + dstPort : 16; + verifTag : 32; + checksum : 32; + } +} + +header_type gre_t { + fields { + C : 1; + R : 1; + K : 1; + S : 1; + s : 1; + recurse : 3; + flags : 5; + ver : 3; + proto : 16; + } +} + +/* ----------------- metadata ---------------- */ +header_type pkt_metadata_t { + fields { + recirc_id : 32; /* Recirculation id carried with the + recirculating packets. 0 for packets + received from the wire. */ + dp_hash : 32; /* hash value computed by the recirculation + action. */ + skb_priority : 32; /* Packet priority for QoS. */ + pkt_mark : 32; /* Packet mark. */ + ct_state : 16; /* Connection state. */ + ct_zone : 16; /* Connection zone. */ + ct_mark : 32; /* Connection mark. */ + ct_label : 128; /* Connection label. */ + in_port : 32; /* Input port. */ + } +} + +header_type flow_tnl_t { + fields { + /* struct flow_tnl: + * Tunnel information used in flow key and metadata. + */ + ip_dst : 32; + ipv6_dst : 64; + ip_src: 32; + ipv6_src : 64; + tun_id : 64; + flags : 16; + ip_tos : 8; + ip_ttl : 8; + tp_src : 16; + tp_dst : 16; + gbp_id : 16; + gbp_flags : 8; + pad1: 40; /* Pad to 64 bits. */ + /* struct tun_metadata metadata; */ + } +} + +header ethernet_t ethernet; +header ipv4_t ipv4; +header ipv6_t ipv6; +header arp_rarp_t arp; +header tcp_t tcp; +header udp_t udp; +header icmp_t icmp; +header vlan_tag_t vlan; +metadata pkt_metadata_t md; +metadata flow_tnl_t tnl_md; + +parser start { + return parse_ethernet; +} + +parser parse_ethernet{ + extract(ethernet); + return select(latest.etherType) { + ETH_P_8021Q: parse_vlan; + ETH_P_8021AD: parse_vlan; + ETH_P_ARP: parse_arp; + ETH_P_IPV4: parse_ipv4; + ETH_P_IPV6: parse_ipv6; + default: ingress; + } +} + +parser parse_vlan { + extract(vlan); + return select(latest.etherType) { + ETH_P_ARP: parse_arp; + ETH_P_IPV4: parse_ipv4; + ETH_P_IPV6: parse_ipv6; + default: ingress; + } +} + +parser parse_arp { + extract(arp); + return ingress; +} + +parser parse_ipv4 { + extract(ipv4); + return select(latest.protocol) { + IPPROTO_TCP: parse_tcp; + IPPROTO_UDP: parse_udp; + IPPROTO_ICMP: parse_icmp; + default: ingress; + } +} + +parser parse_ipv6 { + extract(ipv6); + return select(latest.nextHdr) { + IPPROTO_TCP: parse_tcp; + IPPROTO_UDP: parse_udp; + IPPROTO_ICMP: parse_icmp; + default: ingress; + } +} + +parser parse_tcp { + extract(tcp); + return ingress; +} + +parser parse_udp { + extract(udp); + return ingress; +} + +parser parse_icmp { + extract(icmp); + return ingress; +} +/* ------------------------------------------------------------------------- */ +action nop() {} + +table ovs_tbl { + reads { + /* Avoid compiler optimizes out, although + we are not using it at all */ + ethernet.dstAddr: exact; + vlan.etherType: exact; + ipv4.dstAddr: exact; + ipv6.dstAddr: exact; + icmp.typeCode: exact; + tcp.dstPort: exact; + udp.dstPort: exact; + md.in_port: exact; + tnl_md.tun_id: exact; + } + actions { + nop; + } +} + +control ingress +{ + apply(ovs_tbl); +} + diff --git a/bpf/parser.h b/bpf/parser.h new file mode 100644 index 000000000000..0dc12b6cd1a7 --- /dev/null +++ b/bpf/parser.h @@ -0,0 +1,344 @@ +/* + * Copyright (c) 2016, 2017, 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ + +#include "ovs-p4.h" +#include "api.h" +#include "helpers.h" +#include "maps.h" +#include +#include +#include +#include +#include +#include +#include +#include + +#define TCP_FLAGS_BE16(tp) (*(__be16 *)&tcp_flag_word(tp) & bpf_htons(0x0FFF)) + +static bool ipv6_has_ext(u8 nw_proto) { + if ((nw_proto == IPPROTO_HOPOPTS) || + (nw_proto == IPPROTO_ROUTING) || + (nw_proto == IPPROTO_DSTOPTS) || + (nw_proto == IPPROTO_AH) || + (nw_proto == IPPROTO_FRAGMENT)) { + return true; + } + return false; +} + +__section_tail(PARSER_CALL) +static int ovs_parser(struct __sk_buff* skb) { + void *data = (void *)(long)skb->data; + struct ebpf_headers_t hdrs = {}; + struct ebpf_metadata_t metadata = {}; + struct bpf_tunnel_key key; + struct ethhdr *eth; + ovs_be16 eth_proto; + u32 ebpf_zero = 0; + int offset = 0; + u8 nw_proto = 0; + int err = 0, ret = 0; + + /* Verifier Check. */ + if ((char *)data + sizeof(*eth) > (char *)(long)skb->data_end) { + printt("ERR parsing ethernet\n"); + return TC_ACT_SHOT; + } + + eth = data; + if (eth->h_proto == 0) { + printt("eth_proto == 0, return TC_ACT_OK\n"); + return TC_ACT_OK; + } + + printt("eth_proto = 0x%x len = %d\n", bpf_ntohs(eth->h_proto), skb->len); + printt("skb->protocol = 0x%x\n", skb->protocol); + printt("skb->ingress_ifindex %d skb->ifindex %d\n", + skb->ingress_ifindex, skb->ifindex); + + /* Link Layer. */ + if (skb_load_bytes(skb, offset, &hdrs.ethernet, sizeof(hdrs.ethernet)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + offset += sizeof(hdrs.ethernet); + hdrs.valid |= ETHER_VALID; + + /* VLAN 8021Q (0x8100) or 8021AD (0x8a88) in metadata + * note: vlan in metadata is always the outer vlan + */ + if (skb->vlan_tci) { + hdrs.vlan.tci = skb->vlan_tci | VLAN_TAG_PRESENT; /* host byte order */ + hdrs.vlan.etherType = skb->vlan_proto; + hdrs.valid |= VLAN_VALID; + + printt("skb metadata: vlan proto 0x%x tci %x\n", bpf_ntohs(skb->vlan_proto), skb->vlan_tci); + } + + eth_proto = eth->h_proto; + + if (eth->h_proto == bpf_htons(ETH_P_8021Q)){ + + /* The inner, if exists, is VLAN 8021Q (0x8100) */ + struct vlan_hdr { /* wired format */ + ovs_be16 tci; + ovs_be16 ethertype; + } cvlan; + + /* parse cvlan */ + if (skb_load_bytes(skb, offset - 2, &cvlan, sizeof(cvlan)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + offset += sizeof(hdrs.cvlan); + hdrs.valid |= CVLAN_VALID; + + hdrs.cvlan.tci = bpf_ntohs(cvlan.tci); + hdrs.cvlan.etherType = cvlan.ethertype; + + printt("vlan tci 0x%x ethertype 0x%x\n", + hdrs.cvlan.tci, bpf_ntohs(hdrs.cvlan.etherType)); + + skb_load_bytes(skb, offset - 2, ð_proto, 2); + printt("eth_proto = 0x%x\n", bpf_ntohs(eth_proto)); + } + + /* Network Layer. + * see key_extract() in net/openvswitch/flow.c */ + if (eth_proto == bpf_htons(ETH_P_IP)) { + struct iphdr nh; + + printt("parse ipv4\n"); + if (skb_load_bytes(skb, offset, &nh, sizeof(nh)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + offset += nh.ihl * 4; + hdrs.valid |= IPV4_VALID; + + hdrs.ipv4.ttl = nh.ttl; /* u8 */ + hdrs.ipv4.tos = nh.tos; /* u8 */ + hdrs.ipv4.protocol = nh.protocol; /* u8*/ + hdrs.ipv4.srcAddr = nh.saddr; /* be32 */ + hdrs.ipv4.dstAddr = nh.daddr; /* be32 */ + + nw_proto = hdrs.ipv4.protocol; + printt("next proto 0x%x\n", nw_proto); + + } else if (eth_proto == bpf_htons(ETH_P_ARP) || + eth_proto == bpf_htons(ETH_P_RARP)) { + struct arp_rarp_t *arp; + + printt("parse arp/rarp\n"); + + /* the struct arp_rarp_t is wired format */ + arp = &hdrs.arp; + if (skb_load_bytes(skb, offset, arp, sizeof(hdrs.arp)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + offset += sizeof(hdrs.arp); + hdrs.valid |= ARP_VALID; + + if (arp->ar_hrd == bpf_htons(ARPHRD_ETHER) && + arp->ar_pro == bpf_htons(ETH_P_IP) && + arp->ar_hln == ETH_ALEN && + arp->ar_pln == 4) { + printt("valid arp\n"); + } else { + printt("ERR: invalid arp\n"); + } + goto parse_metadata; + + } else if (eth_proto == bpf_htons(ETH_P_IPV6)) { + + struct ipv6hdr ip6hdr; /* wired format */ + + if (skb_load_bytes(skb, offset, &ip6hdr, sizeof(ip6hdr)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + offset += sizeof(struct ipv6hdr); /* wired format */ + hdrs.valid |= IPV6_VALID; + + printt("parse ipv6\n"); + + memcpy(&hdrs.ipv6.flowLabel, &ip6hdr.flow_lbl, 4); //FIXME + memcpy(&hdrs.ipv6.srcAddr, &ip6hdr.saddr, 16); + memcpy(&hdrs.ipv6.dstAddr, &ip6hdr.daddr, 16); + + nw_proto = ip6hdr.nexthdr; + + if (ipv6_has_ext(nw_proto)) { + printt("WARN: ipv6 nexthdr %x does not supported\n", nw_proto); + // need to update offset + } + + printt("next proto = %x\n", nw_proto); + + } else { + printt("ERR: eth_proto %x not supported\n", bpf_ntohs(eth_proto)); + return TC_ACT_OK; + } + + /* Transport Layer. + * Handle: TCP, UDP, ICMP + */ + if (nw_proto == IPPROTO_TCP) { + struct tcphdr tcp; + + if (skb_load_bytes(skb, offset, &tcp, sizeof(tcp)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + hdrs.valid |= TCP_VALID; + + hdrs.tcp.srcPort = tcp.source; + hdrs.tcp.dstPort = tcp.dest; + hdrs.tcp.flags = TCP_FLAGS_BE16(&tcp); + + printt("parse tcp src %d dst %d\n", bpf_ntohs(tcp.source), bpf_ntohs(tcp.dest)); + + } else if (nw_proto == IPPROTO_UDP) { + struct udphdr udp; + + if (skb_load_bytes(skb, offset, &udp, sizeof(udp)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + hdrs.valid |= UDP_VALID; + + hdrs.udp.srcPort = udp.source; + hdrs.udp.dstPort = udp.dest; + + printt("parse udp src %d dst %d\n", bpf_ntohs(udp.source), bpf_ntohs(udp.dest)); + + } else if (nw_proto == IPPROTO_ICMP) { /* ICMP v4 */ + struct icmphdr icmp; + + if (skb_load_bytes(skb, offset, &icmp, sizeof(icmp)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + hdrs.valid |= ICMP_VALID; + + hdrs.icmp.type = icmp.type; + hdrs.icmp.code = icmp.code; + + printt("parse icmp type %d code %d\n", icmp.type, icmp.code); + + } else if (nw_proto == 0x3a /*EXTHDR_ICMP*/) { /* ICMP v6 */ + struct icmphdr icmp; + + if (skb_load_bytes(skb, offset, &icmp, sizeof(icmp)) < 0) { + err = p4_pe_header_too_short; + printt("ERR: load byte %d\n", __LINE__); + goto end; + } + hdrs.valid |= ICMPV6_VALID; + + hdrs.icmpv6.type = icmp.type; + hdrs.icmpv6.code = icmp.code; + + printt("parse icmp v6 type %d code %d\n", icmp.type, icmp.code); + } else if (nw_proto == IPPROTO_GRE) { + printt("receive gre packet\n"); + } else { + printt("WARN: nw_proto 0x%x not parsed\n", nw_proto); + /* Continue */ + } + +parse_metadata: + metadata.md.skb_priority = skb->priority; + + /* Don't use ovs_cb_get_ifindex(), that gets optimized into something + * that can't be verified. >:( */ + if (skb->cb[OVS_CB_INGRESS]) { + metadata.md.in_port = skb->ingress_ifindex; + } + if (!skb->cb[OVS_CB_INGRESS]) { + metadata.md.in_port = skb->ifindex; + } + metadata.md.pkt_mark = skb->mark; + + ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); + if (!ret) { + printt("bpf_skb_get_tunnel_key id = %d ipv4\n", key.tunnel_id); + metadata.tnl_md.tun_id = key.tunnel_id; + metadata.tnl_md.ip4.ip_src = key.remote_ipv4; + metadata.tnl_md.ip_tos = key.tunnel_tos; + metadata.tnl_md.ip_ttl = key.tunnel_ttl; + metadata.tnl_md.use_ipv6 = 0; + metadata.tnl_md.flags = 0; +#ifdef BPF_ENABLE_IPV6 + } else if (ret == -EPROTO) { + ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), + BPF_F_TUNINFO_IPV6); + if (!ret) { + printt("bpf_skb_get_tunnel_key id = %d ipv6\n", key.tunnel_id); + metadata.tnl_md.tun_id = key.tunnel_id; + memcpy(&metadata.tnl_md.ip6.ipv6_src, &key.remote_ipv4, 16); + metadata.tnl_md.ip_tos = key.tunnel_tos; + metadata.tnl_md.ip_ttl = key.tunnel_ttl; + metadata.tnl_md.use_ipv6 = 1; + metadata.tnl_md.flags = 0; + } +#endif + } + + if (!ret) { + ret = bpf_skb_get_tunnel_opt(skb, &metadata.tnl_md.gnvopt, + sizeof metadata.tnl_md.gnvopt); + if (ret > 0) + metadata.tnl_md.gnvopt_valid = 1; + printt("bpf_skb_get_tunnel_opt ret = %d\n", ret); + } + +end: + if (err != p4_pe_no_error) { + printt("parse error: %d, drop\n", err); + return TC_ACT_SHOT; + } + + /* write flow key and md to key map */ + printt("Parser: updating flow key\n"); + bpf_map_update_elem(&percpu_headers, + &ebpf_zero, &hdrs, BPF_ANY); + + if (ovs_cb_is_initial_parse(skb)) { + bpf_map_update_elem(&percpu_metadata, + &ebpf_zero, &metadata, BPF_ANY); + } + skb->cb[OVS_CB_ACT_IDX] = 0; + + /* tail call next stage */ + printt("tail call match + lookup stage\n"); + bpf_tail_call(skb, &tailcalls, MATCH_ACTION_CALL); + + printt("[ERROR] missing tail call\n"); + return TC_ACT_OK; +} diff --git a/bpf/xdp.h b/bpf/xdp.h new file mode 100644 index 000000000000..2d2102a6ba28 --- /dev/null +++ b/bpf/xdp.h @@ -0,0 +1,35 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301, USA + */ +#include "ovs-p4.h" +#include "api.h" +#include "helpers.h" + +__section("xdp") +static int xdp_ingress(struct xdp_md *ctx OVS_UNUSED) +{ + /* TODO: see p4c-xdp project */ + printt("return XDP_PASS\n"); + return XDP_PASS; +} + +__section("af_xdp") +static int af_xdp_ingress(struct xdp_md *ctx OVS_UNUSED) +{ + /* TODO: see xdpsock_kern.c ans xdpsock_user.c */ + return XDP_PASS; +} From patchwork Sat Jul 14 11:39:00 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943920 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Shq/8nsY"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSWR2rkMz9ryt for ; Sat, 14 Jul 2018 21:44:19 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 79689CCA; Sat, 14 Jul 2018 11:40:06 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id EE469CBD for ; Sat, 14 Jul 2018 11:40:03 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 6F648794 for ; Sat, 14 Jul 2018 11:40:03 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id m19-v6so5837467pgv.3 for ; Sat, 14 Jul 2018 04:40:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=zlLBhNqaud0RYneEWqdPLbQE3ujc6UJDA2WXeGtoo5U=; b=Shq/8nsYlrGlMh179EnQWO5zXDJTMpX32cwM72OczJa0ZcSt0sKkN02LafadW3RK9Z ftLD77qfSLeEe51C5lHL6MkAZYr6mqITud67RERwsIb1d5Pc+7q6T4e1vGCwl8knEu1J +iSKie6lshaUZ8xz+jI4taBeou3G/2HSz4PViNfRcefOseH5qo66F9KgeCnvKEq1jT37 OurkAJgmhFisHCV1ocKfFjQb0/Ws2hLBNjSCPASPkFO0wwMIySA2A0EDa4KfiXaWYj/2 alNUjt6e8VTpMe93tHEHCatIcYuU7aAtrm2jf/V5x70LMsVPFI28TZJdLz8keFS0s2K6 0O6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=zlLBhNqaud0RYneEWqdPLbQE3ujc6UJDA2WXeGtoo5U=; b=d4k1eYS+p7l9HdEhsjukKoU+/cigbfG9TSjF2oEpXVeIrxDbD/K+446Mgv8l/fVN0g Pk74FRkr1mmSvlb4XMavxH2Sbc20XszLpv2B7OKq9EZnBgpiQVzdVwe5uV0mLNItSx3h fth0d1hSyIGpdTsfXXsspqD5tZtSl4LFiPmBjKOc4iEKApAYrz6CD3bNyLEd2buXDEXQ 95/cS7vRVGC21YyyjfsE57BeREyrYk9enL0w0JABq5fC6IIvyplZ+yqCa7RF9DxGNkMY f/iuiVQP1Ro6hqX/z+ZUGMCjCV+1yWsBDzLpilhWlw4ekssdvtxFSEUmWNQCDtUY34QN I4VQ== X-Gm-Message-State: AOUpUlH3EBHz6IvSv/GQai40zVzJt9rUWVlNJ9+c+Xi4VEz7xObJ6/S1 jd32LDlhJs3gFSoEhu34cLQNhOd8 X-Google-Smtp-Source: AAOMgpfx5uRwn3jp76kYM5UTrO4l7Mo3UoXFEiADwNnzqDYiy6FBkArXWwH6p5QkIBbGhsV/CdW8MA== X-Received: by 2002:a62:170c:: with SMTP id 12-v6mr10768278pfx.139.1531568402836; Sat, 14 Jul 2018 04:40:02 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.40.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:40:02 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:39:00 -0700 Message-Id: <1531568345-80246-9-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 08/13] vswitch/bridge.c: add bpf datapath initialization. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The patch initializes the bpf datapath when bridge starts. The check_support could be avoided since we know what datapath bpf program supports what feature. Signed-off-by: Joe Stringer Signed-off-by: William Tu Signed-off-by: Yifeng Sun Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- lib/packets.h | 6 ++++- ofproto/ofproto-dpif.c | 69 ++++++++++++++++++++++++++++++++++---------------- vswitchd/bridge.c | 21 +++++++++++++++ 3 files changed, 73 insertions(+), 23 deletions(-) diff --git a/lib/packets.h b/lib/packets.h index 9a71aa3abbdb..2379c8f6d19d 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -47,7 +47,8 @@ static inline bool ipv6_addr_is_set(const struct in6_addr *addr); static inline bool flow_tnl_dst_is_set(const struct flow_tnl *tnl) { - return tnl->ip_dst || ipv6_addr_is_set(&tnl->ipv6_dst); + return tnl->ip_dst || ipv6_addr_is_set(&tnl->ipv6_dst) || + tnl->ip_src || ipv6_addr_is_set(&tnl->ipv6_src); } struct in6_addr flow_tnl_dst(const struct flow_tnl *tnl); @@ -154,7 +155,10 @@ pkt_metadata_init(struct pkt_metadata *md, odp_port_t port) * we can just zero out ip_dst and the rest of the data will never be * looked at. */ md->tunnel.ip_dst = 0; + md->tunnel.ip_src = 0; md->tunnel.ipv6_dst = in6addr_any; + md->tunnel.ipv6_src = in6addr_any; + md->in_port.odp_port = port; } diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 3365d4185926..115c138505ac 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -1338,28 +1338,53 @@ CHECK_FEATURE__(ct_orig_tuple6, ct_orig_tuple6, ct_nw_proto, 1, ETH_TYPE_IPV6) static void check_support(struct dpif_backer *backer) { - /* Actions. */ - backer->rt_support.odp.recirc = check_recirc(backer); - backer->rt_support.odp.max_vlan_headers = check_max_vlan_headers(backer); - backer->rt_support.odp.max_mpls_depth = check_max_mpls_depth(backer); - backer->rt_support.masked_set_action = check_masked_set_action(backer); - backer->rt_support.trunc = check_trunc_action(backer); - backer->rt_support.ufid = check_ufid(backer); - backer->rt_support.tnl_push_pop = dpif_supports_tnl_push_pop(backer->dpif); - backer->rt_support.clone = check_clone(backer); - backer->rt_support.sample_nesting = check_max_sample_nesting(backer); - backer->rt_support.ct_eventmask = check_ct_eventmask(backer); - backer->rt_support.ct_clear = check_ct_clear(backer); - - /* Flow fields. */ - backer->rt_support.odp.ct_state = check_ct_state(backer); - backer->rt_support.odp.ct_zone = check_ct_zone(backer); - backer->rt_support.odp.ct_mark = check_ct_mark(backer); - backer->rt_support.odp.ct_label = check_ct_label(backer); - - backer->rt_support.odp.ct_state_nat = check_ct_state_nat(backer); - backer->rt_support.odp.ct_orig_tuple = check_ct_orig_tuple(backer); - backer->rt_support.odp.ct_orig_tuple6 = check_ct_orig_tuple6(backer); + if (!strcmp(backer->type, "bpf")) { + /* Actions. */ + backer->rt_support.odp.recirc = check_recirc(backer); + backer->rt_support.odp.max_vlan_headers = check_max_vlan_headers(backer); + backer->rt_support.odp.max_mpls_depth = check_max_mpls_depth(backer); + backer->rt_support.masked_set_action = check_masked_set_action(backer); + backer->rt_support.trunc = check_trunc_action(backer); + backer->rt_support.ufid = check_ufid(backer); + backer->rt_support.tnl_push_pop = dpif_supports_tnl_push_pop(backer->dpif); + backer->rt_support.clone = check_clone(backer); + backer->rt_support.sample_nesting = check_max_sample_nesting(backer); + backer->rt_support.ct_eventmask = false; + backer->rt_support.ct_clear = false; + + /* Flow fields. */ + backer->rt_support.odp.ct_state = false; + backer->rt_support.odp.ct_zone = false; + backer->rt_support.odp.ct_mark = false; + backer->rt_support.odp.ct_label = false; + + backer->rt_support.odp.ct_state_nat = false; + backer->rt_support.odp.ct_orig_tuple = false; + backer->rt_support.odp.ct_orig_tuple6 = false; + } else { + /* Actions. */ + backer->rt_support.odp.recirc = check_recirc(backer); + backer->rt_support.odp.max_vlan_headers = check_max_vlan_headers(backer); + backer->rt_support.odp.max_mpls_depth = check_max_mpls_depth(backer); + backer->rt_support.masked_set_action = check_masked_set_action(backer); + backer->rt_support.trunc = check_trunc_action(backer); + backer->rt_support.ufid = check_ufid(backer); + backer->rt_support.tnl_push_pop = dpif_supports_tnl_push_pop(backer->dpif); + backer->rt_support.clone = check_clone(backer); + backer->rt_support.sample_nesting = check_max_sample_nesting(backer); + backer->rt_support.ct_eventmask = check_ct_eventmask(backer); + backer->rt_support.ct_clear = check_ct_clear(backer); + + /* Flow fields. */ + backer->rt_support.odp.ct_state = check_ct_state(backer); + backer->rt_support.odp.ct_zone = check_ct_zone(backer); + backer->rt_support.odp.ct_mark = check_ct_mark(backer); + backer->rt_support.odp.ct_label = check_ct_label(backer); + + backer->rt_support.odp.ct_state_nat = check_ct_state_nat(backer); + backer->rt_support.odp.ct_orig_tuple = check_ct_orig_tuple(backer); + backer->rt_support.odp.ct_orig_tuple6 = check_ct_orig_tuple6(backer); + } } static int diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index f44f950a4fce..ca6d73810420 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -20,6 +20,7 @@ #include #include "async-append.h" +#include "bpf.h" #include "bfd.h" #include "bitmap.h" #include "cfm.h" @@ -508,6 +509,25 @@ bridge_exit(bool delete_datapath) ovsdb_idl_destroy(idl); } +static int +init_ebpf(const struct ovsrec_open_vswitch *ovs_cfg OVS_UNUSED) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + static int error = 0; + + if (ovsthread_once_start(&once)) { + char *bpf_elf = xasprintf("%s/bpf/datapath.o", ovs_pkgdatadir()); + + error = bpf_init(); + if (!error) { + error = bpf_load(bpf_elf); + } + free(bpf_elf); + ovsthread_once_done(&once); + } + return error; +} + /* Looks at the list of managers in 'ovs_cfg' and extracts their remote IP * addresses and ports into '*managersp' and '*n_managersp'. The caller is * responsible for freeing '*managersp' (with free()). @@ -2979,6 +2999,7 @@ bridge_run(void) if (cfg) { netdev_set_flow_api_enabled(&cfg->other_config); dpdk_init(&cfg->other_config); + init_ebpf(cfg); } /* Initialize the ofproto library. This only needs to run once, but From patchwork Sat Jul 14 11:39:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943921 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LHpS+qW7"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSWx0DNcz9ryt for ; Sat, 14 Jul 2018 21:44:45 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2DEF6CD8; Sat, 14 Jul 2018 11:40:09 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id A2D1ECD4 for ; Sat, 14 Jul 2018 11:40:06 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E172B67E for ; Sat, 14 Jul 2018 11:40:05 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id p23-v6so5687946pgv.13 for ; Sat, 14 Jul 2018 04:40:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JZdFtQj/Nnut9xdnUqWsPUKsmVYDUfFQQ4eSng8h+P4=; b=LHpS+qW732p4Hn3GoiVJT+StGllJUM4sc1H1gzsiLbVyBzirIN2a6XYfsd64jTZBJy j4cT3kCCEk/DIhG/rE67q5MuA6wSWUJ5Pj1Q/OB+NVb+d4f9yYVo8UWi0JMutW6EpEbf CFUBUqzU71GJRC+JtjWgRwGHc/XfgrFpSecqI0wcYHTbSteVd0w8KwrvnxfjS5iZZBAF DfMKLfecJ6OHfxwayJ1QzbhGZwkIE7fAVlESKRxkTvN96U5PBpbiGYqZ47LbxxYoVHI8 VY4Wv390d/WdLcdeZ/tt9ruCxePBFbtlm1BHUIBbfgAjnq+Va7NSsYXUne54DrXrMTe3 iPYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JZdFtQj/Nnut9xdnUqWsPUKsmVYDUfFQQ4eSng8h+P4=; b=GJS2lrIl8NWrQHgqrO5AaiBTaQirM3zHCOb8wStLB7wgEGxU9hKLdXA62PfI099lgK Pf8U41BUBDeO1dszVlDRHgxcTO72SR9kc61joZPV2uehgD6dQ9dfaCjO2+36L4iSmwre ySHNbj7jRtQoFOeN9TErEFhYvx10D1G3qMVmMskceKcK2fVE6PDVCWeE0Rr9og1Ylr08 cS6RvUlXjpO0EM9AJQ/3mYn8vkLlk3+iR52qjX7X1PnbKUyQvXv/VeEPQeWlst3jdJm0 vlZggVnTNDVoC/QM6U3zeT6fDsMbeo4qQVRkroh9KVgp+f3ubG3eZ09KtokaZQfB6ujL wLwA== X-Gm-Message-State: AOUpUlF4tRQH+AvAcMszWunZ8fIueFUnYwALSbI24Pmm6j1YGTXVH5Jz FCzBtZiuHTxBQuNFcLJqW8P/X+/h X-Google-Smtp-Source: AAOMgpcCU3lyfY2OLwB8KiKatNiDcMfp6J4XZ4uG3Q+C5Q7EV8gM7EhQP5gXp1oAcdk6Sn+3jyGoYA== X-Received: by 2002:a62:3184:: with SMTP id x126-v6mr10848627pfx.49.1531568405237; Sat, 14 Jul 2018 04:40:05 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.40.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:40:04 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:39:01 -0700 Message-Id: <1531568345-80246-10-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 09/13] utilities: Add ovs-bpfctl utility. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org From: Joe Stringer This new utility is used for standalone probing of BPF datapath state. Signed-off-by: Joe Stringer Signed-off-by: William Tu Signed-off-by: Yifeng Sun Co-authored-by: William Tu Co-authored-by: Yifeng Sun --- utilities/automake.mk | 9 ++ utilities/ovs-bpfctl.8.xml | 45 ++++++++ utilities/ovs-bpfctl.c | 248 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 302 insertions(+) create mode 100644 utilities/ovs-bpfctl.8.xml create mode 100644 utilities/ovs-bpfctl.c diff --git a/utilities/automake.mk b/utilities/automake.mk index 1636cb93e677..9de28eb1eb7d 100644 --- a/utilities/automake.mk +++ b/utilities/automake.mk @@ -39,6 +39,7 @@ utilities/ovs-lib: $(top_builddir)/config.status EXTRA_DIST += \ utilities/ovs-appctl-bashcomp.bash \ + utilities/ovs-bpfctl.8.xml \ utilities/ovs-check-dead-ifs.in \ utilities/ovs-ctl.in \ utilities/ovs-dev.py \ @@ -103,6 +104,7 @@ CLEANFILES += \ man_MANS += \ utilities/ovs-appctl.8 \ + utilities/ovs-bpfctl.8 \ utilities/ovs-ctl.8 \ utilities/ovs-testcontroller.8 \ utilities/ovs-dpctl.8 \ @@ -148,4 +150,11 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ utilities/checkpatch.py utilities/ovs-dev.py \ utilities/ovs-tcpdump.in +if HAVE_BPF +bin_PROGRAMS += \ + utilities/ovs-bpfctl +utilities_ovs_bpfctl_SOURCES = utilities/ovs-bpfctl.c +utilities_ovs_bpfctl_LDADD = lib/libopenvswitch.la +endif + include utilities/bugtool/automake.mk diff --git a/utilities/ovs-bpfctl.8.xml b/utilities/ovs-bpfctl.8.xml new file mode 100644 index 000000000000..6160d5eb06aa --- /dev/null +++ b/utilities/ovs-bpfctl.8.xml @@ -0,0 +1,45 @@ + + +

Name

+

ovs-bpfctl -- administer Open vSwitch BPF state

+ +

Synopsis

+

ovs-bpfctl [options] command [arg...]

+ +

Description

+

This utility can be used to probe and manage OVS BPF state.

+ +

Commands

+
+
show
+
+ Prints a brief overview of the current BPF configuration state. +
+ +
load-dp filename
+
+ Loads a BPF datapath implementation from filename into the + kernel, and pins it to the filesystem. +
+
+ +

Options

+ + +

Exit Status

+
+
0
+
Successful program execution.
+
1
+
Usage or syntax error.
+
+ +

See also

+

tc(8), tc-bpf(8)

+ +

Authors

+

Manpage written by Joe Stringer.

+

Please report corrections or improvements to + <bugs@openvswitch.org>

+ +
diff --git a/utilities/ovs-bpfctl.c b/utilities/ovs-bpfctl.c new file mode 100644 index 000000000000..10b238a3d79e --- /dev/null +++ b/utilities/ovs-bpfctl.c @@ -0,0 +1,248 @@ +/* + * Copyright (c) 2016 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf.h" +#include "command-line.h" +#include "fatal-signal.h" +#include "util.h" +#include "openvswitch/dynamic-string.h" +#include "openvswitch/vlog.h" + +static int verbosity = 0; +static bool read_only = false; + +typedef int bpfctl_command_handler(int argc, const char *argv[]); +struct bpfctl_command { + const char *name; + const char *usage; + int min_args; + int max_args; + bpfctl_command_handler *handler; + enum { DP_RO, DP_RW} mode; +}; + +OVS_NO_RETURN static void usage(void *userdata OVS_UNUSED); +static void parse_options(int argc, char *argv[]); +static int bpfctl_run_command(int argc, const char *argv[]); + +static void +bpfctl_print(void *userdata OVS_UNUSED, bool error, const char *msg) +{ + FILE *outfile = error ? stderr : stdout; + fputs(msg, outfile); +} + +static void +bpfctl_error(int err_no, const char *fmt, ...) +{ + const char *subprogram_name = get_subprogram_name(); + struct ds ds = DS_EMPTY_INITIALIZER; + int save_errno = errno; + va_list args; + + if (subprogram_name[0]) { + ds_put_format(&ds, "%s(%s): ", program_name,subprogram_name); + } else { + ds_put_format(&ds, "%s: ", program_name); + } + + va_start(args, fmt); + ds_put_format_valist(&ds, fmt, args); + va_end(args); + + if (err_no != 0) { + ds_put_format(&ds, " (%s)", ovs_retval_to_string(err_no)); + } + ds_put_cstr(&ds, "\n"); + + bpfctl_print(NULL, true, ds_cstr(&ds)); + + ds_destroy(&ds); + + errno = save_errno; +} + +int +main(int argc, char *argv[]) +{ + int error; + set_program_name(argv[0]); + parse_options(argc, argv); + fatal_ignore_sigpipe(); + + error = bpfctl_run_command(argc - optind, (const char **) argv + optind); + return error ? EXIT_FAILURE : EXIT_SUCCESS; +} + +static void +parse_options(int argc, char *argv[]) +{ + enum { + OPT_CLEAR = UCHAR_MAX + 1, + OPT_MAY_CREATE, + OPT_READ_ONLY, + VLOG_OPTION_ENUMS + }; + static const struct option long_options[] = { + {"read-only", no_argument, NULL, OPT_READ_ONLY}, + {"help", no_argument, NULL, 'h'}, + {"option", no_argument, NULL, 'o'}, + {"version", no_argument, NULL, 'V'}, + VLOG_LONG_OPTIONS, + {NULL, 0, NULL, 0}, + }; + char *short_options = ovs_cmdl_long_options_to_short_options(long_options); + + for (;;) { + int c; + + c = getopt_long(argc, argv, short_options, long_options, NULL); + if (c == -1) { + break; + } + + switch (c) { + case OPT_READ_ONLY: + read_only = true; + break; + + case 'm': + verbosity++; + break; + + case 'h': + usage(NULL); + + case 'o': + ovs_cmdl_print_options(long_options); + exit(EXIT_SUCCESS); + + case 'V': + ovs_print_version(0, 0); + exit(EXIT_SUCCESS); + + VLOG_OPTION_HANDLERS + + case '?': + exit(EXIT_FAILURE); + + default: + abort(); + } + } + free(short_options); +} + +static void +usage(void *userdata OVS_UNUSED) +{ + printf("%s: Open vSwitch bpf management utility\n" + "usage: %s [OPTIONS] COMMAND [ARG...]\n" + " show show basic info on bpf datapaths\n" + " load-dp FILENAME load datapath from FILENAME\n", + program_name, program_name); + vlog_usage(); + printf(" -m, --more increase verbosity of output\n" + " -h, --help display this help message\n" + " -V, --version display version information\n"); + exit(EXIT_SUCCESS); +} + +static int +bpfctl_show(int argc OVS_UNUSED, const char *argv[] OVS_UNUSED) +{ + struct bpf_state bpf; + + if (!bpf_get(&bpf, verbosity)) { + struct ds ds = DS_EMPTY_INITIALIZER; + + bpf_format_state(&ds, &bpf); + printf("%s", ds_cstr(&ds)); + ds_destroy(&ds); + bpf_put(&bpf); + } + return 0; +} + +static int +bpfctl_load_dp(int argc OVS_UNUSED, const char *argv[]) +{ + int error; + + error = bpf_init(); + if (error) { + return error; + } + return bpf_load(argv[1]); +} + +static const struct bpfctl_command all_commands[] = { + { "load-dp", "[file]", 1, 1, bpfctl_load_dp, DP_RW }, + { "show", "", 0, 0, bpfctl_show, DP_RO }, + { NULL, NULL, 0, 0, NULL, DP_RO }, +}; + +/* Runs the command designated by argv[0] within the command table specified by + * 'commands', which must be terminated by a command whose 'name' member is a + * null pointer. */ +static int +bpfctl_run_command(int argc, const char *argv[]) +{ + const struct bpfctl_command *p; + + if (argc < 1) { + bpfctl_error(0, "missing command name; use --help for help"); + return EINVAL; + } + + for (p = all_commands; p->name != NULL; p++) { + if (!strcmp(p->name, argv[0])) { + int n_arg = argc - 1; + if (n_arg < p->min_args) { + bpfctl_error(0, "'%s' command requires at least %d arguments", + p->name, p->min_args); + return EINVAL; + } else if (n_arg > p->max_args) { + bpfctl_error(0, "'%s' command takes at most %d arguments", + p->name, p->max_args); + return EINVAL; + } else { + if (p->mode == DP_RW && read_only) { + bpfctl_error(0, + "'%s' command does not work in read only mode", + p->name); + return EINVAL; + } + return p->handler(argc, argv); + } + } + } + + bpfctl_error(0, "unknown command '%s'; use --help for help", + argv[0]); + return EINVAL; +} From patchwork Sat Jul 14 11:39:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943926 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="OkPUxRmu"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSZy58y2z9ryt for ; Sat, 14 Jul 2018 21:47:22 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 6A99ACEE; Sat, 14 Jul 2018 11:40:16 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 02B08C3A for ; Sat, 14 Jul 2018 11:40:14 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf0-f176.google.com (mail-pf0-f176.google.com [209.85.192.176]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 4C1F1795 for ; Sat, 14 Jul 2018 11:40:11 +0000 (UTC) Received: by mail-pf0-f176.google.com with SMTP id v15-v6so4020430pff.5 for ; Sat, 14 Jul 2018 04:40:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=xOMhbrKukGMe8HAsj9z8WVPCEq9LqsGlGZD6nMZluBE=; b=OkPUxRmuwTOgV2SG1CyIOs/Y0alaXqqTIPblTMFruMIKtNCAgqssn15x0p4OjeiPR6 2G/IpzvVKsG4cF/O3cTKlni/MyHY4HANVhA3F8KKVL0yD7gPMbTfeE32sQhvHkJNoctT oZ2w5PnSgk6zgT9B4IsOWszJHBvA0xguGTX9SzPCVIRL8VGm7DS3bncn86obx33RCAq5 I/oGZ7rGKh/YRa7KtBuXIgH5+GTIPI/Z3xDQnA+6KXV399szwC78AGAL2ZqsgAPZix36 lKI8nkadkIjO10CQH82HcGHsjXRk3C1SfTlYGmk6VjYDMx1SF1TqsGylrN0ZqA7eFKdB 6x3w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=xOMhbrKukGMe8HAsj9z8WVPCEq9LqsGlGZD6nMZluBE=; b=kdluxxvprkxIaVWGpcXPPnrquaJnOYmf6k1dGh7BNvdnrW5w8Q2WFOPr/f2tUlOvC1 DlMH86T2rkpL9VeoQEmY9A3qYn5Xwn9yLtVThJNG5qeV0dLexTWoFrQ8Y7G/vNUvCS01 C3R7JA1/ejB0GHQ4G0DG+Np+oYFr7UJDQTmW1kF9ITZDpb+aumaRaQZuP3phsCESe5C5 /lwlTITxBgOxIupQ0fmiOUtvS1i1d82GLpKTToFgovzD2W7unjgk2oLi/cGXRs96hNtS ZzSpFtgxKrzJ49pujqbUQZdereVySDoCFdLgss/laGzDz7HeAUacQH5z0GgBvQzKK9lV 8K0g== X-Gm-Message-State: AOUpUlEKiDNYZk2bfDyq7kyyKrVb5PlK5tHHXq4Ds11R+q5uDqQhMjBt KJyN+Z/1+6GlaubLKSvi5hqsP1qn X-Google-Smtp-Source: AAOMgpfqvt7qxVBUxB+UhS2hOoDkvMfjoXaW6itXEVC5HakqQ1sDL0go1UqnHTxuczgflYSlk3YXUQ== X-Received: by 2002:a63:8f53:: with SMTP id r19-v6mr9163398pgn.17.1531568409814; Sat, 14 Jul 2018 04:40:09 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.40.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:40:09 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:39:02 -0700 Message-Id: <1531568345-80246-11-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 10/13] tests: Add "make check-bpf" traffic target. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org Add a separate test file tests/system-bpf-traffic.at for bpf testing. The test cases are a subset of the existing system-traffic.at, and with additional bpf-specific tests. When test passes, the log file is saved under: tests/system-bpf-testsuite.dir// Signed-off-by: William Tu Signed-off-by: Yifeng Sun Signed-off-by: Joe Stringer Signed-off-by: Yi-Hung Wei Co-authored-by: Joe Stringer Co-authored-by: Yifeng Sun Co-authored-by: Yi-Hung Wei --- tests/.gitignore | 1 + tests/automake.mk | 31 +- tests/ofproto-macros.at | 7 + tests/system-bpf-macros.at | 112 ++++++ tests/system-bpf-testsuite.at | 25 ++ tests/system-bpf-testsuite.patch | 10 + tests/system-bpf-traffic.at | 851 +++++++++++++++++++++++++++++++++++++++ 7 files changed, 1036 insertions(+), 1 deletion(-) create mode 100644 tests/system-bpf-macros.at create mode 100644 tests/system-bpf-testsuite.at create mode 100644 tests/system-bpf-testsuite.patch create mode 100644 tests/system-bpf-traffic.at diff --git a/tests/.gitignore b/tests/.gitignore index 3e2ddf2e9e5d..98890e011afc 100644 --- a/tests/.gitignore +++ b/tests/.gitignore @@ -11,6 +11,7 @@ /ovs-pki.log /pki/ /system-kmod-testsuite +/system-bpf-testsuite /system-userspace-testsuite /system-offloads-testsuite /test-aes128 diff --git a/tests/automake.mk b/tests/automake.mk index 52ed53fd16d4..732dc4ab9bdc 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -4,15 +4,18 @@ EXTRA_DIST += \ $(SYSTEM_TESTSUITE_AT) \ $(SYSTEM_KMOD_TESTSUITE_AT) \ $(SYSTEM_USERSPACE_TESTSUITE_AT) \ + $(SYSTEM_BPF_TESTSUITE_AT) \ $(SYSTEM_OFFLOADS_TESTSUITE_AT) \ $(TESTSUITE) \ $(SYSTEM_KMOD_TESTSUITE) \ $(SYSTEM_USERSPACE_TESTSUITE) \ + $(SYSTEM_BPF_TESTSUITE) \ $(SYSTEM_OFFLOADS_TESTSUITE) \ tests/atlocal.in \ $(srcdir)/package.m4 \ $(srcdir)/tests/testsuite \ - $(srcdir)/tests/testsuite.patch + $(srcdir)/tests/testsuite.patch \ + $(srcdir)/tests/system-bpf-testsuite.patch COMMON_MACROS_AT = \ tests/ovsdb-macros.at \ @@ -110,6 +113,11 @@ SYSTEM_KMOD_TESTSUITE_AT = \ tests/system-kmod-testsuite.at \ tests/system-kmod-macros.at +SYSTEM_BPF_TESTSUITE_AT = \ + tests/system-bpf-testsuite.at \ + tests/system-bpf-macros.at \ + tests/system-bpf-traffic.at + SYSTEM_USERSPACE_TESTSUITE_AT = \ tests/system-userspace-testsuite.at \ tests/system-ovn.at \ @@ -134,6 +142,8 @@ TESTSUITE = $(srcdir)/tests/testsuite TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite +SYSTEM_BPF_TESTSUITE = $(srcdir)/tests/system-bpf-testsuite +BPF_TESTSUITE_PATCH = $(srcdir)/tests/system-bpf-testsuite.patch SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite DISTCLEANFILES += tests/atconfig tests/atlocal @@ -174,6 +184,15 @@ check-lcov: all $(check_DATA) clean-lcov lcov $(LCOV_OPTS) -o tests/lcov/coverage.info genhtml $(GENHTML_OPTS) -o tests/lcov tests/lcov/coverage.info @echo "coverage report generated at tests/lcov/index.html" + +check-bpf-lcov: all $(check_DATA) clean-lcov + find . -name '*.gcda' | xargs -n1 rm -f + -set $(SHELL) '$(SYSTEM_BPF_TESTSUITE)' -C tests AUTOTEST_PATH=$(AUTOTEST_PATH) $(TESTSUITEFLAGS); \ + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) + $(MKDIR_P) tests/lcov + lcov $(LCOV_OPTS) -o tests/lcov/coverage.info + genhtml $(GENHTML_OPTS) -o tests/lcov tests/lcov/coverage.info + @echo "coverage report generated at tests/lcov/index.html" # valgrind support @@ -254,6 +273,11 @@ check-system-userspace: all set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) +check-bpf: all + $(MAKE) install + set $(SHELL) '$(SYSTEM_BPF_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) + check-offloads: all set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) @@ -282,6 +306,11 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ +$(SYSTEM_BPF_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_BPF_TESTSUITE_AT) $(BPF_TESTSUITE_PATCH) $(COMMON_MACROS_AT) + $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at + $(AM_V_at)mv $@.tmp $@ + $(AM_V_at)patch -p1 $@ tests/system-bpf-testsuite.patch + $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT) $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at index c8bfe5b5c262..487e40cc8ef2 100644 --- a/tests/ofproto-macros.at +++ b/tests/ofproto-macros.at @@ -335,6 +335,7 @@ m4_define([_OVS_VSWITCHD_START], AT_CAPTURE_FILE([ovs-vswitchd.log]) on_exit "kill_ovs_vswitchd `cat ovs-vswitchd.pid`" AT_CHECK([[sed < stderr ' +/bpf|INFO|/d /ovs_numa|INFO|Discovered /d /vlog|INFO|opened log file/d /vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d @@ -344,6 +345,7 @@ m4_define([_OVS_VSWITCHD_START], /ofproto|INFO|datapath ID changed to fedcba9876543210/d /dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d /netdev: Flow API/d +/Re-using preloaded BPF datapath/d /tc: Using policy/d']]) ]) @@ -395,6 +397,11 @@ check_logs () { sed -n "$1 /reset by peer/d /Broken pipe/d +/bpf.*|WARN/d +/dpif.*|WARN/d +/bpf.*revalidator.*|ERR/d +/odp_util.*revalidator.*|ERR/d +/ofproto_dpif_upcall.*|WARN/d /timeval.*Unreasonably long [[0-9]]*ms poll interval/d /timeval.*faults: [[0-9]]* minor, [[0-9]]* major/d /timeval.*disk: [[0-9]]* reads, [[0-9]]* writes/d diff --git a/tests/system-bpf-macros.at b/tests/system-bpf-macros.at new file mode 100644 index 000000000000..23c170d73119 --- /dev/null +++ b/tests/system-bpf-macros.at @@ -0,0 +1,112 @@ +# _ADD_BR([name]) +# +# Expands into the proper ovs-vsctl commands to create a bridge with the +# appropriate type and properties +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type="bpf" protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]]) + +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override]) +# +# Creates a database and starts ovsdb-server, starts ovs-vswitchd +# connected to that database, calls ovs-vsctl to create a bridge named +# br0 with predictable settings, passing 'vsctl-args' as additional +# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide +# output (e.g. because it includes "create" commands) then 'vsctl-output' +# specifies the expected output after filtering through uuidfilt.pl. +m4_define([OVS_TRAFFIC_VSWITCHD_START], + [ + export OVS_PKGDATADIR=$(`pwd`) + #OVS_WAIT_WHILE([ip link show ovs-system]) + umount /sys/fs/bpf/ + AT_CHECK([mount -t bpf none /sys/fs/bpf]) + AT_CHECK([mkdir -p /sys/fs/bpf/ovs]) + _OVS_VSWITCHD_START([--disable-system]) + dnl Add bridges, ports, etc. + ip link del br0 + #OVS_WAIT_WHILE([ip link show br0]) + AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| ${PERL} $srcdir/uuidfilt.pl])], [0], [$2]) + on_exit 'ovs-vsctl del-br br0' + on_exit 'ip link del ovs-system' + on_exit 'tail -500 /sys/kernel/debug/tracing/trace > trace' +]) + +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds]) +# +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files +# for messages with severity WARN or higher and signaling an error if any +# is present. The optional WHITELIST may contain shell-quoted "sed" +# commands to delete any warnings that are actually expected, e.g.: +# +# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"]) +# +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is +# invoked. They can be used to perform additional cleanups such as name space +# removal. +m4_define([OVS_TRAFFIC_VSWITCHD_STOP], + [OVS_VSWITCHD_STOP([$1]) + AT_CHECK([:; $2]) + AT_CHECK([umount /sys/fs/bpf]) + AT_CAPTURE_FILE([trace]) + ]) + +# CONFIGURE_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET +# socket to receive packets for veths. Unfortunately, the AF_PACKET socket +# doesn't play well with offloads: +# 1. GSO packets are received without segmentation and therefore discarded. +# 2. Packets with offloaded partial checksum are received with the wrong +# checksum, therefore discarded by the receiver. +# +# By disabling tx offloads in the non-OVS side of the veth peer we make sure +# that the AF_PACKET socket will not receive bad packets. +# +# This is a workaround, and should be removed when offloads are properly +# supported in netdev-linux. +m4_define([CONFIGURE_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])] +) + +# CHECK_CONNTRACK() +# +# Perform requirements checks for running conntrack tests. +# +m4_define([CHECK_CONNTRACK], + [AT_SKIP_IF([test $HAVE_PYTHON = no])] +) + +# CHECK_CONNTRACK_ALG() +# +# Perform requirements checks for running conntrack ALG tests. The userspace +# doesn't support ALGs yet, so skip the tests +# +m4_define([CHECK_CONNTRACK_ALG], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_FRAG() +# +# Perform requirements checks for running conntrack fragmentations tests. +# The userspace doesn't support fragmentation yet, so skip the tests. +m4_define([CHECK_CONNTRACK_FRAG], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_LOCAL_STACK() +# +# Perform requirements checks for running conntrack tests with local stack. +# While the kernel connection tracker automatically passes all the connection +# tracking state from an internal port to the OpenvSwitch kernel module, there +# is simply no way of doing that with the userspace, so skip the tests. +m4_define([CHECK_CONNTRACK_LOCAL_STACK], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_NAT() +# +# Perform requirements checks for running conntrack NAT tests. The userspace +# datapath supports NAT. +# +m4_define([CHECK_CONNTRACK_NAT]) diff --git a/tests/system-bpf-testsuite.at b/tests/system-bpf-testsuite.at new file mode 100644 index 000000000000..54ebbcba17dc --- /dev/null +++ b/tests/system-bpf-testsuite.at @@ -0,0 +1,25 @@ +AT_INIT + +AT_COPYRIGHT([Copyright (c) 2015 Nicira, Inc. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.]) + +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS]) + +m4_include([tests/ovs-macros.at]) +m4_include([tests/ovsdb-macros.at]) +m4_include([tests/ofproto-macros.at]) +m4_include([tests/system-bpf-macros.at]) +m4_include([tests/system-common-macros.at]) + +m4_include([tests/system-bpf-traffic.at]) diff --git a/tests/system-bpf-testsuite.patch b/tests/system-bpf-testsuite.patch new file mode 100644 index 000000000000..94f3771d4ee9 --- /dev/null +++ b/tests/system-bpf-testsuite.patch @@ -0,0 +1,10 @@ +--- system-bpf-testsuite 2018-05-31 05:10:16.425135086 -0700 ++++ system-bpf-testsuite 2018-05-31 05:13:46.556051030 -0700 +@@ -2369,7 +2369,6 @@ + else + if test -d "$at_group_dir"; then + find "$at_group_dir" -type d ! -perm -700 -exec chmod u+rwx \{\} \; +- rm -fr "$at_group_dir" + fi + rm -f "$at_test_source" + fi diff --git a/tests/system-bpf-traffic.at b/tests/system-bpf-traffic.at new file mode 100644 index 000000000000..29bb80b5d954 --- /dev/null +++ b/tests/system-bpf-traffic.at @@ -0,0 +1,851 @@ +AT_BANNER([BPF datapath-sanity]) + +AT_SETUP([datapath - basic BPF commands]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-appctl dpif/dump-dps], [0], [dnl +bpf@br0 +]) +AT_CHECK([ovs-appctl dpif/show], [0], [dnl +bpf@ovs-bpf: hit:0 missed:0 + br0: + br0 65534/1: (tap) +]) +AT_CHECK([ovs-appctl dpctl/dump-flows bpf@ovs-bpf], [0], [dnl +]) +AT_CHECK([ovs-appctl dpif/dump-flows br0], [0], [dnl +]) +AT_CHECK([ovs-bpfctl show], [0], [stdout]) + +dnl NOTE: BPF datapath does not support megaflow, so the +dnl rules below won't match any packet +AT_CHECK([ovs-appctl dpctl/add-flow bpf@ovs-bpf "in_port(1),eth(),eth_type(0x0806),arp()" 2], [0], [stdout]) + +AT_CHECK([ovs-appctl dpctl/add-flow bpf@ovs-bpf "in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15),eth_type(0x0800),ipv4(src=35.8.2.41,dst=172.16.0.20,proto=5,tos=0x80,ttl=128,frag=no)" 2], [0], [stdout]) + +AT_CHECK([ovs-appctl dpctl/add-flow bpf@ovs-bpf "in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15),eth_type(0x86dd),ipv6(src=::1,dst=::2,label=0,proto=6,tclass=0,hlimit=128,frag=no),tcp(src=80,dst=8080)" 2], [0], [stdout]) + +dnl this will print "receive tunnel port not found" and cause failure +dnl AT_CHECK([ovs-appctl dpctl/add-flow bpf@ovs-bpf "skb_priority(0),tunnel(tun_id=0x7f10354,src=10.10.10.10,dst=20.20.20.20,ttl=64,flags(csum|key)),skb_mark(0x1234),recirc_id(0),dp_hash(0),in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15)" 2], [0], [stdout]) + +dnl AT_CHECK([ovs-appctl dpctl/add-flow bpf@ovs-bpf "skb_priority(0x1234),tunnel(tun_id=0xfedcba9876543210,src=10.10.10.10,dst=20.20.20.20,tos=0x8,ttl=64,flags(key)),skb_mark(0),recirc_id(0),dp_hash(0),in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15),eth_type(0x8100),vlan(vid=99,pcp=7),encap(eth_type(0x86dd),ipv6(src=::1,dst=::2,label=0,proto=58,tclass=0,hlimit=128,frag=no),icmpv6(type=136,code=0),nd(target=::3,sll=00:05:06:07:08:09,tll=00:0a:0b:0c:0d:0e))" 2],[0], [stdout]) + +dnl AT_CHECK([ovs-appctl dpif/del-flows br0], [0], [dnl +dnl ]) + +dnl AT_CHECK([ovs-dpctl add-flow bpf@ovs-bpf "in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15),eth_type(0x0800),ipv4(src=35.8.2.41,dst=172.16.0.20,proto=5,tos=0x80,ttl=128,frag=no)" 2], [0], [dnl +dnl ]) + + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - http between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_START_L7([at_ns1], [http]) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24") +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping between two ports on cvlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +ADD_SVLAN(p0, at_ns0, 4094, "10.255.2.1/24") +ADD_SVLAN(p1, at_ns1, 4094, "10.255.2.2/24") + +ADD_CVLAN(p0.4094, at_ns0, 100, "10.2.2.1/24") +ADD_CVLAN(p1.4094, at_ns1, 100, "10.2.2.2/24") + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.2.2.2]) + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH(p1, at_ns1, br0, "fc00::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH(p1, at_ns1, br0, "fc00::2/96") + +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96") +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports on cvlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH(p1, at_ns1, br0, "fc00::2/96") + +ADD_SVLAN(p0, at_ns0, 4094, "fc00:ffff::1/96") +ADD_SVLAN(p1, at_ns1, 4094, "fc00:ffff::2/96") + +ADD_CVLAN(p0.4094, at_ns0, 100, "fc00:1::1/96") +ADD_CVLAN(p1.4094, at_ns1, 100, "fc00:1::2/96") + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +dnl NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 6 fc00:1::2 | FORMAT_PING], [0], [dnl +dnl 3 packets transmitted, 3 received, 0% packet loss, time 0ms +dnl ]) +dnl NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 6 fc00:1::2 | FORMAT_PING], [0], [dnl +dnl 3 packets transmitted, 3 received, 0% packet loss, time 0ms +dnl ]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over bond]) +AT_SKIP_IF([echo > /dev/null]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_BOND(p1 p2, at_ns1, br0, bond0, lacp=active bond_mode=balance-tcp, "10.1.1.2/24") + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping -c 1 10.1.1.2]) + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan tunnel]) +OVS_CHECK_VXLAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +ip link del vxlan_sys_4789 +on_exit 'ip link del vxlan_sys_4789' +on_exit 'ip link del br-underlay' + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [id 0 dstport 4789]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan6 tunnel]) +OVS_CHECK_VXLAN_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +ip link del vxlan_sys_4789 + +on_exit 'ip link del vxlan_sys_4789' +on_exit 'ip link del br-underlay' + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24], + [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over gre tunnel]) +OVS_CHECK_GRE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) +on_exit 'ip link del br-underlay' + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24], [options:key=100]) +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [key 100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve tunnel]) +OVS_CHECK_GENEVE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +ip link del genev_sys_6081 +on_exit 'ip link del genev_sys_6081' +on_exit 'ip link del br-underlay' + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24], [options:key=22]) +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [vni 22]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve tunnel with TLV]) +OVS_CHECK_GENEVE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-tlv-map br0 "{class=0xffff,type=0,len=4}->tun_metadata0"]) +AT_CHECK([ovs-ofctl --protocols=OpenFlow15 add-flow br0 "actions=set_field:0xfaceb001->tun_metadata0, normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +ip link del genev_sys_6081 +on_exit 'ip link del genev_sys_6081' +on_exit 'ip link del br-underlay' + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24], [options:key=22]) +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [vni 22]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + +AT_SETUP([datapath - ping over geneve6 tunnel]) +OVS_CHECK_GENEVE_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24], + [vni 0 udp6zerocsumtx udp6zerocsumrx]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - clone action]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \ + -- set interface ovs-p1 ofport_request=2]) + +AT_DATA([flows.txt], [dnl +priority=1 actions=NORMAL +priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2 +priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - mpls actions]) +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br1, "10.1.1.2/24") + +AT_CHECK([ip link add patch0 type veth peer name patch1]) +on_exit 'ip link del patch0' + +AT_CHECK([ip link set dev patch0 up]) +AT_CHECK([ip link set dev patch1 up]) +AT_CHECK([ovs-vsctl add-port br0 patch0]) +AT_CHECK([ovs-vsctl add-port br1 patch1]) + +AT_DATA([flows.txt], [dnl +table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,resubmit(,1) +table=0,priority=100,dl_type=0x8847,mpls_label=3 actions=pop_mpls:0x0800,resubmit(,1) +table=0,priority=10 actions=resubmit(,1) +table=1,priority=10 actions=normal +]) + +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +AT_CHECK([ovs-ofctl add-flows br1 flows.txt]) + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 6 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 6 10.1.1.1 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP +AT_SETUP([datapath - basic truncate action]) +AT_SKIP_IF([test $HAVE_NC = no]) +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-ofctl del-flows br0]) + +dnl Create p0 and ovs-p0(1) +ADD_NAMESPACES(at_ns0) +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11]) +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22]) + +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1 +AT_CHECK([ip link add p1 type veth peer name ovs-p1]) +on_exit 'ip link del ovs-p1' +AT_CHECK([ip link set dev ovs-p1 up]) +AT_CHECK([ip link set dev p1 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2]) +dnl Use p1 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3]) + +dnl Create p2(5) and ovs-p2(4) +AT_CHECK([ip link add p2 type veth peer name ovs-p2]) +on_exit 'ip link del ovs-p2' +AT_CHECK([ip link set dev ovs-p2 up]) +AT_CHECK([ip link set dev p2 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4]) +dnl Use p2 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5]) + +dnl basic test +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +dnl use this file as payload file for ncat +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null]) +on_exit 'rm -f payload200.bin' +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl packet with truncated size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=100 +]) +dnl packet with original size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=242 +]) + +dnl more complicated output actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535) +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +dnl SLOW_ACTION: disable kernel datapath truncate support +dnl Repeat the test above, but exercise the SLOW_ACTION code path +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0]) + +dnl SLOW_ACTION test1: check datapatch actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout]) +AT_CHECK([tail -3 stdout], [0], +[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3 +This flow is handled by the userspace slow path because it: + - Uses action(s) not supported by datapath. +]) + +dnl SLOW_ACTION test2: check actual packet truncate +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) + +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +dnl Create 2 bridges and 2 namespaces to test truncate over +dnl GRE tunnel: +dnl br0: overlay bridge +dnl ns1: connect to br0, with IP:10.1.1.2 +dnl br-underlay: with IP: 172.31.1.100 +dnl ns0: connect to br-underlay, with IP: 10.1.1.1 +AT_SETUP([datapath - truncate and output to gre tunnel]) +AT_SKIP_IF([test $HAVE_NC = no]) +OVS_CHECK_GRE() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_BR([br-underlay]) +ADD_NAMESPACES(at_ns0) +ADD_NAMESPACES(at_ns1) +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [], [address e6:66:c1:11:11:11]) +AT_CHECK([ovs-vsctl -- set interface at_gre0 ofport_request=1]) +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22]) + +dnl Set up (p1 and ovs-p1) at br0 +ADD_VETH(p1, at_ns1, br0, '10.1.1.2/24') +AT_CHECK([ovs-vsctl -- set interface ovs-p1 ofport_request=2]) +NS_CHECK_EXEC([at_ns1], [ip link set dev p1 address e6:66:c1:22:22:22]) +NS_CHECK_EXEC([at_ns1], [arp -s 10.1.1.1 e6:66:c1:11:11:11]) + +dnl Set up (p2 and ovs-p2) as loopback for verifying packet size +AT_CHECK([ip link add p2 type veth peer name ovs-p2]) +on_exit 'ip link del ovs-p2' +AT_CHECK([ip link set dev ovs-p2 up]) +AT_CHECK([ip link set dev p2 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=3]) +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=4]) + +dnl use this file as payload file for ncat +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null]) +on_exit 'rm -f payload200.bin' + +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +priority=99,in_port=1,actions=output(port=2,max_len=100),output(port=3,max_len=100) +priority=99,in_port=2,udp,actions=output(port=1,max_len=100) +priority=1,in_port=4,ip,actions=drop +priority=1,actions=drop +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-ofctl del-flows br-underlay]) +AT_DATA([flows-underlay.txt], [dnl +priority=99,dl_type=0x0800,nw_proto=47,in_port=1,actions=LOCAL +priority=99,dl_type=0x0800,nw_proto=47,in_port=LOCAL,ip_dst=172.31.1.1/24,actions=1 +priority=1,actions=drop +]) + +AT_CHECK([ovs-ofctl add-flows br-underlay flows-underlay.txt]) + +dnl check tunnel push path, from at_ns1 to at_ns0 +NS_CHECK_EXEC([at_ns1], [nc $NC_EOF_OPT -u 10.1.1.1 1234 < payload200.bin]) +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +dnl Before truncation = ETH(14) + IP(20) + UDP(8) + 200 = 242B +AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=2" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=242 +]) +dnl After truncation = outer ETH(14) + outer IP(20) + GRE(4) + 100 = 138B +AT_CHECK([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=138 +]) + +dnl check tunnel pop path, from at_ns0 to at_ns1 +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 5678 < payload200.bin]) +dnl After truncation = 100 byte at loopback device p2(4) +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip], [0], [dnl + n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop +]) + +dnl SLOW_ACTION: disable datapath truncate support +dnl Repeat the test above, but exercise the SLOW_ACTION code path +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0]) + +dnl SLOW_ACTION test1: check datapatch actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +dnl SLOW_ACTION test2: check actual packet truncate +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +AT_CHECK([ovs-ofctl del-flows br-underlay]) +AT_CHECK([ovs-ofctl add-flows br-underlay flows-underlay.txt]) + +dnl check tunnel push path, from at_ns1 to at_ns0 +NS_CHECK_EXEC([at_ns1], [nc $NC_EOF_OPT -u 10.1.1.1 1234 < payload200.bin]) +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +dnl Before truncation = ETH(14) + IP(20) + UDP(8) + 200 = 242B +AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=2" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=242 +]) +dnl After truncation = outer ETH(14) + outer IP(20) + GRE(4) + 100 = 138B +AT_CHECK([ovs-ofctl dump-flows br-underlay | grep "in_port=LOCAL" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=138 +]) + +dnl check tunnel pop path, from at_ns0 to at_ns1 +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 5678 < payload200.bin]) +dnl After truncation = 100 byte at loopback device p2(4) +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 | grep "in_port=4" | ofctl_strip], [0], [dnl + n_packets=1, n_bytes=100, priority=1,ip,in_port=4 actions=drop +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + +dnl simple test case for BPF +AT_SETUP([ovn -- 1 LR connects 2 LSes]) +AT_KEYWORDS([ovnbpf]) + +ovn_start +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-int]) + +# Set external-ids in br-int needed for ovn-controller +# Use vxlan here +ovs-vsctl \ + -- set Open_vSwitch . external-ids:system-id=hv1 \ + -- set Open_vSwitch . external-ids:ovn-remote=unix:$ovs_base/ovn-sb/ovn-sb.sock \ + -- set Open_vSwitch . external-ids:ovn-encap-type=vxlan \ + -- set Open_vSwitch . external-ids:ovn-encap-ip=169.0.0.1 \ + -- set bridge br-int fail-mode=secure other-config:disable-in-band=true + +# Start ovn-controller +start_daemon ovn-controller + +# Logical network: +# 1 LR - R1 and 2 LSes foo and bar R1 has switchess foo (192.168.1.0/24) +# and # bar (192.168.2.0/24) connected to it. +# +# foo ------- R1 ------- bar +# 192.168.1.0/24 192.168.2.0/24 +# + +ovn-nbctl create Logical_Router name=R1 + +ovn-nbctl ls-add foo +ovn-nbctl ls-add bar + +# Connect foo to R1 +ovn-nbctl lrp-add R1 foo 00:00:01:01:02:03 192.168.1.1/24 +ovn-nbctl lsp-add foo rp-foo -- set Logical_Switch_Port rp-foo \ + type=router options:router-port=foo addresses=\"00:00:01:01:02:03\" + +# Connect bar to R1 +ovn-nbctl lrp-add R1 bar 00:00:01:01:02:04 192.168.2.1/24 +ovn-nbctl lsp-add bar rp-bar -- set Logical_Switch_Port rp-bar \ + type=router options:router-port=bar addresses=\"00:00:01:01:02:04\" + +# Logical port 'foo1' in switch 'foo'. +ADD_NAMESPACES(foo1) +ADD_VETH(foo1, foo1, br-int, "192.168.1.2/24", "f0:00:00:01:02:03", \ + "192.168.1.1") +ovn-nbctl lsp-add foo foo1 \ + -- lsp-set-addresses foo1 "f0:00:00:01:02:03 192.168.1.2" + +ADD_NAMESPACES(bar1) +ADD_VETH(bar1, bar1, br-int, "192.168.2.2/24", "f0:00:00:01:02:05", \ +"192.168.2.1") +ovn-nbctl lsp-add bar bar1 \ + -- lsp-set-addresses bar1 "f0:00:00:01:02:05 192.168.2.2" + +# wait for ovn-controller to catch up. +ovn-nbctl --wait=hv sync + +# 'bar1' should be able to ping 'foo1' directly. +NS_CHECK_EXEC([bar1], [ping -q -c 3 -i 0.3 -w 8 192.168.1.2 | FORMAT_PING], \ +[0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_APP_EXIT_AND_WAIT([ovn-controller]) + +as ovn-sb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as ovn-nb +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as northd +OVS_APP_EXIT_AND_WAIT([ovn-northd]) + +as +OVS_TRAFFIC_VSWITCHD_STOP(["/failed to query port patch-.*/d +/connection dropped.*/d"]) +AT_CLEANUP + From patchwork Sat Jul 14 11:39:03 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 943923 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="qN9OouPX"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41SSYH6Hn6z9ryt for ; Sat, 14 Jul 2018 21:45:55 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id E8CEFCEF; Sat, 14 Jul 2018 11:40:12 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id C42E5CE9 for ; Sat, 14 Jul 2018 11:40:11 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl0-f65.google.com (mail-pl0-f65.google.com [209.85.160.65]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 64556796 for ; Sat, 14 Jul 2018 11:40:11 +0000 (UTC) Received: by mail-pl0-f65.google.com with SMTP id p23-v6so2034526plo.6 for ; Sat, 14 Jul 2018 04:40:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=eYKx4PGeYvrX1Um1UMZIujb7yavYYRcry7L6YuA4Q4o=; b=qN9OouPXXmx1DEPBS4Gyt7JwUad700VX3KMKtqACteuLB6jQtVRmXXwiY+AmMByfuz 6ujmDevVq6JIFAqN9lKFcLMJr2TJzrvnXVtqH5nvmcMGPTylQda/scEptDhRWfeBeMC8 nln7J6VnGVcCH3lOLn9nHxzdfN1pEtJstfn0ACcQBEr6C+A5ncblGC6JDjWW/+O0DL1F mumtA8/HHSXqH3b0273W4YD9KX0IbfQekKspofPLSOzk7SEthMXwATqgH/i5o1uBCUOU 76cRneRhnArbEXCgxndfIe2wYeaxCZPSwHKP71Cn9sfMCDLEnE+Ycgqt5WS4e2Ruj8uo z+Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=eYKx4PGeYvrX1Um1UMZIujb7yavYYRcry7L6YuA4Q4o=; b=F3ix+UyojSt9D9+Pkd8/ZOaOluu5b/celNrwZi4TWEWipBYlWAeyUCUWZR6i6z/S/T F1A8G+1HDtb4DHfV+h5ESaxPrB6srBYa8Nkea0EzgCh9ULnw4xKWABMsRuhO/YiZ4Tpe w+SqeI5QSwIK84l+0LaghWtIaAGarYNm7SGWaakm2p7lMeHTloCb1SN5mWglG3ZOIjVz AVPCxG+DwuHTjxlM+YUUHJXtx9VdZ+yV+a8kSj/Qo5ACM+l9e4WdApdMacIX8AYBvV6G L81p/5fwpMR5W372q283W92kSXExnd3S4rAzdMMErDla3qD43C1yPqDjIQDAtpxZlTId mPMw== X-Gm-Message-State: AOUpUlF0B2v1HT/mNcnyVKTMUhJRIqd4eE/xB4mKzoOtAXk2ZNTS/myS 7puRcLoKcvxHASVjixFbfKd//Jz8 X-Google-Smtp-Source: AAOMgpfrd4TLGYQG1coX8n1RTaBeb51JWAeC6EJDDiE8UX0uXlDgsnnxUbekUFodJoANoUrER+nC3A== X-Received: by 2002:a17:902:4424:: with SMTP id k33-v6mr9845596pld.242.1531568410829; Sat, 14 Jul 2018 04:40:10 -0700 (PDT) Received: from sc9-mailhost3.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id m21-v6sm35825267pgv.27.2018.07.14.04.40.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 14 Jul 2018 04:40:10 -0700 (PDT) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Sat, 14 Jul 2018 04:39:03 -0700 Message-Id: <1531568345-80246-12-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1531568345-80246-1-git-send-email-u9012063@gmail.com> References: <1531568345-80246-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [RFC PATCHv2 11/13] vagrant: add ebpf support using ubuntu/bionic X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org VAGRANT_VAGRANTFILE=Vagrantfile-eBPF vagrant up Signed-off-by: William Tu Signed-off-by: Yifeng Sun --- Makefile.am | 1 + Vagrantfile-eBPF | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 100 insertions(+) create mode 100644 Vagrantfile-eBPF diff --git a/Makefile.am b/Makefile.am index ec1fc53b1060..d26c765a285a 100644 --- a/Makefile.am +++ b/Makefile.am @@ -86,6 +86,7 @@ EXTRA_DIST = \ $(MAN_ROOTS) \ Vagrantfile \ Vagrantfile-FreeBSD \ + Vagrantfile-eBPF \ .mailmap bin_PROGRAMS = sbin_PROGRAMS = diff --git a/Vagrantfile-eBPF b/Vagrantfile-eBPF new file mode 100644 index 000000000000..7b9be32b8f03 --- /dev/null +++ b/Vagrantfile-eBPF @@ -0,0 +1,99 @@ +# -*- mode: ruby -*- +# vi: set ft=ruby : + +$bootstrap = <