From patchwork Wed Nov 28 21:22:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1004842 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="u0j1O8LD"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 434twC2m2fz9ryk for ; Thu, 29 Nov 2018 08:24:59 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 2D3A2CFE; Wed, 28 Nov 2018 21:23:19 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id ED341C7C for ; Wed, 28 Nov 2018 21:23:15 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pl1-f195.google.com (mail-pl1-f195.google.com [209.85.214.195]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id ACF857A4 for ; Wed, 28 Nov 2018 21:23:13 +0000 (UTC) Received: by mail-pl1-f195.google.com with SMTP id gn14so18011373plb.10 for ; Wed, 28 Nov 2018 13:23:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=H552+uOu/yVoGZdqVllRqeLV6WfnK/vBQ0iz5w1d4JE=; b=u0j1O8LDnNikjAUBD02nLLD9lffx/wxr/Hjg13PXR7b87n37Q70RsIboBKy/fMj32b cDy3uLTGGlSizaxt8la6HZLDIDbJr/kOiDyUB4YtNUbYE5TQWCPe5Godg3YGQRLS1iMe fREi2xjvsiTVIm8RsqYZo2A8o+09IHaG+zx0GgoR2U07lNFbmKslOSn7R7uj8BdZ+Bpv /xS6GCb+cnegpQU7zx5715tpEQy9BvtcwcG2RoHapz2LXtmt65kqWWi8k+RJ7Y9/CNC8 +IrEe1Xu/zGyUbeKyu8PfV3jb7buVAaeLvUcVzFxUnE7GUdIIaJcezyWwQ1OhHnApJNv YxhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=H552+uOu/yVoGZdqVllRqeLV6WfnK/vBQ0iz5w1d4JE=; b=oCFi9ylJjW1DDPGgBAKUE0FANe4zJlhxorFUW+p7J+BOqXpOUUTc8oxUaZ7rffYMdV 8gTV/+qg4LFNYBs85sQqG11RdJJMyPxg6XV+T+/WXhDsFUEMJiZ4ZgB6I0oViz89HLE8 zCZoxDETsysAfyPHgHwcbi4mQ/Zaq10Ey1dOeL2Huc8BEgtfQ6DEPnqkY9YBOAiDrKES IpVqRe3n0nTV/+UVu0QzsOBVhRO8N6Ivji9Vn8/6i2wD+4M853Ei8gjjvzzrOiayuzNH rkqvo4K0uSE4Ez1lgkvX6QRp5A7edJkgIgvw2z3Y2OkqUF681AZveY0IFhOueEife6X/ EKlg== X-Gm-Message-State: AA+aEWYqkYKsuaXw58eM8191tCAx0+1aGLHtu4djEtlQzvo62jkpikzv 4Y/lRCLoI4D8H6oM3em+zuq1rbVK X-Google-Smtp-Source: AFSGD/UHM47Zy8G6ggybFJP5AgXYAkJKmPNkOI2HGrPYm4oeNo4r17ppkxH6jhAum2yh/yx8/nUylQ== X-Received: by 2002:a17:902:4c08:: with SMTP id a8mr39040492ple.74.1543440192396; Wed, 28 Nov 2018 13:23:12 -0800 (PST) Received: from sc9-mailhost2.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id t5sm10899178pfb.60.2018.11.28.13.23.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 28 Nov 2018 13:23:11 -0800 (PST) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Wed, 28 Nov 2018 13:22:20 -0800 Message-Id: <1543440142-27253-2-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1543440142-27253-1-git-send-email-u9012063@gmail.com> References: <1543440142-27253-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCHv3 RFC 1/3] netdev-afxdp: add new netdev type for AF_XDP X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The patch creates a new netdev type called "afxdp" and re-uses some of the AF_XDP API implementation from xdpsock_user.c at linux sample code. By default, it binds a device's queue 0 and uses the generic XDP support to send and receive packets. Signed-off-by: William Tu --- acinclude.m4 | 13 + configure.ac | 1 + lib/automake.mk | 6 +- lib/dp-packet.c | 20 ++ lib/dp-packet.h | 29 ++- lib/netdev-afxdp.c | 703 ++++++++++++++++++++++++++++++++++++++++++++++++++ lib/netdev-afxdp.h | 41 +++ lib/netdev-linux.c | 72 +++++- lib/netdev-provider.h | 1 + lib/netdev.c | 1 + lib/xdpsock.c | 171 ++++++++++++ lib/xdpsock.h | 144 +++++++++++ 12 files changed, 1197 insertions(+), 5 deletions(-) create mode 100644 lib/netdev-afxdp.c create mode 100644 lib/netdev-afxdp.h create mode 100644 lib/xdpsock.c create mode 100644 lib/xdpsock.h diff --git a/acinclude.m4 b/acinclude.m4 index ed83df43df54..d89d9b7d1295 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -205,6 +205,19 @@ AC_DEFUN([OVS_CHECK_LINUX_TC], [ [Define to 1 if TCA_PEDIT_KEY_EX_HDR_TYPE_UDP is available.])]) ]) +dnl OVS_CHECK_LINUX_AF_XDP +dnl +dnl Configure Linux AF_XDP compat. +AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], + [AC_CHECK_HEADER([linux/if_xdp.h], + [HAVE_AF_XDP=yes], + [HAVE_AF_XDP=no]) + AM_CONDITIONAL([HAVE_AF_XDP], [test "$HAVE_AF_XDP" = yes]) + if test "$HAVE_AF_XDP" = yes; then + AC_DEFINE([HAVE_AF_XDP], [1], + [Define to 1 if linux/if_xdp.h is available.]) + fi]) + dnl OVS_CHECK_DPDK dnl dnl Configure DPDK source tree diff --git a/configure.ac b/configure.ac index 3e97a750c812..0c86dae192df 100644 --- a/configure.ac +++ b/configure.ac @@ -136,6 +136,7 @@ OVS_LIBTOOL_VERSIONS OVS_CHECK_CXX AX_FUNC_POSIX_MEMALIGN OVS_CHECK_UNBOUND +OVS_CHECK_LINUX_AF_XDP OVS_CHECK_INCLUDE_NEXT([stdio.h string.h]) AC_CONFIG_FILES([ diff --git a/lib/automake.mk b/lib/automake.mk index 63e9d72ac18a..3516c0784136 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -323,7 +323,11 @@ lib_libopenvswitch_la_SOURCES = \ lib/lldp/lldpd.c \ lib/lldp/lldpd.h \ lib/lldp/lldpd-structs.c \ - lib/lldp/lldpd-structs.h + lib/lldp/lldpd-structs.h \ + lib/xdpsock.c \ + lib/xdpsock.h \ + lib/netdev-afxdp.c \ + lib/netdev-afxdp.h if WIN32 lib_libopenvswitch_la_SOURCES += \ diff --git a/lib/dp-packet.c b/lib/dp-packet.c index 93b0e9c84793..b208922945a4 100644 --- a/lib/dp-packet.c +++ b/lib/dp-packet.c @@ -121,6 +121,13 @@ dp_packet_uninit(struct dp_packet *b) * created as a dp_packet */ free_dpdk_buf((struct dp_packet*) b); #endif + } else if (b->source == DPBUF_AFXDP) { + struct dp_packet_afxdp *xpacket; + + xpacket = dp_packet_cast_afxdp(b); + if (xpacket->mpool) + umem_elem_push(xpacket->mpool, dp_packet_base(b)); + return; } } } @@ -249,6 +256,18 @@ dp_packet_resize__(struct dp_packet *b, size_t new_headroom, size_t new_tailroom case DPBUF_STACK: OVS_NOT_REACHED(); + case DPBUF_AFXDP: + if (new_headroom == dp_packet_headroom(b)) { + new_base = xmalloc(new_allocated); + } else { + new_base = xmalloc(new_allocated); + dp_packet_copy__(b, new_base, new_headroom, new_tailroom); + free(dp_packet_base(b)); + } + b->source = DPBUF_MALLOC; + // put back to freelist + OVS_NOT_REACHED(); + break; case DPBUF_STUB: b->source = DPBUF_MALLOC; new_base = xmalloc(new_allocated); @@ -434,6 +453,7 @@ dp_packet_steal_data(struct dp_packet *b) { void *p; ovs_assert(b->source != DPBUF_DPDK); + ovs_assert(b->source != DPBUF_AFXDP); if (b->source == DPBUF_MALLOC && dp_packet_data(b) == dp_packet_base(b)) { p = dp_packet_data(b); diff --git a/lib/dp-packet.h b/lib/dp-packet.h index 7b85dd902cce..c115c62f4c37 100644 --- a/lib/dp-packet.h +++ b/lib/dp-packet.h @@ -30,6 +30,7 @@ #include "packets.h" #include "util.h" #include "flow.h" +#include "xdpsock.h" #ifdef __cplusplus extern "C" { @@ -42,10 +43,10 @@ enum OVS_PACKED_ENUM dp_packet_source { DPBUF_DPDK, /* buffer data is from DPDK allocated memory. * ref to dp_packet_init_dpdk() in dp-packet.c. */ + DPBUF_AFXDP, }; #define DP_PACKET_CONTEXT_SIZE 64 - /* Buffer for holding packet data. A dp_packet is automatically reallocated * as necessary if it grows too large for the available memory. * By default the packet type is set to Ethernet (PT_ETH). @@ -80,6 +81,17 @@ struct dp_packet { }; }; +struct dp_packet_afxdp { + struct umem_pool *mpool; + struct dp_packet packet; +}; + +static struct dp_packet_afxdp *dp_packet_cast_afxdp(const struct dp_packet *d) +{ + ovs_assert(d->source == DPBUF_AFXDP); + return CONTAINER_OF(d, struct dp_packet_afxdp, packet); +} + static inline void *dp_packet_data(const struct dp_packet *); static inline void dp_packet_set_data(struct dp_packet *, void *); static inline void *dp_packet_base(const struct dp_packet *); @@ -174,7 +186,20 @@ dp_packet_delete(struct dp_packet *b) free_dpdk_buf((struct dp_packet*) b); return; } - + if (b->source == DPBUF_AFXDP) { + struct dp_packet_afxdp *xpacket; + + /* if a packet is received from afxdp port, + * and tx to a system port. Then we need to + * push the rx umem back here + */ + xpacket = dp_packet_cast_afxdp(b); + if (xpacket->mpool) + umem_elem_push(xpacket->mpool, dp_packet_base(b)); + + //free(xpacket); + return; + } dp_packet_uninit(b); free(b); } diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c new file mode 100644 index 000000000000..1d33cdcb8931 --- /dev/null +++ b/lib/netdev-afxdp.c @@ -0,0 +1,703 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#ifndef HAVE_AF_XDP +#else +#include "netdev-linux.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "dp-packet.h" +#include "dpif-netlink.h" +#include "dpif-netdev.h" +#include "openvswitch/dynamic-string.h" +#include "fatal-signal.h" +#include "hash.h" +#include "openvswitch/hmap.h" +#include "netdev-provider.h" +#include "netdev-tc-offloads.h" +#include "netdev-vport.h" +#include "netlink-notifier.h" +#include "netlink-socket.h" +#include "netlink.h" +#include "netnsid.h" +#include "openvswitch/ofpbuf.h" +#include "openflow/openflow.h" +#include "ovs-atomic.h" +#include "packets.h" +#include "openvswitch/poll-loop.h" +#include "rtnetlink.h" +#include "openvswitch/shash.h" +#include "socket-util.h" +#include "sset.h" +#include "tc.h" +#include "timer.h" +#include "unaligned.h" +#include "openvswitch/vlog.h" +#include "util.h" +#include "lib/xdpsock.h" +#include "netdev-afxdp.h" + +VLOG_DEFINE_THIS_MODULE(netdev_afxdp); + +#ifndef SOL_XDP +#define SOL_XDP 283 +#endif +#ifndef AF_XDP +#define AF_XDP 44 +#endif +#ifndef PF_XDP +#define PF_XDP AF_XDP +#endif + +#define barrier() __asm__ __volatile__("": : :"memory") +#define u_smp_rmb() barrier() +#define u_smp_wmb() barrier() + +#define UMEM2DESC(elem, base) ((uint64_t)((char *)elem - (char *)base)) +#define UMEM2XPKT(base, i) \ + (struct dp_packet_afxdp *)((char *)base + i * sizeof(struct dp_packet_afxdp)) + +#define AFXDP_MODE XDP_FLAGS_SKB_MODE /* DRV_MODE or SKB_MODE */ +static uint32_t opt_xdp_flags; +static uint32_t opt_xdp_bind_flags; +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + +static inline uint32_t xq_nb_avail(struct xdp_uqueue *q, uint32_t ndescs) +{ + uint32_t entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + + return (entries > ndescs) ? ndescs : entries; +} + +static inline uint32_t umem_nb_free(struct xdp_umem_uqueue *q, uint32_t nb) +{ + uint32_t free_entries = q->cached_cons - q->cached_prod; + + if (free_entries >= nb) + return free_entries; + + q->cached_cons = (*q->consumer + q->size) & q->mask; + + return q->cached_cons - q->cached_prod; +} + +static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq, + struct xdp_desc *d, + size_t nb) +{ + uint32_t i; + + if (umem_nb_free(fq, nb) < nb) { + VLOG_ERR("%s error\n", __func__); + return -ENOSPC; + } + + for (i = 0; i < nb; i++) { + uint32_t idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i].addr; + } + + u_smp_wmb(); + + *fq->producer = fq->cached_prod; + + return 0; +} + +static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, uint64_t *d, + size_t nb) +{ + uint32_t i; + + if (umem_nb_free(fq, nb) < nb) { + VLOG_ERR("%s Not enough free blocks\n", __func__); + return -ENOSPC; + } + + for (i = 0; i < nb; i++) { + uint32_t idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i]; + } + + u_smp_wmb(); + + *fq->producer = fq->cached_prod; + + return 0; +} + +static inline uint32_t umem_nb_avail(struct xdp_umem_uqueue *q, uint32_t nb) +{ + uint32_t entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + + return (entries > nb) ? nb : entries; +} + +static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, + uint64_t *d, size_t nb) +{ + uint32_t idx, i, entries = umem_nb_avail(cq, nb); + + u_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = cq->cached_cons++ & cq->mask; + d[i] = cq->ring[idx]; + } + + if (entries > 0) { + u_smp_wmb(); + + *cq->consumer = cq->cached_cons; + } + + return entries; +} + +static struct xdp_umem *xdp_umem_configure(int sfd) +{ + int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS; + struct xdp_mmap_offsets off; + struct xdp_umem_reg mr; + struct xdp_umem *umem; + socklen_t optlen; + void *bufs; + int i; + + umem = xcalloc(1, sizeof(*umem)); + + ovs_assert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */ + NUM_FRAMES * FRAME_SIZE) == 0); + + VLOG_DBG("%s shared umem from %p to %p", __func__, + bufs, (char*)bufs + NUM_FRAMES * FRAME_SIZE); + + mr.addr = (uint64_t)bufs; + mr.len = NUM_FRAMES * FRAME_SIZE; + mr.chunk_size = FRAME_SIZE; + mr.headroom = FRAME_HEADROOM; + + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size, + sizeof(int)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size, + sizeof(int)) == 0); + + optlen = sizeof(off); + ovs_assert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off, + &optlen) == 0); + + umem->fq.map = mmap(0, off.fr.desc + + FQ_NUM_DESCS * sizeof(uint64_t), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_FILL_RING); + ovs_assert(umem->fq.map != MAP_FAILED); + + umem->fq.mask = FQ_NUM_DESCS - 1; + umem->fq.size = FQ_NUM_DESCS; + umem->fq.producer = (void *)((char *)umem->fq.map + off.fr.producer); + umem->fq.consumer = (void *)((char *)umem->fq.map + off.fr.consumer); + umem->fq.ring = (void *)((char *)umem->fq.map + off.fr.desc); + umem->fq.cached_cons = FQ_NUM_DESCS; + + umem->cq.map = mmap(0, off.cr.desc + + CQ_NUM_DESCS * sizeof(uint64_t), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_COMPLETION_RING); + ovs_assert(umem->cq.map != MAP_FAILED); + + umem->cq.mask = CQ_NUM_DESCS - 1; + umem->cq.size = CQ_NUM_DESCS; + umem->cq.producer = (void *)((char *)umem->cq.map + off.cr.producer); + umem->cq.consumer = (void *)((char *)umem->cq.map + off.cr.consumer); + umem->cq.ring = (void *)((char *)umem->cq.map + off.cr.desc); + + umem->frames = bufs; + umem->fd = sfd; + + /* UMEM pool init */ + umem_pool_init(&umem->mpool, NUM_FRAMES); + + for (i = NUM_FRAMES - 1; i >= 0; i--) { + struct umem_elem *elem; + + elem = (struct umem_elem *)((char *)umem->frames + i * FRAME_SIZE); + umem_elem_push(&umem->mpool, elem); + } + + /* AF_XDP metadata init */ + xpacket_pool_init(&umem->xpool, NUM_FRAMES); + + VLOG_DBG("%s xpacket pool from %p to %p", __func__, + umem->xpool.array, + (char *)umem->xpool.array + + NUM_FRAMES * sizeof(struct dp_packet_afxdp)); + + for (i = NUM_FRAMES - 1; i >= 0; i--) { + struct dp_packet_afxdp *xpacket; + struct dp_packet *packet; + char *base; + + xpacket = UMEM2XPKT(umem->xpool.array, i); + xpacket->mpool = &umem->mpool; + + packet = &xpacket->packet; + packet->source = DPBUF_AFXDP; + + base = (char *)umem->frames + i * FRAME_SIZE; + dp_packet_use(packet, base, FRAME_SIZE); + packet->source = DPBUF_AFXDP; + } + return umem; +} + +void +xsk_destroy(struct xdpsock *xsk) +{ +#ifdef AFXDP_HUGETLB + munmap(xsk->umem->frames, NUM_FRAMES * FRAME_SIZE); +#else + free(xsk->umem->frames); +#endif + + /* cleanup umem pool */ + umem_pool_cleanup(&xsk->umem->mpool); + + /* cleanup metadata */ + xpacket_pool_cleanup(&xsk->umem->xpool); + + close(xsk->sfd); + return; +} + +struct xdpsock * +xsk_configure(struct xdp_umem *umem, + int ifindex, int xdp_queue_id) +{ + struct sockaddr_xdp sxdp = {}; + struct xdp_mmap_offsets off; + int sfd, ndescs = NUM_DESCS; + struct xdpsock *xsk; + bool shared = false; + socklen_t optlen; + uint64_t i; + + opt_xdp_flags |= AFXDP_MODE; + opt_xdp_bind_flags |= XDP_COPY; + opt_xdp_bind_flags |= XDP_ATTACH; + + sfd = socket(PF_XDP, SOCK_RAW, 0); + ovs_assert(sfd >= 0); + + xsk = calloc(1, sizeof(*xsk)); + ovs_assert(xsk); + + xsk->sfd = sfd; + xsk->outstanding_tx = 0; + + VLOG_DBG("%s xsk fd %d", __func__, sfd); + if (!umem) { + shared = false; + xsk->umem = xdp_umem_configure(sfd); + } else { + xsk->umem = umem; + ovs_assert(0); + } + + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_RX_RING, + &ndescs, sizeof(int)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_TX_RING, + &ndescs, sizeof(int)) == 0); + optlen = sizeof(off); + ovs_assert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off, + &optlen) == 0); + + /* Confiugre RX ring */ + xsk->rx.map = mmap(NULL, + off.rx.desc + + NUM_DESCS * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_PGOFF_RX_RING); + ovs_assert(xsk->rx.map != MAP_FAILED); + + /* Populate the FILL ring */ + for (i = 0; i < NUM_DESCS; i++) { + struct umem_elem *elem; + uint64_t desc[1]; + + elem = umem_elem_pop(&xsk->umem->mpool); + desc[0] = UMEM2DESC(elem, xsk->umem->frames); + umem_fill_to_kernel(&xsk->umem->fq, desc, 1); + } + + /* Configure Tx ring */ + xsk->tx.map = mmap(NULL, + off.tx.desc + + NUM_DESCS * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_PGOFF_TX_RING); + ovs_assert(xsk->tx.map != MAP_FAILED); + + xsk->rx.mask = NUM_DESCS - 1; + xsk->rx.size = NUM_DESCS; + xsk->rx.producer = (void *)((char *)xsk->rx.map + off.rx.producer); + xsk->rx.consumer = (void *)((char *)xsk->rx.map + off.rx.consumer); + xsk->rx.ring = (void *)((char *)xsk->rx.map + off.rx.desc); + + xsk->tx.mask = NUM_DESCS - 1; + xsk->tx.size = NUM_DESCS; + xsk->tx.producer = (void *)((char *)xsk->tx.map + off.tx.producer); + xsk->tx.consumer = (void *)((char *)xsk->tx.map + off.tx.consumer); + xsk->tx.ring = (void *)((char *)xsk->tx.map + off.tx.desc); + xsk->tx.cached_cons = NUM_DESCS; + + /* XSK socket */ + sxdp.sxdp_family = PF_XDP; + sxdp.sxdp_ifindex = ifindex; + sxdp.sxdp_queue_id = xdp_queue_id; + + if (shared) { + sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_shared_umem_fd = umem->fd; + } else { + sxdp.sxdp_flags = opt_xdp_bind_flags; + } + + if (bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp))) { + VLOG_FATAL("afxdp bind failed (%s)", ovs_strerror(errno)); + } + + return xsk; +} + +static inline int xq_deq(struct xdp_uqueue *uq, + struct xdp_desc *descs, + int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int idx; + int i, entries; + + entries = xq_nb_avail(uq, ndescs); + + u_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = uq->cached_cons++ & uq->mask; + descs[i] = r[idx]; + } + + if (entries > 0) { + u_smp_wmb(); + + *uq->consumer = uq->cached_cons; + } + return entries; +} + +static inline void *xq_get_data(struct xdpsock *xsk, uint64_t addr) +{ + return &xsk->umem->frames[addr]; +} + +static void OVS_UNUSED vlog_hex_dump(const void *buf, size_t count) +{ + struct ds ds = DS_EMPTY_INITIALIZER; + ds_put_hex_dump(&ds, buf, count, 0, false); + VLOG_DBG_RL(&rl, "%s", ds_cstr(&ds)); + ds_destroy(&ds); +} + +static void kick_tx(int fd) +{ + int ret; + +#if AF_XDP_POLL + struct pollfd fds[1]; + int timeout; + fds[0].fd = fd; + fds[0].events = POLLOUT; + timeout = 1000; /* 1ns */ + + /* this is slower due to syscall */ + ret = poll(fds, 1, timeout); + if (ret < 0) + return; +#endif + ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0); + if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN || errno == EBUSY) { + return; + } else { + VLOG_WARN_RL(&rl, "sendto fails %s", ovs_strerror(errno)); + } +} + +static inline uint32_t +xq_nb_free(struct xdp_uqueue *q, uint32_t ndescs) +{ + uint32_t free_entries = q->cached_cons - q->cached_prod; + + if (free_entries >= ndescs) + return free_entries; + + /* Refresh the local tail pointer */ + q->cached_cons = *q->consumer + q->size; + return q->cached_cons - q->cached_prod; +} + +static inline int xq_enq(struct xdp_uqueue *uq, + const struct xdp_desc *descs, + unsigned int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int i; + + if (xq_nb_free(uq, ndescs) < ndescs) + return -ENOSPC; + + for (i = 0; i < ndescs; i++) { + uint32_t idx = uq->cached_prod++ & uq->mask; + + r[idx].addr = descs[i].addr; + r[idx].len = descs[i].len; + } + + u_smp_wmb(); + + *uq->producer = uq->cached_prod; + return 0; +} + +static inline void +print_xsk_stat(struct xdpsock *xsk OVS_UNUSED) { + struct xdp_statistics stat; + socklen_t optlen; + + optlen = sizeof(stat); + ovs_assert(getsockopt(xsk->sfd, SOL_XDP, XDP_STATISTICS, + &stat, &optlen) == 0); + + VLOG_DBG_RL(&rl, "rx dropped %llu, rx_invalid %llu, tx_invalid %llu", + stat.rx_dropped, stat.rx_invalid_descs, stat.tx_invalid_descs); + return; +} + +/* Receive packet from AF_XDP socket */ +int +netdev_linux_rxq_xsk(struct xdpsock *xsk, + struct dp_packet_batch *batch) +{ + struct xdp_desc descs[NETDEV_MAX_BURST]; + unsigned int rcvd, i = 0, non_afxdp = 0; + int ret = 0; + + rcvd = xq_deq(&xsk->rx, descs, NETDEV_MAX_BURST); + if (rcvd == 0) { + /* no packet on the RX ring */ + return 0; + } + + for (i = 0; i < rcvd; i++) { + struct dp_packet_afxdp *xpacket; + struct dp_packet *packet; + void *base; + int index; + + base = xq_get_data(xsk, descs[i].addr); + index = (descs[i].addr - FRAME_HEADROOM) / FRAME_SIZE; + xpacket = UMEM2XPKT(xsk->umem->xpool.array, index); + + VLOG_DBG_RL(&rl, "rcvd %d base %p xpacket %p index %d", + rcvd, base, xpacket, index); + vlog_hex_dump(base, 14); + + packet = &xpacket->packet; + xpacket->mpool = &xsk->umem->mpool; + + if (packet->source != DPBUF_AFXDP) { + non_afxdp++; /* FIXME: might be a bug */ + continue; + } + + packet->source = DPBUF_AFXDP; + dp_packet_set_data(packet, base); + dp_packet_set_size(packet, descs[i].len); + + /* add packet into batch, increase batch->count */ + dp_packet_batch_add(batch, packet); + } + rcvd -= non_afxdp; + xsk->rx_npkts += rcvd; + + for (i = 0; i < rcvd; i++) { + struct xdp_desc fill_desc[1]; + struct umem_elem *elem; + int retry_cnt = 0; +retry: + elem = umem_elem_pop(&xsk->umem->mpool); + if (!elem && retry_cnt < 10) { + retry_cnt++; + VLOG_WARN_RL(&rl, "retry refilling the fill queue"); + xsleep(1); + goto retry; + } + descs[0].addr = (uint64_t)((char *)elem - xsk->umem->frames); + umem_fill_to_kernel_ex(&xsk->umem->fq, fill_desc, 1); + } + +#ifdef AFXDP_DEBUG + print_xsk_stat(xsk); +#endif + return ret; +} + +int +netdev_linux_afxdp_batch_send(struct xdpsock *xsk, /* send to xdp socket! */ + struct dp_packet_batch *batch) +{ + struct dp_packet *packet; + struct xdp_uqueue *uq; + struct xdp_desc *r; + int ndescs = batch->count; + uint64_t descs[BATCH_SIZE]; + unsigned int tx_done = 0, total_tx = 0; + int j; + + uq = &xsk->tx; + r = uq->ring; + + if (xq_nb_free(uq, ndescs) < ndescs) { + VLOG_WARN_RL(&rl, "no free desc, outstanding tx %d, free tx nb %d", + xsk->outstanding_tx, xq_nb_free(uq, ndescs)); + return -EAGAIN; + } + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + struct umem_elem *elem; + struct dp_packet_afxdp *xpacket; + + uint32_t idx = uq->cached_prod++ & uq->mask; +#ifdef AFXDP_AOID_TXCOPY + if (packet->source == DPBUF_AFXDP) { + xpacket = dp_packet_cast_afxdp(packet); + + if (xpacket->mpool == &xsk->umem->mpool) { + r[idx].addr = (uint64_t)((char *)dp_packet_base(packet) - xsk->umem->frames); + r[idx].len = dp_packet_size(packet); + xpacket->mpool = NULL; + continue; + } + } +#endif + elem = umem_elem_pop(&xsk->umem->mpool); + if (!elem) { + VLOG_ERR_RL(&rl, "no available elem!"); + return -EAGAIN; + } + + memcpy(elem, dp_packet_data(packet), dp_packet_size(packet)); + vlog_hex_dump(dp_packet_data(packet), 14); + + r[idx].addr = (uint64_t)((char *)elem - xsk->umem->frames); + r[idx].len = dp_packet_size(packet); + + if (packet->source == DPBUF_AFXDP) { + xpacket = dp_packet_cast_afxdp(packet); + umem_elem_push(xpacket->mpool, dp_packet_base(packet)); + /* Avoid freeing it twice at dp_packet_uninit */ + xpacket->mpool = NULL; + } + } + u_smp_wmb(); + + *uq->producer = uq->cached_prod; + xsk->outstanding_tx += batch->count; + +retry: + kick_tx(xsk->sfd); + + tx_done = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE); + if (tx_done > 0) { + xsk->outstanding_tx -= tx_done; + xsk->tx_npkts += tx_done; + total_tx += tx_done; + VLOG_DBG_RL(&rl, "%s complete %d tx", __func__, tx_done); + } + + /* Recycle back to the umem pool */ + for (j = 0; j < tx_done; j++) { + struct umem_elem *elem; + + elem = (struct umem_elem *)(descs[j] + xsk->umem->frames); + umem_elem_push(&xsk->umem->mpool, elem); + } + + if (total_tx < batch->count && xsk->outstanding_tx > (CQ_NUM_DESCS/2)) { + goto retry; + } +#ifdef ADXDP_DEBUG + print_xsk_stat(xsk); +#endif + return 0; +} + +#endif diff --git a/lib/netdev-afxdp.h b/lib/netdev-afxdp.h new file mode 100644 index 000000000000..1febb6ecfbfb --- /dev/null +++ b/lib/netdev-afxdp.h @@ -0,0 +1,41 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_AFXDP_H +#define NETDEV_AFXDP_H 1 + +#include +#include + +/* These functions are Linux AF_XDP specific, so they should be used directly + * only by Linux-specific code. */ + +struct netdev; +struct xdpsock; +struct xdp_umem; +struct dp_packet_batch; + +struct xdpsock *xsk_configure(struct xdp_umem *umem, + int ifindex, int xdp_queue_id); +void xsk_destroy(struct xdpsock *xsk); + +int netdev_linux_rxq_xsk(struct xdpsock *xsk, + struct dp_packet_batch *batch); + +int netdev_linux_afxdp_batch_send(struct xdpsock *xsk, + struct dp_packet_batch *batch); + +#endif /* netdev-afxdp.h */ diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index f86dcd06e563..a8a06abe967b 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -74,6 +74,7 @@ #include "unaligned.h" #include "openvswitch/vlog.h" #include "util.h" +#include "netdev-afxdp.h" VLOG_DEFINE_THIS_MODULE(netdev_linux); @@ -523,6 +524,7 @@ struct netdev_linux { /* LAG information. */ bool is_lag_master; /* True if the netdev is a LAG master. */ + struct xdpsock *xsk[1]; /* af_xdp socket: use only one queue */ }; struct netdev_rxq_linux { @@ -572,6 +574,12 @@ is_netdev_linux_class(const struct netdev_class *netdev_class) } static bool +is_afxdp_netdev(const struct netdev *netdev) +{ + return netdev_get_class(netdev) == &netdev_afxdp_class; +} + +static bool is_tap_netdev(const struct netdev *netdev) { return netdev_get_class(netdev) == &netdev_tap_class; @@ -1073,6 +1081,10 @@ netdev_linux_destruct(struct netdev *netdev_) atomic_count_dec(&miimon_cnt); } + if (is_afxdp_netdev(netdev_)) { + xsk_destroy(netdev->xsk[0]); + } + ovs_mutex_destroy(&netdev->mutex); } @@ -1102,6 +1114,30 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) rx->is_tap = is_tap_netdev(netdev_); if (rx->is_tap) { rx->fd = netdev->tap_fd; + } else if (is_afxdp_netdev(netdev_)) { + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + int ifindex, num_socks = 0; + int xdp_queue_id = 0; + struct xdpsock *xsk; + + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n", + ovs_strerror(errno)); + ovs_assert(0); + } + + VLOG_DBG("%s: %s: queue=%d configuring xdp sock", + __func__, netdev_->name, xdp_queue_id); + + /* Get ethernet device index. */ + error = get_ifindex(&netdev->up, &ifindex); + if (error) { + goto error; + } + + xsk = xsk_configure(NULL, ifindex, xdp_queue_id); + netdev->xsk[num_socks++] = xsk; + rx->fd = xsk->sfd; /* for netdev layer to poll */ } else { struct sockaddr_ll sll; int ifindex, val; @@ -1307,9 +1343,14 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, { struct netdev_rxq_linux *rx = netdev_rxq_linux_cast(rxq_); struct netdev *netdev = rx->up.netdev; - struct dp_packet *buffer; + struct dp_packet *buffer = NULL; ssize_t retval; int mtu; + struct netdev_linux *netdev_ = netdev_linux_cast(netdev); + + if (is_afxdp_netdev(netdev)) { + return netdev_linux_rxq_xsk(netdev_->xsk[0], batch); + } if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) { mtu = ETH_PAYLOAD_MAX; @@ -1318,6 +1359,7 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, /* Assume Ethernet port. No need to set packet_type. */ buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, DP_NETDEV_HEADROOM); + retval = (rx->is_tap ? netdev_linux_rxq_recv_tap(rx->fd, buffer) : netdev_linux_rxq_recv_sock(rx->fd, buffer)); @@ -1328,6 +1370,13 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch, netdev_rxq_get_name(rxq_), ovs_strerror(errno)); } dp_packet_delete(buffer); + } else if (is_afxdp_netdev(netdev)) { + dp_packet_batch_init_packet_fields(batch); + + if (batch->count != 0) + VLOG_DBG("%s AFXDP recv %lu packets", __func__, batch->count); + + return retval; } else { dp_packet_batch_init_packet(batch, buffer); } @@ -1469,7 +1518,8 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, int error = 0; int sock = 0; - if (!is_tap_netdev(netdev_)) { + if (!is_tap_netdev(netdev_) && + !is_afxdp_netdev(netdev_)) { if (netdev_linux_netnsid_is_remote(netdev_linux_cast(netdev_))) { error = EOPNOTSUPP; goto free_batch; @@ -1488,6 +1538,12 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, } error = netdev_linux_sock_batch_send(sock, ifindex, batch); + } else if (is_afxdp_netdev(netdev_)) { + struct xdpsock *xsk; + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + + xsk = netdev->xsk[0]; + error = netdev_linux_afxdp_batch_send(xsk, batch); } else { error = netdev_linux_tap_batch_send(netdev_, batch); } @@ -3205,6 +3261,7 @@ const struct netdev_class netdev_linux_class = { NETDEV_LINUX_CLASS_COMMON, LINUX_FLOW_OFFLOAD_API, .type = "system", + .is_pmd = false, .construct = netdev_linux_construct, .get_stats = netdev_linux_get_stats, .get_features = netdev_linux_get_features, @@ -3215,6 +3272,7 @@ const struct netdev_class netdev_linux_class = { const struct netdev_class netdev_tap_class = { NETDEV_LINUX_CLASS_COMMON, .type = "tap", + .is_pmd = false, .construct = netdev_linux_construct_tap, .get_stats = netdev_tap_get_stats, .get_features = netdev_linux_get_features, @@ -3224,6 +3282,16 @@ const struct netdev_class netdev_tap_class = { const struct netdev_class netdev_internal_class = { NETDEV_LINUX_CLASS_COMMON, .type = "internal", + .is_pmd = false, + .construct = netdev_linux_construct, + .get_stats = netdev_internal_get_stats, + .get_status = netdev_internal_get_status, +}; + +const struct netdev_class netdev_afxdp_class = { + NETDEV_LINUX_CLASS_COMMON, + .type = "afxdp", + .is_pmd = true, .construct = netdev_linux_construct, .get_stats = netdev_internal_get_stats, .get_status = netdev_internal_get_status, diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index fb0c27e6e8e8..5bf041316503 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -902,6 +902,7 @@ extern const struct netdev_class netdev_linux_class; #endif extern const struct netdev_class netdev_internal_class; extern const struct netdev_class netdev_tap_class; +extern const struct netdev_class netdev_afxdp_class; #ifdef __cplusplus } diff --git a/lib/netdev.c b/lib/netdev.c index 84874408abfd..288c914cdd25 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -146,6 +146,7 @@ netdev_initialize(void) netdev_register_provider(&netdev_linux_class); netdev_register_provider(&netdev_internal_class); netdev_register_provider(&netdev_tap_class); + netdev_register_provider(&netdev_afxdp_class); netdev_vport_tunnel_register(); #endif #if defined(__FreeBSD__) || defined(__NetBSD__) diff --git a/lib/xdpsock.c b/lib/xdpsock.c new file mode 100644 index 000000000000..888b2f6ccbd8 --- /dev/null +++ b/lib/xdpsock.c @@ -0,0 +1,171 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "openvswitch/vlog.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "async-append.h" +#include "coverage.h" +#include "dirs.h" +#include "openvswitch/dynamic-string.h" +#include "openvswitch/ofpbuf.h" +#include "ovs-thread.h" +#include "sat-math.h" +#include "socket-util.h" +#include "svec.h" +#include "syslog-direct.h" +#include "syslog-libc.h" +#include "syslog-provider.h" +#include "timeval.h" +#include "unixctl.h" +#include "util.h" +#include "ovs-atomic.h" +#include "xdpsock.h" +#include "openvswitch/compiler.h" +#include "dp-packet.h" + +void +__umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n) +{ + void *ptr; + + if (OVS_UNLIKELY(umemp->index + n > umemp->size)) { + OVS_NOT_REACHED(); + } + + ptr = &umemp->array[umemp->index]; + memcpy(ptr, addrs, n * sizeof(void *)); + umemp->index += n; +} + +inline void +__umem_elem_push(struct umem_pool *umemp, void *addr) +{ + umemp->array[umemp->index++] = addr; +} + +void +umem_elem_push(struct umem_pool *umemp, void *addr) +{ + + if (OVS_UNLIKELY(umemp->index >= umemp->size)) { + /* stack is full */ + OVS_NOT_REACHED(); + } + + ovs_mutex_lock(&umemp->mutex); + __umem_elem_push(umemp, addr); + ovs_mutex_unlock(&umemp->mutex); +} + +void +__umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n) +{ + void *ptr; + + umemp->index -= n; + + if (OVS_UNLIKELY(umemp->index < 0)) { + OVS_NOT_REACHED(); + } + + ptr = &umemp->array[umemp->index]; + memcpy(addrs, ptr, n * sizeof(void *)); +} + +inline void * +__umem_elem_pop(struct umem_pool *umemp) +{ + return umemp->array[--umemp->index]; +} + +void * +umem_elem_pop(struct umem_pool *umemp) +{ + void *ptr; + + ovs_mutex_lock(&umemp->mutex); + ptr = __umem_elem_pop(umemp); + ovs_mutex_unlock(&umemp->mutex); + + return ptr; +} + +void ** +__umem_pool_alloc(unsigned int size) +{ + void *bufs; + + ovs_assert(posix_memalign(&bufs, getpagesize(), + size * sizeof(void *)) == 0); + memset(bufs, 0, size * sizeof(void *)); + return (void **)bufs; +} + +unsigned int +umem_elem_count(struct umem_pool *mpool) +{ + return mpool->index; +} + +int +umem_pool_init(struct umem_pool *umemp, unsigned int size) +{ + umemp->array = __umem_pool_alloc(size); + if (!umemp->array) + OVS_NOT_REACHED(); + + umemp->size = size; + umemp->index = 0; + ovs_mutex_init(&umemp->mutex); + return 0; +} + +void +umem_pool_cleanup(struct umem_pool *umemp) +{ + free(umemp->array); +} + +/* AF_XDP metadata init/destroy */ +int +xpacket_pool_init(struct xpacket_pool *xp, unsigned int size) +{ + void *bufs; + + ovs_assert(posix_memalign(&bufs, getpagesize(), + size * sizeof(struct dp_packet_afxdp)) == 0); + + xp->array = bufs; + xp->size = size; + return 0; +} + +void +xpacket_pool_cleanup(struct xpacket_pool *xp) +{ + free(xp->array); +} diff --git a/lib/xdpsock.h b/lib/xdpsock.h new file mode 100644 index 000000000000..6ff76e41a8c7 --- /dev/null +++ b/lib/xdpsock.h @@ -0,0 +1,144 @@ +/* + * Copyright (c) 2018 Nicira, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef XDPSOCK_H +#define XDPSOCK_H 1 + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "ovs-atomic.h" +#include "openvswitch/thread.h" + +#define FRAME_HEADROOM 256 +#define FRAME_SHIFT 11 +#define FRAME_SIZE 2048 +#define BATCH_SIZE NETDEV_MAX_BURST + +#ifdef AFXDP_DEBUG +#define NUM_FRAMES 128 +#define NUM_DESCS 64 +#define FQ_NUM_DESCS 64 +#define CQ_NUM_DESCS 64 +#else +#define NUM_FRAMES 10240 +#define NUM_DESCS 256 +#define FQ_NUM_DESCS 256 +#define CQ_NUM_DESCS 256 +#endif + +struct xdp_uqueue { + uint32_t cached_prod; + uint32_t cached_cons; + uint32_t mask; + uint32_t size; + uint32_t *producer; + uint32_t *consumer; + struct xdp_desc *ring; + void *map; +}; + +struct xdpsock { + struct xdp_uqueue rx; + struct xdp_uqueue tx; + int sfd; + struct xdp_umem *umem; + uint32_t outstanding_tx; + unsigned long rx_npkts; + unsigned long tx_npkts; + unsigned long prev_rx_npkts; + unsigned long prev_tx_npkts; +}; + +struct umem_elem_head { + unsigned int index; + struct ovs_mutex mutex; + uint32_t n; +}; + +struct umem_elem { + struct umem_elem *next; +}; + +/* LIFO ptr_array */ +struct umem_pool { + int index; /* point to top */ + unsigned int size; + struct ovs_mutex mutex; + void **array; /* a pointer array */ +}; + +/* array-based dp_packet_afxdp */ +struct xpacket_pool { + unsigned int size; + struct dp_packet_afxdp **array; +}; + +struct xdp_umem_uqueue { + uint32_t cached_prod; + uint32_t cached_cons; + uint32_t mask; + uint32_t size; + uint32_t *producer; + uint32_t *consumer; + uint64_t *ring; + void *map; +}; + +struct xdp_umem { + struct umem_pool mpool; /* a free list/array */ + struct xpacket_pool xpool; + char *frames; + struct xdp_umem_uqueue fq; + struct xdp_umem_uqueue cq; + int fd; +}; + +void __umem_elem_push(struct umem_pool *umemp, void *addr); +void umem_elem_push(struct umem_pool *umemp, void *addr); +void *__umem_elem_pop(struct umem_pool *umemp); +void *umem_elem_pop(struct umem_pool *umemp); +void **__umem_pool_alloc(unsigned int size); +int umem_pool_init(struct umem_pool *umemp, unsigned int size); +void umem_pool_cleanup(struct umem_pool *umemp); +unsigned int umem_elem_count(struct umem_pool *mpool); +void __umem_elem_pop_n(struct umem_pool *umemp, void **addrs, int n); +void __umem_elem_push_n(struct umem_pool *umemp, void **addrs, int n); +int xpacket_pool_init(struct xpacket_pool *xp, unsigned int size); +void xpacket_pool_cleanup(struct xpacket_pool *xp); + +#endif From patchwork Wed Nov 28 21:22:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1004843 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="HcB8UqY2"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 434tx94KN6z9ryk for ; Thu, 29 Nov 2018 08:25:49 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 057E9D19; Wed, 28 Nov 2018 21:23:22 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 0F276A80 for ; Wed, 28 Nov 2018 21:23:20 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id CEEAE786 for ; Wed, 28 Nov 2018 21:23:15 +0000 (UTC) Received: by mail-pf1-f180.google.com with SMTP id b85so10800776pfc.3 for ; Wed, 28 Nov 2018 13:23:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=oi33A58CsCFblLwwjoZx+GC+aYIx6B8oWDj+g+dR5oM=; b=HcB8UqY2T2Y+6wsgAazMQIdLJC/XaFg+UUYwQpq7ScawDjTWsnskBxSJYt3tj40Cha pDiE/fyxUGAXSF7gX+vjRD+zW9604gKhcUDKevJu5pjfJZ8W4O0xfAIPnyrNG/2mL+Wp ptKZL4nUkwtvlbZG0vWbSOd8cxz6VuxTCCJ2vFrcUpU+AY3Xifkjkl3WA0lnJ5b/otI1 K2tZw6puhlQuV6zQ9Y0pgD9MKVkrtALUiVIoO3jrxJ6M4Q3TGL9MX3fXmNY+zkb6oHfU ES4BsenecmmSApqSKOGDNpZ56Y/0zGag3vYXfV0Ur42gWlBVwagHa4D2HHHGcRO3aD+G iKpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=oi33A58CsCFblLwwjoZx+GC+aYIx6B8oWDj+g+dR5oM=; b=k98U+eVOAAj/Hyd3K+AN4ceKrDhXjZW/v9oWYXXyNSCMupMYu9oRey2KtclwnKyJOq t5w84WvpDYzP0qwDaRGiqtNOCYcz59xhTqrdP4UxrePHpI/T67nxPnXWdnyy8dmMBi2N C64ntkiqbthQMYEx/0mYMD3BrIZ8np2jzHABUfFTnkQnZ3SgjmzlyWhaKuBlHVAqRdiQ Z2qWVpmsqo1A2s2U5moFyS/9B8mYDu6W2xYUTon7AB5fOY9iIhAOuATwkZzE/u5MfduF jtaqit1/gKAYrhSTN07+0KpUl9D1qltc91qk7bqdp8BmVg09oVXm82sjtLJnMFJJkJp3 A9tw== X-Gm-Message-State: AA+aEWa5Pc0JxlmIsjogv8scIoUYUol9GVkSXo9w240/5dra4zGV/VV8 TuHlXEbh53N6FhaDBEZEYL374hWa X-Google-Smtp-Source: AFSGD/VcHYHcE5BQm/FgDWclpM8Mhr8Mx3ZuO4XqSW+JeKwK2LFZ8dXYCyabUUHptB05f8tN9vYr9A== X-Received: by 2002:a62:8949:: with SMTP id v70mr4692511pfd.85.1543440193293; Wed, 28 Nov 2018 13:23:13 -0800 (PST) Received: from sc9-mailhost2.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id t5sm10899178pfb.60.2018.11.28.13.23.12 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 28 Nov 2018 13:23:12 -0800 (PST) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Wed, 28 Nov 2018 13:22:21 -0800 Message-Id: <1543440142-27253-3-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1543440142-27253-1-git-send-email-u9012063@gmail.com> References: <1543440142-27253-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCHv3 RFC 2/3] tests: add AF_XDP netdev test cases. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The patch adds the test framework for OVS using afxdp. Most of the test cases are slightly modified from the existing test cases at system-traffic.at. All the veth creations, ADD_VETH, are replaced by using AF_XDP veth, with new macro ADD_VETH_AFXDP. So packet I/O is based on AF_XDP socket interface. Signed-off-by: William Tu --- tests/automake.mk | 17 + tests/system-afxdp-macros.at | 153 ++++ tests/system-afxdp-testsuite.at | 26 + tests/system-afxdp-traffic.at | 1541 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 1737 insertions(+) create mode 100644 tests/system-afxdp-macros.at create mode 100644 tests/system-afxdp-testsuite.at create mode 100644 tests/system-afxdp-traffic.at diff --git a/tests/automake.mk b/tests/automake.mk index 97312cf2ce6e..38cfb7158167 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -4,11 +4,13 @@ EXTRA_DIST += \ $(SYSTEM_TESTSUITE_AT) \ $(SYSTEM_KMOD_TESTSUITE_AT) \ $(SYSTEM_USERSPACE_TESTSUITE_AT) \ + $(SYSTEM_AFXDP_TESTSUITE_AT) \ $(SYSTEM_OFFLOADS_TESTSUITE_AT) \ $(SYSTEM_DPDK_TESTSUITE_AT) \ $(TESTSUITE) \ $(SYSTEM_KMOD_TESTSUITE) \ $(SYSTEM_USERSPACE_TESTSUITE) \ + $(SYSTEM_AFXDP_TESTSUITE) \ $(SYSTEM_OFFLOADS_TESTSUITE) \ $(SYSTEM_DPDK_TESTSUITE) \ tests/atlocal.in \ @@ -152,6 +154,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \ tests/system-userspace-macros.at \ tests/system-userspace-packet-type-aware.at +SYSTEM_AFXDP_TESTSUITE_AT = \ + tests/system-afxdp-testsuite.at \ + tests/system-afxdp-traffic.at \ + tests/system-afxdp-macros.at + SYSTEM_TESTSUITE_AT = \ tests/system-common-macros.at \ tests/system-ovn.at \ @@ -176,6 +183,7 @@ TESTSUITE = $(srcdir)/tests/testsuite TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite SYSTEM_DPDK_TESTSUITE = $(srcdir)/tests/system-dpdk-testsuite DISTCLEANFILES += tests/atconfig tests/atlocal @@ -304,6 +312,11 @@ check-system-userspace: all set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) +check-afxdp: all + $(MAKE) install + set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) + check-offloads: all set $(SHELL) '$(SYSTEM_OFFLOADS_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) @@ -336,6 +349,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT) + $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at + $(AM_V_at)mv $@.tmp $@ + $(SYSTEM_OFFLOADS_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_OFFLOADS_TESTSUITE_AT) $(COMMON_MACROS_AT) $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at new file mode 100644 index 000000000000..c9d2227f9ab6 --- /dev/null +++ b/tests/system-afxdp-macros.at @@ -0,0 +1,153 @@ +# _ADD_BR([name]) +# +# Expands into the proper ovs-vsctl commands to create a bridge with the +# appropriate type and properties +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type="netdev" protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]]) + +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override]) +# +# Creates a database and starts ovsdb-server, starts ovs-vswitchd +# connected to that database, calls ovs-vsctl to create a bridge named +# br0 with predictable settings, passing 'vsctl-args' as additional +# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide +# output (e.g. because it includes "create" commands) then 'vsctl-output' +# specifies the expected output after filtering through uuidfilt. +m4_define([OVS_TRAFFIC_VSWITCHD_START], + [ + export OVS_PKGDATADIR=$(`pwd`) + _OVS_VSWITCHD_START([--disable-system]) + dnl Add bridges, ports, etc. + OVS_WAIT_WHILE([ip link show br0]) + AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2]) +]) + +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds]) +# +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files +# for messages with severity WARN or higher and signaling an error if any +# is present. The optional WHITELIST may contain shell-quoted "sed" +# commands to delete any warnings that are actually expected, e.g.: +# +# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"]) +# +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is +# invoked. They can be used to perform additional cleanups such as name space +# removal. +m4_define([OVS_TRAFFIC_VSWITCHD_STOP], + [OVS_VSWITCHD_STOP([dnl +$1";/netdev_linux.*obtaining netdev stats via vport failed/d +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d +"]) + AT_CHECK([:; $2]) + ]) + +m4_define([ADD_VETH_AFXDP], + [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77]) + CONFIGURE_AFXDP_VETH_OFFLOADS([$1]) + AT_CHECK([ip link set $1 netns $2]) + AT_CHECK([ip link set dev ovs-$1 up]) + AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \ + set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"]) + NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7]) + NS_CHECK_EXEC([$2], [ip link set dev $1 up]) + if test -n "$5"; then + NS_CHECK_EXEC([$2], [ip link set dev $1 address $5]) + fi + if test -n "$6"; then + NS_CHECK_EXEC([$2], [ip route add default via $6]) + fi + on_exit 'ip link del ovs-$1' + ] +) + +# CONFIGURE_AFXDP_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads and VLAN offloads for veths used in AF_XDP. +m4_define([CONFIGURE_AFXDP_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore]) + AT_CHECK([ethtool -K $1 rxvlan off], [0], [ignore], [ignore]) + AT_CHECK([ethtool -K $1 txvlan off], [0], [ignore], [ignore]) + ] +) + +# CONFIGURE_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET +# socket to receive packets for veths. Unfortunately, the AF_PACKET socket +# doesn't play well with offloads: +# 1. GSO packets are received without segmentation and therefore discarded. +# 2. Packets with offloaded partial checksum are received with the wrong +# checksum, therefore discarded by the receiver. +# +# By disabling tx offloads in the non-OVS side of the veth peer we make sure +# that the AF_PACKET socket will not receive bad packets. +# +# This is a workaround, and should be removed when offloads are properly +# supported in netdev-linux. +m4_define([CONFIGURE_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])] +) + +# CHECK_CONNTRACK() +# +# Perform requirements checks for running conntrack tests. +# +m4_define([CHECK_CONNTRACK], + [AT_SKIP_IF([test $HAVE_PYTHON = no])] +) + +# CHECK_CONNTRACK_ALG() +# +# Perform requirements checks for running conntrack ALG tests. The userspace +# supports FTP and TFTP. +# +m4_define([CHECK_CONNTRACK_ALG]) + +# CHECK_CONNTRACK_FRAG() +# +# Perform requirements checks for running conntrack fragmentations tests. +# The userspace doesn't support fragmentation yet, so skip the tests. +m4_define([CHECK_CONNTRACK_FRAG], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_LOCAL_STACK() +# +# Perform requirements checks for running conntrack tests with local stack. +# While the kernel connection tracker automatically passes all the connection +# tracking state from an internal port to the OpenvSwitch kernel module, there +# is simply no way of doing that with the userspace, so skip the tests. +m4_define([CHECK_CONNTRACK_LOCAL_STACK], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_NAT() +# +# Perform requirements checks for running conntrack NAT tests. The userspace +# datapath supports NAT. +# +m4_define([CHECK_CONNTRACK_NAT]) + +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() +# +# Perform requirements checks for running ovs-dpctl flush-conntrack by +# conntrack 5-tuple test. The userspace datapath does not support +# this feature yet. +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CT_DPIF_SET_GET_MAXCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS]) + +# CHECK_CT_DPIF_GET_NCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The +# userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_GET_NCONNS]) diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at new file mode 100644 index 000000000000..538c0d15d556 --- /dev/null +++ b/tests/system-afxdp-testsuite.at @@ -0,0 +1,26 @@ +AT_INIT + +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.]) + +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS]) + +m4_include([tests/ovs-macros.at]) +m4_include([tests/ovsdb-macros.at]) +m4_include([tests/ofproto-macros.at]) +m4_include([tests/system-afxdp-macros.at]) +m4_include([tests/system-common-macros.at]) + +m4_include([tests/system-afxdp-traffic.at]) +m4_include([tests/system-ovn.at]) diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at new file mode 100644 index 000000000000..87f4dd160b51 --- /dev/null +++ b/tests/system-afxdp-traffic.at @@ -0,0 +1,1541 @@ +AT_BANNER([AF_XDP netdev datapath-sanity]) + +AT_SETUP([datapath - ping between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - http between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_START_L7([at_ns1], [http]) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +ADD_VLAN(p0, at_ns0, 100, "10.2.2.1/24") +ADD_VLAN(p1, at_ns1, 100, "10.2.2.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.2.2.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 6 fc00::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping6 between two ports on vlan]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +ADD_VLAN(p0, at_ns0, 100, "fc00:1::1/96") +ADD_VLAN(p1, at_ns1, 100, "fc00:1::2/96") + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00:1::2]) + +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 1600 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping6 -s 3200 -q -c 3 -i 0.3 -w 2 fc00:1::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan tunnel]) +OVS_CHECK_VXLAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([vxlan], [br0], [at_vxlan0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([vxlan], [at_vxlan1], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [id 0 dstport 4789]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over vxlan6 tunnel]) +OVS_CHECK_VXLAN_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([vxlan], [br0], [at_vxlan0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([vxlan], [at_vxlan1], [at_ns0], [fc00::100], [10.1.1.1/24], + [id 0 dstport 4789 udp6zerocsumtx udp6zerocsumrx]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over gre tunnel]) +OVS_CHECK_GRE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([gre], [br0], [at_gre0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([gretap], [ns_gre0], [at_ns0], [172.31.1.100], [10.1.1.1/24]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over erspan v1 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=1 options:erspan_idx=7]) +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 1 erspan 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over erspan v2 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([erspan], [br0], [at_erspan0], [172.31.1.1], [10.1.1.100/24], [options:key=1 options:erspan_ver=2 options:erspan_dir=1 options:erspan_hwid=0x7]) +ADD_NATIVE_TUNNEL([erspan], [ns_erspan0], [at_ns0], [172.31.1.100], [10.1.1.1/24], [seq key 1 erspan_ver 2 erspan_dir egress erspan_hwid 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.92/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +dnl NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +NS_CHECK_EXEC([at_ns0], [ping -s 1200 -i 0.3 -c 3 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over ip6erspan v1 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad) +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24], + [options:key=123 options:erspan_ver=1 options:erspan_idx=0x7]) +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100], + [10.1.1.1/24], [local fc00:100::1 seq key 123 erspan_ver 1 erspan 7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over ip6erspan v2 tunnel]) +OVS_CHECK_GRE() +OVS_CHECK_ERSPAN() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00:100::1/96", [], [], nodad) +AT_CHECK([ip addr add dev br-underlay "fc00:100::100/96" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([ip6erspan], [br0], [at_erspan0], [fc00:100::1], [10.1.1.100/24], + [options:key=121 options:erspan_ver=2 options:erspan_dir=0 options:erspan_hwid=0x7]) +ADD_NATIVE_TUNNEL6([ip6erspan], [ns_erspan0], [at_ns0], [fc00:100::100], + [10.1.1.1/24], + [local fc00:100::1 seq key 121 erspan_ver 2 erspan_dir ingress erspan_hwid 0x7]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00:100::1/96 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 2 fc00:100::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00:100::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve tunnel]) +OVS_CHECK_GENEVE() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "172.31.1.1/24") +AT_CHECK([ip addr add dev br-underlay "172.31.1.100/24"]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL([geneve], [br0], [at_gnv0], [172.31.1.1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL([geneve], [ns_gnv0], [at_ns0], [172.31.1.100], [10.1.1.1/24], + [vni 0]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add 172.31.1.100/24 br-underlay], [0], [OK +]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 172.31.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - ping over geneve6 tunnel]) +OVS_CHECK_GENEVE_UDP6ZEROCSUM() + +OVS_TRAFFIC_VSWITCHD_START() +ADD_BR([br-underlay]) + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) +AT_CHECK([ovs-ofctl add-flow br-underlay "actions=normal"]) + +ADD_NAMESPACES(at_ns0) + +dnl Set up underlay link from host into the namespace using veth pair. +ADD_VETH_AFXDP(p0, at_ns0, br-underlay, "fc00::1/64", [], [], "nodad") +AT_CHECK([ip addr add dev br-underlay "fc00::100/64" nodad]) +AT_CHECK([ip link set dev br-underlay up]) + +dnl Set up tunnel endpoints on OVS outside the namespace and with a native +dnl linux device inside the namespace. +ADD_OVS_TUNNEL6([geneve], [br0], [at_gnv0], [fc00::1], [10.1.1.100/24]) +ADD_NATIVE_TUNNEL6([geneve], [ns_gnv0], [at_ns0], [fc00::100], [10.1.1.1/24], + [vni 0 udp6zerocsumtx udp6zerocsumrx]) + +AT_CHECK([ovs-appctl ovs/route/add 10.1.1.100/24 br0], [0], [OK +]) +AT_CHECK([ovs-appctl ovs/route/add fc00::100/64 br-underlay], [0], [OK +]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::100]) + +dnl First, check the underlay +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +dnl Okay, now check the overlay with different packet sizes +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 1600 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +NS_CHECK_EXEC([at_ns0], [ping -s 3200 -q -c 3 -i 0.3 -w 2 10.1.1.100 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - clone action]) +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_CHECK([ovs-vsctl -- set interface ovs-p0 ofport_request=1 \ + -- set interface ovs-p1 ofport_request=2]) + +AT_DATA([flows.txt], [dnl +priority=1 actions=NORMAL +priority=10 in_port=1,ip,actions=clone(mod_dl_dst(50:54:00:00:00:0a),set_field:192.168.3.3->ip_dst), output:2 +priority=10 in_port=2,ip,actions=clone(mod_dl_src(ae:c6:7e:54:8d:4d),mod_dl_dst(50:54:00:00:00:0b),set_field:192.168.4.4->ip_dst, controller), output:1 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([cat ofctl_monitor.log | STRIP_MONITOR_CSUM], [0], [dnl +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +icmp,vlan_tci=0x0000,dl_src=ae:c6:7e:54:8d:4d,dl_dst=50:54:00:00:00:0b,nw_src=10.1.1.2,nw_dst=192.168.4.4,nw_tos=0,nw_ecn=0,nw_ttl=64,icmp_type=0,icmp_code=0 icmp_csum: +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - mpls actions]) +OVS_TRAFFIC_VSWITCHD_START([_ADD_BR([br1])]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br1, "10.1.1.2/24") + +AT_CHECK([ip link add patch0 type veth peer name patch1]) +on_exit 'ip link del patch0' + +AT_CHECK([ip link set dev patch0 up]) +AT_CHECK([ip link set dev patch1 up]) +AT_CHECK([ovs-vsctl add-port br0 patch0]) +AT_CHECK([ovs-vsctl add-port br1 patch1]) + +AT_DATA([flows.txt], [dnl +table=0,priority=100,dl_type=0x0800 actions=push_mpls:0x8847,set_mpls_label:3,resubmit(,1) +table=0,priority=100,dl_type=0x8847,mpls_label=3 actions=pop_mpls:0x0800,resubmit(,1) +table=0,priority=10 actions=resubmit(,1) +table=1,priority=10 actions=normal +]) + +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +AT_CHECK([ovs-ofctl add-flows br1 flows.txt]) + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - basic truncate action]) +AT_SKIP_IF([test $HAVE_NC = no]) +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-ofctl del-flows br0]) + +dnl Create p0 and ovs-p0(1) +ADD_NAMESPACES(at_ns0) +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +NS_CHECK_EXEC([at_ns0], [ip link set dev p0 address e6:66:c1:11:11:11]) +NS_CHECK_EXEC([at_ns0], [arp -s 10.1.1.2 e6:66:c1:22:22:22]) + +dnl Create p1(3) and ovs-p1(2), packets received from ovs-p1 will appear in p1 +AT_CHECK([ip link add p1 type veth peer name ovs-p1]) +on_exit 'ip link del ovs-p1' +AT_CHECK([ip link set dev ovs-p1 up]) +AT_CHECK([ip link set dev p1 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p1 -- set interface ovs-p1 ofport_request=2]) +dnl Use p1 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p1 -- set interface p1 ofport_request=3]) + +dnl Create p2(5) and ovs-p2(4) +AT_CHECK([ip link add p2 type veth peer name ovs-p2]) +on_exit 'ip link del ovs-p2' +AT_CHECK([ip link set dev ovs-p2 up]) +AT_CHECK([ip link set dev p2 up]) +AT_CHECK([ovs-vsctl add-port br0 ovs-p2 -- set interface ovs-p2 ofport_request=4]) +dnl Use p2 to check the truncated packet +AT_CHECK([ovs-vsctl add-port br0 p2 -- set interface p2 ofport_request=5]) + +dnl basic test +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +dnl use this file as payload file for ncat +AT_CHECK([dd if=/dev/urandom of=payload200.bin bs=200 count=1 2> /dev/null]) +on_exit 'rm -f payload200.bin' +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl packet with truncated size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=100 +]) +dnl packet with original size +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=242 +]) + +dnl more complicated output actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_DATA([flows.txt], [dnl +in_port=3 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=5 dl_dst=e6:66:c1:22:22:22 actions=drop +in_port=1 dl_dst=e6:66:c1:22:22:22 actions=output(port=2,max_len=100),output:4,output(port=2,max_len=100),output(port=4,max_len=100),output:2,output(port=4,max_len=200),output(port=2,max_len=65535) +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +dnl SLOW_ACTION: disable kernel datapath truncate support +dnl Repeat the test above, but exercise the SLOW_ACTION code path +AT_CHECK([ovs-appctl dpif/set-dp-features br0 trunc false], [0]) + +dnl SLOW_ACTION test1: check datapatch actions +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +AT_CHECK([ovs-appctl ofproto/trace br0 "in_port=1,dl_type=0x800,dl_src=e6:66:c1:11:11:11,dl_dst=e6:66:c1:22:22:22,nw_src=192.168.0.1,nw_dst=192.168.0.2,nw_proto=6,tp_src=8,tp_dst=9"], [0], [stdout]) +AT_CHECK([tail -3 stdout], [0], +[Datapath actions: trunc(100),3,5,trunc(100),3,trunc(100),5,3,trunc(200),5,trunc(65535),3 +This flow is handled by the userspace slow path because it: + - Uses action(s) not supported by datapath. +]) + +dnl SLOW_ACTION test2: check actual packet truncate +AT_CHECK([ovs-ofctl del-flows br0]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +NS_CHECK_EXEC([at_ns0], [nc $NC_EOF_OPT -u 10.1.1.2 1234 < payload200.bin]) + +dnl 100 + 100 + 242 + min(65535,242) = 684 +AT_CHECK([ovs-appctl revalidator/purge], [0]) +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=3" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=684 +]) + +dnl 242 + 100 + min(242,200) = 542 +AT_CHECK([ovs-ofctl dump-flows br0 table=0 | grep "in_port=5" | sed -n 's/.*\(n\_bytes=[[0-9]]*\).*/\1/p'], [0], [dnl +n_bytes=542 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + +AT_BANNER([conntrack]) + +AT_SETUP([conntrack - controller]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(commit),controller +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,udp,action=controller +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +AT_CAPTURE_FILE([ofctl_monitor.log]) +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) + +dnl Send an unsolicited reply from port 2. This should be dropped. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000']) + +dnl OK, now start a new connection from port 1. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 1 ct\(commit\),controller '50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000']) + +dnl Now try a reply from port 2. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 2 ct\(table=0\) '50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000']) + +dnl Check this output. We only see the latter two packets, not the first. +AT_CHECK([cat ofctl_monitor.log], [0], [dnl +NXT_PACKET_IN2 (xid=0x0): total_len=42 in_port=1 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0 +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 ct_state=est|rpl|trk,ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2,ip,in_port=2 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - force commit]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() +AT_CHECK([ovs-appctl vlog/set dpif:dbg dpif_netdev:dbg ofproto_dpif_upcall:dbg]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(force,commit),controller +priority=100,in_port=2,ct_state=-trk,udp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,udp,action=ct(force,commit,table=1) +table=1,in_port=2,ct_state=+trk,udp,action=controller +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +AT_CAPTURE_FILE([ofctl_monitor.log]) +AT_CHECK([ovs-ofctl monitor br0 65534 invalid_ttl --detach --no-chdir --pidfile 2> ofctl_monitor.log]) + +dnl Send an unsolicited reply from port 2. This should be dropped. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +dnl OK, now start a new connection from port 1. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) + +dnl Now try a reply from port 2. +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +dnl Check this output. We only see the latter two packets, not the first. +AT_CHECK([cat ofctl_monitor.log], [0], [dnl +NXT_PACKET_IN2 (xid=0x0): cookie=0x0 total_len=42 in_port=1 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.1,nw_dst=10.1.1.2,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=1,tp_dst=2 udp_csum:0 +NXT_PACKET_IN2 (xid=0x0): table_id=1 cookie=0x0 total_len=42 ct_state=new|trk,ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1,ip,in_port=2 (via action) data_len=42 (unbuffered) +udp,vlan_tci=0x0000,dl_src=50:54:00:00:00:09,dl_dst=50:54:00:00:00:0a,nw_src=10.1.1.2,nw_dst=10.1.1.1,nw_tos=0,nw_ecn=0,nw_ttl=0,tp_src=2,tp_dst=1 udp_csum:0 +]) + +dnl +dnl Check that the directionality has been changed by force commit. +dnl +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [], [dnl +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2) +]) + +dnl OK, now send another packet from port 1 and see that it switches again +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) +AT_CHECK([ovs-appctl revalidator/purge], [0]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - ct flush by 5-tuple]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,udp,action=ct(commit),2 +priority=100,in_port=2,udp,action=ct(zone=5,commit),1 +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,action=ct(zone=5,commit),1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Test UDP from port 1 +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=1 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101010a0101020001000200080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [], [dnl +udp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),reply=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1) +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack 'ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=17,ct_tp_src=2,ct_tp_dst=1']) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.1,"], [1], [dnl +]) + +dnl Test UDP from port 2 +AT_CHECK([ovs-ofctl -O OpenFlow13 packet-out br0 "in_port=2 packet=50540000000a50540000000908004500001c000000000011a4cd0a0101020a0101010002000100080000 actions=resubmit(,0)"]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [dnl +udp,orig=(src=10.1.1.2,dst=10.1.1.1,sport=2,dport=1),reply=(src=10.1.1.1,dst=10.1.1.2,sport=1,dport=2),zone=5 +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 'ct_nw_src=10.1.1.1,ct_nw_dst=10.1.1.2,ct_nw_proto=17,ct_tp_src=1,ct_tp_dst=2']) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +]) + +dnl Test ICMP traffic +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [0], [stdout]) +AT_CHECK([cat stdout | FORMAT_CT(10.1.1.1)], [0],[dnl +icmp,orig=(src=10.1.1.2,dst=10.1.1.1,id=,type=8,code=0),reply=(src=10.1.1.1,dst=10.1.1.2,id=,type=0,code=0),zone=5 +]) + +ICMP_ID=`cat stdout | cut -d ',' -f4 | cut -d '=' -f2` +ICMP_TUPLE=ct_nw_src=10.1.1.2,ct_nw_dst=10.1.1.1,ct_nw_proto=1,icmp_id=$ICMP_ID,icmp_type=8,icmp_code=0 +AT_CHECK([ovs-appctl dpctl/flush-conntrack zone=5 $ICMP_TUPLE]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | grep "orig=.src=10\.1\.1\.2,"], [1], [dnl +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - IPv4 ping]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0) +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +dnl Pings from ns1->ns0 should fail. +NS_CHECK_EXEC([at_ns1], [ping -q -c 3 -i 0.3 -w 2 10.1.1.1 | FORMAT_PING], [0], [dnl +7 packets transmitted, 0 received, 100% packet loss, time 0ms +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - get_nconns and get/set_maxconns]) +CHECK_CONNTRACK() +CHECK_CT_DPIF_SET_GET_MAXCONNS() +CHECK_CT_DPIF_GET_NCONNS() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=100,in_port=1,icmp,action=ct(commit),2 +priority=100,in_port=2,icmp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,icmp,ct_state=+trk+est,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +icmp,orig=(src=10.1.1.1,dst=10.1.1.2,id=,type=8,code=0),reply=(src=10.1.1.2,dst=10.1.1.1,id=,type=0,code=0) +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: maxconns missing or malformed (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns a], [2], [], [dnl +ovs-vswitchd: maxconns missing or malformed (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns one-bad-dp 10], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns one-bad-dp], [2], [], [dnl +ovs-vswitchd: datapath not found (Invalid argument) +ovs-appctl: ovs-vswitchd: server returned an error +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl +1 +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +3000000 +]) + +AT_CHECK([ovs-appctl dpctl/ct-set-maxconns 10], [], [dnl +setting maxconns successful +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +10 +]) + +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +AT_CHECK([ovs-appctl dpctl/ct-get-nconns], [], [dnl +0 +]) + +AT_CHECK([ovs-appctl dpctl/ct-get-maxconns], [], [dnl +10 +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - IPv6 ping]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +AT_DATA([flows.txt], [dnl + +dnl ICMPv6 echo request and reply go to table 1. The rest of the traffic goes +dnl through normal action. +table=0,priority=10,icmp6,icmp_type=128,action=goto_table:1 +table=0,priority=10,icmp6,icmp_type=129,action=goto_table:1 +table=0,priority=1,action=normal + +dnl Allow everything from ns0->ns1. Only allow return traffic from ns1->ns0. +table=1,priority=100,in_port=1,icmp6,action=ct(commit),2 +table=1,priority=100,in_port=2,icmp6,ct_state=-trk,action=ct(table=0) +table=1,priority=100,in_port=2,icmp6,ct_state=+trk+est,action=1 +table=1,priority=1,action=drop +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +dnl The above ping creates state in the connection tracker. We're not +dnl interested in that state. +AT_CHECK([ovs-appctl dpctl/flush-conntrack]) + +dnl Pings from ns1->ns0 should fail. +NS_CHECK_EXEC([at_ns1], [ping6 -q -c 3 -i 0.3 -w 2 fc00::1 | FORMAT_PING], [0], [dnl +7 packets transmitted, 0 received, 100% packet loss, time 0ms +]) + +dnl Pings from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [ping6 -q -c 3 -i 0.3 -w 2 fc00::2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl +icmpv6,orig=(src=fc00::1,dst=fc00::2,id=,type=128,code=0),reply=(src=fc00::2,dst=fc00::1,id=,type=129,code=0) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - preserve registers]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Allow any traffic from ns0->ns1, ns2->ns3. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(commit,table=0) +priority=100,in_port=1,tcp,ct_state=+trk,action=2 +priority=100,in_port=2,tcp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,tcp,ct_state=+trk,action=1 +priority=100,in_port=3,tcp,ct_state=-trk,action=load:0->NXM_NX_REG0[[]],ct(table=0) +priority=100,in_port=3,tcp,ct_state=+trk,reg0=0,action=load:1->NXM_NX_REG0[[]],ct(commit,table=0) +priority=100,in_port=3,tcp,ct_state=+trk,reg0=1,action=4 +priority=100,in_port=4,tcp,ct_state=-trk,action=ct(commit,table=0) +priority=100,in_port=4,tcp,ct_state=+trk,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +dnl HTTP requests from p2->p3 should work fine. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 --retry-connrefused -v -o wget1.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - invalid]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Pass traffic from ns0->ns1 without committing, but attempt to track in +dnl the opposite direction. This should fail. +dnl Pass traffic from ns3->ns4 without committing, and this time match +dnl invalid traffic and allow it through. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=ct(),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+new,tcp,action=1 +priority=100,in_port=3,tcp,action=ct(),4 +priority=100,in_port=4,ct_state=-trk,tcp,action=ct(table=0) +priority=100,in_port=4,ct_state=+trk+inv,tcp,action=3 +priority=100,in_port=4,ct_state=+trk+new,tcp,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl We set up our rules to allow the request without committing. The return +dnl traffic can't be identified, because the initial request wasn't committed. +dnl For the first pair of ports, this means that the connection fails. +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log], [4]) + +dnl For the second pair, we allow packets from invalid connections, so it works. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 --retry-connrefused -v -o wget1.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - zones]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Allow any traffic from ns0->ns1. Allow return traffic, matching on zone. +dnl For ns2->ns3, use a different zone and see that the match fails. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=ct(commit,zone=1),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=1) +priority=100,in_port=2,ct_state=+trk,ct_zone=1,tcp,action=1 +priority=100,in_port=3,tcp,action=ct(commit,zone=2),4 +priority=100,in_port=4,ct_state=-trk,tcp,action=ct(table=0,zone=2) +priority=100,in_port=4,ct_state=+trk,ct_zone=1,tcp,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=1,protoinfo=(state=) +]) + +dnl HTTP requests from p2->p3 should fail due to network failure. +dnl Try 3 times, in 1 second intervals. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 -v -o wget1.log], [4]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.4)], [0], [dnl +tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=,dport=),reply=(src=10.1.1.4,dst=10.1.1.3,sport=,dport=),zone=2,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - zones from field]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=load:0x1001->NXM_NX_REG0[[0..15]],ct(commit,zone=NXM_NX_REG0[[0..15]]),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=load:0x1001->NXM_NX_REG0[[0..15]],ct(table=0,zone=NXM_NX_REG0[[0..15]]) +priority=100,in_port=2,ct_state=+trk,ct_zone=0x1001,tcp,action=1 +priority=100,in_port=3,tcp,action=load:0x1002->NXM_NX_REG0[[0..15]],ct(commit,zone=NXM_NX_REG0[[0..15]]),4 +priority=100,in_port=4,ct_state=-trk,tcp,action=load:0x1002->NXM_NX_REG0[[0..15]],ct(table=0,zone=NXM_NX_REG0[[0..15]]) +priority=100,in_port=4,ct_state=+trk,ct_zone=0x1001,tcp,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=4097,protoinfo=(state=) +]) + +dnl HTTP requests from p2->p3 should fail due to network failure. +dnl Try 3 times, in 1 second intervals. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 -v -o wget1.log], [4]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.4)], [0], [dnl +tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=,dport=),reply=(src=10.1.1.4,dst=10.1.1.3,sport=,dport=),zone=4098,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - multiple bridges]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START( + [_ADD_BR([br1]) --\ + add-port br0 patch+ -- set int patch+ type=patch options:peer=patch- --\ + add-port br1 patch- -- set int patch- type=patch options:peer=patch+ --]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br1, "10.1.1.2/24") + +dnl Allow any traffic from ns0->br1, allow established in reverse. +AT_DATA([flows-br0.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=2,tcp,ct_state=-trk,action=ct(commit,zone=1),1 +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(table=0,zone=1) +priority=100,in_port=1,tcp,ct_state=+trk+est,ct_zone=1,action=2 +]) + +dnl Allow any traffic from br0->ns1, allow established in reverse. +AT_DATA([flows-br1.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(table=0,zone=2) +priority=100,in_port=1,tcp,ct_state=+trk+new,ct_zone=2,action=ct(commit,zone=2),2 +priority=100,in_port=1,tcp,ct_state=+trk+est,ct_zone=2,action=2 +priority=100,in_port=2,tcp,ct_state=-trk,action=ct(table=0,zone=2) +priority=100,in_port=2,tcp,ct_state=+trk+est,ct_zone=2,action=ct(commit,zone=2),1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows-br0.txt]) +AT_CHECK([ovs-ofctl --bundle add-flows br1 flows-br1.txt]) + +dnl HTTP requests from p0->p1 should work fine. +OVS_START_L7([at_ns1], [http]) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - multiple zones]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=ct(commit,zone=1),ct(commit,zone=2),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=2) +priority=100,in_port=2,ct_state=+trk,ct_zone=2,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +dnl (again) HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=1,protoinfo=(state=) +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=2,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - multiple namespaces, internal ports]) +CHECK_CONNTRACK() +CHECK_CONNTRACK_LOCAL_STACK() +OVS_TRAFFIC_VSWITCHD_START( + [set-fail-mode br0 secure -- ]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_INT(p0, at_ns0, br0, "10.1.1.1/24") +ADD_INT(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +dnl +dnl If skb->nfct is leaking from inside the namespace, this test will fail. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(commit,zone=1),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0,zone=1) +priority=100,in_port=2,ct_state=+trk,ct_zone=1,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +dnl (again) HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),zone=1,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP(["dnl +/ioctl(SIOCGIFINDEX) on .* device failed: No such device/d +/removing policing failed: No such device/d"]) +AT_CLEANUP + +AT_SETUP([conntrack - ct_mark]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Allow traffic between ns0<->ns1 using the ct_mark. +dnl Check that different marks do not match for traffic between ns2<->ns3. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=ct(commit,exec(set_field:1->ct_mark)),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk,ct_mark=1,tcp,action=1 +priority=100,in_port=3,tcp,action=ct(commit,exec(set_field:2->ct_mark)),4 +priority=100,in_port=4,ct_state=-trk,tcp,action=ct(table=0) +priority=100,in_port=4,ct_state=+trk,ct_mark=1,tcp,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),mark=1,protoinfo=(state=) +]) + +dnl HTTP requests from p2->p3 should fail due to network failure. +dnl Try 3 times, in 1 second intervals. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 -v -o wget1.log], [4]) +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.4)], [0], [dnl +tcp,orig=(src=10.1.1.3,dst=10.1.1.4,sport=,dport=),reply=(src=10.1.1.4,dst=10.1.1.3,sport=,dport=),mark=2,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - ct_mark bit-fiddling]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow traffic between ns0<->ns1 using the ct_mark. Return traffic should +dnl cause an additional bit to be set in the connection (and be allowed). +AT_DATA([flows.txt], [dnl +table=0,priority=1,action=drop +table=0,priority=10,arp,action=normal +table=0,priority=10,icmp,action=normal +table=0,priority=100,in_port=1,tcp,action=ct(table=1) +table=0,priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=1,commit,exec(set_field:0x2/0x6->ct_mark)) +table=1,in_port=1,ct_state=+new,tcp,action=ct(commit,exec(set_field:0x5/0x5->ct_mark)),2 +table=1,in_port=1,ct_state=-new,tcp,action=2 +table=1,in_port=2,ct_state=+trk,ct_mark=3,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),mark=3,protoinfo=(state=) +]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_BANNER([conntrack - L7]) + +AT_SETUP([conntrack - IPv4 HTTP]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,action=ct(commit),2 +priority=100,in_port=2,ct_state=-trk,tcp,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,tcp,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns0], [http]) +OVS_START_L7([at_ns1], [http]) + +dnl HTTP requests from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(10.1.1.2)], [0], [dnl +tcp,orig=(src=10.1.1.1,dst=10.1.1.2,sport=,dport=),reply=(src=10.1.1.2,dst=10.1.1.1,sport=,dport=),protoinfo=(state=) +]) + +dnl HTTP requests from ns1->ns0 should fail due to network failure. +dnl Try 3 times, in 1 second intervals. +NS_CHECK_EXEC([at_ns1], [wget 10.1.1.1 -t 3 -T 1 --retry-connrefused -v -o wget1.log], [4]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - IPv6 HTTP]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "fc00::1/96") +ADD_VETH_AFXDP(p1, at_ns1, br0, "fc00::2/96") + +dnl Allow any traffic from ns0->ns1. Only allow nd, return traffic from ns1->ns0. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,icmp6,action=normal +priority=100,in_port=1,tcp6,action=ct(commit),2 +priority=100,in_port=2,ct_state=-trk,tcp6,action=ct(table=0) +priority=100,in_port=2,ct_state=+trk+est,tcp6,action=1 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +dnl Linux seems to take a little time to get its IPv6 stack in order. Without +dnl waiting, we get occasional failures due to the following error: +dnl "connect: Cannot assign requested address" +OVS_WAIT_UNTIL([ip netns exec at_ns0 ping6 -c 1 fc00::2]) + +OVS_START_L7([at_ns0], [http6]) +OVS_START_L7([at_ns1], [http6]) + +dnl HTTP requests from ns0->ns1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget http://[[fc00::2]] -t 3 -T 1 --retry-connrefused -v -o wget0.log]) +AT_CHECK([ovs-appctl dpctl/dump-conntrack | FORMAT_CT(fc00::2)], [0], [dnl +tcp,orig=(src=fc00::1,dst=fc00::2,sport=,dport=),reply=(src=fc00::2,dst=fc00::1,sport=,dport=),protoinfo=(state=) +]) + +dnl HTTP requests from ns1->ns0 should fail due to network failure. +dnl Try 3 times, in 1 second intervals. +NS_CHECK_EXEC([at_ns1], [wget http://[[fc00::1]] -t 3 -T 1 --retry-connrefused -v -o wget1.log], [4]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([conntrack - commit, recirc]) +CHECK_CONNTRACK() +OVS_TRAFFIC_VSWITCHD_START() + +ADD_NAMESPACES(at_ns0, at_ns1, at_ns2, at_ns3) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") +ADD_VETH_AFXDP(p2, at_ns2, br0, "10.1.1.3/24") +ADD_VETH_AFXDP(p3, at_ns3, br0, "10.1.1.4/24") + +dnl Allow any traffic from ns0->ns1, ns2->ns3. +AT_DATA([flows.txt], [dnl +priority=1,action=drop +priority=10,arp,action=normal +priority=10,icmp,action=normal +priority=100,in_port=1,tcp,ct_state=-trk,action=ct(commit,table=0) +priority=100,in_port=1,tcp,ct_state=+trk,action=2 +priority=100,in_port=2,tcp,ct_state=-trk,action=ct(table=0) +priority=100,in_port=2,tcp,ct_state=+trk,action=1 +priority=100,in_port=3,tcp,ct_state=-trk,action=set_field:0->metadata,ct(table=0) +priority=100,in_port=3,tcp,ct_state=+trk,metadata=0,action=set_field:1->metadata,ct(commit,table=0) +priority=100,in_port=3,tcp,ct_state=+trk,metadata=1,action=4 +priority=100,in_port=4,tcp,ct_state=-trk,action=ct(commit,table=0) +priority=100,in_port=4,tcp,ct_state=+trk,action=3 +]) + +AT_CHECK([ovs-ofctl --bundle add-flows br0 flows.txt]) + +OVS_START_L7([at_ns1], [http]) +OVS_START_L7([at_ns3], [http]) + +dnl HTTP requests from p0->p1 should work fine. +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +dnl HTTP requests from p2->p3 should work fine. +NS_CHECK_EXEC([at_ns2], [wget 10.1.1.4 -t 3 -T 1 --retry-connrefused -v -o wget1.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + + From patchwork Wed Nov 28 21:22:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 1004841 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="pjassp1h"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 434tvQ4RSkz9s47 for ; Thu, 29 Nov 2018 08:24:18 +1100 (AEDT) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 8F2E0CA4; Wed, 28 Nov 2018 21:23:18 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 22E2F92F for ; Wed, 28 Nov 2018 21:23:15 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id D26997C3 for ; Wed, 28 Nov 2018 21:23:14 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id w73so10783519pfk.10 for ; Wed, 28 Nov 2018 13:23:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=g8XOiKZi18pihpqRV08pNUlejO/shzi9iiyeCHOFdeE=; b=pjassp1hTIH9PYvcnKigSnV6hNq4aUY5u9PPJv88wgmI1VpntU2P4MySvezTXfCiJm SLZY8pu5Mg16fAwK/NoEk09sW+gidkSMrX9IvUNbnTsR+g1Nget6BG1K8HoKtEZb6QBc 3kif0U/kdmKa72nuaZCSVCjYTbKFmkFeIyT/1+lkKl3oBiRQCKBxgOKT76JhqTZGPq/b a9wkZ3Yr6VSajT3nb6NBQkyMWttiweuIbWnBqJ1IEHETFANkoABEjik5fjpO+VjDMSnV EW4S+EEfewS8Ys9oAg/zC3PLJAweP7nUb7YvFQlkch4AcDiz1RyDALeyl3kqyYMT6YUM H6yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=g8XOiKZi18pihpqRV08pNUlejO/shzi9iiyeCHOFdeE=; b=L/splUPx1BisZNZGBCT6AtXc11/ZHrGSJ23qCvr1XMLBGcReiiErOO3T1C40R430Q8 3pvM9VLUyCo4shwzppxmkp/0ybCkiEh5i3YhIj6BoKEaKESPhZd5xo1cFyjXLr4NXsux gjS9bFrnoovHiuHFOw3LtmTXpYbKYD9SpTjbb47W7q1Pn0JYgoVLH6P7hquuV8RcgL1i M81wMx/MWxiAaTzQZL4ACSfOvIxU9NS/jkWgV+IVvJkRjQE4LOJTURLKSKyY49ssE7Ln kMawNDNbi/Tpb4fvSiPvewdy5QKuX00/5p0+Y5jh4EnQZFqLGsvo3zaZGUF/AgiwZLpq UhGQ== X-Gm-Message-State: AGRZ1gLHnOw/hRJpAUiyVOAc/4SrnhR9CWi+OWI1yE5cHbbSVe3RHKBk urFabDD1sj5FHj/lsKVl6Lg6qjsv X-Google-Smtp-Source: AJdET5eprzNPS6NRUwe5ib/ZNexl1ZL7aZi9/b0e6z0eJ/VmJFcbcPL0D6/+Dl7NIM3SMPJxu5irTQ== X-Received: by 2002:a62:ab0d:: with SMTP id p13mr39194506pff.211.1543440194071; Wed, 28 Nov 2018 13:23:14 -0800 (PST) Received: from sc9-mailhost2.vmware.com ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id t5sm10899178pfb.60.2018.11.28.13.23.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 28 Nov 2018 13:23:13 -0800 (PST) From: William Tu To: dev@openvswitch.org, iovisor-dev@lists.iovisor.org Date: Wed, 28 Nov 2018 13:22:22 -0800 Message-Id: <1543440142-27253-4-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1543440142-27253-1-git-send-email-u9012063@gmail.com> References: <1543440142-27253-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCHv3 RFC 3/3] FIXME: work around the failed cases. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org There are still two issues causing some test cases failed. This patch provides an work-around. Signed-off-by: William Tu --- lib/dpif-netdev.c | 2 +- tests/system-afxdp-macros.at | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 1564db9c6e44..0a8941309081 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -4458,7 +4458,7 @@ rxq_scheduling(struct dp_netdev *dp, bool pinned) OVS_REQUIRES(dp->port_mutex) continue; } rxqs[i]->pmd = rr_numa_get_pmd(non_local_numa, assign_cyc); - VLOG_WARN("There's no available (non-isolated) pmd thread " + VLOG_INFO("There's no available (non-isolated) pmd thread " "on numa node %d. Queue %d on port \'%s\' will " "be assigned to the pmd on core %d " "(numa node %d). Expect reduced performance.", diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at index c9d2227f9ab6..062da7020c7a 100644 --- a/tests/system-afxdp-macros.at +++ b/tests/system-afxdp-macros.at @@ -37,6 +37,8 @@ m4_define([OVS_TRAFFIC_VSWITCHD_STOP], [OVS_VSWITCHD_STOP([dnl $1";/netdev_linux.*obtaining netdev stats via vport failed/d /dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d +/dpif_netdev(revalidator.*)|ERR|internal error parsing flow key/d +/dpif(revalidator.*)|WARN|netdev@ovs-netdev: failed to put/d "]) AT_CHECK([:; $2]) ])