From patchwork Sat Aug 18 00:29:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 959158 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="u9L+tu+E"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41sgw16527z9s4c for ; Sat, 18 Aug 2018 10:30:41 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id 39E55D29; Sat, 18 Aug 2018 00:30:15 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 30C9CD19 for ; Sat, 18 Aug 2018 00:30:14 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 7157B67F for ; Sat, 18 Aug 2018 00:30:13 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id v66-v6so2914841pgb.10 for ; Fri, 17 Aug 2018 17:30:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=lI0RMFspuIXqNcHBd/wG2oc3Z6UDEYCKiQyUd1glIFg=; b=u9L+tu+EWr5r9L6x84Gmeau2kL5ttXYo8zRV7Jo5yAxeF37IIVgdKEdNNQiWhLlony e4WXfiBqIumTZthD5BONh8kht19oughltCnGx9ql3ZyW7xQqzeaLknWOrD5aJWv61qYS ePhGm6iIaNVkjN4CTdGrsDjkHzp/zK470bND27IsyTNQTN1nmSuR8+u+b7orKOCSlnW9 9AkR1EEmEjCRcUBFUkQXOD0Hv+UKD68JHx1RDtkQQrIxYnv8bhswqSGdGJfj7Nhz2gk9 j3o90SnAyRU8Tb5PKX1sFObcgIn5ygVdwxSTkBvONxkOznW4JX5r61POfBZhvbKzcUmB Kf4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=lI0RMFspuIXqNcHBd/wG2oc3Z6UDEYCKiQyUd1glIFg=; b=OKDjb/wRjybBs/clKcHXUz6iyii2N0f4VLKXALioVDm8W365SYFh1jOtD1iYMyEvs/ 0tnWFXsek/FDh5S0RL3v9eWkZKv4s/8Yfq38kW2KSFp/NIRB1SPLgxHcSGMUlbOxh4/1 6UI5xVlkY3F08nK6qkCPrT81n+W9eqLDYZOjN0BaBsohg8my7RywDhSkfMQqj1pHMh5f 8dZOpzJOipC6+ShgQNu9AzNQT4jy22vqbwoXDyeiCPlv08qug5dYWBF1wEim7NNaXTI6 2mhVPfMSm05jZaQ3pcLgIhW3go6+zzoS6//hQPHBsGiU9JgRCHjgcLNIJPc4ds564XKV SMew== X-Gm-Message-State: AOUpUlGm/8rAd2fwXcrA1ogY65PD/BHNo6t4riWto6JFIPviVY84OXig 8E/nkwPPqCxECoamtHjWUfRISPE8 X-Google-Smtp-Source: AA+uWPyB47kjabrN6fW/YUDovqfrx9LEG5KTMJZdcvIHCgN9w2JWGpOB+OKuGHk6G3D4vkdn7EUHDA== X-Received: by 2002:a65:5c4b:: with SMTP id v11-v6mr35231771pgr.445.1534552212972; Fri, 17 Aug 2018 17:30:12 -0700 (PDT) Received: from sc9-mailhost2.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id g5-v6sm4666908pfc.77.2018.08.17.17.30.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 17 Aug 2018 17:30:12 -0700 (PDT) From: William Tu To: iovisor-dev@lists.iovisor.org, dev@openvswitch.org Date: Fri, 17 Aug 2018 17:29:34 -0700 Message-Id: <1534552176-125735-2-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534552176-125735-1-git-send-email-u9012063@gmail.com> References: <1534552176-125735-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH RFC 1/3] afxdp: add ebpf code for afxdp and xskmap. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org AF_XDP requires attaching an xdp program and xskmap for each netdev. The patch provides these program/map and the loading and attaching implementation. Signed-off-by: William Tu --- acinclude.m4 | 1 + bpf/api.h | 6 ++++++ bpf/helpers.h | 2 ++ bpf/maps.h | 12 ++++++++++++ bpf/xdp.h | 34 +++++++++++++++++++++++++++++----- lib/bpf.c | 41 +++++++++++++++++++++++++++++++++++++---- lib/bpf.h | 6 ++++-- vswitchd/bridge.c | 1 + 8 files changed, 92 insertions(+), 11 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index 257de4e178a8..badc1e564487 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -376,6 +376,7 @@ AC_DEFUN([OVS_CHECK_BPF], [ AC_DEFINE([HAVE_BPF], [1], [Define to 1 if BPF is available.]) BPF_LDADD="-lbpf -lelf" + AC_DEFINE([AFXDP_NETDEV], [1], [System uses the AFXDP module.]) AC_SUBST([BPF_LDADD]) fi ]) diff --git a/bpf/api.h b/bpf/api.h index f2db1f729157..15a44b744e6a 100644 --- a/bpf/api.h +++ b/bpf/api.h @@ -131,6 +131,12 @@ sizeof(uint32_t), pin, __NR_CPUS__) #endif +#ifndef BPF_XSKMAP +# define BPF_XSKMAP(name, max_elem) \ + __BPF_MAP(name, BPF_MAP_TYPE_XSKMAP, 0, sizeof(int), \ + sizeof(int), 1, max_elem) +#endif + /** Classifier helper */ #ifndef BPF_H_DEFAULT diff --git a/bpf/helpers.h b/bpf/helpers.h index fc4c4933e189..424cc06bd6aa 100644 --- a/bpf/helpers.h +++ b/bpf/helpers.h @@ -163,6 +163,8 @@ static int (*bpf_skb_change_tail)(void *ctx, int len, int flags) = (void *) BPF_FUNC_skb_change_tail; static int (*bpf_get_hash_recalc)(void *ctx) = (void *) BPF_FUNC_get_hash_recalc; +static int (*bpf_redirect_map)(void *map, int key, int flags) = + (void *) BPF_FUNC_redirect_map; static int OVS_UNUSED vlan_push(void *ctx, ovs_be16 proto, u16 tci) { diff --git a/bpf/maps.h b/bpf/maps.h index d0e39c79a098..63953f3b045f 100644 --- a/bpf/maps.h +++ b/bpf/maps.h @@ -153,6 +153,18 @@ BPF_PERCPU_ARRAY(percpu_executing_key, 1 ); +/* af_xdp map: + * key can be anything by our design, + * value is the receive queue id the userspace + * program received from + * we need one map per device + * the only parameter is the number of queues + */ +BPF_XSKMAP(xsks_map0, 4); +BPF_XSKMAP(xsks_map1, 4); +BPF_XSKMAP(xsks_map2, 4); +BPF_XSKMAP(xsks_map3, 4); + struct ebpf_headers_t; struct ebpf_metadata_t; diff --git a/bpf/xdp.h b/bpf/xdp.h index 15c379e7f43c..c007184e950a 100644 --- a/bpf/xdp.h +++ b/bpf/xdp.h @@ -68,10 +68,34 @@ static int xdp_ingress(struct xdp_md *ctx OVS_UNUSED) #endif } -__section("af_xdp") -static int af_xdp_ingress(struct xdp_md *ctx OVS_UNUSED) +#define AFXDP_REDIRECT(xskmap) { \ + int idx = 0; \ + int flags = 0; \ + int len = (long)ctx->data_end - (long)ctx->data; \ + printt("ingress_ifindex %d rx_queue_index %d pkt len %d\n", \ + ctx->ingress_ifindex, ctx->rx_queue_index, len); \ + printt("send to queue xsk queue 0\n"); \ + return bpf_redirect_map(xskmap, idx, flags); \ +}\ + +/* For AFXDP, we need one map and one afxdp program per netdev */ +__section("afxdp0") +static int af_xdp_ingress0(struct xdp_md *ctx OVS_UNUSED) { - /* TODO: see xdpsock_kern.c ans xdpsock_user.c */ - return XDP_PASS; + AFXDP_REDIRECT(&xsks_map0); +} +__section("afxdp1") +static int af_xdp_ingress1(struct xdp_md *ctx OVS_UNUSED) +{ + AFXDP_REDIRECT(&xsks_map1); +} +__section("afxdp2") +static int af_xdp_ingress2(struct xdp_md *ctx OVS_UNUSED) +{ + AFXDP_REDIRECT(&xsks_map2); +} +__section("afxdp3") +static int af_xdp_ingress3(struct xdp_md *ctx OVS_UNUSED) +{ + AFXDP_REDIRECT(&xsks_map3); } - diff --git a/lib/bpf.c b/lib/bpf.c index 48c677e54659..d59ed1bf1e65 100644 --- a/lib/bpf.c +++ b/lib/bpf.c @@ -174,6 +174,7 @@ bpf_format_state(struct ds *ds, struct bpf_state *state) bpf_format_prog(ds, &state->egress); bpf_format_prog(ds, &state->ingress); bpf_format_prog(ds, &state->xdp); + //bpf_format_prog(ds, &state->afxdp); } /* Populates 'state' with the standard set of programs and maps for openvswitch @@ -194,6 +195,10 @@ bpf_get(struct bpf_state *state, bool verbose) {&state->egress.fd, "egress/0"}, {&state->downcall.fd, "downcall/0"}, {&state->xdp.fd, "xdp/0"}, + {&state->afxdp[0].fd, "afxdp0/0"}, + {&state->afxdp[1].fd, "afxdp1/0"}, + {&state->afxdp[2].fd, "afxdp2/0"}, + {&state->afxdp[3].fd, "afxdp3/0"}, /* BPF Maps */ {&state->upcalls.fd, "upcalls"}, {&state->flow_table.fd, "flow_table"}, @@ -201,6 +206,10 @@ bpf_get(struct bpf_state *state, bool verbose) {&state->tailcalls.fd, "tailcalls"}, {&state->execute_actions.fd, "execute_actions"}, {&state->dp_flow_stats.fd, "dp_flow_stats"}, + {&state->xsks_map[0].fd, "xsks_map0"}, + {&state->xsks_map[1].fd, "xsks_map1"}, + {&state->xsks_map[2].fd, "xsks_map2"}, + {&state->xsks_map[3].fd, "xsks_map3"}, }; int i, k, error = 0; char buf[BUFSIZ]; @@ -217,7 +226,7 @@ bpf_get(struct bpf_state *state, bool verbose) } error = bpf_obj_get(buf); if (error > 0) { - VLOG_DBG("Loaded BPF object at %s fd %d", buf, error); + VLOG_INFO("Loaded BPF object at %s fd %d", buf, error); *objs[i].fd = error; error = 0; continue; @@ -229,7 +238,7 @@ bpf_get(struct bpf_state *state, bool verbose) prog_array_fd = state->tailcalls.fd; - VLOG_DBG("start loading/pinning program array\n"); + VLOG_INFO("start loading/pinning program array\n"); for (k = 0; k < BPF_MAX_PROG_ARRAY; k++) { struct stat s; int prog_fd; @@ -243,7 +252,7 @@ bpf_get(struct bpf_state *state, bool verbose) prog_fd = bpf_obj_get(buf); if (prog_fd > 0) { - VLOG_DBG("Loaded BPF object at %s", buf); + VLOG_INFO("Loaded BPF object at %s", buf); state->tailarray[k].fd = prog_fd; error = bpf_map_update_elem(prog_array_fd, &k, &prog_fd, BPF_ANY); if (error < 0) { @@ -280,9 +289,17 @@ bpf_get(struct bpf_state *state, bool verbose) state->downcall.name = xstrdup("ovs_cls_downcall"); state->upcalls.name = xstrdup("upcalls"); state->xdp.name = xstrdup("xdp"); + state->afxdp[0].name = xstrdup("afxdp0"); + state->afxdp[1].name = xstrdup("afxdp1"); + state->afxdp[2].name = xstrdup("afxdp2"); + state->afxdp[3].name = xstrdup("afxdp3"); state->flow_table.name = xstrdup("flow_table"); state->datapath_stats.name = xstrdup("datapath_stats"); state->dp_flow_stats.name = xstrdup("dp_flow_stats"); + state->xsks_map[0].name = xstrdup("xsks_map0"); + state->xsks_map[1].name = xstrdup("xsks_map1"); + state->xsks_map[2].name = xstrdup("xsks_map2"); + state->xsks_map[3].name = xstrdup("xsks_map3"); // add parser, lookup, action, deparser state->tailcalls.name = xstrdup("tailcalls"); @@ -309,17 +326,33 @@ bpf_put(struct bpf_state *state) xclose(state->downcall.fd, state->downcall.name); xclose(state->upcalls.fd, state->upcalls.name); xclose(state->xdp.fd, state->xdp.name); + xclose(state->afxdp[0].fd, state->afxdp[0].name); + xclose(state->afxdp[1].fd, state->afxdp[1].name); + xclose(state->afxdp[2].fd, state->afxdp[2].name); + xclose(state->afxdp[3].fd, state->afxdp[3].name); xclose(state->flow_table.fd, "ovs_map_flow_table"); xclose(state->datapath_stats.fd, "ovs_datapath_stats"); xclose(state->dp_flow_stats.fd, state->dp_flow_stats.name); + xclose(state->xsks_map[0].fd, state->xsks_map[0].name); + xclose(state->xsks_map[1].fd, state->xsks_map[1].name); + xclose(state->xsks_map[2].fd, state->xsks_map[2].name); + xclose(state->xsks_map[3].fd, state->xsks_map[3].name); free((void *)state->ingress.name); free((void *)state->egress.name); free((void *)state->downcall.name); free((void *)state->upcalls.name); free((void *)state->xdp.name); + free((void *)state->afxdp[0].name); + free((void *)state->afxdp[1].name); + free((void *)state->afxdp[2].name); + free((void *)state->afxdp[3].name); free((void *)state->flow_table.name); free((void *)state->datapath_stats.name); free((void *)state->dp_flow_stats.name); + free((void *)state->xsks_map[0].name); + free((void *)state->xsks_map[1].name); + free((void *)state->xsks_map[2].name); + free((void *)state->xsks_map[3].name); } static void @@ -335,7 +368,7 @@ process(struct bpf_object *obj) int error; VLOG_DBG(" - %s\n", title); - if (strstr(title, "xdp")) { + if (strstr(title, "xdp")) { /* handle both xdp and afxdp */ error = bpf_program__set_xdp(prog); } else { error = bpf_program__set_sched_cls(prog); // or sched_act? diff --git a/lib/bpf.h b/lib/bpf.h index 4b5afaf4f77f..69091aa640d3 100644 --- a/lib/bpf.h +++ b/lib/bpf.h @@ -38,6 +38,7 @@ struct bpf_map { struct bpf_state; struct ds; +#define MAX_AFXDP_DEV 4 /* Max number of supported AFXDP netdev */ #define BPF_MAX_PROG_ARRAY 64 struct bpf_state { /* File descriptors for programs. */ @@ -46,14 +47,15 @@ struct bpf_state { struct bpf_prog downcall; /* BPF_PROG_TYPE_SCHED_CLS */ struct bpf_prog tailarray[BPF_MAX_PROG_ARRAY]; struct bpf_prog xdp; /* BPF_PROG_TYPE_XDP */ - // william: struct bpf_prog parser, deparser, action, - + struct bpf_prog afxdp[MAX_AFXDP_DEV]; /* BPF_PROG_TYPE_XDP: + each netdev need one */ struct bpf_map upcalls; /* BPF_MAP_TYPE_PERF_ARRAY */ struct bpf_map flow_table; /* BPF_MAP_TYPE_HASH */ struct bpf_map datapath_stats; /* BPF_MAP_TYPE_ARRAY */ struct bpf_map tailcalls; /* BPF_PROG_TYPE_PROG_ARRARY */ struct bpf_map execute_actions; /* BPF_MAP_TYPE_ARRAY */ struct bpf_map dp_flow_stats; /* BPF_MAP_TYPE_HASH */ + struct bpf_map xsks_map[MAX_AFXDP_DEV]; /* BPF_MAP_TYPE_XSKMAP */ }; int bpf_get(struct bpf_state *state, bool verbose); diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index ca6d73810420..56711c657dd4 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -518,6 +518,7 @@ init_ebpf(const struct ovsrec_open_vswitch *ovs_cfg OVS_UNUSED) if (ovsthread_once_start(&once)) { char *bpf_elf = xasprintf("%s/bpf/datapath.o", ovs_pkgdatadir()); + VLOG_DBG("%s bpf elf: %s", __func__, bpf_elf); error = bpf_init(); if (!error) { error = bpf_load(bpf_elf); From patchwork Sat Aug 18 00:29:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 959160 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="Mgbv4gWD"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41sgx2516rz9s4c for ; Sat, 18 Aug 2018 10:31:34 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id B37C3D4F; Sat, 18 Aug 2018 00:30:18 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id 4466BD35 for ; Sat, 18 Aug 2018 00:30:17 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pg1-f194.google.com (mail-pg1-f194.google.com [209.85.215.194]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id 5FD48189 for ; Sat, 18 Aug 2018 00:30:15 +0000 (UTC) Received: by mail-pg1-f194.google.com with SMTP id x5-v6so4246646pgp.7 for ; Fri, 17 Aug 2018 17:30:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=D/YQqtpokWYgTOqPa5c6VH+u6n6qQ/JVaBPbJuEPcTc=; b=Mgbv4gWDTyECZtt9KTlubSRYcZe8LJMG21MzFKv0mY6k11l33jTqCdVj/d50SmnK+g 6Dwk4H5E22r7dK9ESYugTblOWBh+VvQFijF/YDfwIGT234FePr9o7yeLAgt3pyqZyMvw RZQMGBE9L6rG0bCT1R3BKllsqCueh2S0BuobIol97R6RM7N9U55FofdSLXXzvj72QKpX XgWr65bv8/QKInPx2tGadJRCwrqioYDz1xIFByCoIw/40BBw2KvDbrur00s9Q5sDePTD 3pFhSMUuhT3S+eqTAYwASOsSsCEfEg7tU2Q3GvEOzzQ5e0aV93A/KfSNraSiIM71C4Kb A+5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=D/YQqtpokWYgTOqPa5c6VH+u6n6qQ/JVaBPbJuEPcTc=; b=jTvK2YIIrdbnfJhdPhXAwYBJxrUjn2pas0fPBDxdRlok5i6rboRKEjmbXvkHcRdKMz xobe+gtYrcIpeE6kWk7aQw8CwzeO5Kbd0w5AOzib5NcPDl+SCIoK/iZnUYdFzmre+xBb 93kmgDlPzaWDimwfox/hf+zW94dq0s+oise4nuTsTTmA4ywpKy93of+cugT5MXtc8+By tg8ZCzqbv1U99s6+VVKNFY8eemWh7EICl+a//dYnWuiBooC9PEkvva2ZR7hyAcbHJN1Y Tn6vBSl11YlsK9gt/Ty3aJS2hrLE5DMJ8a86i3Mgu5PExxirfEqsh5SkqOuxANvyDYz2 +Z5Q== X-Gm-Message-State: AOUpUlFXtSTm4udNleLH0uJpVDksckbOj/Waq521moxByy6p72QRud4z iKR0CXPP87GWAhm33/T9QFV6Gt4K X-Google-Smtp-Source: AA+uWPyAQrXIpcwURd3YyiuPH+fjSrtnnCFO0MorsWPBQ1NSaZjid0ufwmKZW3VoDgLHBbZ0kPqvUg== X-Received: by 2002:a63:c20:: with SMTP id b32-v6mr33951085pgl.400.1534552214510; Fri, 17 Aug 2018 17:30:14 -0700 (PDT) Received: from sc9-mailhost2.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id g5-v6sm4666908pfc.77.2018.08.17.17.30.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 17 Aug 2018 17:30:13 -0700 (PDT) From: William Tu To: iovisor-dev@lists.iovisor.org, dev@openvswitch.org Date: Fri, 17 Aug 2018 17:29:35 -0700 Message-Id: <1534552176-125735-3-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534552176-125735-1-git-send-email-u9012063@gmail.com> References: <1534552176-125735-1-git-send-email-u9012063@gmail.com> MIME-Version: 1.0 X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH RFC 2/3] netdev-linux: add new netdev type afxdp. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The patch creates a new netdev type "afxdp" and copies some of the afxdp api implementation from xdpsock_user.c at linux sample code. The afxdp ebpf programs/maps are loaded when dpif-netdev is created, and when users add a netdev with type="afxdp", ovs attaches the ebpf program/map to the netdev, and initializes the af_xdp socket. Signed-off-by: William Tu --- lib/automake.mk | 3 +- lib/dpif-netdev.c | 74 ++++- lib/if_xdp.h | 79 ++++++ lib/netdev-dummy.c | 1 + lib/netdev-linux.c | 741 +++++++++++++++++++++++++++++++++++++++++++++++++- lib/netdev-provider.h | 2 + lib/netdev-vport.c | 4 + lib/netdev.c | 11 + lib/netdev.h | 1 + 9 files changed, 907 insertions(+), 9 deletions(-) create mode 100644 lib/if_xdp.h diff --git a/lib/automake.mk b/lib/automake.mk index 61fef23152d3..e0528f74989f 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -302,7 +302,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/lldp/lldpd.c \ lib/lldp/lldpd.h \ lib/lldp/lldpd-structs.c \ - lib/lldp/lldpd-structs.h + lib/lldp/lldpd-structs.h \ + lib/if_xdp.h if WIN32 lib_libopenvswitch_la_SOURCES += \ diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index baff020fe3d0..9f0300ac4e91 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -76,6 +76,11 @@ #include "unixctl.h" #include "util.h" +#include "bpf.h" +#include "netdev.h" +#include "openvswitch/thread.h" +#include + VLOG_DEFINE_THIS_MODULE(dpif_netdev); #define FLOW_DUMP_MAX_BATCH 50 @@ -507,6 +512,12 @@ struct tx_port { struct dp_netdev_rxq *output_pkts_rxqs[NETDEV_MAX_BURST]; }; +static struct dp_bpf { + struct bpf_state bpf; + struct netdev *outport; /* Used for downcall. */ +} bpf_datapath; + + /* A set of properties for the current processing loop that is not directly * associated with the pmd thread itself, but with the packets being * processed or the short-term system configuration (for example, time). @@ -1121,6 +1132,8 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], static int dpif_netdev_init(void) { + int error; + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; static enum pmd_info_type show_aux = PMD_INFO_SHOW_STATS, clear_aux = PMD_INFO_CLEAR_STATS, poll_aux = PMD_INFO_SHOW_RXQ; @@ -1137,6 +1150,17 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-rebalance", "[dp]", 0, 1, dpif_netdev_pmd_rebalance, NULL); + + // load the bpf program + if (ovsthread_once_start(&once)) { + // we don't need downcall device here + error = bpf_get(&bpf_datapath.bpf, true); + if (error) { + VLOG_ERR("%s: Load BPF datapath failed", __func__); + } + } + ovsthread_once_done(&once); + return 0; } @@ -1504,7 +1528,26 @@ dp_netdev_reload_pmd__(struct dp_netdev_pmd_thread *pmd) ovs_mutex_cond_wait(&pmd->cond, &pmd->cond_mutex); ovs_mutex_unlock(&pmd->cond_mutex); } - +/* +static bool output_to_local_stack(struct netdev *netdev) +{ + return !strcmp(netdev_get_type(netdev), "tap"); +} +*/ +static bool netdev_support_xdp(const char *devname) +{ + /* + struct netdev_linux *netdev_linux = netdev_linux_cast(netdev_linux); + if (netdev_linux->ifindex == 0) + return false; +*/ + if (!strstr(devname, "afxdp")) { + return false; + } else { + return true; + } +} +static int afxdp_idx; static int port_create(const char *devname, const char *type, odp_port_t port_no, struct dp_netdev_port **portp) @@ -1519,7 +1562,7 @@ port_create(const char *devname, const char *type, /* Open and validate network device. */ error = netdev_open(devname, type, &netdev); - VLOG_INFO("%s %s error %d", __func__, devname, error); + VLOG_INFO("%s %s type = %s error %d", __func__, devname, type, error); if (error) { return error; } @@ -1538,6 +1581,23 @@ port_create(const char *devname, const char *type, goto out; } + if (!strcmp(type, "afxdp")) { + // or a separate set_af_xdp? + // FIXME: + VLOG_INFO("using afxdp port idx %d", afxdp_idx); + error = netdev_set_xdp(netdev, &bpf_datapath.bpf.afxdp[afxdp_idx]); + if (error) { + VLOG_WARN("%s XDP set failed", __func__); + goto out; + } + error = netdev_set_xskmap(netdev, bpf_datapath.bpf.xsks_map[afxdp_idx].fd); + if (error) { + VLOG_ERR("%s XSK map set error\n", __func__); + goto out; + } + afxdp_idx++; + } + port = xzalloc(sizeof *port); port->port_no = port_no; port->netdev = netdev; @@ -5008,8 +5068,11 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_flow *flow; if (OVS_UNLIKELY(dp_packet_size(packet) < ETH_HEADER_LEN)) { + static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); + dp_packet_delete(packet); n_dropped++; + VLOG_ERR_RL(&rl, "%s dropped packet size %d\n", __func__, dp_packet_size(packet)); continue; } @@ -5254,6 +5317,13 @@ dp_netdev_input__(struct dp_netdev_pmd_thread *pmd, n_batches = 0; emc_processing(pmd, packets, keys, batches, &n_batches, md_is_valid, port_no); +/* + if (dp_packet_batch_is_empty(packets)) { + VLOG_WARN("%s: batch is empty ", __func__); + } else { + VLOG_WARN("%s: batch is %lu ", __func__, packets->count); + } +*/ if (!dp_packet_batch_is_empty(packets)) { /* Get ingress port from first packet's metadata. */ in_port = packets->packets[0]->md.in_port.odp_port; diff --git a/lib/if_xdp.h b/lib/if_xdp.h new file mode 100644 index 000000000000..2a8c5780166f --- /dev/null +++ b/lib/if_xdp.h @@ -0,0 +1,79 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * if_xdp: XDP socket user-space interface + * Copyright(c) 2018 Intel Corporation. + * + * Author(s): Björn Töpel + * Magnus Karlsson + */ + +#ifndef _LINUX_IF_XDP_H +#define _LINUX_IF_XDP_H + +#include +#include + +/* Options for the sxdp_flags field */ +#define XDP_SHARED_UMEM (1 << 0) +#define XDP_COPY (1 << 1) /* Force copy-mode */ +#define XDP_ZEROCOPY (1 << 2) /* Force zero-copy mode */ + +struct sockaddr_xdp { + __u16 sxdp_family; + __u16 sxdp_flags; + __u32 sxdp_ifindex; + __u32 sxdp_queue_id; + __u32 sxdp_shared_umem_fd; +}; + +struct xdp_ring_offset { + __u64 producer; + __u64 consumer; + __u64 desc; +}; + +struct xdp_mmap_offsets { + struct xdp_ring_offset rx; + struct xdp_ring_offset tx; + struct xdp_ring_offset fr; /* Fill */ + struct xdp_ring_offset cr; /* Completion */ +}; + +/* XDP socket options */ +#define XDP_MMAP_OFFSETS 1 +#define XDP_RX_RING 2 +#define XDP_TX_RING 3 +#define XDP_UMEM_REG 4 +#define XDP_UMEM_FILL_RING 5 +#define XDP_UMEM_COMPLETION_RING 6 +#define XDP_STATISTICS 7 + +struct xdp_umem_reg { + __u64 addr; /* Start of packet data area */ + __u64 len; /* Length of packet data area */ + __u32 chunk_size; + __u32 headroom; +}; + +struct xdp_statistics { + __u64 rx_dropped; /* Dropped for reasons other than invalid desc */ + __u64 rx_invalid_descs; /* Dropped due to invalid descriptor */ + __u64 tx_invalid_descs; /* Dropped due to invalid descriptor */ +}; + +/* Pgoff for mmaping the rings */ +#define XDP_PGOFF_RX_RING 0 +#define XDP_PGOFF_TX_RING 0x80000000 +#define XDP_UMEM_PGOFF_FILL_RING 0x100000000ULL +#define XDP_UMEM_PGOFF_COMPLETION_RING 0x180000000ULL + +/* Rx/Tx descriptor */ +struct xdp_desc { + __u64 addr; + __u32 len; + __u32 options; +}; + +/* UMEM descriptor is __u64 */ + +#endif /* _LINUX_IF_XDP_H */ diff --git a/lib/netdev-dummy.c b/lib/netdev-dummy.c index 44c9458a9a22..c7a065ed7ba8 100644 --- a/lib/netdev-dummy.c +++ b/lib/netdev-dummy.c @@ -1429,6 +1429,7 @@ netdev_dummy_update_flags(struct netdev *netdev_, NULL, /* set_policing */ \ NULL, /* set_filter */ \ NULL, /* set_xdp */ \ + NULL, /* set_xskmap */ \ NULL, /* get_qos_types */ \ NULL, /* get_qos_capabilities */ \ NULL, /* get_qos */ \ diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 121dd3bc738e..6546ff88aee6 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -88,6 +88,519 @@ COVERAGE_DEFINE(netdev_set_hwaddr); COVERAGE_DEFINE(netdev_get_ethtool); COVERAGE_DEFINE(netdev_set_ethtool); +#ifdef AFXDP_NETDEV +// ========================================================= +#ifndef SOL_XDP +#define SOL_XDP 283 +#endif + +#ifndef AF_XDP +#define AF_XDP 44 +#endif + +#ifndef PF_XDP +#define PF_XDP AF_XDP +#endif + +#define NUM_FRAMES 128 +#define FRAME_HEADROOM 0 +#define FRAME_SIZE 2048 +#define NUM_DESCS 32 + +#define FQ_NUM_DESCS 32 +#define CQ_NUM_DESCS 32 + +#define DEBUG_HEXDUMP 0 + +typedef __u32 u32; +typedef uint64_t u64; + +#include "lib/xdpsock.h" +static u32 opt_xdp_flags; // now alwyas set to SKB_MODE at bpf_set_link_xdp_fd +static u32 opt_xdp_bind_flags; + +struct xdp_uqueue { + u32 cached_prod; + u32 cached_cons; + u32 mask; + u32 size; + u32 *producer; + u32 *consumer; + struct xdp_desc *ring; + void *map; +}; + +struct xdpsock { + struct xdp_uqueue rx; + struct xdp_uqueue tx; + int sfd; + struct xdp_umem *umem; + u32 outstanding_tx; + unsigned long rx_npkts; + unsigned long tx_npkts; + unsigned long prev_rx_npkts; + unsigned long prev_tx_npkts; +}; + +#define MAX_SOCKS 4 + +#define barrier() __asm__ __volatile__("": : :"memory") +#define u_smp_rmb() barrier() +#define u_smp_wmb() barrier() +#define likely(x) __builtin_expect(!!(x), 1) +#define unlikely(x) __builtin_expect(!!(x), 0) + +static const char pkt_data[] = + "\x3c\xfd\xfe\x9e\x7f\x71\xec\xb1\xd7\x98\x3a\xc0\x08\x00\x45\x00" + "\x00\x2e\x00\x00\x00\x00\x40\x11\x88\x97\x05\x08\x07\x08\xc8\x14" + "\x1e\x04\x10\x92\x10\x92\x00\x1a\x6d\xa3\x34\x33\x1f\x69\x40\x6b" + "\x54\x59\xb6\x14\x2d\x11\x44\xbf\xaf\xd9\xbe\xaa"; + +static inline u32 xq_nb_avail(struct xdp_uqueue *q, u32 ndescs) +{ + u32 entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + + return (entries > ndescs) ? ndescs : entries; +} + +static inline u32 umem_nb_free(struct xdp_umem_uqueue *q, u32 nb) +{ + u32 free_entries = q->cached_cons - q->cached_prod; + VLOG_INFO("0: %s cons %d prod %d\n", __func__, q->cached_cons, q->cached_prod); + + if (free_entries >= nb) + return free_entries; + + /* Refresh the local tail pointer */ + q->cached_cons = (*q->consumer + q->size) & q->mask; + + VLOG_INFO("%s cons %d prod %d\n", __func__, q->cached_cons, q->cached_prod); + VLOG_INFO("consumer %d, size %d\n", *q->consumer, q->size); + + return q->cached_cons - q->cached_prod; +} + +static inline int umem_fill_to_kernel_ex(struct xdp_umem_uqueue *fq, + struct xdp_desc *d, + size_t nb) +{ + u32 i; + + VLOG_INFO("%s nb = %d", __func__, nb); + if (umem_nb_free(fq, nb) < nb) { + VLOG_ERR("%s error\n", __func__); + return -ENOSPC; + } + + for (i = 0; i < nb; i++) { + u32 idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i].addr; + } + + u_smp_wmb(); + + *fq->producer = fq->cached_prod; + + VLOG_INFO("%s producer at %d\n", __func__, *fq->producer); + return 0; +} + +static inline int umem_fill_to_kernel(struct xdp_umem_uqueue *fq, uint64_t *d, + size_t nb) +{ + u32 i; + + if (umem_nb_free(fq, nb) < nb) { + VLOG_ERR("%s error\n", __func__); + return -ENOSPC; + } + + for (i = 0; i < nb; i++) { + u32 idx = fq->cached_prod++ & fq->mask; + + fq->ring[idx] = d[i]; + } + + u_smp_wmb(); + + *fq->producer = fq->cached_prod; + + VLOG_INFO("%s producer at %d\n", __func__, *fq->producer); + return 0; +} + +static inline u32 umem_nb_avail(struct xdp_umem_uqueue *q, u32 nb) +{ + u32 entries = q->cached_prod - q->cached_cons; + + if (entries == 0) { + q->cached_prod = *q->producer; + entries = q->cached_prod - q->cached_cons; + } + + return (entries > nb) ? nb : entries; +} + +static inline size_t umem_complete_from_kernel(struct xdp_umem_uqueue *cq, + uint64_t *d, size_t nb) +{ + u32 idx, i, entries = umem_nb_avail(cq, nb); + + u_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = cq->cached_cons++ & cq->mask; + d[i] = cq->ring[idx]; + } + + if (entries > 0) { + u_smp_wmb(); + + *cq->consumer = cq->cached_cons; + } + + return entries; +} + +static struct xdp_umem *xdp_umem_configure(int sfd) +{ + int fq_size = FQ_NUM_DESCS, cq_size = CQ_NUM_DESCS; + struct xdp_mmap_offsets off; + struct xdp_umem_reg mr; + struct xdp_umem *umem; + socklen_t optlen; + void *bufs; + + umem = calloc(1, sizeof(*umem)); + ovs_assert(umem); + + VLOG_DBG("enter: %s \n", __func__); + ovs_assert(posix_memalign(&bufs, getpagesize(), /* PAGE_SIZE aligned */ + NUM_FRAMES * FRAME_SIZE) == 0); + + VLOG_INFO("%s shared umem from %p to %p", __func__, + bufs, (char*)bufs + NUM_FRAMES * FRAME_SIZE); + + mr.addr = (__u64)bufs; + mr.len = NUM_FRAMES * FRAME_SIZE; + mr.chunk_size = FRAME_SIZE; + mr.headroom = FRAME_HEADROOM; + + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_REG, &mr, sizeof(mr)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_FILL_RING, &fq_size, + sizeof(int)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_UMEM_COMPLETION_RING, &cq_size, + sizeof(int)) == 0); + + optlen = sizeof(off); + ovs_assert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off, + &optlen) == 0); + + umem->fq.map = mmap(0, off.fr.desc + + FQ_NUM_DESCS * sizeof(u64), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_FILL_RING); + ovs_assert(umem->fq.map != MAP_FAILED); + + umem->fq.mask = FQ_NUM_DESCS - 1; + umem->fq.size = FQ_NUM_DESCS; + umem->fq.producer = (void *)((char *)umem->fq.map + off.fr.producer); + umem->fq.consumer = (void *)((char *)umem->fq.map + off.fr.consumer); + umem->fq.ring = (void *)((char *)umem->fq.map + off.fr.desc); + umem->fq.cached_cons = FQ_NUM_DESCS; + + umem->cq.map = mmap(0, off.cr.desc + + CQ_NUM_DESCS * sizeof(u64), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_UMEM_PGOFF_COMPLETION_RING); + ovs_assert(umem->cq.map != MAP_FAILED); + + umem->cq.mask = CQ_NUM_DESCS - 1; + umem->cq.size = CQ_NUM_DESCS; + umem->cq.producer = umem->cq.map + off.cr.producer; + umem->cq.consumer = umem->cq.map + off.cr.consumer; + umem->cq.ring = umem->cq.map + off.cr.desc; + + umem->frames = bufs; + umem->fd = sfd; + +#if 0 + if (opt_bench == BENCH_TXONLY) { + int i; + + for (i = 0; i < NUM_FRAMES; i++) + (void)gen_eth_frame(&umem->frames[i][0]); + } +#endif + return umem; +} + +static struct xdpsock *xsk_configure(struct xdp_umem *umem, + int ifindex, int queue) +{ + struct sockaddr_xdp sxdp = {}; + struct xdp_mmap_offsets off; + int sfd, ndescs = NUM_DESCS; + struct xdpsock *xsk; + bool shared = false; + socklen_t optlen; + u64 i; + + opt_xdp_flags |= XDP_FLAGS_SKB_MODE; + opt_xdp_bind_flags |= XDP_COPY; + + sfd = socket(PF_XDP, SOCK_RAW, 0); + ovs_assert(sfd >= 0); + + xsk = calloc(1, sizeof(*xsk)); + ovs_assert(xsk); + + xsk->sfd = sfd; + xsk->outstanding_tx = 0; + + VLOG_DBG("enter: %s xsk fd %d", __func__, sfd); + if (!umem) { + shared = false; + xsk->umem = xdp_umem_configure(sfd); + } else { + xsk->umem = umem; + ovs_assert(0); + } + + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_RX_RING, + &ndescs, sizeof(int)) == 0); + ovs_assert(setsockopt(sfd, SOL_XDP, XDP_TX_RING, + &ndescs, sizeof(int)) == 0); + optlen = sizeof(off); + ovs_assert(getsockopt(sfd, SOL_XDP, XDP_MMAP_OFFSETS, &off, + &optlen) == 0); + + /* Rx */ + xsk->rx.map = mmap(NULL, + off.rx.desc + + NUM_DESCS * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_PGOFF_RX_RING); + ovs_assert(xsk->rx.map != MAP_FAILED); + + if (!shared) { + for (i = 0; i < NUM_DESCS * FRAME_SIZE; i += FRAME_SIZE) + ovs_assert(umem_fill_to_kernel(&xsk->umem->fq, &i, 1) + == 0); + } + + // FIXME: we also configure tx here + /* Tx */ + xsk->tx.map = mmap(NULL, + off.tx.desc + + NUM_DESCS * sizeof(struct xdp_desc), + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, sfd, + XDP_PGOFF_TX_RING); + ovs_assert(xsk->tx.map != MAP_FAILED); + + xsk->rx.mask = NUM_DESCS - 1; + xsk->rx.size = NUM_DESCS; + xsk->rx.producer = xsk->rx.map + off.rx.producer; + xsk->rx.consumer = xsk->rx.map + off.rx.consumer; + xsk->rx.ring = xsk->rx.map + off.rx.desc; + + xsk->tx.mask = NUM_DESCS - 1; + xsk->tx.size = NUM_DESCS; + xsk->tx.producer = xsk->tx.map + off.tx.producer; + xsk->tx.consumer = xsk->tx.map + off.tx.consumer; + xsk->tx.ring = xsk->tx.map + off.tx.desc; + xsk->tx.cached_cons = NUM_DESCS; + + /* XSK socket */ + sxdp.sxdp_family = PF_XDP; + sxdp.sxdp_ifindex = ifindex; + sxdp.sxdp_queue_id = queue; + + if (shared) { + sxdp.sxdp_flags = XDP_SHARED_UMEM; + sxdp.sxdp_shared_umem_fd = umem->fd; + } else { + sxdp.sxdp_flags = opt_xdp_bind_flags; + } + + ovs_assert(bind(sfd, (struct sockaddr *)&sxdp, sizeof(sxdp)) == 0); + + return xsk; +} + +static inline int xq_deq(struct xdp_uqueue *uq, + struct xdp_desc *descs, + int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int idx; + int i, entries; + + entries = xq_nb_avail(uq, ndescs); + + u_smp_rmb(); + + for (i = 0; i < entries; i++) { + idx = uq->cached_cons++ & uq->mask; + descs[i] = r[idx]; + } + + if (entries > 0) { + u_smp_wmb(); + + *uq->consumer = uq->cached_cons; + VLOG_INFO("%s entries %d consumer %d\n", __func__, entries, *uq->consumer); + } + return entries; +} + +static inline void *xq_get_data(struct xdpsock *xsk, u64 addr) +{ + return &xsk->umem->frames[addr]; +} + +static void vlog_hex_dump(const void *buf, size_t count) +{ + struct ds ds = DS_EMPTY_INITIALIZER; + ds_put_hex_dump(&ds, buf, count, 0, false); + VLOG_INFO("\n%s", ds_cstr(&ds)); + ds_destroy(&ds); +} + +static void kick_tx(int fd) +{ + int ret; + + VLOG_DBG("%s: send to fd %d", __func__, fd); + ret = sendto(fd, NULL, 0, MSG_DONTWAIT, NULL, 0); + if (ret >= 0 || errno == ENOBUFS || errno == EAGAIN) + return; + ovs_assert(0); +} + +static inline void complete_tx_l2fwd(struct xdpsock *xsk) +{ + u64 descs[BATCH_SIZE]; + unsigned int rcvd; + size_t ndescs; + + if (!xsk->outstanding_tx) + return; + + kick_tx(xsk->sfd); + ndescs = (xsk->outstanding_tx > BATCH_SIZE) ? BATCH_SIZE : + xsk->outstanding_tx; + + /* re-add completed Tx buffers */ + rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, ndescs); + + if (rcvd > 0) { + umem_fill_to_kernel(&xsk->umem->fq, descs, rcvd); + xsk->outstanding_tx -= rcvd; + xsk->tx_npkts += rcvd; + } +} + +static inline void complete_tx_only(struct xdpsock *xsk) +{ + u64 descs[BATCH_SIZE]; + unsigned int rcvd; + + if (!xsk->outstanding_tx) { + VLOG_DBG("no outstanding_tx"); + return; + } + + kick_tx(xsk->sfd); + + rcvd = umem_complete_from_kernel(&xsk->umem->cq, descs, BATCH_SIZE); + if (rcvd > 0) { + xsk->outstanding_tx -= rcvd; + xsk->tx_npkts += rcvd; + } +} + +static inline u32 xq_nb_free(struct xdp_uqueue *q, u32 ndescs) +{ + u32 free_entries = q->cached_cons - q->cached_prod; + + if (free_entries >= ndescs) + return free_entries; + + /* Refresh the local tail pointer */ + q->cached_cons = *q->consumer + q->size; + return q->cached_cons - q->cached_prod; +} + +static inline int xq_enq(struct xdp_uqueue *uq, + const struct xdp_desc *descs, + unsigned int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int i; + + if (xq_nb_free(uq, ndescs) < ndescs) + return -ENOSPC; + + for (i = 0; i < ndescs; i++) { + u32 idx = uq->cached_prod++ & uq->mask; + + r[idx].addr = descs[i].addr; + r[idx].len = descs[i].len; + } + + u_smp_wmb(); + + *uq->producer = uq->cached_prod; + return 0; +} + +static inline int xq_enq_tx_only(struct xdp_uqueue *uq, + unsigned int id, unsigned int ndescs) +{ + struct xdp_desc *r = uq->ring; + unsigned int i; + + if (xq_nb_free(uq, ndescs) < ndescs) + return -ENOSPC; + + for (i = 0; i < ndescs; i++) { + u32 idx = uq->cached_prod++ & uq->mask; + + r[idx].addr = (id + i) << FRAME_SHIFT; + r[idx].len = sizeof(pkt_data) - 1; + } + + u_smp_wmb(); + + *uq->producer = uq->cached_prod; + return 0; +} + +static inline void print_xsk_stat(struct xdpsock *xsk) { + struct xdp_statistics stat; + socklen_t optlen; + + optlen = sizeof(stat); + ovs_assert(getsockopt(xsk->sfd, SOL_XDP, XDP_STATISTICS, + &stat, &optlen) == 0); + + VLOG_INFO("rx dropped %llu, rx_invalid %llu, tx_invalid %llu", + stat.rx_dropped, stat.rx_invalid_descs, stat.tx_invalid_descs); + +} +// ========================================================= +#endif /* These were introduced in Linux 2.6.14, so they might be missing if we have * old headers. */ @@ -522,6 +1035,8 @@ struct netdev_linux { int tap_fd; bool present; /* If the device is present in the namespace */ uint64_t tx_dropped; /* tap device can drop if the iface is down */ + struct xdpsock *xsk[16]; /* af_xdp socket: each queue has one xdp sock */ + int xskmap_fd; /* map netdev's queue id to xsk fd */ }; struct netdev_rxq_linux { @@ -571,6 +1086,12 @@ is_netdev_linux_class(const struct netdev_class *netdev_class) } static bool +is_afxdp_netdev(const struct netdev *netdev) +{ + return netdev_get_class(netdev) == &netdev_afxdp_class; +} + +static bool is_tap_netdev(const struct netdev *netdev) { return netdev_get_class(netdev) == &netdev_tap_class; @@ -921,6 +1442,13 @@ netdev_linux_destruct(struct netdev *netdev_) atomic_count_dec(&miimon_cnt); } + if (is_afxdp_netdev(netdev_)) { + int ifindex; + + get_ifindex(netdev_, &ifindex); + bpf_set_link_xdp_fd(ifindex, -1, XDP_FLAGS_SKB_MODE); + } + ovs_mutex_destroy(&netdev->mutex); } @@ -950,6 +1478,44 @@ netdev_linux_rxq_construct(struct netdev_rxq *rxq_) rx->is_tap = is_tap_netdev(netdev_); if (rx->is_tap) { rx->fd = netdev->tap_fd; + } else if (is_afxdp_netdev(netdev_)) { + // setup AF_XDP socket here, see xsk_configure + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; + int ifindex, num_socks = 0; + struct xdpsock *xsk; + int queue_id = 0; // FIXME + int key = 0; + int xsk_fd; + + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + VLOG_ERR("ERROR: setrlimit(RLIMIT_MEMLOCK) \"%s\"\n", + ovs_strerror(errno)); + ovs_assert(0); + } + + VLOG_INFO("%s: %s: queue=%d configuring xdp sock", + __func__, netdev_->name, queue_id); + + /* Get ethernet device index. */ + error = get_ifindex(&netdev->up, &ifindex); + if (error) { + goto error; + } + + xsk = xsk_configure(NULL, ifindex, queue_id); + + netdev->xsk[num_socks++] = xsk; + rx->fd = xsk->sfd; //for upper layer to poll + xsk_fd = xsk->sfd; + + if (xsk_fd) { + error = bpf_map_update_elem(netdev->xskmap_fd, &key, &xsk_fd, 0); + if (error) { + VLOG_ERR("failed to set xsks_map: %s", ovs_strerror(error)); + return error; + } + } + } else { struct sockaddr_ll sll; int ifindex, val; @@ -1149,6 +1715,58 @@ netdev_linux_rxq_recv_tap(int fd, struct dp_packet *buffer) return 0; } +/* Receive packet from AF_XDP socket */ +static int +netdev_linux_rxq_xsk(struct xdpsock *xsk, + struct dp_packet_batch *batch) +{ + struct xdp_desc descs[NETDEV_MAX_BURST]; + unsigned int rcvd, i = 0; + int ret = 0; + + rcvd = xq_deq(&xsk->rx, descs, NETDEV_MAX_BURST); + if (rcvd == 0) { + return 0; + } + + VLOG_INFO("%s receive %d packets xsk fd %d", + __func__, rcvd, xsk->sfd); + + for (i = 0; i < rcvd; i++) { + struct dp_packet *packet; + void *base, *new_packet; + + packet = xmalloc(sizeof *packet); + + VLOG_INFO("%s packet len %d", __func__, descs[i].len); + base = xq_get_data(xsk, descs[i].addr); + + //vlog_hex_dump(base, 14); + new_packet = malloc(2048); + memcpy(new_packet, base, descs[i].len); + + //dp_packet_use(packet, base, descs[i].len); + dp_packet_use(packet, new_packet, descs[i].len); + + packet->source = DPBUF_MALLOC; + //dp_packet_set_data(packet, base); // no offset now? + dp_packet_set_data(packet, new_packet); // no offset now? + dp_packet_set_size(packet, descs[i].len); + + /* add packet into batch, batch->count inc */ + dp_packet_batch_add(batch, packet); + } + + xsk->rx_npkts += rcvd; + umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd); + + //batch->count = rcvd; // batch_add inc the counter + //don't put it back to FILL queue yet. + + print_xsk_stat(xsk); + return ret; +} + static int netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch) { @@ -1157,6 +1775,8 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch) struct dp_packet *buffer; ssize_t retval; int mtu; + struct netdev_linux *netdev_ = netdev_linux_cast(netdev); + if (netdev_linux_get_mtu__(netdev_linux_cast(netdev), &mtu)) { mtu = ETH_PAYLOAD_MAX; @@ -1166,15 +1786,20 @@ netdev_linux_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch) buffer = dp_packet_new_with_headroom(VLAN_ETH_HEADER_LEN + mtu, DP_NETDEV_HEADROOM); retval = (rx->is_tap - ? netdev_linux_rxq_recv_tap(rx->fd, buffer) - : netdev_linux_rxq_recv_sock(rx->fd, buffer)); - + ? netdev_linux_rxq_recv_tap(rx->fd, buffer) : + (is_afxdp_netdev(netdev) ? netdev_linux_rxq_xsk(netdev_->xsk[0], batch) : + netdev_linux_rxq_recv_sock(rx->fd, buffer))); if (retval) { if (retval != EAGAIN && retval != EMSGSIZE) { VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s", netdev_rxq_get_name(rxq_), ovs_strerror(errno)); } dp_packet_delete(buffer); + } else if (is_afxdp_netdev(netdev)) { + dp_packet_batch_init_packet_fields(batch); + + if (batch->count != 0) + VLOG_INFO("%s AFXDP recv %lu packets", __func__, batch->count); } else { dp_packet_batch_init_packet(batch, buffer); } @@ -1208,6 +1833,66 @@ netdev_linux_rxq_drain(struct netdev_rxq *rxq_) } static int +netdev_linux_afxdp_batch_send(struct xdpsock *xsk, /* send to xdp socket! */ + struct dp_packet_batch *batch) +{ + struct dp_packet *packet; + struct xdp_uqueue *uq; + struct xdp_desc *r; + int ndescs = batch->count; + u32 id = NUM_FRAMES / 2; + + VLOG_INFO("%s send %lu packet to fd %d", __func__, batch->count, xsk->sfd); + VLOG_INFO("%s outstanding tx %d", __func__, xsk->outstanding_tx); + + /* cleanup and refill */ + uq = &xsk->tx; + r = uq->ring; + + // see tx_only and xq_enq_tx_only + if (xq_nb_free(uq, ndescs) < ndescs) { + VLOG_ERR("no free desc"); + return -ENOSPC; + } + + DP_PACKET_BATCH_FOR_EACH (packet, batch) { + void *umem_buf; + + u32 idx = uq->cached_prod++ & uq->mask; + // FIXME: find available id + umem_buf = xsk->umem->frames + (id << FRAME_SHIFT); + + memcpy(umem_buf, dp_packet_data(packet), dp_packet_size(packet)); + //vlog_hex_dump(dp_packet_data(packet), 14); + r[idx].addr = (id << FRAME_SHIFT); + r[idx].len = dp_packet_size(packet); + id++; +#if 0 /* avoid copy */ + } else { + u32 idx = uq->cached_prod++ & uq->mask; + + VLOG_WARN("packet from umem %p", dp_packet_base(packet)); + vlog_hex_dump(dp_packet_base(packet), 14); + + r[idx].addr = (u64)(u64 *)dp_packet_base(packet); + r[idx].len = dp_packet_size(packet); + } +#endif + } + u_smp_wmb(); + + *uq->producer = uq->cached_prod; + + xsk->outstanding_tx += batch->count; + + complete_tx_only(xsk); + print_xsk_stat(xsk); + + return 0; +} + + +static int netdev_linux_sock_batch_send(int sock, int ifindex, struct dp_packet_batch *batch) { @@ -1312,21 +1997,32 @@ netdev_linux_send(struct netdev *netdev_, int qid OVS_UNUSED, int error = 0; int sock = 0; - if (!is_tap_netdev(netdev_)) { + if (!is_tap_netdev(netdev_) && + !is_afxdp_netdev(netdev_)) { sock = af_packet_sock(); if (sock < 0) { error = -sock; + VLOG_WARN("%s af sock < 0", __func__); goto free_batch; } int ifindex = netdev_get_ifindex(netdev_); if (ifindex < 0) { + VLOG_WARN("%s ifindex < 0", __func__); error = -ifindex; goto free_batch; } error = netdev_linux_sock_batch_send(sock, ifindex, batch); + } else if (is_afxdp_netdev(netdev_)) { + struct xdpsock *xsk; + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + + xsk = netdev->xsk[0]; // FIXME: always use queue 0 + VLOG_INFO_RL(&rl, "XXX %s sent to AFXDP dev xsk %d", __func__, xsk->sfd); + error = netdev_linux_afxdp_batch_send(xsk, batch); } else { + VLOG_INFO_RL(&rl, "%s sent to tap dev", __func__); error = netdev_linux_tap_batch_send(netdev_, batch); } if (error) { @@ -2426,12 +3122,22 @@ netdev_linux_set_xdp__(struct netdev *netdev_, const struct bpf_prog *prog, { struct netdev_linux *netdev = netdev_linux_cast(netdev_); const char *netdev_name = netdev_get_name(netdev_); - int ifindex = netdev->ifindex; + int ifindex; int error; - VLOG_DBG("Setting %s XDP filter %d on %s (ifindex %d)", prog->name, + error = get_ifindex(netdev_, &ifindex); + if (error) { + return ENODEV; + } + + + VLOG_INFO("Setting %s XDP filter %d on %s (ifindex %d)", prog->name, prog->fd, netdev_name, ifindex); + if (ifindex == 0) { + VLOG_WARN("skip device %s", netdev_name); + return 0; + } if (netdev->cache_valid & valid_bit) { error = *filter_error; if (error || (prog && prog->fd == *netdev_filter)) { @@ -2456,6 +3162,19 @@ out: } static int +netdev_linux_set_xskmap(struct netdev *netdev_, int xskmap_fd) +{ + struct netdev_linux *netdev = netdev_linux_cast(netdev_); + + ovs_assert(xskmap_fd != 0); + + VLOG_INFO("%s xsks_map fd %d", __func__, xskmap_fd); + netdev->xskmap_fd = xskmap_fd; + + return 0; +} + +static int netdev_linux_set_xdp(struct netdev *netdev_, const struct bpf_prog *prog) { struct netdev_linux *netdev = netdev_linux_cast(netdev_); @@ -3167,6 +3886,7 @@ netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off, netdev_linux_set_policing, \ netdev_linux_set_filter, \ netdev_linux_set_xdp, \ + netdev_linux_set_xskmap, \ netdev_linux_get_qos_types, \ netdev_linux_get_qos_capabilities, \ netdev_linux_get_qos, \ @@ -3201,6 +3921,15 @@ netdev_linux_update_flags(struct netdev *netdev_, enum netdev_flags off, FLOW_OFFLOAD_API \ } +const struct netdev_class netdev_afxdp_class = + NETDEV_LINUX_CLASS( + "afxdp", + netdev_linux_construct, + netdev_linux_get_stats, + netdev_linux_get_features, + netdev_linux_get_status, + LINUX_FLOW_OFFLOAD_API); + const struct netdev_class netdev_linux_class = NETDEV_LINUX_CLASS( "system", diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index 3e53a5b76272..df92275d5aff 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -515,6 +515,7 @@ struct netdev_class { * * This function may be set to null if filters are not supported. */ int (*set_xdp)(struct netdev *netdev, const struct bpf_prog *); + int (*set_xskmap)(struct netdev *netdev, int xsks_map_fd); /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves * it empty if 'netdev' does not support QoS. Any names added to 'types' @@ -884,6 +885,7 @@ extern const struct netdev_class netdev_bsd_class; extern const struct netdev_class netdev_windows_class; #else extern const struct netdev_class netdev_linux_class; +extern const struct netdev_class netdev_afxdp_class; #endif extern const struct netdev_class netdev_internal_class; extern const struct netdev_class netdev_tap_class; diff --git a/lib/netdev-vport.c b/lib/netdev-vport.c index 4341c89894a3..a61ff4b6808c 100644 --- a/lib/netdev-vport.c +++ b/lib/netdev-vport.c @@ -1000,6 +1000,9 @@ netdev_vport_set_xdp(struct netdev *netdev_, const struct bpf_prog *prog) ifindex = netdev_vport_get_ifindex(netdev_); error = bpf_set_link_xdp_fd(ifindex, prog->fd, XDP_FLAGS_SKB_MODE); + // FIXME / TODO + // update xsks_map_fd + ovs_mutex_unlock(&netdev->mutex); VLOG_INFO("%s %d", __func__, error); @@ -1057,6 +1060,7 @@ netdev_vport_set_xdp(struct netdev *netdev_, const struct bpf_prog *prog) NULL, /* set_policing */ \ netdev_vport_set_filter, /* set_filter */ \ netdev_vport_set_xdp, /* set_xdp */ \ + NULL, /* set_xskmap */ \ NULL, /* get_qos_types */ \ NULL, /* get_qos_capabilities */ \ NULL, /* get_qos */ \ diff --git a/lib/netdev.c b/lib/netdev.c index c44a1a683b92..826555dd92f6 100644 --- a/lib/netdev.c +++ b/lib/netdev.c @@ -142,6 +142,7 @@ netdev_initialize(void) #ifdef __linux__ netdev_register_provider(&netdev_linux_class); + netdev_register_provider(&netdev_afxdp_class); netdev_register_provider(&netdev_internal_class); netdev_register_provider(&netdev_tap_class); netdev_vport_tunnel_register(); @@ -1474,6 +1475,16 @@ netdev_set_xdp(struct netdev *netdev, struct bpf_prog *prog) : EOPNOTSUPP); } +/* set xsk map */ +int +netdev_set_xskmap(struct netdev *netdev, int xskmap) +{ + return (netdev->netdev_class->set_xskmap + ? netdev->netdev_class->set_xskmap(netdev, xskmap) + : EOPNOTSUPP); +} + + /* Adds to 'types' all of the forms of QoS supported by 'netdev', or leaves it * empty if 'netdev' does not support QoS. Any names added to 'types' should * be documented as valid for the "type" column in the "QoS" table in diff --git a/lib/netdev.h b/lib/netdev.h index 3388504d85c9..3a8d7118378e 100644 --- a/lib/netdev.h +++ b/lib/netdev.h @@ -320,6 +320,7 @@ int netdev_set_policing(struct netdev *, uint32_t kbits_rate, uint32_t kbits_burst); int netdev_set_filter(struct netdev *netdev, struct bpf_prog *prog); int netdev_set_xdp(struct netdev *netdev, struct bpf_prog *prog); +int netdev_set_xskmap(struct netdev *netdev, int xsks_map_fd); int netdev_get_qos_types(const struct netdev *, struct sset *types); int netdev_get_qos_capabilities(const struct netdev *, From patchwork Sat Aug 18 00:29:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: William Tu X-Patchwork-Id: 959159 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=openvswitch.org (client-ip=140.211.169.12; helo=mail.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="NE7hzDu3"; dkim-atps=neutral Received: from mail.linuxfoundation.org (mail.linuxfoundation.org [140.211.169.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 41sgwY3JQ3z9s4c for ; Sat, 18 Aug 2018 10:31:09 +1000 (AEST) Received: from mail.linux-foundation.org (localhost [127.0.0.1]) by mail.linuxfoundation.org (Postfix) with ESMTP id E6D9CD48; Sat, 18 Aug 2018 00:30:17 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@mail.linuxfoundation.org Received: from smtp1.linuxfoundation.org (smtp1.linux-foundation.org [172.17.192.35]) by mail.linuxfoundation.org (Postfix) with ESMTPS id BE9EFD35 for ; Sat, 18 Aug 2018 00:30:16 +0000 (UTC) X-Greylist: whitelisted by SQLgrey-1.7.6 Received: from mail-pf1-f196.google.com (mail-pf1-f196.google.com [209.85.210.196]) by smtp1.linuxfoundation.org (Postfix) with ESMTPS id E781467F for ; Sat, 18 Aug 2018 00:30:15 +0000 (UTC) Received: by mail-pf1-f196.google.com with SMTP id e13-v6so4156378pff.7 for ; Fri, 17 Aug 2018 17:30:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:subject:date:message-id:in-reply-to:references; bh=WDlSabWMSh98oRGmEltPs/+2Ll3Eld0SktPVRkWtGMs=; b=NE7hzDu3f/05lChePksJTP5YHVDReXc4VnRcivjxkdp0qGZ2Y+zkyV8U3Yjq0vhTEC jMbnetcUCikSnDYFhcCXpoebsmM3PsJqchgVY8abdvmBfZ1pPAn/GUajPMY1FkzZm0nz W5Udr0i0/vKC7RL9+JaExF53/wPzbaky+RAoIB7LNCP1uTzSPDjDDEXtJTSfo8sINcne CdqdXKJOa3SsVYrTurbuFfXeMdEP/NNPEQ+cbWTB5Mzo/63SjJvE4f7KsxyXoYqBAYd9 scM4lJf3MHZbogaAOKYg9btnIVvyOcEGwwstEBaDWELffiRUCGibP7wIk11+8evceMd1 WM5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references; bh=WDlSabWMSh98oRGmEltPs/+2Ll3Eld0SktPVRkWtGMs=; b=pf7gNsk9GCQ8zGHOefP/Uqk3nmP3ySRD4CSw+sVyjn9kjv0Fp9aLhA9yWVwaRnlv3k mPw6J6zE4XAAK7/22ogwZKk9lXvm2pJ81Of4kSD87T73Bp/tl7T4sXqv+DeCGVGdF6nT fHdVLMx3azJz0XlShg9JbYsX4gwGuANevs4Icl+qFoHj5Q2jKEME4heXGggpc/xZA0/Q zJzSFclg7rmIrvhjgQI7teIMdjkULUYGbr/ii3czxRe/j/nYKJoao2eBluPUdc2u92ty zCLCg7ESwq5WEtrGFaGSJWdvC5yfeS6lm3m+tmviK9Eoz6sz6tByiDqJzF899GhqW46t JJNQ== X-Gm-Message-State: AOUpUlGYYyJNDfbIoUggwM0sLJikfG/qzMSI97sLFh1NH00ED7QFTTwD z4yplemlevLGwylXhCuUm7Y= X-Google-Smtp-Source: AA+uWPzF1akzAnKs8+mLtEd4FAzN+8UtM+KKTObdcjLS1SivXP0slVxRWjrJGsz8mwpGsfCPTrIvRQ== X-Received: by 2002:a63:c24c:: with SMTP id l12-v6mr7369259pgg.255.1534552215414; Fri, 17 Aug 2018 17:30:15 -0700 (PDT) Received: from sc9-mailhost2.vmware.com (c-73-231-16-221.hsd1.ca.comcast.net. [73.231.16.221]) by smtp.gmail.com with ESMTPSA id g5-v6sm4666908pfc.77.2018.08.17.17.30.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 17 Aug 2018 17:30:14 -0700 (PDT) From: William Tu To: iovisor-dev@lists.iovisor.org, dev@openvswitch.org Date: Fri, 17 Aug 2018 17:29:36 -0700 Message-Id: <1534552176-125735-4-git-send-email-u9012063@gmail.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1534552176-125735-1-git-send-email-u9012063@gmail.com> References: <1534552176-125735-1-git-send-email-u9012063@gmail.com> X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE autolearn=no version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on smtp1.linux-foundation.org Subject: [ovs-dev] [PATCH RFC 3/3] tests: add afxdp test cases. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ovs-dev-bounces@openvswitch.org Errors-To: ovs-dev-bounces@openvswitch.org The patch adds the test framework for OVS using afxdp. Currently there are two test cases, using ping and http as traffic forwarding through ovs with 2 afxdp netdev. Signed-off-by: William Tu --- tests/automake.mk | 17 +++++ tests/ofproto-macros.at | 1 + tests/system-afxdp-macros.at | 148 ++++++++++++++++++++++++++++++++++++++++ tests/system-afxdp-testsuite.at | 25 +++++++ tests/system-afxdp-traffic.at | 38 +++++++++++ 5 files changed, 229 insertions(+) create mode 100644 tests/system-afxdp-macros.at create mode 100644 tests/system-afxdp-testsuite.at create mode 100644 tests/system-afxdp-traffic.at diff --git a/tests/automake.mk b/tests/automake.mk index 732dc4ab9bdc..c112ae3fcf13 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -4,11 +4,13 @@ EXTRA_DIST += \ $(SYSTEM_TESTSUITE_AT) \ $(SYSTEM_KMOD_TESTSUITE_AT) \ $(SYSTEM_USERSPACE_TESTSUITE_AT) \ + $(SYSTEM_AFXDP_TESTSUITE_AT) \ $(SYSTEM_BPF_TESTSUITE_AT) \ $(SYSTEM_OFFLOADS_TESTSUITE_AT) \ $(TESTSUITE) \ $(SYSTEM_KMOD_TESTSUITE) \ $(SYSTEM_USERSPACE_TESTSUITE) \ + $(SYSTEM_AFXDP_TESTSUITE) \ $(SYSTEM_BPF_TESTSUITE) \ $(SYSTEM_OFFLOADS_TESTSUITE) \ tests/atlocal.in \ @@ -124,6 +126,11 @@ SYSTEM_USERSPACE_TESTSUITE_AT = \ tests/system-userspace-macros.at \ tests/system-userspace-packet-type-aware.at +SYSTEM_AFXDP_TESTSUITE_AT = \ + tests/system-afxdp-testsuite.at \ + tests/system-afxdp-traffic.at \ + tests/system-afxdp-macros.at + SYSTEM_TESTSUITE_AT = \ tests/system-common-macros.at \ tests/system-ovn.at \ @@ -142,6 +149,7 @@ TESTSUITE = $(srcdir)/tests/testsuite TESTSUITE_PATCH = $(srcdir)/tests/testsuite.patch SYSTEM_KMOD_TESTSUITE = $(srcdir)/tests/system-kmod-testsuite SYSTEM_USERSPACE_TESTSUITE = $(srcdir)/tests/system-userspace-testsuite +SYSTEM_AFXDP_TESTSUITE = $(srcdir)/tests/system-afxdp-testsuite SYSTEM_BPF_TESTSUITE = $(srcdir)/tests/system-bpf-testsuite BPF_TESTSUITE_PATCH = $(srcdir)/tests/system-bpf-testsuite.patch SYSTEM_OFFLOADS_TESTSUITE = $(srcdir)/tests/system-offloads-testsuite @@ -273,6 +281,11 @@ check-system-userspace: all set $(SHELL) '$(SYSTEM_USERSPACE_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) +check-afxdp: all + $(MAKE) install + set $(SHELL) '$(SYSTEM_AFXDP_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ + "$$@" || (test X'$(RECHECK)' = Xyes && "$$@" --recheck) + check-bpf: all $(MAKE) install set $(SHELL) '$(SYSTEM_BPF_TESTSUITE)' -C tests AUTOTEST_PATH='$(AUTOTEST_PATH)' $(TESTSUITEFLAGS) -j1; \ @@ -306,6 +319,10 @@ $(SYSTEM_USERSPACE_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_USERSP $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ +$(SYSTEM_AFXDP_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_AFXDP_TESTSUITE_AT) $(COMMON_MACROS_AT) + $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at + $(AM_V_at)mv $@.tmp $@ + $(SYSTEM_BPF_TESTSUITE): package.m4 $(SYSTEM_TESTSUITE_AT) $(SYSTEM_BPF_TESTSUITE_AT) $(BPF_TESTSUITE_PATCH) $(COMMON_MACROS_AT) $(AM_V_GEN)$(AUTOTEST) -I '$(srcdir)' -o $@.tmp $@.at $(AM_V_at)mv $@.tmp $@ diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at index 487e40cc8ef2..e39498f24800 100644 --- a/tests/ofproto-macros.at +++ b/tests/ofproto-macros.at @@ -336,6 +336,7 @@ m4_define([_OVS_VSWITCHD_START], on_exit "kill_ovs_vswitchd `cat ovs-vswitchd.pid`" AT_CHECK([[sed < stderr ' /bpf|INFO|/d +/netdev_linux|INFO|.*/d /ovs_numa|INFO|Discovered /d /vlog|INFO|opened log file/d /vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d diff --git a/tests/system-afxdp-macros.at b/tests/system-afxdp-macros.at new file mode 100644 index 000000000000..21e07ea05e3b --- /dev/null +++ b/tests/system-afxdp-macros.at @@ -0,0 +1,148 @@ +# _ADD_BR([name]) +# +# Expands into the proper ovs-vsctl commands to create a bridge with the +# appropriate type and properties +m4_define([_ADD_BR], [[add-br $1 -- set Bridge $1 datapath_type="netdev" protocols=OpenFlow10,OpenFlow11,OpenFlow12,OpenFlow13,OpenFlow14,OpenFlow15 fail-mode=secure ]]) + +# OVS_TRAFFIC_VSWITCHD_START([vsctl-args], [vsctl-output], [=override]) +# +# Creates a database and starts ovsdb-server, starts ovs-vswitchd +# connected to that database, calls ovs-vsctl to create a bridge named +# br0 with predictable settings, passing 'vsctl-args' as additional +# commands to ovs-vsctl. If 'vsctl-args' causes ovs-vsctl to provide +# output (e.g. because it includes "create" commands) then 'vsctl-output' +# specifies the expected output after filtering through uuidfilt. +m4_define([OVS_TRAFFIC_VSWITCHD_START], + [ + export OVS_PKGDATADIR=$(`pwd`) + #OVS_WAIT_WHILE([ip link show ovs-netdev]) + umount /sys/fs/bpf/ + AT_CHECK([mount -t bpf none /sys/fs/bpf]) + AT_CHECK([mkdir -p /sys/fs/bpf/ovs]) + _OVS_VSWITCHD_START([--disable-system]) + dnl Add bridges, ports, etc. + OVS_WAIT_WHILE([ip link show br0]) + AT_CHECK([ovs-vsctl -- _ADD_BR([br0]) -- $1 m4_if([$2], [], [], [| uuidfilt])], [0], [$2]) +]) + +# OVS_TRAFFIC_VSWITCHD_STOP([WHITELIST], [extra_cmds]) +# +# Gracefully stops ovs-vswitchd and ovsdb-server, checking their log files +# for messages with severity WARN or higher and signaling an error if any +# is present. The optional WHITELIST may contain shell-quoted "sed" +# commands to delete any warnings that are actually expected, e.g.: +# +# OVS_TRAFFIC_VSWITCHD_STOP(["/expected error/d"]) +# +# 'extra_cmds' are shell commands to be executed afte OVS_VSWITCHD_STOP() is +# invoked. They can be used to perform additional cleanups such as name space +# removal. +m4_define([OVS_TRAFFIC_VSWITCHD_STOP], + [OVS_VSWITCHD_STOP([dnl +$1";/netdev_linux.*obtaining netdev stats via vport failed/d +/dpif_netlink.*Generic Netlink family 'ovs_datapath' does not exist. The Open vSwitch kernel module is probably not loaded./d"]) + AT_CHECK([:; $2]) + AT_CHECK([umount /sys/fs/bpf]) + ]) + +m4_define([ADD_VETH_AFXDP], + [ AT_CHECK([ip link add $1 type veth peer name ovs-$1 || return 77]) + CONFIGURE_VETH_OFFLOADS([$1]) + AT_CHECK([ip link set $1 netns $2]) + AT_CHECK([ip link set dev ovs-$1 up]) + AT_CHECK([ovs-vsctl add-port $3 ovs-$1 -- \ + set interface ovs-$1 external-ids:iface-id="$1" type="afxdp"]) + NS_CHECK_EXEC([$2], [ip addr add $4 dev $1 $7]) + NS_CHECK_EXEC([$2], [ip link set dev $1 up]) + if test -n "$5"; then + NS_CHECK_EXEC([$2], [ip link set dev $1 address $5]) + fi + if test -n "$6"; then + NS_CHECK_EXEC([$2], [ip route add default via $6]) + fi + on_exit 'ip link del ovs-$1' + ] +) + + +# CONFIGURE_VETH_OFFLOADS([VETH]) +# +# Disable TX offloads for veths. The userspace datapath uses the AF_PACKET +# socket to receive packets for veths. Unfortunately, the AF_PACKET socket +# doesn't play well with offloads: +# 1. GSO packets are received without segmentation and therefore discarded. +# 2. Packets with offloaded partial checksum are received with the wrong +# checksum, therefore discarded by the receiver. +# +# By disabling tx offloads in the non-OVS side of the veth peer we make sure +# that the AF_PACKET socket will not receive bad packets. +# +# This is a workaround, and should be removed when offloads are properly +# supported in netdev-linux. +m4_define([CONFIGURE_VETH_OFFLOADS], + [AT_CHECK([ethtool -K $1 tx off], [0], [ignore], [ignore])] +) + +# CHECK_CONNTRACK() +# +# Perform requirements checks for running conntrack tests. +# +m4_define([CHECK_CONNTRACK], + [AT_SKIP_IF([test $HAVE_PYTHON = no])] +) + +# CHECK_CONNTRACK_ALG() +# +# Perform requirements checks for running conntrack ALG tests. The userspace +# supports FTP and TFTP. +# +m4_define([CHECK_CONNTRACK_ALG]) + +# CHECK_CONNTRACK_FRAG() +# +# Perform requirements checks for running conntrack fragmentations tests. +# The userspace doesn't support fragmentation yet, so skip the tests. +m4_define([CHECK_CONNTRACK_FRAG], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_LOCAL_STACK() +# +# Perform requirements checks for running conntrack tests with local stack. +# While the kernel connection tracker automatically passes all the connection +# tracking state from an internal port to the OpenvSwitch kernel module, there +# is simply no way of doing that with the userspace, so skip the tests. +m4_define([CHECK_CONNTRACK_LOCAL_STACK], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CONNTRACK_NAT() +# +# Perform requirements checks for running conntrack NAT tests. The userspace +# datapath supports NAT. +# +m4_define([CHECK_CONNTRACK_NAT]) + +# CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE() +# +# Perform requirements checks for running ovs-dpctl flush-conntrack by +# conntrack 5-tuple test. The userspace datapath does not support +# this feature yet. +m4_define([CHECK_CT_DPIF_FLUSH_BY_CT_TUPLE], +[ + AT_SKIP_IF([:]) +]) + +# CHECK_CT_DPIF_SET_GET_MAXCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-set-maxconns or +# ovs-dpctl ct-get-maxconns. The userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_SET_GET_MAXCONNS]) + +# CHECK_CT_DPIF_GET_NCONNS() +# +# Perform requirements checks for running ovs-dpctl ct-get-nconns. The +# userspace datapath does support this feature. +m4_define([CHECK_CT_DPIF_GET_NCONNS]) diff --git a/tests/system-afxdp-testsuite.at b/tests/system-afxdp-testsuite.at new file mode 100644 index 000000000000..ff56ba3c56ab --- /dev/null +++ b/tests/system-afxdp-testsuite.at @@ -0,0 +1,25 @@ +AT_INIT + +AT_COPYRIGHT([Copyright (c) 2018 Nicira, Inc. + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.]) + +m4_ifdef([AT_COLOR_TESTS], [AT_COLOR_TESTS]) + +m4_include([tests/ovs-macros.at]) +m4_include([tests/ovsdb-macros.at]) +m4_include([tests/ofproto-macros.at]) +m4_include([tests/system-afxdp-macros.at]) +m4_include([tests/system-common-macros.at]) + +m4_include([tests/system-afxdp-traffic.at]) diff --git a/tests/system-afxdp-traffic.at b/tests/system-afxdp-traffic.at new file mode 100644 index 000000000000..cee33c274f69 --- /dev/null +++ b/tests/system-afxdp-traffic.at @@ -0,0 +1,38 @@ +AT_BANNER([AFXDP netdev datapath-sanity]) + +AT_SETUP([datapath - ping between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH_AFXDP(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH_AFXDP(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP + +AT_SETUP([datapath - http between two ports]) +OVS_TRAFFIC_VSWITCHD_START() + +AT_CHECK([ovs-ofctl add-flow br0 "actions=normal"]) + +ADD_NAMESPACES(at_ns0, at_ns1) + +ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24") +ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24") + +NS_CHECK_EXEC([at_ns0], [ping -q -c 3 -i 0.3 -w 2 10.1.1.2 | FORMAT_PING], [0], [dnl +3 packets transmitted, 3 received, 0% packet loss, time 0ms +]) + +OVS_START_L7([at_ns1], [http]) +NS_CHECK_EXEC([at_ns0], [wget 10.1.1.2 -t 3 -T 1 --retry-connrefused -v -o wget0.log]) + +OVS_TRAFFIC_VSWITCHD_STOP +AT_CLEANUP +