From patchwork Thu Apr 2 11:13:46 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noa Ezra X-Patchwork-Id: 1265526 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=mellanox.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48tL6v5mKMz9sQt for ; Thu, 2 Apr 2020 22:14:15 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 259AE88703; Thu, 2 Apr 2020 11:14:14 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id STTnqMUePBMc; Thu, 2 Apr 2020 11:14:08 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id AEBB888794; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 968A8C1D89; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 100B7C07FF for ; Thu, 2 Apr 2020 11:14:02 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id F0380886C8 for ; Thu, 2 Apr 2020 11:14:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ldSBOAXnI-wt for ; Thu, 2 Apr 2020 11:14:00 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by hemlock.osuosl.org (Postfix) with ESMTP id 4DC17886CF for ; Thu, 2 Apr 2020 11:13:58 +0000 (UTC) Received: from Internal Mail-Server by MTLPINE2 (envelope-from noae@mellanox.com) with ESMTPS (AES256-SHA encrypted); 2 Apr 2020 14:13:50 +0300 Received: from pegasus27.mtr.labs.mlnx. (pegasus27.mtr.labs.mlnx [10.210.16.14]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 032BDoaP010235; Thu, 2 Apr 2020 14:13:50 +0300 From: Noa Ezra To: ovs-dev@openvswitch.org Date: Thu, 2 Apr 2020 11:13:46 +0000 Message-Id: <1585826027-330699-2-git-send-email-noae@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585826027-330699-1-git-send-email-noae@mellanox.com> References: <1585826027-330699-1-git-send-email-noae@mellanox.com> Cc: elibr@mellanox.com, Ameer Mahagneh , Noa Ezra Subject: [ovs-dev] [PATCH ovs v1 1/2] netdev-dpdk-vdpa: Introduce dpdkvdpa netdev X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" vDPA netdev is designed to support both SW and HW use cases. HW mode will be used to configure vDPA capable devices. SW acceleration is used to leverage SRIOV offloads to virtio guests by relaying packets between VF and virtio devices. Add the SW relay forwarding logic as a pre-step for adding dpdkvdpa port with no functional change. Signed-off-by: Noa Ezra Reviewed-by: Oz Shlomo --- lib/automake.mk | 4 +- lib/netdev-dpdk-vdpa.c | 820 +++++++++++++++++++++++++++++++++++++++++++++++++ lib/netdev-dpdk-vdpa.h | 55 ++++ 3 files changed, 878 insertions(+), 1 deletion(-) create mode 100755 lib/netdev-dpdk-vdpa.c create mode 100644 lib/netdev-dpdk-vdpa.h diff --git a/lib/automake.mk b/lib/automake.mk index 95925b5..b57682c 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -146,6 +146,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/netdev-offload.h \ lib/netdev-offload-provider.h \ lib/netdev-provider.h \ + lib/netdev-dpdk-vdpa.h \ lib/netdev-vport.c \ lib/netdev-vport.h \ lib/netdev-vport-private.h \ @@ -429,7 +430,8 @@ if DPDK_NETDEV lib_libopenvswitch_la_SOURCES += \ lib/dpdk.c \ lib/netdev-dpdk.c \ - lib/netdev-offload-dpdk.c + lib/netdev-offload-dpdk.c \ + lib/netdev-dpdk-vdpa.c else lib_libopenvswitch_la_SOURCES += \ lib/dpdk-stub.c diff --git a/lib/netdev-dpdk-vdpa.c b/lib/netdev-dpdk-vdpa.c new file mode 100755 index 0000000..c6ed061 --- /dev/null +++ b/lib/netdev-dpdk-vdpa.c @@ -0,0 +1,820 @@ +/* + * Copyright (c) 2019 Mellanox Technologies, Ltd. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include +#include "netdev-dpdk-vdpa.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "netdev-provider.h" +#include "openvswitch/vlog.h" +#include "dp-packet.h" +#include "util.h" + +VLOG_DEFINE_THIS_MODULE(netdev_dpdk_vdpa); + +#define NETDEV_DPDK_VDPA_SIZEOF_MBUF (sizeof(struct rte_mbuf *)) +#define NETDEV_DPDK_VDPA_MAX_QPAIRS 16 +#define NETDEV_DPDK_VDPA_INVALID_QUEUE_ID 0xFFFF +#define NETDEV_DPDK_VDPA_STATS_MAX_STR_SIZE 64 +#define NETDEV_DPDK_VDPA_RX_DESC_DEFAULT 512 + +enum netdev_dpdk_vdpa_port_type { + NETDEV_DPDK_VDPA_PORT_TYPE_VM, + NETDEV_DPDK_VDPA_PORT_TYPE_VF +}; + +struct netdev_dpdk_vdpa_relay_flow { + struct rte_flow *flow; + bool queues_en[RTE_MAX_QUEUES_PER_PORT]; + uint32_t priority; +}; + +struct netdev_dpdk_vdpa_qpair { + uint16_t port_id_rx; + uint16_t port_id_tx; + uint16_t pr_queue; + uint8_t mb_head; + uint8_t mb_tail; + struct rte_mbuf *pkts[NETDEV_MAX_BURST * 2]; +}; + +struct netdev_dpdk_vdpa_relay { + PADDED_MEMBERS(CACHE_LINE_SIZE, + struct netdev_dpdk_vdpa_qpair qpair[NETDEV_DPDK_VDPA_MAX_QPAIRS * 2]; + uint16_t num_queues; + struct netdev_dpdk_vdpa_relay_flow flow_params; + int port_id_vm; + int port_id_vf; + uint16_t vf_mtu; + int n_rxq; + char *vf_pci; + char *vm_socket; + char *vhost_name; + bool started; + ); +}; + +static int +netdev_dpdk_vdpa_port_from_name(const char *name) +{ + int port_id; + size_t len; + + len = strlen(name); + for (port_id = 0; port_id < RTE_MAX_ETHPORTS; port_id++) { + if (rte_eth_dev_is_valid_port(port_id) && + !strncmp(name, rte_eth_devices[port_id].device->name, len)) { + return port_id; + } + } + VLOG_ERR("No port was found for %s", name); + return ENODEV; +} + +static void +netdev_dpdk_vdpa_free(void *ptr) +{ + if (ptr == NULL) { + return; + } + free(ptr); + ptr = NULL; +} + +static void +netdev_dpdk_vdpa_clear_relay(struct netdev_dpdk_vdpa_relay *relay) +{ + uint16_t q; + uint8_t i; + + for (q = 0; q < relay->num_queues; q++) { + for (i = relay->qpair[q].mb_head; i < relay->qpair[q].mb_tail; i++) { + rte_pktmbuf_free(relay->qpair[q].pkts[i]); + } + relay->qpair[q].mb_head = 0; + relay->qpair[q].mb_tail = 0; + relay->qpair[q].port_id_rx = 0; + relay->qpair[q].port_id_tx = 0; + relay->qpair[q].pr_queue = NETDEV_DPDK_VDPA_INVALID_QUEUE_ID; + } + + relay->started = false; + relay->port_id_vm = 0; + relay->port_id_vf = 0; + relay->num_queues = 0; + relay->flow_params.flow = NULL; + memset(&relay->flow_params, 0, sizeof relay->flow_params); +} + +static void +netdev_dpdk_vdpa_free_relay_strings(struct netdev_dpdk_vdpa_relay *relay) +{ + netdev_dpdk_vdpa_free(relay->vm_socket); + netdev_dpdk_vdpa_free(relay->vf_pci); + netdev_dpdk_vdpa_free(relay->vhost_name); +} + +static int +netdev_dpdk_vdpa_generate_rss_flow(struct netdev_dpdk_vdpa_relay *relay) +{ + struct rte_flow_attr attr; + struct rte_flow_item pattern[2]; + struct rte_flow_action action[2]; + struct rte_flow *flow = NULL; + struct rte_flow_error error; + static struct rte_flow_action_rss action_rss = { + .func = RTE_ETH_HASH_FUNCTION_DEFAULT, + .level = 0, + .types = ETH_RSS_IP | ETH_RSS_UDP | ETH_RSS_TCP, + .key_len = 0, + .key = NULL, + }; + uint16_t queue[RTE_MAX_QUEUES_PER_PORT]; + uint32_t i; + uint32_t j; + int err = 0; + + memset(pattern, 0, sizeof pattern); + memset(action, 0, sizeof action); + memset(&attr, 0, sizeof(struct rte_flow_attr)); + attr.ingress = 1; + attr.priority = !relay->flow_params.priority; + + for (i = 0, j = 0; i < RTE_MAX_QUEUES_PER_PORT; i++) { + if (relay->flow_params.queues_en[i]) { + queue[j++] = i; + } + } + + action_rss.queue = queue; + action_rss.queue_num = j; + + action[0].type = RTE_FLOW_ACTION_TYPE_RSS; + action[0].conf = &action_rss; + action[1].type = RTE_FLOW_ACTION_TYPE_END; + + pattern[0].type = RTE_FLOW_ITEM_TYPE_ETH; + pattern[0].spec = NULL; + pattern[0].mask = NULL; + pattern[1].type = RTE_FLOW_ITEM_TYPE_END; + + flow = rte_flow_create(relay->port_id_vf, &attr, pattern, action, &error); + if (flow == NULL) { + VLOG_ERR("Failed to create flow. msg: %s", + error.message ? error.message : "(no stated reason)"); + err = EINVAL; + goto out; + } + + if (relay->flow_params.flow != NULL) { + err = rte_flow_destroy(relay->port_id_vf, relay->flow_params.flow, + &error); + if (err < 0) { + VLOG_ERR("Failed to destroy flow. msg: %s", + error.message ? error.message : "(no stated reason)"); + goto out; + } + } + + relay->flow_params.flow = flow; + relay->flow_params.priority = attr.priority; +out: + return err; +} + +static int +netdev_dpdk_vdpa_queue_state(struct netdev_dpdk_vdpa_relay *relay, + uint16_t port) +{ + struct rte_eth_vhost_queue_event event; + uint32_t q_id; + int err = 0; + + while (!rte_eth_vhost_get_queue_event(port, &event)) { + q_id = (event.rx ? event.queue_id * 2 : event.queue_id * 2 + 1); + if ((q_id >= relay->num_queues) && event.enable) { + VLOG_ERR("netdev_dpdk_vdpa_queue_state: " + "Queue %u is higher than max queues configures for port " + "%u. Max queues configured: %u", + q_id, port, relay->num_queues); + return ENODEV; + } + relay->flow_params.queues_en[event.queue_id] = event.enable; + /* Load balance the relay's queues on the pr's queues in round robin */ + relay->qpair[q_id].pr_queue = (event.enable ? q_id % relay->n_rxq : + NETDEV_DPDK_VDPA_INVALID_QUEUE_ID); + if (!event.rx) { + relay->flow_params.queues_en[event.queue_id] = event.enable; + err = netdev_dpdk_vdpa_generate_rss_flow(relay); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_generate_rss_flow failed"); + return err; + } + } + } + + return 0; +} + +static int +netdev_dpdk_vdpa_queue_state_cb_fn(uint16_t port_id, + enum rte_eth_event_type type OVS_UNUSED, + void *param, + void *ret_param OVS_UNUSED) +{ + struct netdev_dpdk_vdpa_relay *relay = param; + int ret = 0; + + ret = netdev_dpdk_vdpa_queue_state(relay, port_id); + if (ret) { + VLOG_ERR("netdev_dpdk_vdpa_queue_state failed for port %u", port_id); + return ret; + } + + return 0; +} + +static int +netdev_dpdk_vdpa_link_status_cb_fn(uint16_t port_id, + enum rte_eth_event_type type OVS_UNUSED, + void *param, + void *ret_param OVS_UNUSED) +{ + struct netdev_dpdk_vdpa_relay *relay = param; + struct rte_eth_link link; + int q; + + rte_eth_link_get_nowait(port_id, &link); + if (!link.link_status) { + for (q = 0; q < NETDEV_DPDK_VDPA_MAX_QPAIRS; q++) { + relay->qpair[q].pr_queue = NETDEV_DPDK_VDPA_INVALID_QUEUE_ID; + } + for (q = 0; q < RTE_MAX_QUEUES_PER_PORT; q++) { + relay->flow_params.queues_en[q] = false; + } + } + + return 0; +} + +static void +netdev_dpdk_vdpa_close_dev(struct netdev_dpdk_vdpa_relay *relay, + int port_id) +{ + rte_eth_dev_stop(port_id); + if (rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_QUEUE_STATE, + netdev_dpdk_vdpa_queue_state_cb_fn, + relay)) { + VLOG_ERR("rte_eth_dev_callback_unregister failed for port id %u" + "event type RTE_ETH_EVENT_QUEUE_STATE", port_id); + } + if (rte_eth_dev_callback_unregister(port_id, RTE_ETH_EVENT_INTR_LSC, + netdev_dpdk_vdpa_link_status_cb_fn, + relay)) { + VLOG_ERR("rte_eth_dev_callback_unregister failed for port id %u" + "event type RTE_ETH_EVENT_INTR_LSC", port_id); + } + + if (port_id == relay->port_id_vf) { + if (relay->flow_params.flow != NULL) { + struct rte_flow_error error; + if (rte_flow_destroy(port_id, relay->flow_params.flow, &error)) { + VLOG_ERR("rte_flow_destroy failed, Port id %u." + "rte flow destroy error: %u : message : %s", + port_id, error.type, error.message); + } + } + } + rte_eth_dev_close(port_id); +} + +static int +netdev_dpdk_vdpa_port_init(struct netdev_dpdk_vdpa_relay *relay, + struct rte_mempool *mp, + enum netdev_dpdk_vdpa_port_type port_type) +{ + struct rte_eth_dev_info dev_info; + struct rte_eth_txconf txconf; + struct rte_eth_conf conf = { + .rxmode = { + .mq_mode = ETH_MQ_RX_RSS, + }, + .txmode = { + .mq_mode = ETH_MQ_TX_NONE, + }, + }; + uint64_t csum_offloads, tso_offloads; + bool csum_support, tso_support; + uint16_t port = (port_type == NETDEV_DPDK_VDPA_PORT_TYPE_VM) ? + relay->port_id_vm : relay->port_id_vf; + uint16_t q; + int err = 0; + + if (!rte_eth_dev_is_valid_port(port)) { + VLOG_ERR("rte_eth_dev_is_valid_port failed, invalid port %d", port); + err = ENODEV; + goto out; + } + if (relay->started) { + rte_eth_dev_stop(port); + relay->started = false; + } + rte_eth_dev_info_get(port, &dev_info); + conf.rxmode.offloads = 0; + + conf.txmode.offloads = 0; + if (port_type == NETDEV_DPDK_VDPA_PORT_TYPE_VF) { + /* enable checksum and TSO for vf */ + csum_offloads = (DEV_TX_OFFLOAD_UDP_CKSUM | + DEV_TX_OFFLOAD_TCP_CKSUM); + tso_offloads = (DEV_TX_OFFLOAD_TCP_TSO | + DEV_TX_OFFLOAD_MULTI_SEGS); + + tso_support = (tso_offloads & dev_info.tx_offload_capa) == + tso_offloads; + csum_support = (csum_offloads & dev_info.tx_offload_capa) == + csum_offloads; + + if ((!tso_support) || (!csum_support)) { + VLOG_ERR("Device %s doesn't support needed features:%s%s", + dev_info.device->name, + tso_support ? "":" TSO offloads", + csum_support ? "":" checksum offloads"); + err = EINVAL; + goto out; + } + + conf.txmode.offloads |= (csum_offloads | tso_offloads); + } + + err = rte_eth_dev_configure(port, relay->num_queues, relay->num_queues, + &conf); + if (err < 0) { + VLOG_ERR("rte_eth_dev_configure failed for port %d", port); + goto out; + } + for (q = 0; q < relay->num_queues; q++) { + err = rte_eth_rx_queue_setup(port, q, + NETDEV_DPDK_VDPA_RX_DESC_DEFAULT, + rte_eth_dev_socket_id(port), + NULL, mp); + if (err) { + VLOG_ERR("rte_eth_rx_queue_setup failed for port %d, error %d", + port, err); + goto dev_close; + } + } + txconf = dev_info.default_txconf; + txconf.offloads = conf.txmode.offloads; + for (q = 0; q < relay->num_queues; q++) { + err = rte_eth_tx_queue_setup(port, q, + NETDEV_DPDK_VDPA_RX_DESC_DEFAULT, + rte_eth_dev_socket_id(port), + &txconf); + if (err < 0) { + VLOG_ERR("rte_eth_tx_queue_setup failed for port %d, error %d", + port, err); + goto out; + } + } + + if (port_type == NETDEV_DPDK_VDPA_PORT_TYPE_VM) { + err = netdev_dpdk_vdpa_queue_state(relay, port); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_queue_state failed for port %u", port); + goto dev_close; + } + } + + err = rte_eth_dev_callback_register(port, RTE_ETH_EVENT_QUEUE_STATE, + netdev_dpdk_vdpa_queue_state_cb_fn, + relay); + if (err < 0) { + VLOG_ERR("rte_eth_dev_callback_register failed," + "event QUEUE_STATE error %d", err); + goto dev_close; + } + + err = rte_eth_dev_callback_register(port, RTE_ETH_EVENT_INTR_LSC, + netdev_dpdk_vdpa_link_status_cb_fn, + relay); + if (err < 0) { + VLOG_ERR("rte_eth_dev_callback_register failed," + "event INTR_LSC error %d", err); + goto dev_close; + } + + err = rte_eth_dev_start(port); + if (err < 0) { + VLOG_ERR("rte_eth_dev_start failed for port %d", port); + goto dev_close; + } + relay->started = true; + goto out; + +dev_close: + if (relay->started == true) { + rte_eth_dev_stop(port); + relay->started = false; + } + rte_eth_dev_close(port); +out: + return err; +} + +static void +netdev_dpdk_vdpa_parse_pkt(struct rte_mbuf *m, uint16_t mtu) +{ + const struct ovs_16aligned_ip6_frag *frag_hdr; + const struct ovs_16aligned_ip6_hdr *ipv6; + const struct vlan_header *vlan; + const struct eth_header *eth; + const struct ip_header *ipv4; + const struct tcp_header *tcp; + uint8_t nw_frag = 0; + uint8_t l4_proto_id; + uint64_t ol_flags; + const void *data; + uint32_t l2_len; + uint32_t l3_len; + uint32_t l4_len; + ovs_be16 proto; + size_t size; + + eth = rte_pktmbuf_mtod(m, const struct eth_header *); + l2_len = sizeof *eth; + vlan = (struct vlan_header *)(eth + 1); + proto = eth->eth_type; + + while (eth_type_vlan(proto)) { + l2_len += sizeof *vlan; + proto = vlan->vlan_next_type; + vlan++; + } + + if ((rte_pktmbuf_pkt_len(m) - l2_len) <= mtu) { + return; + } + + switch (ntohs(proto)) { + case ETH_TYPE_IP: + ipv4 = (const struct ip_header *)vlan; + l3_len = (ipv4->ip_ihl_ver & 0x0f) << 2; + l4_proto_id = ipv4->ip_proto; + if (l4_proto_id == IPPROTO_TCP) { + tcp = (const struct tcp_header *)((char *)ipv4 + l3_len); + l4_len = TCP_OFFSET(tcp->tcp_ctl) * 4; + ol_flags = (PKT_TX_IPV4 | PKT_TX_IP_CKSUM); + } + break; + case ETH_TYPE_IPV6: + ipv6 = (const struct ovs_16aligned_ip6_hdr *)vlan; + data = ipv6 + 1; + size = rte_pktmbuf_data_len(m) - l2_len - sizeof *ipv6; + l4_proto_id = ipv6->ip6_nxt; + if (!parse_ipv6_ext_hdrs(&data, &size, &l4_proto_id, &nw_frag, + &frag_hdr) || nw_frag) { + return; + } + l3_len = (char *)data - (char *)ipv6; + if (l4_proto_id == IPPROTO_TCP) { + tcp = (const struct tcp_header *)data; + l4_len = TCP_OFFSET(tcp->tcp_ctl) * 4; + ol_flags = PKT_TX_IPV6; + } + break; + default: + return; + } + + if (l4_proto_id == IPPROTO_TCP) { + ol_flags |= (PKT_TX_TCP_SEG | PKT_TX_TCP_CKSUM); + m->l2_len = l2_len; + m->l3_len = l3_len; + m->l4_len = l4_len; + m->ol_flags = ol_flags; + m->tso_segsz = mtu - l3_len - l4_len; + } +} + +static int +netdev_dpdk_vdpa_forward_traffic(struct netdev_dpdk_vdpa_qpair *qpair, + uint16_t queue_id, uint16_t mtu) +{ + bool tx_vf = (queue_id & 1) ? true : false; + uint16_t num_rx_packets = 0; + uint8_t buffered_packets; + uint16_t num_tx_packets; + uint8_t num_packets; + uint32_t fwd_rx = 0; + int i; + + queue_id = queue_id >> 1; + buffered_packets = qpair->mb_tail - qpair->mb_head; + num_packets = buffered_packets; + if (buffered_packets >= NETDEV_MAX_BURST) { + goto send; + } + + /* Allocate 2 * NETDEV_MAX_BURST packets entries to always allow a full + * RX burst while not dropping any pending packets. + * Move pending packets to the head of the array when there are not enough + * consecutive entries for a full RX burst. + */ + if (unlikely(qpair->mb_tail > NETDEV_MAX_BURST)) { + rte_memcpy(&qpair->pkts[0], &qpair->pkts[qpair->mb_head], + num_packets * NETDEV_DPDK_VDPA_SIZEOF_MBUF); + qpair->mb_head = 0; + qpair->mb_tail = num_packets; + } + + num_rx_packets = rte_eth_rx_burst(qpair->port_id_rx, queue_id, + qpair->pkts + qpair->mb_tail, + NETDEV_MAX_BURST); + qpair->mb_tail += num_rx_packets; + num_packets += num_rx_packets; + fwd_rx += num_rx_packets; + +send: + if (tx_vf) { + for (i = buffered_packets; i < num_packets; i++) { + netdev_dpdk_vdpa_parse_pkt(qpair->pkts[qpair->mb_head + i], mtu); + } + } + /* It is preferred to send a full burst of packets. + * Send a partial burst only if no new packets were received during the + * current poll iteration */ + if (((num_rx_packets == 0) && (num_packets > 0)) || + (num_packets > NETDEV_MAX_BURST)) { + num_packets = MIN(num_packets,NETDEV_MAX_BURST); + } else if (num_packets < NETDEV_MAX_BURST) { + goto out; + } + + num_tx_packets = rte_eth_tx_burst(qpair->port_id_tx, queue_id, + qpair->pkts + qpair->mb_head, + num_packets); + qpair->mb_head += num_tx_packets; + if (likely(qpair->mb_head == qpair->mb_tail)) { + qpair->mb_head = 0; + qpair->mb_tail = 0; + } +out: + return fwd_rx; +} + +void * +netdev_dpdk_vdpa_alloc_relay(void) +{ + return rte_zmalloc("ovs_dpdk", + sizeof(struct netdev_dpdk_vdpa_relay), + CACHE_LINE_SIZE); +} + +int +netdev_dpdk_vdpa_rxq_recv_impl(struct netdev_dpdk_vdpa_relay *relay, + int pr_queue) +{ + uint32_t fwd_rx = 0; + uint16_t q; + + /* Apply the multi core distribution policy by receiving only from queues + * that are associated with the current port representor's queue. */ + for (q = 0; q < (relay->num_queues * 2); q++) { + if (relay->qpair[q].pr_queue == pr_queue) { + fwd_rx += netdev_dpdk_vdpa_forward_traffic(&relay->qpair[q], q, + relay->vf_mtu); + } else if (relay->qpair[q].pr_queue == + NETDEV_DPDK_VDPA_INVALID_QUEUE_ID) { + break; + } + } + return fwd_rx; +} + +int +netdev_dpdk_vdpa_config_impl(struct netdev_dpdk_vdpa_relay *relay, + uint16_t port_id, + const char *vm_socket, + const char *vf_pci, + int max_queues) +{ + char *vhost_args; + uint16_t q; + int err = 0; + + /* if fwd_config already been done, don't run it again */ + if (relay->vf_pci) { + goto out; + } + + if (max_queues < 0) { + max_queues = NETDEV_DPDK_VDPA_MAX_QPAIRS; + } + + relay->vf_pci = xstrdup(vf_pci); + relay->vm_socket = xstrdup(vm_socket); + relay->vhost_name = xasprintf("net_vhost%d",port_id); + vhost_args = xasprintf("iface=%s,queues=%d,client=1", + relay->vm_socket, max_queues); + + /* create virtio vdev:*/ + err = rte_eal_hotplug_add("vdev", relay->vhost_name, vhost_args); + if (err) { + VLOG_ERR("rte_eal_hotplug_add failed for vdev, socket %s", + relay->vm_socket); + goto err_clear_relay; + } + relay->port_id_vm = netdev_dpdk_vdpa_port_from_name(relay->vhost_name); + if (relay->port_id_vm < 0) { + VLOG_ERR("No port id was found for vm %s", relay->vhost_name); + err = ENODEV; + goto err_clear_vdev; + } + + /* create vf:*/ + err = rte_eal_hotplug_add("pci", relay->vf_pci, ""); + if (err) { + VLOG_ERR("rte_eal_hotplug_add failed for pci %s", relay->vf_pci); + goto err_clear_vdev; + } + relay->port_id_vf = netdev_dpdk_vdpa_port_from_name(relay->vf_pci); + if (relay->port_id_vf < 0) { + VLOG_ERR("No port id was found for vf %s", relay->vf_pci); + err = ENODEV; + goto err_clear_vf; + } + + relay->num_queues = max_queues; + relay->flow_params.priority = 0; + relay->flow_params.flow = NULL; + memset(relay->flow_params.queues_en, false, + sizeof(bool) * RTE_MAX_QUEUES_PER_PORT); + + for (q = 0; q < (relay->num_queues * 2); q++) { + relay->qpair[q].pr_queue = NETDEV_DPDK_VDPA_INVALID_QUEUE_ID; + if (q & 1) { + relay->qpair[q].port_id_rx = relay->port_id_vm; + relay->qpair[q].port_id_tx = relay->port_id_vf; + } else { + relay->qpair[q].port_id_rx = relay->port_id_vf; + relay->qpair[q].port_id_tx = relay->port_id_vm; + } + relay->qpair[q].mb_head = 0; + relay->qpair[q].mb_tail = 0; + } + + goto out_clear; + +err_clear_vf: + rte_eal_hotplug_remove("pci", relay->vf_pci); +err_clear_vdev: + rte_eal_hotplug_remove("vdev", relay->vhost_name); +err_clear_relay: + netdev_dpdk_vdpa_clear_relay(relay); +out_clear: + netdev_dpdk_vdpa_free(vhost_args); +out: + return err; +} + +int +netdev_dpdk_vdpa_update_relay(struct netdev_dpdk_vdpa_relay *relay, + struct rte_mempool *mp, + int n_rxq) +{ + uint16_t mtu; + int err = 0; + + err = rte_eth_dev_get_mtu(relay->port_id_vf, &mtu); + if (err < 0) { + mtu = RTE_ETHER_MTU; + err = 0; + } + relay->vf_mtu = mtu; + relay->n_rxq = n_rxq; + + /* port init vf */ + err = netdev_dpdk_vdpa_port_init(relay, mp, + NETDEV_DPDK_VDPA_PORT_TYPE_VF); + if (err) { + VLOG_ERR("port_init failed for port_id %u", relay->port_id_vf); + goto clear_relay; + } + + /* port init vm */ + err = netdev_dpdk_vdpa_port_init(relay, mp, + NETDEV_DPDK_VDPA_PORT_TYPE_VM); + if (err) { + VLOG_ERR("port_init failed for port_id %u", relay->port_id_vm); + goto vf_close; + } + + goto out; + +vf_close: + rte_eth_dev_stop(relay->port_id_vf); + rte_eth_dev_close(relay->port_id_vf); +clear_relay: + rte_eal_hotplug_remove("pci", relay->vf_pci); + rte_eal_hotplug_remove("vdev", relay->vhost_name); + netdev_dpdk_vdpa_clear_relay(relay); +out: + return err; +} + +void +netdev_dpdk_vdpa_destruct_impl(struct netdev_dpdk_vdpa_relay *relay) +{ + if (!(rte_eth_dev_is_valid_port(relay->port_id_vm))) { + goto destruct_vf; + } + netdev_dpdk_vdpa_close_dev(relay, relay->port_id_vm); + rte_eal_hotplug_remove("vdev", relay->vhost_name); + +destruct_vf: + if (!(rte_eth_dev_is_valid_port(relay->port_id_vf))) { + goto out; + } + netdev_dpdk_vdpa_close_dev(relay, relay->port_id_vf); + rte_eal_hotplug_remove("pci", relay->vf_pci); + +out: + netdev_dpdk_vdpa_clear_relay(relay); + netdev_dpdk_vdpa_free_relay_strings(relay); +} + +int +netdev_dpdk_vdpa_get_custom_stats_impl(struct netdev_dpdk_vdpa_relay *relay, + struct netdev_custom_stats *cstm_stats) +{ + enum stats_vals { + VDPA_CUSTOM_STATS_VM_PACKETS, + VDPA_CUSTOM_STATS_VM_BYTES, + VDPA_CUSTOM_STATS_VF_PACKETS, + VDPA_CUSTOM_STATS_VF_BYTES, + VDPA_CUSTOM_STATS_TOTAL_SIZE + }; + const char *stats_names[] = { + [VDPA_CUSTOM_STATS_VM_PACKETS] = "VM packets", + [VDPA_CUSTOM_STATS_VM_BYTES] = "VM bytes", + [VDPA_CUSTOM_STATS_VF_PACKETS] = "VF packets", + [VDPA_CUSTOM_STATS_VF_BYTES] = "VF bytes" + }; + struct rte_eth_stats rte_stats; + uint16_t i; + + cstm_stats->size = VDPA_CUSTOM_STATS_TOTAL_SIZE; + cstm_stats->counters = xcalloc(cstm_stats->size, + sizeof *cstm_stats->counters); + + for (i = 0; i < cstm_stats->size; i++) { + ovs_strlcpy(cstm_stats->counters[i].name, stats_names[i], + NETDEV_CUSTOM_STATS_NAME_SIZE); + } + + if (rte_eth_stats_get(relay->port_id_vm, &rte_stats)) { + VLOG_ERR("rte_eth_stats_get failed." + "Can't get ETH statistics for port id %u", + relay->port_id_vm); + return EPROTO; + } + cstm_stats->counters[VDPA_CUSTOM_STATS_VM_PACKETS].value = + rte_stats.opackets; + cstm_stats->counters[VDPA_CUSTOM_STATS_VM_BYTES].value = rte_stats.obytes; + + if (rte_eth_stats_get(relay->port_id_vf, &rte_stats)) { + VLOG_ERR("rte_eth_stats_get failed." + "Can't get ETH statistics for port id %u", + relay->port_id_vf); + return EPROTO; + } + cstm_stats->counters[VDPA_CUSTOM_STATS_VF_PACKETS].value = + rte_stats.opackets; + cstm_stats->counters[VDPA_CUSTOM_STATS_VF_BYTES].value = rte_stats.obytes; + + return 0; +} + diff --git a/lib/netdev-dpdk-vdpa.h b/lib/netdev-dpdk-vdpa.h new file mode 100644 index 0000000..accb95b --- /dev/null +++ b/lib/netdev-dpdk-vdpa.h @@ -0,0 +1,55 @@ +/* + * Copyright (c) 2019 Mellanox Technologies, Ltd. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_DPDK_VDPA_H +#define NETDEV_DPDK_VDPA_H 1 + +#include "netdev.h" + +struct netdev_dpdk_vdpa_relay; +struct rte_mempool; + +/* + * Functions that implement the relay forwarding for the netdev dpdkvdpa + * which is defined and implemented in netdev-dpdk. + * Each relay is associated with a port representor, which is a regular + * dpdk netdev. The port representor is the calling context to the relay's + * rx_recv function. The idle cycles of the port represntor's rx_recv are + * used to forward packets between vf to vm and vice versa. + */ + +void * +netdev_dpdk_vdpa_alloc_relay(void); +int +netdev_dpdk_vdpa_update_relay(struct netdev_dpdk_vdpa_relay *relay, + struct rte_mempool *mp, + int n_rxq); +int +netdev_dpdk_vdpa_rxq_recv_impl(struct netdev_dpdk_vdpa_relay *relay, + int pr_queue); +int +netdev_dpdk_vdpa_config_impl(struct netdev_dpdk_vdpa_relay *relay, + uint16_t port_id, + const char *vm_socket, + const char *vf_pci, + int max_queues); +void +netdev_dpdk_vdpa_destruct_impl(struct netdev_dpdk_vdpa_relay *relay); +int +netdev_dpdk_vdpa_get_custom_stats_impl(struct netdev_dpdk_vdpa_relay *relay, + struct netdev_custom_stats *cstm_stats); + +#endif /* netdev-dpdk-vdpa.h */ From patchwork Thu Apr 2 11:13:47 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Noa Ezra X-Patchwork-Id: 1265525 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=hemlock.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=mellanox.com Received: from hemlock.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 48tL6n5c4Fz9sQt for ; Thu, 2 Apr 2020 22:14:09 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by hemlock.osuosl.org (Postfix) with ESMTP id 1160D887B3; Thu, 2 Apr 2020 11:14:08 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from hemlock.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wcGqeTOd6Bn5; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by hemlock.osuosl.org (Postfix) with ESMTP id 12DAA887EF; Thu, 2 Apr 2020 11:14:03 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id EEFFFC1D7F; Thu, 2 Apr 2020 11:14:02 +0000 (UTC) X-Original-To: ovs-dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from silver.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id CAC98C07FF for ; Thu, 2 Apr 2020 11:14:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by silver.osuosl.org (Postfix) with ESMTP id 9A525203AE for ; Thu, 2 Apr 2020 11:14:01 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from silver.osuosl.org ([127.0.0.1]) by localhost (.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VpotDBnXXzrR for ; Thu, 2 Apr 2020 11:13:59 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.7.6 Received: from mellanox.co.il (mail-il-dmz.mellanox.com [193.47.165.129]) by silver.osuosl.org (Postfix) with ESMTP id 15507203A3 for ; Thu, 2 Apr 2020 11:13:58 +0000 (UTC) Received: from Internal Mail-Server by MTLPINE1 (envelope-from noae@mellanox.com) with ESMTPS (AES256-SHA encrypted); 2 Apr 2020 14:13:50 +0300 Received: from pegasus27.mtr.labs.mlnx. (pegasus27.mtr.labs.mlnx [10.210.16.14]) by labmailer.mlnx (8.13.8/8.13.8) with ESMTP id 032BDoaQ010235; Thu, 2 Apr 2020 14:13:50 +0300 From: Noa Ezra To: ovs-dev@openvswitch.org Date: Thu, 2 Apr 2020 11:13:47 +0000 Message-Id: <1585826027-330699-3-git-send-email-noae@mellanox.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1585826027-330699-1-git-send-email-noae@mellanox.com> References: <1585826027-330699-1-git-send-email-noae@mellanox.com> Cc: elibr@mellanox.com, Ameer Mahagneh , Noa Ezra Subject: [ovs-dev] [PATCH ovs v1 2/2] netdev-dpdk: Add dpdkvdpa port X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" dpdkvdpa netdev works with 3 components: vhost-user socket, vdpa device: real vdpa device or a VF and representor of "vdpa device". In order to add a new vDPA port, add a new port to existing bridge with type dpdkvdpa and vDPA options: ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa options:vdpa-socket-path= options:vdpa-accelerator-devargs= options:dpdk-devargs=,representor=[id] On this command OVS will create a new netdev: 1. Register vhost-user-client device. 2. Open and configure VF dpdk port. 3. Open and configure representor dpdk port. The new netdev will use netdev_rxq_recv() function in order to receive packets from VF and push to vhost-user and receive packets from vhost-user and push to VF. Signed-off-by: Noa Ezra Reviewed-by: Oz Shlomo --- Documentation/automake.mk | 1 + Documentation/topics/dpdk/index.rst | 1 + Documentation/topics/dpdk/vdpa.rst | 90 ++++++++++++++++++++ NEWS | 1 + lib/netdev-dpdk.c | 164 +++++++++++++++++++++++++++++++++++- vswitchd/vswitch.xml | 25 ++++++ 6 files changed, 281 insertions(+), 1 deletion(-) create mode 100644 Documentation/topics/dpdk/vdpa.rst diff --git a/Documentation/automake.mk b/Documentation/automake.mk index f85c432..7caf6e7 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -41,6 +41,7 @@ DOC_SOURCE = \ Documentation/topics/dpdk/qos.rst \ Documentation/topics/dpdk/vdev.rst \ Documentation/topics/dpdk/vhost-user.rst \ + Documentation/topics/dpdk/vdpa.rst \ Documentation/topics/fuzzing/index.rst \ Documentation/topics/fuzzing/what-is-fuzzing.rst \ Documentation/topics/fuzzing/ovs-fuzzing-infrastructure.rst \ diff --git a/Documentation/topics/dpdk/index.rst b/Documentation/topics/dpdk/index.rst index a5be5e3..e8595c3 100644 --- a/Documentation/topics/dpdk/index.rst +++ b/Documentation/topics/dpdk/index.rst @@ -39,3 +39,4 @@ DPDK Support /topics/dpdk/qos /topics/dpdk/jumbo-frames /topics/dpdk/memory + /topics/dpdk/vdpa diff --git a/Documentation/topics/dpdk/vdpa.rst b/Documentation/topics/dpdk/vdpa.rst new file mode 100644 index 0000000..34c5300 --- /dev/null +++ b/Documentation/topics/dpdk/vdpa.rst @@ -0,0 +1,90 @@ +.. + Copyright (c) 2019 Mellanox Technologies, Ltd. + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at: + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + + +=============== +DPDK VDPA Ports +=============== + +In user space there are two main approaches to communicate with a guest (VM), +using virtIO ports (e.g. netdev type=dpdkvhoshuser/dpdkvhostuserclient) or +SR-IOV using phy ports (e.g. netdev type = dpdk). +Phy ports allow working with port representor which is attached to the OVS and +a matching VF is given with pass-through to the guest. +HW rules can process packets from up-link and direct them to the VF without +going through SW (OVS) and therefore using phy ports gives the best +performance. +However, SR-IOV architecture requires that the guest will use a driver which is +specific to the underlying HW. Specific HW driver has two main drawbacks: +1. Breaks virtualization in some sense (guest aware of the HW), can also limit +the type of images supported. +2. Less natural support for live migration. + +Using virtIO port solves both problems, but reduces performance and causes +losing of some functionality, for example, for some HW offload, working +directly with virtIO cannot be supported. + +We created a new netdev type- dpdkvdpa. dpdkvdpa port solves this conflict. +The new netdev is basically very similar to regular dpdk netdev but it has some +additional functionally. +This port translates between phy port to virtIO port, it takes packets from +rx-queue and send them to the suitable tx-queue and allows to transfer packets +from virtIO guest (VM) to a VF and vice versa and benefit both SR-IOV and +virtIO. + +Quick Example +------------- + +Configure OVS bridge and ports +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +you must first create a bridge and add ports to the switch. +Since the dpdkvdpa port is configured as a client, the vdpa-socket-path must be +configured by the user. +VHOST_USER_SOCKET_PATH=/path/to/socket + + $ ovs-vsctl add-br br0-ovs -- set bridge br0-ovs datapath_type=netdev + $ ovs-vsctl add-port br0-ovs pf -- set Interface pf \ + type=dpdk options:dpdk-devargs= + $ ovs-vsctl add-port br0 vdpa0 -- set Interface vdpa0 type=dpdkvdpa \ + options:vdpa-socket-path=VHOST_USER_SOCKET_PATH \ + options:vdpa-accelerator-devargs= \ + options:dpdk-devargs=,representor=[id] + +Once the ports have been added to the switch, they must be added to the guest. + +Adding vhost-user ports to the guest (QEMU) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Attach the vhost-user device sockets to the guest. To do this, you must pass +the following parameters to QEMU: + + -chardev socket,id=char1,path=$VHOST_USER_SOCKET_PATH,server + -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce + -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 + +QEMU will wait until the port is created successfully in OVS to boot the VM. +In this mode, in case the switch will crash, the vHost ports will reconnect +automatically once it is brought back. diff --git a/NEWS b/NEWS index 70bd175..79ed080 100644 --- a/NEWS +++ b/NEWS @@ -45,6 +45,7 @@ v2.13.0 - 14 Feb 2020 * Add hardware offload support for output, drop, set of MAC, IPv4 and TCP/UDP ports actions (experimental). * Add experimental support for TSO. + * 'dpdkvdpa' port type. - RSTP: * The rstp_statistics column in Port table will only be updated every stats-update-interval configured in Open_vSwitch table. diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 44ebf96..ce7ed7e 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -54,6 +54,7 @@ #include "fatal-signal.h" #include "if-notifier.h" #include "netdev-provider.h" +#include "netdev-dpdk-vdpa.h" #include "netdev-vport.h" #include "odp-util.h" #include "openvswitch/dynamic-string.h" @@ -532,6 +533,8 @@ struct netdev_dpdk { int rte_xstats_ids_size; uint64_t *rte_xstats_ids; ); + + struct netdev_dpdk_vdpa_relay *relay; }; struct netdev_rxq_dpdk { @@ -541,6 +544,7 @@ struct netdev_rxq_dpdk { static void netdev_dpdk_destruct(struct netdev *netdev); static void netdev_dpdk_vhost_destruct(struct netdev *netdev); +static void netdev_dpdk_vdpa_destruct(struct netdev *netdev); static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, struct netdev_custom_stats *); @@ -555,7 +559,8 @@ static bool is_dpdk_class(const struct netdev_class *class) { return class->destruct == netdev_dpdk_destruct - || class->destruct == netdev_dpdk_vhost_destruct; + || class->destruct == netdev_dpdk_vhost_destruct + || class->destruct == netdev_dpdk_vdpa_destruct; } /* DPDK NIC drivers allocate RX buffers at a particular granularity, typically @@ -1432,6 +1437,30 @@ netdev_dpdk_construct(struct netdev *netdev) return err; } +static int +netdev_dpdk_vdpa_construct(struct netdev *netdev) +{ + struct netdev_dpdk *dev; + int err; + + err = netdev_dpdk_construct(netdev); + if (err) { + VLOG_ERR("netdev_dpdk_construct failed. Port: %s\n", netdev->name); + goto out; + } + + ovs_mutex_lock(&dpdk_mutex); + dev = netdev_dpdk_cast(netdev); + dev->relay = netdev_dpdk_vdpa_alloc_relay(); + if (!dev->relay) { + err = ENOMEM; + } + + ovs_mutex_unlock(&dpdk_mutex); +out: + return err; +} + static void common_destruct(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) @@ -1515,6 +1544,19 @@ dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED, } static void +netdev_dpdk_vdpa_destruct(struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dpdk_mutex); + netdev_dpdk_vdpa_destruct_impl(dev->relay); + rte_free(dev->relay); + ovs_mutex_unlock(&dpdk_mutex); + + netdev_dpdk_destruct(netdev); +} + +static void netdev_dpdk_vhost_destruct(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -2018,6 +2060,50 @@ out: } static int + +netdev_dpdk_vdpa_set_config(struct netdev *netdev, const struct smap *args, + char **errp) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + const char *vdpa_accelerator_devargs = + smap_get(args, "vdpa-accelerator-devargs"); + const char *vdpa_socket_path = + smap_get(args, "vdpa-socket-path"); + int vdpa_max_queues = smap_get_int(args, "vdpa-max-queues", -1); + int err = 0; + + if ((vdpa_accelerator_devargs == NULL) || (vdpa_socket_path == NULL)) { + VLOG_ERR("netdev_dpdk_vdpa_set_config failed." + "Required arguments are missing for VDPA port %s", + netdev->name); + goto free_relay; + } + + err = netdev_dpdk_set_config(netdev, args, errp); + if (err) { + VLOG_ERR("netdev_dpdk_set_config failed. Port: %s", netdev->name); + goto free_relay; + } + + err = netdev_dpdk_vdpa_config_impl(dev->relay, dev->port_id, + vdpa_socket_path, + vdpa_accelerator_devargs, + vdpa_max_queues); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_config_impl failed. Port %s", + netdev->name); + goto free_relay; + } + + goto out; + +free_relay: + rte_free(dev->relay); +out: + return err; +} + +static int netdev_dpdk_vhost_client_set_config(struct netdev *netdev, const struct smap *args, char **errp OVS_UNUSED) @@ -2479,6 +2565,23 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, return 0; } +static int +netdev_dpdk_vdpa_rxq_recv(struct netdev_rxq *rxq, + struct dp_packet_batch *batch, + int *qfill) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + int fwd_rx; + int ret; + + fwd_rx = netdev_dpdk_vdpa_rxq_recv_impl(dev->relay, rxq->queue_id); + ret = netdev_dpdk_rxq_recv(rxq, batch, qfill); + if ((ret == EAGAIN) && fwd_rx) { + return 0; + } + return ret; +} + static inline int netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, int cnt, bool should_steal) @@ -3244,6 +3347,26 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev, } static int +netdev_dpdk_vdpa_get_custom_stats(const struct netdev *netdev, + struct netdev_custom_stats *custom_stats) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err = 0; + + ovs_mutex_lock(&dev->mutex); + + err = netdev_dpdk_vdpa_get_custom_stats_impl(dev->relay, + custom_stats); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_get_custom_stats_impl failed." + "Port %s\n", netdev->name); + } + + ovs_mutex_unlock(&dev->mutex); + return err; +} + +static int netdev_dpdk_get_features(const struct netdev *netdev, enum netdev_features *current, enum netdev_features *advertised, @@ -5022,6 +5145,31 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev) } static int +netdev_dpdk_vdpa_reconfigure(struct netdev *netdev) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err; + + err = netdev_dpdk_reconfigure(netdev); + if (err) { + VLOG_ERR("netdev_dpdk_reconfigure failed. Port %s", netdev->name); + goto out; + } + + ovs_mutex_lock(&dev->mutex); + err = netdev_dpdk_vdpa_update_relay(dev->relay, dev->dpdk_mp->mp, + dev->up.n_rxq); + if (err) { + VLOG_ERR("netdev_dpdk_vdpa_update_relay failed. Port %s", + netdev->name); + } + + ovs_mutex_unlock(&dev->mutex); +out: + return err; +} + +static int netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); @@ -5310,10 +5458,24 @@ static const struct netdev_class dpdk_vhost_client_class = { .rxq_enabled = netdev_dpdk_vhost_rxq_enabled, }; +static const struct netdev_class dpdk_vdpa_class = { + .type = "dpdkvdpa", + NETDEV_DPDK_CLASS_COMMON, + .construct = netdev_dpdk_vdpa_construct, + .destruct = netdev_dpdk_vdpa_destruct, + .rxq_recv = netdev_dpdk_vdpa_rxq_recv, + .set_config = netdev_dpdk_vdpa_set_config, + .reconfigure = netdev_dpdk_vdpa_reconfigure, + .get_stats = netdev_dpdk_get_stats, + .get_custom_stats = netdev_dpdk_vdpa_get_custom_stats, + .send = netdev_dpdk_eth_send +}; + void netdev_dpdk_register(void) { netdev_register_provider(&dpdk_class); netdev_register_provider(&dpdk_vhost_class); netdev_register_provider(&dpdk_vhost_client_class); + netdev_register_provider(&dpdk_vdpa_class); } diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index f9339af..e7715f5 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -2671,6 +2671,13 @@

+
dpdkvdpa
+
+ The dpdk vDPA port allows forwarding bi-directional traffic between + SR-IOV virtual functions (VFs) and VirtIO devices in virtual + machines (VMs). +
+ @@ -3219,6 +3226,24 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \

+ +

+ The value specifies the path to the socket associated with a VDPA + port that will be created by QEMU. + Only supported by dpdkvdpa interfaces. +

+
+ + +

+ The value specifies the PCI address associated with the virtual + function. + Only supported by dpdkvdpa interfaces. +

+
+