From patchwork Wed Apr 1 09:13:08 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218450 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=uIGQjV1C; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzpn41xdz1yGH for ; Wed, 01 Apr 2026 20:15:13 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id D92F340862; Wed, 1 Apr 2026 09:15:06 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id hu6_OMAShtpO; Wed, 1 Apr 2026 09:15:05 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 7DF3240873 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=uIGQjV1C Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 7DF3240873; Wed, 1 Apr 2026 09:15:05 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 5C82BC0070; Wed, 1 Apr 2026 09:15:05 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 03D50C003D for ; Wed, 1 Apr 2026 09:15:05 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 4B3B040862 for ; Wed, 1 Apr 2026 09:14:54 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id RaFWWJyYitCQ for ; Wed, 1 Apr 2026 09:14:53 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c105::7; helo=ch4pr04cu002.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 13E8F4084C Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 13E8F4084C Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazlp170130007.outbound.protection.outlook.com [IPv6:2a01:111:f403:c105::7]) by smtp2.osuosl.org (Postfix) with ESMTPS id 13E8F4084C for ; Wed, 1 Apr 2026 09:14:52 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Efm+jrq4L7eax5ZQwnoKqdjfd1p7dLqKEfxAuqtmceigKLFUFSAXg5V9xjHwqjJ5kU6IsSxua/7MsfRDVzE3O5pV39f79Un9vn60/Of7SBSrKl8aZMK/NEFecut6bQW2RdE14N4gSf0+MGCEUb1FcODzowRczr0t0fp84UQRuhw7c9QWthbryOzKekDEjt/NCybeYKkalRY5gXRC3s4qt0WvgvBvhWNOcN/bJpvWUD2+WdzhihDZGlV9NF6y93U7EKHSsnWmRW7OK5AduWTkAl66JWSfFFLWtsJdVTUvPADHwcME4W/RUfJTzJkeZPO6CfClXpEhL1xov0aE7ZGJHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=i6NbXI/09akD65mFf5bYBBfoswTg0BTeVgwSDt2f26c=; b=fAKcNcnaWMZALLH/3P7Dd+kuEYI4jfH1Yo3EcagCmd1ccACrdGlqP7fgW1O9bSHom4oC/FKRz7Oc/txlAkqjrtIbgkYHyZSMPE7kM5BPdNgoe9pCvsL066i6H7mWL+4kwezcoxJS9u/q3JoQCAXCi/6xksp1iE/HkgUQnGGDEdvBgTk3n9fS48dIJDYEXahwYzFdeSnVpJqlfUgejeiCuEiPAqMWPltqBFHJJXIcxH+PH/aI8Z3vO1Aig7zAac2iODdsXo4weUFqvodoiF3p9QGwI/tI7izm6AyYbVboxSjr55k78ORt+sym0DhoJJSPeRQJhP42MTC5Wn78REecTA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=i6NbXI/09akD65mFf5bYBBfoswTg0BTeVgwSDt2f26c=; b=uIGQjV1Cpy4ehbgfAFkRGUdjq9sFDV+n2MqQkEpIh7ioIhNcdZn/JqxZynhzgCZKqYHLwAn8eqspJMct3Ui3pSzbPHo4NXuQfGgY2Pac7lzG8vTVOrq6q5g5GMysXk20j/5kqosxjxq+sgEZsVr7zqwyetI0xHb+iD/coX+GgO5ymgBhAU2o5JU/1wsvKmklKJ7xt2Q3lLBO6lmD8uQRFaH4OPcT9Vil2EQVxYcgXBkhK6jHoq835GluJu5FzHFtlqeLifx+10eOS8b5+N/uyTJAt/5RdDEIZnLidcGNPlHtlPPtUk5PAYAElN/jOxypqymYtlHSv2eWDEEeG2+m5w== Received: from PH7PR02CA0019.namprd02.prod.outlook.com (2603:10b6:510:33d::25) by BY5PR12MB4068.namprd12.prod.outlook.com (2603:10b6:a03:203::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 09:14:44 +0000 Received: from SA2PEPF000015C8.namprd03.prod.outlook.com (2603:10b6:510:33d:cafe::f6) by PH7PR02CA0019.outlook.office365.com (2603:10b6:510:33d::25) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.29 via Frontend Transport; Wed, 1 Apr 2026 09:14:33 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015C8.mail.protection.outlook.com (10.167.241.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:14:44 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:28 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:25 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:08 +0300 Message-ID: <20260401091318.2671624-2-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015C8:EE_|BY5PR12MB4068:EE_ X-MS-Office365-Filtering-Correlation-Id: f96aba90-0d28-425c-0126-08de8fcf1d6c X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|82310400026|376014|36860700016|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: pYfGJ6n/twdur+zfQPwu2FPP6RI20egkrxrXgSolUUodVXYCc+3aqBWwU1rPsb8PvCWor3Xe7ldzR2QbDUQcmgtD2KVdtdjR/hftYObwpt8UZurUTroC66TVVflEp6FvpT+uFPouF9rKTucyLEp78aI5x49FKeVHZciEQFN2MVtHUC7hJ9l28unE8PtQT0In/ZFsG6ciUtmmMLlgV9nzREg8ZHwJP4h1CIfkhmBV+cThTzPiKZoVR0m6p55IJNBMRQmUdTP91W370sGahWWpcB6ujh1Ql4+iWrEHimdSVPtSero+fB3beJNQ8cItpKL+fLy9o10VWeOh3O7rAykKK6mO55Fu+Y+R1risPL26h9H+p8awqD4OCpbdwm0cEn3YQ2gnUAld+xuimyI5uYu8ZWbgO0oC1vufudk4eWKAZGAu5jVss2pzCF6/hhl4S9reO3efURqtcueK31VgwjDDyvTqLudU2aIUJSYyyCs2FS6hAig7PlDdLa7c6nkzKL9ftUCmg3upM4rylrKNgf+3xJDK3h2eSPFHJGguk6w60MnZSWWV5Nm0fHDSpK/CvVvneCogrXxNKN55xZOzRn2XPscUiDk4aXRHQsEKolPgLDhe64I9Fcr0AbJyY1kZ10p8VQ5ID8rl/yRmEzyVlvMcai5ijXZF0LTkznyI7zlBcXPzFrFI0iXj0T3b1F8GZcsmbtfqrMy6EUlZ1rn8ya9V5eP/amaQj2UrKif9mPT/+sKy+tcHmBRYCRNpg1U8qwB9kJSvj30HAeo77bqHnSq8dQ== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(1800799024)(82310400026)(376014)(36860700016)(22082099003)(56012099003)(18002099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: jHQMndCd71VQara+bJMoT57tc82CHq0e+NspU39DkugQ/8mGH9FxUFefH2QXoTh1UY3QX6GbHcpdE4C0B5tXzPLdM3K+MgOHilUh1lmDxJLCRIPTHDfcVmRuRn5cSyCcC5XkEernCNVRbvI8fvKyqtt4++09hl3Lo1H+ntkEKKnY53OjfxvtU1i5XV9jjOyXKfaFFLmP76sYGpDPJuCPn5s+0IKvtC4/NFAc2ze0ibtPmuVBkX90KQTXKEfxU+KlNk/JcyAIR71eFMRE6LAhr1X5TqZW5XBCLG4w/LjXTTfzHJLnOtiUhANqzIEZoXd0sLs4HmKYSYbRz8stBxQQJ2BJr+bSvAPCYPEcNJeDDUn8Rw0pEKLAI2u40nRjmx0ria61M0L+oBhiApQj9PdH1jqceSztiGfcdQKwTFXaWkyiE4nQmjLPWKnPqbhw8leh X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:14:44.6742 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: f96aba90-0d28-425c-0126-08de8fcf1d6c X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015C8.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4068 Subject: [ovs-dev] [PATCH v3 01/11] ovs-rcu: Add support for embedded variant. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Add a way to schedule executions with the RCU using memory embedded within the object being scheduled, if applicable. This way, freeing a high volume of objects does not require many small allocations, potentially increasing heap fragmentation and memory pressure. Signed-off-by: Gaetan Rivet Co-authored-by: Eli Britstein Signed-off-by: Eli Britstein --- lib/guarded-list.c | 10 ++++ lib/guarded-list.h | 2 + lib/ovs-rcu.c | 110 ++++++++++++++++++++++--------------- lib/ovs-rcu.h | 39 ++++++++++++++ tests/test-rcu.c | 131 +++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 249 insertions(+), 43 deletions(-) diff --git a/lib/guarded-list.c b/lib/guarded-list.c index 2186d074e..bb77fb55f 100644 --- a/lib/guarded-list.c +++ b/lib/guarded-list.c @@ -65,6 +65,16 @@ guarded_list_push_back(struct guarded_list *list, return retval; } +void +guarded_list_push_back_all(struct guarded_list *list, + struct ovs_list *nodes, size_t n) +{ + ovs_mutex_lock(&list->mutex); + ovs_list_push_back_all(&list->list, nodes); + list->n += n; + ovs_mutex_unlock(&list->mutex); +} + struct ovs_list * guarded_list_pop_front(struct guarded_list *list) { diff --git a/lib/guarded-list.h b/lib/guarded-list.h index 80ce22c12..b575dc425 100644 --- a/lib/guarded-list.h +++ b/lib/guarded-list.h @@ -40,6 +40,8 @@ bool guarded_list_is_empty(const struct guarded_list *); size_t guarded_list_push_back(struct guarded_list *, struct ovs_list *, size_t max); +void guarded_list_push_back_all(struct guarded_list *, struct ovs_list *, + size_t n); struct ovs_list *guarded_list_pop_front(struct guarded_list *); size_t guarded_list_pop_all(struct guarded_list *, struct ovs_list *); diff --git a/lib/ovs-rcu.c b/lib/ovs-rcu.c index 49afcc55c..54e6c469d 100644 --- a/lib/ovs-rcu.c +++ b/lib/ovs-rcu.c @@ -38,7 +38,7 @@ struct ovsrcu_cb { }; struct ovsrcu_cbset { - struct ovs_list list_node; + struct ovsrcu_node rcu_node; struct ovsrcu_cb *cbs; size_t n_allocated; int n_cbs; @@ -49,6 +49,8 @@ struct ovsrcu_perthread { uint64_t seqno; struct ovsrcu_cbset *cbset; + struct ovs_list pending; /* Thread-local list of ovsrcu_node. */ + size_t n_pending; char name[16]; /* This thread's name. */ }; @@ -58,15 +60,15 @@ static pthread_key_t perthread_key; static struct ovs_list ovsrcu_threads; static struct ovs_mutex ovsrcu_threads_mutex; -static struct guarded_list flushed_cbsets; -static struct seq *flushed_cbsets_seq; +static struct guarded_list flushed_nodes; +static struct seq *flushed_nodes_seq; static struct latch postpone_exit; static struct ovs_barrier postpone_barrier; static void ovsrcu_init_module(void); -static void ovsrcu_flush_cbset__(struct ovsrcu_perthread *, bool); -static void ovsrcu_flush_cbset(struct ovsrcu_perthread *); +static void ovsrcu_flush_nodes__(struct ovsrcu_perthread *, bool); +static void ovsrcu_flush_nodes(struct ovsrcu_perthread *); static void ovsrcu_unregister__(struct ovsrcu_perthread *); static bool ovsrcu_call_postponed(void); static void *ovsrcu_postpone_thread(void *arg OVS_UNUSED); @@ -85,6 +87,8 @@ ovsrcu_perthread_get(void) perthread = xmalloc(sizeof *perthread); perthread->seqno = seq_read(global_seqno); perthread->cbset = NULL; + ovs_list_init(&perthread->pending); + perthread->n_pending = 0; ovs_strlcpy(perthread->name, name[0] ? name : "main", sizeof perthread->name); @@ -153,9 +157,7 @@ ovsrcu_quiesce(void) perthread = ovsrcu_perthread_get(); perthread->seqno = seq_read(global_seqno); - if (perthread->cbset) { - ovsrcu_flush_cbset(perthread); - } + ovsrcu_flush_nodes(perthread); seq_change(global_seqno); ovsrcu_quiesced(); @@ -171,9 +173,7 @@ ovsrcu_try_quiesce(void) perthread = ovsrcu_perthread_get(); if (!seq_try_lock()) { perthread->seqno = seq_read(global_seqno); - if (perthread->cbset) { - ovsrcu_flush_cbset__(perthread, true); - } + ovsrcu_flush_nodes__(perthread, true); seq_change_protected(global_seqno); seq_unlock(); ovsrcu_quiesced(); @@ -264,10 +264,10 @@ ovsrcu_exit(void) /* Repeatedly: * * - Wait for a grace period. One important side effect is to push the - * running thread's cbset into 'flushed_cbsets' so that the next call + * running thread's nodes into 'flushed_nodes' so that the next call * has something to call. * - * - Call all the callbacks in 'flushed_cbsets'. If there aren't any, + * - Call all the callbacks in 'flushed_nodes'. If there aren't any, * we're done, otherwise the callbacks themselves might have requested * more deferred callbacks so we go around again. * @@ -282,6 +282,32 @@ ovsrcu_exit(void) } } +static void +ovsrcu_run_cbset(void *aux) +{ + struct ovsrcu_cbset *cbset = aux; + struct ovsrcu_cb *cb; + + for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) { + cb->function(cb->aux); + } + + free(cbset->cbs); + free(cbset); +} + +void +ovsrcu_postpone_embedded__(void (*function)(void *aux), void *aux, + struct ovsrcu_node *rcu_node) +{ + struct ovsrcu_perthread *perthread = ovsrcu_perthread_get(); + + rcu_node->cb = function; + rcu_node->aux = aux; + ovs_list_push_back(&perthread->pending, &rcu_node->list_node); + perthread->n_pending++; +} + /* Registers 'function' to be called, passing 'aux' as argument, after the * next grace period. * @@ -314,6 +340,7 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux) cbset->cbs = xmalloc(MIN_CBS * sizeof *cbset->cbs); cbset->n_allocated = MIN_CBS; cbset->n_cbs = 0; + ovsrcu_postpone_embedded(ovsrcu_run_cbset, cbset, rcu_node); } if (cbset->n_cbs == cbset->n_allocated) { @@ -329,24 +356,18 @@ ovsrcu_postpone__(void (*function)(void *aux), void *aux) static bool OVS_NO_SANITIZE_FUNCTION ovsrcu_call_postponed(void) { - struct ovsrcu_cbset *cbset; - struct ovs_list cbsets; + struct ovs_list nodes = OVS_LIST_INITIALIZER(&nodes); + struct ovsrcu_node *node; - guarded_list_pop_all(&flushed_cbsets, &cbsets); - if (ovs_list_is_empty(&cbsets)) { + guarded_list_pop_all(&flushed_nodes, &nodes); + if (ovs_list_is_empty(&nodes)) { return false; } ovsrcu_synchronize(); - LIST_FOR_EACH_POP (cbset, list_node, &cbsets) { - struct ovsrcu_cb *cb; - - for (cb = cbset->cbs; cb < &cbset->cbs[cbset->n_cbs]; cb++) { - cb->function(cb->aux); - } - free(cbset->cbs); - free(cbset); + LIST_FOR_EACH_POP (node, list_node, &nodes) { + node->cb(node->aux); } return true; @@ -358,9 +379,9 @@ ovsrcu_postpone_thread(void *arg OVS_UNUSED) pthread_detach(pthread_self()); while (!latch_is_set(&postpone_exit)) { - uint64_t seqno = seq_read(flushed_cbsets_seq); + uint64_t cb_seqno = seq_read(flushed_nodes_seq); if (!ovsrcu_call_postponed()) { - seq_wait(flushed_cbsets_seq, seqno); + seq_wait(flushed_nodes_seq, cb_seqno); latch_wait(&postpone_exit); poll_block(); } @@ -371,33 +392,36 @@ ovsrcu_postpone_thread(void *arg OVS_UNUSED) } static void -ovsrcu_flush_cbset__(struct ovsrcu_perthread *perthread, bool protected) +ovsrcu_flush_nodes__(struct ovsrcu_perthread *perthread, bool protected) { - struct ovsrcu_cbset *cbset = perthread->cbset; + if (ovs_list_is_empty(&perthread->pending)) { + return; + } - if (cbset) { - guarded_list_push_back(&flushed_cbsets, &cbset->list_node, SIZE_MAX); - perthread->cbset = NULL; + perthread->cbset = NULL; + guarded_list_push_back_all(&flushed_nodes, &perthread->pending, + perthread->n_pending); + ovs_list_init(&perthread->pending); + perthread->n_pending = 0; - if (protected) { - seq_change_protected(flushed_cbsets_seq); - } else { - seq_change(flushed_cbsets_seq); - } + if (protected) { + seq_change_protected(flushed_nodes_seq); + } else { + seq_change(flushed_nodes_seq); } } static void -ovsrcu_flush_cbset(struct ovsrcu_perthread *perthread) +ovsrcu_flush_nodes(struct ovsrcu_perthread *perthread) { - ovsrcu_flush_cbset__(perthread, false); + ovsrcu_flush_nodes__(perthread, false); } static void ovsrcu_unregister__(struct ovsrcu_perthread *perthread) { - if (perthread->cbset) { - ovsrcu_flush_cbset(perthread); + if (!ovs_list_is_empty(&perthread->pending)) { + ovsrcu_flush_nodes(perthread); } ovs_mutex_lock(&ovsrcu_threads_mutex); @@ -438,8 +462,8 @@ ovsrcu_init_module(void) ovs_list_init(&ovsrcu_threads); ovs_mutex_init(&ovsrcu_threads_mutex); - guarded_list_init(&flushed_cbsets); - flushed_cbsets_seq = seq_create(); + guarded_list_init(&flushed_nodes); + flushed_nodes_seq = seq_create(); ovsthread_once_done(&once); } diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h index a1c15c126..efd43a1a2 100644 --- a/lib/ovs-rcu.h +++ b/lib/ovs-rcu.h @@ -125,6 +125,22 @@ * ovs_mutex_unlock(&mutex); * } * + * As an alternative to ovsrcu_postpone(), the same deferred execution can be + * achieved using ovsrcu_postpone_embedded(): + * + * struct deferrable { + * struct ovsrcu_node rcu_node; + * }; + * + * void + * deferred_free(struct deferrable *d) + * { + * ovsrcu_postpone_embedded(free, d, rcu_node); + * } + * + * Using embedded fields can be preferred sometimes to avoid the small + * allocations done in ovsrcu_postpone(). + * * In some rare cases an object may not be addressable with a pointer, but only * through an array index (e.g. because it's provided by another library). It * is still possible to have RCU semantics by using the ovsrcu_index type. @@ -173,6 +189,8 @@ #include "compiler.h" #include "ovs-atomic.h" +#include "openvswitch/list.h" + #if __GNUC__ #define OVSRCU_TYPE(TYPE) struct { ATOMIC(TYPE) p; } #define OVSRCU_INITIALIZER(VALUE) { VALUE } @@ -256,6 +274,27 @@ void ovsrcu_postpone__(void (*function)(void *aux), void *aux); (void) sizeof(*(ARG)), \ ovsrcu_postpone__((void (*)(void *))(FUNCTION), ARG)) +struct ovsrcu_node { + struct ovs_list list_node; + void (*cb)(void *aux); + void *aux; +}; + +/* Calls FUNCTION passing ARG as its pointer-type argument, which + * contains an 'ovsrcu_node' as a field named MEMBER. The function + * is called following the next grace period. See 'Usage' above for an + * example. + */ +void ovsrcu_postpone_embedded__(void (*function)(void *aux), void *aux, + struct ovsrcu_node *node); +#define ovsrcu_postpone_embedded(FUNCTION, ARG, MEMBER) \ + (/* Verify that ARG is appropriate for FUNCTION. */ \ + (void) sizeof((FUNCTION)(ARG), 1), \ + /* Verify that ARG is a pointer type. */ \ + (void) sizeof(*(ARG)), \ + ovsrcu_postpone_embedded__((void (*)(void *))(FUNCTION), ARG, \ + &(ARG)->MEMBER)) + /* An array index protected by RCU semantics. This is an easier alternative to * an RCU protected pointer to a malloc'd int. */ typedef struct { atomic_int v; } ovsrcu_index; diff --git a/tests/test-rcu.c b/tests/test-rcu.c index bb17092bf..26150e7d9 100644 --- a/tests/test-rcu.c +++ b/tests/test-rcu.c @@ -17,11 +17,16 @@ #include #undef NDEBUG #include "fatal-signal.h" +#include "ovs-atomic.h" #include "ovs-rcu.h" #include "ovs-thread.h" #include "ovstest.h" +#include "seq.h" +#include "timeval.h" #include "util.h" +#include "openvswitch/poll-loop.h" + static void * quiescer_main(void *aux OVS_UNUSED) { @@ -67,10 +72,136 @@ test_rcu_barrier(void) ovs_assert(count == 10); } +struct element { + struct ovsrcu_node rcu_node; + struct seq *trigger; + atomic_bool wait; +}; + +static void +trigger_cb(void *e_) +{ + struct element *e = (struct element *) e_; + + seq_change(e->trigger); +} + +static void * +wait_main(void *aux) +{ + struct element *e = aux; + + for (;;) { + bool wait; + + atomic_read(&e->wait, &wait); + if (!wait) { + break; + } + } + + seq_wait(e->trigger, seq_read(e->trigger)); + poll_block(); + + return NULL; +} + +static void +test_rcu_postpone_embedded(bool multithread) +{ + long long int timeout; + pthread_t waiter; + struct element e; + uint64_t seqno; + + atomic_init(&e.wait, true); + + if (multithread) { + waiter = ovs_thread_create("waiter", wait_main, &e); + } + + e.trigger = seq_create(); + seqno = seq_read(e.trigger); + + ovsrcu_postpone_embedded(trigger_cb, &e, rcu_node); + + /* Check that GC holds out until all threads are quiescent. */ + timeout = time_msec(); + if (multithread) { + timeout += 200; + } + while (time_msec() <= timeout) { + ovs_assert(seq_read(e.trigger) == seqno); + } + + atomic_store(&e.wait, false); + + seq_wait(e.trigger, seqno); + poll_timer_wait_until(time_msec() + 200); + poll_block(); + + /* Verify that GC executed. */ + ovs_assert(seq_read(e.trigger) != seqno); + seq_destroy(e.trigger); + + if (multithread) { + xpthread_join(waiter, NULL); + } +} + +#define N_ORDER_CBS 5 + +struct order_element { + struct ovsrcu_node rcu_node; + int id; + int *log; + int *log_idx; +}; + +static void +order_cb(void *aux) +{ + struct order_element *e = aux; + e->log[(*e->log_idx)++] = e->id; +} + +static void +test_rcu_ordering(void) +{ + struct order_element elems[N_ORDER_CBS]; + int log[N_ORDER_CBS]; + int log_idx = 0; + + for (int i = 0; i < N_ORDER_CBS; i++) { + elems[i].id = i; + elems[i].log = log; + elems[i].log_idx = &log_idx; + ovsrcu_postpone_embedded(order_cb, &elems[i], rcu_node); + } + + ovsrcu_barrier(); + + ovs_assert(log_idx == N_ORDER_CBS); + for (int i = 0; i < N_ORDER_CBS; i++) { + if (log[i] != i) { + ovs_abort(0, "RCU embedded callback ordering violated: " + "expected cb %d at position %d, got %d", + i, i, log[i]); + } + } +} + static void test_rcu(int argc OVS_UNUSED, char *argv[] OVS_UNUSED) { + const bool multithread = true; + + /* Execute single-threaded check before spawning additional threads. */ + test_rcu_postpone_embedded(!multithread); + test_rcu_postpone_embedded(multithread); + test_rcu_quiesce(); test_rcu_barrier(); + test_rcu_ordering(); } OVSTEST_REGISTER("test-rcu", test_rcu); From patchwork Wed Apr 1 09:13:09 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218451 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=DKMTEQMD; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzpt434Wz1yGH for ; Wed, 01 Apr 2026 20:15:18 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 7F2FB40800; Wed, 1 Apr 2026 09:15:16 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id sqsntG4AzrCR; Wed, 1 Apr 2026 09:15:11 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 050024087E Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=DKMTEQMD Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 050024087E; Wed, 1 Apr 2026 09:15:11 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id DFD77C0070; Wed, 1 Apr 2026 09:15:10 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1365EC054D for ; Wed, 1 Apr 2026 09:15:10 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id D668F80BDF for ; Wed, 1 Apr 2026 09:15:07 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id iZZ0qg78TqAW for ; Wed, 1 Apr 2026 09:15:04 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c110::1; helo=bn1pr04cu002.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org A7BAF807FA Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org A7BAF807FA Authentication-Results: smtp1.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=DKMTEQMD Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azlp170100001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c110::1]) by smtp1.osuosl.org (Postfix) with ESMTPS id A7BAF807FA for ; Wed, 1 Apr 2026 09:15:03 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CqXDWL8hpOZryHQ1Hvgu5ZURVoIDZzThqZTYVWaXGSkpXkiFaXGfCTJjeedQxSm+AFbEI8cdk4jNNA8DflF5d6XuMuTlxkoi3JxaR1eINH67cs41tFhmqa4lyqsSHiIfpQEmHjPSskprVqHIZAW5ew0BpGsxPglDpew6ff+wmNhoxzIeDVOK4Wbb8mN3Vqn3URsu97AgPXAisjeuem4ReYkTqqw6+8rPl3p2X4eRcjTfmdKe69mtgcjtKgiykyF3bpXhSU8bJjm/AMH3C3T/0kUWEoPKabHBZOBSoAJfZ7VvwsERjuHFnH0K0iHP4h8MzzhBZDLkiOzqEClvnlI1ng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=Y0Hj6AIlhQ2Ij2uxi5C3g5YNmqeNot3nLeC8tgu4hZ4=; b=iSt+J08of6hcgnaCu1kfREadQy8SMYVKaJ9uUBXO5WM0Nm3lakuJDE83FTYm+zIf6vOPoF0sF6eB8ufA2TcZ2C4b5HfXnBI74oa3NrHIvZTgFFryikcb2xFeMS9TnFxovFW2YuWUOZfDOJH1Z/XSCnMWfOqyEPkzug1SlKM6XWTcFzsTdcy+3AUt/DSqujfc69rZP/yjvDQd3A+FWtEYYKD2oSpXfWJ4d/P1VYr8u4+l/zQm+Xmq5SoW4AgzAOatQGsxoL8l0s+UTU4SX06My0CYg/wTyLbioRbSlGpA8W3lradomFiXim1+XCLwJb+PdX53KD8hPg/5eG6Iuv0sTQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Y0Hj6AIlhQ2Ij2uxi5C3g5YNmqeNot3nLeC8tgu4hZ4=; b=DKMTEQMDn7gj+ybNHKM7Sq/auuqSWNHQ6edvAdxoXic9PSNhvGmjQqlOn4xj0Oe4ywOOYJpqiGwFFuEWnm3VDqsiu8Fw6v5rzpcAHuhKBE5y+bvDaoc4ulLu8ijwdcx/rBg8WUyyG4Y7c1LqdVgAofpjs7wCA+zwH779ztGqEXsSVP52emQ41elIJkxiQ2rbK0ONRO28vszsMNs5DGYjBgTT677xtYp2NkTQ+0yhm1nuz+7PGAYeXiXSZPHBPj/Y/prALDyWALqCdoGEn5ut0AEnX7ESY3ZIxsIrPxlBINNzgtlJS14GOdJP+uvFyi19Q+CkrjaWRPUZFj1f9ghdFw== Received: from PH7P220CA0009.NAMP220.PROD.OUTLOOK.COM (2603:10b6:510:326::14) by CH1PPF0B4A257F6.namprd12.prod.outlook.com (2603:10b6:61f:fc00::605) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 09:14:50 +0000 Received: from SA2PEPF000015CD.namprd03.prod.outlook.com (2603:10b6:510:326:cafe::21) by PH7P220CA0009.outlook.office365.com (2603:10b6:510:326::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.29 via Frontend Transport; Wed, 1 Apr 2026 09:14:44 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015CD.mail.protection.outlook.com (10.167.241.203) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:14:49 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:31 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:28 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:09 +0300 Message-ID: <20260401091318.2671624-3-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015CD:EE_|CH1PPF0B4A257F6:EE_ X-MS-Office365-Filtering-Correlation-Id: 959bcbb1-f4fc-420c-0922-08de8fcf2060 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|36860700016|82310400026|1800799024|13003099007|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: c/Q8l0BBoNd7HAjNYEDYnGPCA8Nzfq3LUfFrEcSVxovGzqr1baCCgylTzAnReWVBa9RcInMxuGcyxHJM4HiHjbBJFU35ly8aVwOTN6XchyLwcX3zSrDafcF/jhZP0NmsjxqoiSMEIJdB1VHnIb2RFJrTyl+ZcNJ/pmKJWqcACEPUaXnXx2ZFzwT2NoSb06JcvkWmYbI7Jh9LVhgMzhgGVigkIMvTc6bjpjiW5yBsC1iYYtYTVVXE58cFYCwSibjYVGYpqFj1c4saBM/RpUkkr9jPP3eZESlFxEBh1z/p2JTuto8sKwPOVUyVEKXNTjgfRrrDAx2DmMbTiNXqxC9i8YhdwDb/JwXkT8STstPdm/M4qe3Ml1MkFZX8JA00bGjwgG50oKNB62M0RoxQBQXroSzin70I8iL/CzNuDM8X1gLCdzkv6rHU1xVBfK1WHIDai4FHa376jugQyuoOrB4XR2/N2TlXvq+NN2mu2OfJfmCs3XlZ+KGR53uxDMOJerl6osoSZv/9P8sMpAaC2Xj0PpaTfV4QN7C90wr2sFYFO/3WRCWNwyqgNJ2cuTuKKmkJZZxzbn6/MglegXfFE9avrBNVQaKp1+KicRzlh33ZNiU8Y+AJARblP78bT1N8fjWSyaAEF6qk3ZfqhdjmrYO3fK4YiybXQjn1jqLYLYzki9Mse8CYMjIFpeBenZ9+nC8BIxh7B9MQc1rPFVi1igQ3O3TAcSqEGSAG43EdVFDhli75zZeN8TTHI6ZgJIvQBGOzIXmXmTHcCqdNEzklrwfIRA== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(376014)(36860700016)(82310400026)(1800799024)(13003099007)(22082099003)(56012099003)(18002099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: 1IG7ZRBFkz8oD9z9D0veQwKLvDqUGsdRIpWhQM6QHAxKzL5LSt6OOS3eJ0VAWpEw6S5CG/QIHj657cBdZsa9ODrIK1fnCDFlPBNYs4ER946RdTJw1FD0UjT4zLsc6m9xPzb6V1BiPXoWh5zoXNljCXE75Iw0B+7m7n3OeZO6Nm591v86yPMM+ys8xkthta8xJZ69mIFijfsbb3prZsE3/uCQzwqR+DxXqU2DfESTtLsPvfcd/UhMrzUcFKdFETak+Q5H/zcATBPpVbw+GTh0ZIBmqFOL6YRMZF1xVsjMX+zkw2yOjrNmup5zPaqg22xalw1jHpF5Izdygia1sQ+NiOYf6BINiR1+gVHmR+LQsLdqVFShDeHe5OIlyLTULe4KHvWhLGcQfnnxnXfcAcIg4whlSkcyf1VBJtUyTNRHs9S9JtX3S6RETrXxzIXN93vO X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:14:49.5978 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 959bcbb1-f4fc-420c-0922-08de8fcf2060 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015CD.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH1PPF0B4A257F6 Subject: [ovs-dev] [PATCH v3 02/11] refmap: Introduce reference map. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Gaetan Rivet Add a reference-counted key-value map. Duplicates take a reference on the original entry within the map, leaving it in place. To be able to execute an entry creation after determining whether it is already present or not in the map, store relevant initialization and de-initialization functions. Signed-off-by: Gaetan Rivet Co-authored-by: Eli Britstein Signed-off-by: Eli Britstein --- lib/automake.mk | 2 + lib/refmap.c | 485 ++++++++++++++++++ lib/refmap.h | 130 +++++ tests/automake.mk | 1 + tests/library.at | 5 + tests/test-refmap.c | 894 ++++++++++++++++++++++++++++++++++ utilities/checkpatch_dict.txt | 2 + 7 files changed, 1519 insertions(+) create mode 100644 lib/refmap.c create mode 100644 lib/refmap.h create mode 100644 tests/test-refmap.c diff --git a/lib/automake.mk b/lib/automake.mk index c6e988906..cb6458b0d 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -323,6 +323,8 @@ lib_libopenvswitch_la_SOURCES = \ lib/rculist.h \ lib/reconnect.c \ lib/reconnect.h \ + lib/refmap.c \ + lib/refmap.h \ lib/rstp.c \ lib/rstp.h \ lib/rstp-common.h \ diff --git a/lib/refmap.c b/lib/refmap.c new file mode 100644 index 000000000..c2c435238 --- /dev/null +++ b/lib/refmap.c @@ -0,0 +1,485 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "cmap.h" +#include "hash.h" +#include "fatal-signal.h" +#include "ovs-atomic.h" +#include "ovs-thread.h" +#include "refmap.h" +#include "timeval.h" + +#include "openvswitch/list.h" +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(refmap); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); + +static struct ovs_mutex refmap_destroy_lock = OVS_MUTEX_INITIALIZER; +static struct ovs_list refmap_destroy_list + OVS_GUARDED_BY(refmap_destroy_lock) = + OVS_LIST_INITIALIZER(&refmap_destroy_list); + +struct refmap { + struct cmap map; + struct ovs_mutex map_lock; + size_t key_size; + size_t value_size; + refmap_value_init value_init; + refmap_value_uninit value_uninit; + refmap_value_format value_format; + char *name; + struct ovs_list in_destroy_list; +}; + +struct refmap_node { + struct ovsrcu_node rcu_node; + /* CMAP related: */ + struct cmap_node map_node; + uint32_t hash; + /* Content: */ + struct ovs_refcount refcount; + /* Key, then Value follows. */ +}; + +static void +refmap_destroy__(struct refmap *rfm, bool global_destroy); + +static void +refmap_destroy_unregister_protected(struct refmap *rfm) + OVS_REQUIRES(refmap_destroy_lock) +{ + ovs_list_remove(&rfm->in_destroy_list); +} + +static void +refmap_destroy_unregister(struct refmap *rfm) + OVS_EXCLUDED(refmap_destroy_lock) +{ + ovs_mutex_lock(&refmap_destroy_lock); + refmap_destroy_unregister_protected(rfm); + ovs_mutex_unlock(&refmap_destroy_lock); +} + +static void +refmap_destroy_register(struct refmap *rfm) + OVS_EXCLUDED(refmap_destroy_lock) +{ + ovs_mutex_lock(&refmap_destroy_lock); + ovs_list_push_back(&refmap_destroy_list, &rfm->in_destroy_list); + ovs_mutex_unlock(&refmap_destroy_lock); +} + +static void +refmap_destroy_all(void *aux OVS_UNUSED) +{ + struct refmap *rfm; + + ovs_mutex_lock(&refmap_destroy_lock); + LIST_FOR_EACH_SAFE (rfm, in_destroy_list, &refmap_destroy_list) { + refmap_destroy_unregister_protected(rfm); + refmap_destroy__(rfm, true); + } + ovs_mutex_unlock(&refmap_destroy_lock); + ovs_mutex_destroy(&refmap_destroy_lock); +} + +static void +refmap_fatal_signal_hook(void *aux OVS_UNUSED) +{ + /* This argument is only for the type check in 'ovsrcu_postpone', + * it is not otherwise used. */ + static int dummy_arg; + + /* Do not run all destroys right in the signal handler. + * Let other modules execute their own cleanup, and then + * iterate over any remaining to warn about leaks. */ + ovsrcu_postpone(refmap_destroy_all, &dummy_arg); +} + +struct refmap * +refmap_create(const char *name, + size_t key_size, + size_t value_size, + refmap_value_init value_init, + refmap_value_uninit value_uninit, + refmap_value_format value_format) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + struct refmap *rfm; + + ovs_assert(value_init && value_uninit); + + if (ovsthread_once_start(&once)) { + fatal_signal_add_hook(refmap_fatal_signal_hook, NULL, NULL, true); + ovsthread_once_done(&once); + } + + rfm = xzalloc(sizeof *rfm); + rfm->name = xstrdup(name); + rfm->key_size = key_size; + rfm->value_size = value_size; + rfm->value_init = value_init; + rfm->value_uninit = value_uninit; + rfm->value_format = value_format; + + ovs_mutex_init(&rfm->map_lock); + cmap_init(&rfm->map); + + refmap_destroy_register(rfm); + + return rfm; +} + +static void +refmap_destroy__(struct refmap *rfm, bool global_destroy) +{ + bool leaks_detected = false; + + if (!rfm) { + return; + } + + VLOG_DBG("%s: destroying the map", rfm->name); + + ovs_mutex_lock(&rfm->map_lock); + if (!cmap_is_empty(&rfm->map)) { + struct refmap_node *node; + + VLOG_WARN("%s: %s called with elements remaining in the map", + rfm->name, __func__); + leaks_detected = true; + CMAP_FOR_EACH (node, map_node, &rfm->map) { + /* No need to remove the node from the CMAP, it will + * be destroyed immediately. */ + ovsrcu_postpone_embedded(free, node, rcu_node); + } + } + cmap_destroy(&rfm->map); + ovs_mutex_unlock(&rfm->map_lock); + + ovs_mutex_destroy(&rfm->map_lock); + free(rfm->name); + free(rfm); + + /* During the very last stage of execution of RCU callbacks, + * the VLOG subsystem has been disabled. All logs are thus muted. + * If leaks are detected, abort the process, even though we were + * exiting due to a fatal signal. The SIGABRT generated will still + * be visible. */ + if (global_destroy && leaks_detected) { + ovs_abort(-1, "Refmap values leak detected."); + } +} + +void +refmap_destroy(struct refmap *rfm) +{ + if (!rfm) { + return; + } + + refmap_destroy_unregister(rfm); + refmap_destroy__(rfm, false); +} + +static size_t +refmap_aligned_key_size(struct refmap *rfm) +{ + return ROUND_UP(rfm->key_size, 8); +} + +static void * +refmap_node_key(struct refmap_node *node) +{ + if (!node) { + return NULL; + } + + return node + 1; +} + +static void * +refmap_node_value(struct refmap *rfm, struct refmap_node *node) +{ + if (!node) { + return NULL; + } + + return ((char *) refmap_node_key(node)) + refmap_aligned_key_size(rfm); +} + +static size_t +refmap_node_total_size(struct refmap *rfm) +{ + return sizeof(struct refmap_node) + + refmap_aligned_key_size(rfm) + rfm->value_size; +} + +static struct refmap_node * +refmap_node_from_value(struct refmap *rfm, void *value) +{ + size_t offset = sizeof(struct refmap_node) + refmap_aligned_key_size(rfm); + + if ((uintptr_t) value < offset) { + return NULL; + } + return (void *) (((char *) value) - offset); +} + +static void +log_node(struct refmap *rfm, const char *prefix, struct refmap_node *node) +{ + void *key, *value; + struct ds s; + + if (OVS_LIKELY(VLOG_DROP_DBG(&rl) || !rfm->value_format)) { + return; + } + + key = refmap_node_key(node); + value = refmap_node_value(rfm, node); + + ds_init(&s); + ds_put_cstr(&s, ", '"); + rfm->value_format(&s, key, value); + ds_put_cstr(&s, "'"); + VLOG_DBG("%s: %p %s, refcnt=%d%s", rfm->name, value, prefix, + ovs_refcount_read(&node->refcount), ds_cstr(&s)); + ds_destroy(&s); +} + +/* Increments 'refcount', but only if it is over one. + * + * Returns false if the refcount was zero or one. + * In refmap, the last reference is only used to synchronize between + * value init and uninit in case of contention. In such state, the + * object is not valid anymore for external readers, until the + * value_{init,uninit} critical section is completed. + */ +static inline bool +refmap_refcount_try_ref_one(struct ovs_refcount *refcount) +{ + unsigned int count; + + atomic_read_explicit(&refcount->count, &count, memory_order_relaxed); + do { + if (count <= 1) { + return false; + } + } while (!atomic_compare_exchange_weak_explicit(&refcount->count, &count, + count + 1, + memory_order_relaxed, + memory_order_relaxed)); + return true; +} + +void +refmap_for_each(struct refmap *rfm, + void (*cb)(void *value, void *key, void *arg), void *arg) +{ + struct refmap_node *node; + + CMAP_FOR_EACH (node, map_node, &rfm->map) { + void *value; + + if (!refmap_refcount_try_ref_one(&node->refcount)) { + continue; + } + log_node(rfm, "foreach", node); + value = refmap_node_value(rfm, node); + cb(value, refmap_node_key(node), arg); + refmap_unref(rfm, value); + } +} + +static uint32_t +refmap_key_hash(const struct refmap *rfm, const void *key) +{ + return hash_bytes(key, rfm->key_size, 0); +} + +static void * +refmap_lookup_protected(struct refmap *rfm, void *key, uint32_t hash) +{ + struct refmap_node *node; + + CMAP_FOR_EACH_WITH_HASH_PROTECTED (node, map_node, hash, &rfm->map) { + if (!memcmp(key, refmap_node_key(node), rfm->key_size) && + ovs_refcount_read(&node->refcount) > 1) { + return node; + } + } + + return NULL; +} + +static void * +refmap_lookup(struct refmap *rfm, void *key, uint32_t hash) +{ + struct refmap_node *node; + + CMAP_FOR_EACH_WITH_HASH (node, map_node, hash, &rfm->map) { + if (!memcmp(key, refmap_node_key(node), rfm->key_size) && + ovs_refcount_read(&node->refcount) > 1) { + return node; + } + } + + return NULL; +} + +void * +refmap_try_ref(struct refmap *rfm, void *key) +{ + struct refmap_node *node; + + node = refmap_lookup(rfm, key, refmap_key_hash(rfm, key)); + if (!node || !refmap_refcount_try_ref_one(&node->refcount)) { + return NULL; + } + + log_node(rfm, "try_ref", node); + return refmap_node_value(rfm, node); +} + +void * +refmap_ref(struct refmap *rfm, void *key, void *arg) +{ + struct refmap_node *node; + bool error = false; + uint32_t hash; + void *value; + + hash = refmap_key_hash(rfm, key); + + node = refmap_lookup(rfm, key, hash); + if (node && refmap_refcount_try_ref_one(&node->refcount)) { + value = refmap_node_value(rfm, node); + goto out; + } + + ovs_mutex_lock(&rfm->map_lock); + + node = refmap_lookup_protected(rfm, key, hash); + if (node && refmap_refcount_try_ref_one(&node->refcount)) { + ovs_mutex_unlock(&rfm->map_lock); + value = refmap_node_value(rfm, node); + goto out; + } + + node = xzalloc(refmap_node_total_size(rfm)); + node->hash = hash; + ovs_refcount_init(&node->refcount); + memcpy(refmap_node_key(node), key, rfm->key_size); + value = refmap_node_value(rfm, node); + if (rfm->value_init(value, arg) == 0) { + cmap_insert(&rfm->map, &node->map_node, node->hash); + ovs_refcount_ref(&node->refcount); + } else { + value = NULL; + error = true; + VLOG_WARN("%s: value_init failed", rfm->name); + } + ovs_mutex_unlock(&rfm->map_lock); + +out: + if (error) { + free(node); + return NULL; + } + + log_node(rfm, "ref", node); + + return value; +} + +bool +refmap_try_ref_value(struct refmap *rfm, void *value) +{ + struct refmap_node *node; + + if (!value) { + return false; + } + + node = refmap_node_from_value(rfm, value); + if (!node || !refmap_refcount_try_ref_one(&node->refcount)) { + return false; + } + + log_node(rfm, "try_ref_value", node); + return true; +} + +bool +refmap_unref(struct refmap *rfm, void *value) +{ + struct refmap_node *node; + + if (!value) { + return false; + } + + node = refmap_node_from_value(rfm, value); + if (!node) { + return false; + } + + log_node(rfm, "unref", node); + + if (ovs_refcount_unref(&node->refcount) == 2) { + ovs_mutex_lock(&rfm->map_lock); + if (ovs_refcount_read(&node->refcount) > 1) { + ovs_mutex_unlock(&rfm->map_lock); + return false; + } + rfm->value_uninit(refmap_node_value(rfm, node)); + cmap_remove(&rfm->map, &node->map_node, node->hash); + ovs_assert(ovs_refcount_unref(&node->refcount) == 1); + ovs_mutex_unlock(&rfm->map_lock); + ovsrcu_postpone_embedded(free, node, rcu_node); + return true; + } + return false; +} + +void * +refmap_key_from_value(struct refmap *rfm, void *value) +{ + return refmap_node_key(refmap_node_from_value(rfm, value)); +} + +unsigned int +refmap_value_refcount_read(struct refmap *rfm, void *value) +{ + struct refmap_node *node; + + if (!value) { + return 0; + } + + node = refmap_node_from_value(rfm, value); + if (node) { + return ovs_refcount_read(&node->refcount) - 1; + } + + return 0; +} diff --git a/lib/refmap.h b/lib/refmap.h new file mode 100644 index 000000000..e41f1d888 --- /dev/null +++ b/lib/refmap.h @@ -0,0 +1,130 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef REFMAP_H +#define REFMAP_H + +#include + +#include +#include + +#include "openvswitch/dynamic-string.h" + +/* + * Reference map + * ============= + * + * This key-value store acts like a regular concurrent hashmap, + * except that insertion takes a reference on the value if already + * present. + * The key provided must be fully initialized, including potential pad bytes. + * + * As the value creation is dependent on it being already present + * within the structure and the user cannot predict that, this structure + * requires definitions for value_init and value_uninit functions, + * that will be called only at creation (first reference taken) and + * destruction (last reference released). + * + * Example: + * 1. struct key key; + * 2. memset(&key, 0, sizeof key); + * 3. refmap_create() + * 4. value = refmap_ref(key); + * Since it's the first reference for , value_init is called. + * 5. refmap_ref(key); + * This is not the first reference for . Only ref-count is updated. + * 6. refmap_unref(value); + * This is not the last reference released. Only ref-count is updated. + * 7. refmap_unref(value); + * This is the last reference released. value_uninit is immediatelly + * called, while the value memory is freed after RCU grace period. + * + * Thread safety + * ============= + * + * MT-unsafe: + * * refmap_create + * * refmap_destroy + * + * MT-safe: + * * refmap_for_each + * * refmap_ref + * * refmap_try_ref + * * refmap_try_ref_value + * * refmap_unref + * + */ + +struct refmap; + +/* Called once on a newly created 'value', i.e. when the first + * reference is taken. */ +typedef int (*refmap_value_init)(void *value, void *arg); + +/* Called once on the last dereference to value. */ +typedef void (*refmap_value_uninit)(void *value); + +/* Format a (key, value, arg) tuple in 's'. This is an optional (can be NULL) + * callback, used for debug log purposes. + */ +typedef struct ds *(*refmap_value_format)(struct ds *s, void *key, + void *value); + +/* Allocate and return a map handle. + * + * The user must ensure the 'key' is fully initialized, including potential + * pad bytes. + */ +struct refmap *refmap_create(const char *name, + size_t key_size, + size_t value_size, + refmap_value_init value_init, + refmap_value_uninit value_uninit, + refmap_value_format value_format); + +/* Frees the map memory. + * + * The client is responsible for unreferencing any data previously held in + * the map. */ +void refmap_destroy(struct refmap *rfm); + +/* refmap_try_ref takes a reference for the found value upon success. + * It's the user's responsibility to unref it. */ +void *refmap_try_ref(struct refmap *rfm, void *key); +void *refmap_ref(struct refmap *rfm, void *key, void *arg); +bool refmap_try_ref_value(struct refmap *rfm, void *value); +void refmap_for_each(struct refmap *rfm, + void (*cb)(void *value, void *key, void *arg), + void *arg); +/* The refmap_value_refcount_read() API requires the caller to hold a + * reference, so a returned value of 1 only indicates you were the sole owner + * at the moment of the read, but may no longer be by the time you receive the + * value. This makes it unsuitable for logic decisions and only useful for + * debug logging. + */ +void *refmap_key_from_value(struct refmap *rfm, void *value); + +/* Return 'true' if it was the last 'value' dereference and + * 'value_uninit' has been called. */ +bool refmap_unref(struct refmap *rfm, void *value); + +unsigned int +refmap_value_refcount_read(struct refmap *rfm, void *value); + +#endif /* REFMAP_H */ diff --git a/tests/automake.mk b/tests/automake.mk index a9d972a86..a74e56454 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -502,6 +502,7 @@ tests_ovstest_SOURCES = \ tests/test-rcu.c \ tests/test-rculist.c \ tests/test-reconnect.c \ + tests/test-refmap.c \ tests/test-rstp.c \ tests/test-sflow.c \ tests/test-sha1.c \ diff --git a/tests/library.at b/tests/library.at index 449f15fd5..3662d488e 100644 --- a/tests/library.at +++ b/tests/library.at @@ -44,6 +44,11 @@ AT_CHECK([ovstest test-ccmap check 1], [0], [... ]) AT_CLEANUP +AT_SETUP([refmap]) +AT_KEYWORDS([refmap]) +AT_CHECK([ovstest test-refmap check], [0], []) +AT_CLEANUP + AT_SETUP([atomic operations]) AT_CHECK([ovstest test-atomic]) AT_CLEANUP diff --git a/tests/test-refmap.c b/tests/test-refmap.c new file mode 100644 index 000000000..4639088bc --- /dev/null +++ b/tests/test-refmap.c @@ -0,0 +1,894 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#undef NDEBUG +#include +#include +#include +#include + +#include "ovs-atomic.h" +#include "ovs-numa.h" +#include "ovs-rcu.h" +#include "ovs-thread.h" +#include "ovstest.h" +#include "random.h" +#include "refmap.h" +#include "timeval.h" +#include "util.h" + +#include "openvswitch/util.h" +#include "openvswitch/vlog.h" + +#define N 100 + +static struct refmap_test_params { + unsigned int n_threads; + unsigned int n_ids; + int step_idx; + bool debug; + bool csv_format; +} params = { + .n_threads = 1, + .n_ids = N, + .debug = false, + .csv_format = false, +}; + +DEFINE_STATIC_PER_THREAD_DATA(unsigned int, thread_id, OVSTHREAD_ID_UNSET); + +static unsigned int +thread_id(void) +{ + static atomic_count next_id = ATOMIC_COUNT_INIT(0); + unsigned int id = *thread_id_get(); + + if (OVS_UNLIKELY(id == OVSTHREAD_ID_UNSET)) { + id = atomic_count_inc(&next_id); + *thread_id_get() = id; + } + + return id; +} + +struct key { + size_t idx; + bool b; + uint8_t pad[7]; +}; + +struct value { + uint32_t *hdl; +}; + +struct arg { + uint32_t *ptr; +}; + +static int +value_init(void *value_, void *arg_) +{ + struct value *value = value_; + struct arg *arg = arg_; + + /* Verify that we don't double-init value. */ + ovs_assert(!value->hdl); + + *arg->ptr = 1; + value->hdl = arg->ptr; + return 0; +} + +static void +value_uninit(void *value_) +{ + struct value *value = value_; + + /* Verify that we don't double-uninit value. */ + ovs_assert(value->hdl); + + *value->hdl = 2; + value->hdl = NULL; +} + +static struct ds * +value_format(struct ds *s, void *key_, void *value_) +{ + struct key *key = key_; + struct value *value = value_; + + ds_put_format(s, "idx=%"PRIuSIZE", b=%s, hdl=%p", + key->idx, key->b ? "1" : "0", + value->hdl); + return s; +} + +struct check_refmap_ctx { + struct refmap *rfm; + struct value **values; + int count; +}; + +static void +check_refmap_cb(void *value_, void *key_, void *arg_) +{ + struct check_refmap_ctx *ctx = arg_; + struct key *key = key_; + + ovs_assert(key->idx < N); + if (ctx->values) { + ovs_assert(ctx->values[key->idx] == value_); + } + + ctx->count++; +} + +static void +check_refmap(struct refmap *rfm, struct value **values, int n_expected) +{ + struct check_refmap_ctx ctx = { + .rfm = rfm, + .values = values, + .count = 0, + }; + + refmap_for_each(rfm, check_refmap_cb, &ctx); + ovs_assert(ctx.count == n_expected); +} + +struct iter_modify_ctx { + struct refmap *rfm; + struct key *keys; + struct value **extra_refs; + int ref_count; + int unref_count; +}; + +static void +iter_ref_cb(void *value_, void *key_, void *arg_) +{ + struct iter_modify_ctx *ctx = arg_; + struct key *key = key_; + + ovs_assert(refmap_try_ref_value(ctx->rfm, value_)); + ctx->extra_refs[key->idx] = value_; + ctx->ref_count++; +} + +static void +iter_unref_cb(void *value_, void *key_ OVS_UNUSED, void *arg_) +{ + struct iter_modify_ctx *ctx = arg_; + + refmap_unref(ctx->rfm, value_); + ctx->unref_count++; +} + +struct try_ref_race_ctx { + struct refmap *rfm; + struct key key; + atomic_bool stop; +}; + +static void +race_for_each_cb(void *value_, void *key_ OVS_UNUSED, void *arg_ OVS_UNUSED) +{ + struct value *value = value_; + + ovs_assert(value->hdl); +} + +static void * +try_ref_racer(void *arg) +{ + struct try_ref_race_ctx *ctx = arg; + + for (;;) { + bool stop_; + + atomic_read(&ctx->stop, &stop_); + if (stop_) { + break; + } + + void *value = refmap_try_ref(ctx->rfm, &ctx->key); + if (value) { + struct value *v = value; + + ovs_assert(v->hdl); + refmap_unref(ctx->rfm, value); + } + + refmap_for_each(ctx->rfm, race_for_each_cb, NULL); + } + + return NULL; +} + +/* Stress-test that try_ref rejects entries at refcount 1 (the internal + * reference used during value init/uninit synchronization). + * + * Thread A repeatedly creates and destroys the same entry. + * Thread B continuously calls try_ref and for_each. */ +static void +check_try_ref_race(void) +{ + struct try_ref_race_ctx race_ctx; + pthread_t worker; + struct refmap *rfm; + + rfm = refmap_create("try-ref-race", sizeof(struct key), + sizeof(struct value), value_init, value_uninit, + value_format); + + memset(&race_ctx.key, 0, sizeof race_ctx.key); + race_ctx.key.idx = 0; + race_ctx.rfm = rfm; + atomic_init(&race_ctx.stop, false); + + worker = ovs_thread_create("try-ref-racer", try_ref_racer, &race_ctx); + + for (int i = 0; i < 10000; i++) { + uint32_t arg_val = 0; + struct arg arg = { .ptr = &arg_val }; + void *value; + + value = refmap_ref(rfm, &race_ctx.key, &arg); + refmap_unref(rfm, value); + } + + atomic_store(&race_ctx.stop, true); + xpthread_join(worker, NULL); + + refmap_destroy(rfm); +} + +static void +run_check(struct ovs_cmdl_context *ctx OVS_UNUSED) +{ + struct iter_modify_ctx im_ctx; + struct value *extra_refs[N]; + struct value *values[N]; + struct key keys[N]; + struct refmap *rfm; + uint32_t args[N]; + + rfm = refmap_create("check-rfm", sizeof(struct key), sizeof(struct value), + value_init, value_uninit, value_format); + + check_refmap(rfm, NULL, 0); + + memset(keys, 0, sizeof keys); + for (int i = 0; i < N; i++) { + struct arg arg = { + .ptr = &args[i], + }; + struct value *value; + + keys[i].idx = i; + args[i] = i; + ovs_assert(!refmap_try_ref(rfm, &keys[i])); + value = refmap_ref(rfm, &keys[i], &arg); + ovs_assert(value); + ovs_assert(value == refmap_ref(rfm, &keys[i], &arg)); + refmap_unref(rfm, value); + ovs_assert(value == refmap_try_ref(rfm, &keys[i])); + refmap_unref(rfm, value); + values[i] = value; + } + + check_refmap(rfm, (struct value **) values, N); + + for (int i = 0; i < N; i++) { + /* Verify that value_init is properly called. */ + ovs_assert(values[i]->hdl == &args[i]); + ovs_assert(args[i] == 1); + } + + /* Verify refmap_value_refcount_read: each value has one user ref. */ + for (int i = 0; i < N; i++) { + ovs_assert(refmap_value_refcount_read(rfm, values[i]) == 1); + } + + ovs_assert(refmap_value_refcount_read(rfm, NULL) == 0); + + /* Verify refmap_key_from_value. */ + for (int i = 0; i < N; i++) { + struct key *k = refmap_key_from_value(rfm, values[i]); + ovs_assert(k->idx == keys[i].idx); + } + + /* Verify refmap_try_ref_value and refcount changes. */ + for (int i = 0; i < N; i++) { + ovs_assert(refmap_try_ref_value(rfm, values[i])); + ovs_assert(refmap_value_refcount_read(rfm, values[i]) == 2); + refmap_unref(rfm, values[i]); + ovs_assert(refmap_value_refcount_read(rfm, values[i]) == 1); + } + + ovs_assert(!refmap_try_ref_value(rfm, NULL)); + + check_refmap(rfm, (struct value **) values, N); + + /* Take extra refs from within refmap_for_each callback. */ + memset(&im_ctx, 0, sizeof im_ctx); + im_ctx.rfm = rfm; + im_ctx.keys = keys; + im_ctx.extra_refs = (struct value **) extra_refs; + memset(extra_refs, 0, sizeof extra_refs); + refmap_for_each(rfm, iter_ref_cb, &im_ctx); + ovs_assert(im_ctx.ref_count == N); + for (int i = 0; i < N; i++) { + ovs_assert(extra_refs[i] == values[i]); + ovs_assert(refmap_value_refcount_read(rfm, values[i]) == 2); + } + + check_refmap(rfm, (struct value **) values, N); + + /* Drop extra refs from within refmap_for_each callback. */ + memset(&im_ctx, 0, sizeof im_ctx); + im_ctx.rfm = rfm; + refmap_for_each(rfm, iter_unref_cb, &im_ctx); + ovs_assert(im_ctx.unref_count == N); + for (int i = 0; i < N; i++) { + ovs_assert(refmap_value_refcount_read(rfm, values[i]) == 1); + } + + check_refmap(rfm, (struct value **) values, N); + + for (int i = 0; i < N; i++) { + refmap_unref(rfm, values[i]); + } + + for (int i = 0; i < N; i++) { + ovs_assert(!refmap_try_ref(rfm, &keys[i])); + } + + for (int i = 0; i < N; i++) { + /* Verify that value_uninit is executed. */ + ovs_assert(args[i] == 2); + } + + check_refmap(rfm, NULL, 0); + + refmap_destroy(rfm); + + check_try_ref_race(); +} + +static uint32_t *ids; +static void **values; +static atomic_uint *thread_working_ms; /* Measured work time. */ + +static struct ovs_barrier barrier_outer; +static struct ovs_barrier barrier_inner; + +static atomic_uint running_time_ms; +static atomic_bool stop; + +static unsigned int +elapsed(unsigned int start) +{ + unsigned int running_time_ms_; + + atomic_read(&running_time_ms, &running_time_ms_); + + return running_time_ms_ - start; +} + +static void * +clock_main(void *arg OVS_UNUSED) +{ + struct timeval start; + struct timeval end; + + xgettimeofday(&start); + for (;;) { + bool stop_; + + atomic_read(&stop, &stop_); + if (stop_) { + break; + } + + xgettimeofday(&end); + atomic_store(&running_time_ms, + timeval_to_msec(&end) - timeval_to_msec(&start)); + xnanosleep(100 * 1000); + } + + return NULL; +} + +enum step_id { + STEP_NONE, + STEP_ALLOC, + STEP_REF, + STEP_UNREF, + STEP_FREE, + STEP_MIXED, + STEP_POS_QUERY, + STEP_NEG_QUERY, +}; + +static const char *step_names[] = { + [STEP_NONE] = "", + [STEP_ALLOC] = "alloc", + [STEP_REF] = "ref", + [STEP_UNREF] = "unref", + [STEP_FREE] = "free", + [STEP_MIXED] = "mixed", + [STEP_POS_QUERY] = "pos-query", + [STEP_NEG_QUERY] = "neg-query", +}; + +#define MAX_N_STEP 10 + +#define FOREACH_STEP(STEP_VAR, SCHEDULE) \ + for (int __idx = 0, STEP_VAR = (SCHEDULE)[__idx]; \ + (STEP_VAR = (SCHEDULE)[__idx]) != STEP_NONE; \ + __idx++) + +struct test_case { + int idx; + enum step_id schedule[MAX_N_STEP]; +}; + +static void +print_header(void) +{ + if (params.csv_format) { + return; + } + + printf("Benchmarking n=%u on %u thread%s.\n", + params.n_ids, params.n_threads, + params.n_threads > 1 ? "s" : ""); + + printf(" step\\thread: "); + printf(" Avg"); + for (size_t i = 0; i < params.n_threads; i++) { + printf(" %3" PRIuSIZE, i + 1); + } + + printf("\n"); +} + +static void +print_test_header(struct test_case *test) +{ + if (params.csv_format) { + return; + } + + printf("[%d]---------------------------", test->idx); + for (size_t i = 0; i < params.n_threads; i++) { + printf("-------"); + } + + printf("\n"); +} + +static void +print_test_result(struct test_case *test, enum step_id step, int step_idx) +{ + char test_name[50]; + uint32_t *twm; + uint32_t avg; + size_t i; + + twm = xcalloc(params.n_threads, sizeof *twm); + for (i = 0; i < params.n_threads; i++) { + atomic_read(&thread_working_ms[i], &twm[i]); + } + + avg = 0; + for (i = 0; i < params.n_threads; i++) { + avg += twm[i]; + } + + ovs_assert(params.n_threads); + avg /= params.n_threads; + + snprintf(test_name, sizeof test_name, "%d.%d-%s", + test->idx, step_idx, + step_names[step]); + if (params.csv_format) { + printf("%s,%" PRIu32, test_name, avg); + } else { + printf("%*s: ", 18, test_name); + printf(" %6" PRIu32, avg); + for (i = 0; i < params.n_threads; i++) { + printf(" %6" PRIu32, twm[i]); + } + printf(" ms"); + } + + printf("\n"); + + free(twm); +} + +static struct test_case test_cases[] = { + { + .schedule = { + STEP_ALLOC, + STEP_FREE, + }, + }, + { + .schedule = { + STEP_ALLOC, + STEP_REF, + STEP_UNREF, + STEP_FREE, + }, + }, + { + .schedule = { + STEP_MIXED, + STEP_FREE, + }, + }, + { + .schedule = { + STEP_ALLOC, + STEP_POS_QUERY, + /* Test negative query with map full. */ + STEP_NEG_QUERY, + STEP_FREE, + /* Test negative query with map empty. */ + STEP_NEG_QUERY, + }, + }, +}; + +static void +swap_ptr(void **a, void **b) +{ + void *t; + t = *a; + *a = *b; + *b = t; +} + +struct aux { + struct test_case test; + struct refmap *rfm; +}; + +static void * +benchmark_thread_worker(void *aux_) +{ + unsigned int tid = thread_id(); + unsigned int n_ids_per_thread; + unsigned int start_idx; + struct aux *aux = aux_; + struct refmap *rfm; + unsigned int start; + uint32_t *th_ids; + void **th_privs; + void *value; + size_t i; + + n_ids_per_thread = params.n_ids / params.n_threads; + start_idx = tid * n_ids_per_thread; + th_privs = &values[start_idx]; + th_ids = &ids[start_idx]; + + for (;;) { + bool stop_; + + ovs_barrier_block(&barrier_outer); + atomic_read(&stop, &stop_); + if (stop_) { + break; + } + + /* Wait for main thread to finish initializing + * rfm and step schedule. */ + ovs_barrier_block(&barrier_inner); + rfm = aux->rfm; + + FOREACH_STEP(step, aux->test.schedule) { + ovs_barrier_block(&barrier_inner); + atomic_read(&running_time_ms, &start); + switch (step) { + case STEP_ALLOC: + case STEP_REF: + for (i = 0; i < n_ids_per_thread; i++) { + struct key key = { + .idx = start_idx + i, + }; + struct arg arg = { + .ptr = &th_ids[i], + }; + + th_privs[i] = refmap_ref(rfm, &key, &arg); + } + break; + case STEP_POS_QUERY: + for (i = 0; i < n_ids_per_thread; i++) { + struct key key = { + .idx = start_idx + i, + }; + value = refmap_try_ref(rfm, &key); + refmap_unref(rfm, value); + } + break; + case STEP_NEG_QUERY: + for (i = 0; i < n_ids_per_thread; i++) { + struct key key = { + .idx = params.n_ids + 1, + }; + value = refmap_try_ref(rfm, &key); + refmap_unref(rfm, value); + } + break; + case STEP_UNREF: + case STEP_FREE: + for (i = 0; i < n_ids_per_thread; i++) { + refmap_unref(rfm, th_privs[i]); + } + break; + case STEP_MIXED: + for (i = 0; i < n_ids_per_thread; i++) { + struct arg arg; + struct key key; + int shuffled; + + /* Mixed mode is doing: + * 1. Alloc. + * 2. Shuffle two elements. + * 3. Delete shuffled element. + * 4. Alloc again. + * The loop ends with all elements allocated. + */ + + memset(&key, 0, sizeof key); + key.idx = start_idx + i; + shuffled = random_range(i + 1); + + arg.ptr = &th_ids[i]; + th_privs[i] = refmap_ref(rfm, &key, &arg); + swap_ptr(&th_privs[i], &th_privs[shuffled]); + refmap_unref(rfm, th_privs[i]); + arg.ptr = &th_ids[i]; + th_privs[i] = refmap_ref(rfm, &key, &arg); + } + break; + default: + fprintf(stderr, "[%u]: Reached step %d\n", + tid, step); + OVS_NOT_REACHED(); + break; + } + atomic_store(&thread_working_ms[tid], elapsed(start)); + ovs_barrier_block(&barrier_inner); + /* Main thread prints result now. */ + } + } + + return NULL; +} + +static void +benchmark_thread_main(struct aux *aux) +{ + int step_idx; + + memset(ids, 0, params.n_ids * sizeof *ids); + memset(values, 0, params.n_ids * sizeof *values); + + aux->rfm = refmap_create("benchmark-rfm", sizeof(struct key), + sizeof(struct value), value_init, value_uninit, + value_format); + + print_test_header(&aux->test); + ovs_barrier_block(&barrier_inner); + /* Init is done, worker can start preparing to work. */ + step_idx = 0; + FOREACH_STEP(step, aux->test.schedule) { + ovs_barrier_block(&barrier_inner); + /* Workers do the scheduled work now. */ + ovs_barrier_block(&barrier_inner); + print_test_result(&aux->test, step, step_idx++); + } + + refmap_destroy(aux->rfm); +} + +static bool +parse_benchmark_params(int argc, char *argv[]) +{ + long int l_threads = 0; + long int l_ids = 0; + bool valid = true; + long int l; + int i; + + params.step_idx = -1; + for (i = 0; i < argc; i++) { + if (!strcmp(argv[i], "benchmark") || + !strcmp(argv[i], "debug")) { + continue; + } else if (!strcmp(argv[i], "csv")) { + params.csv_format = true; + } else if (!strncmp(argv[i], "step=", 5)) { + if (!str_to_long(&argv[i][5], 10, &l)) { + fprintf(stderr, + "Invalid parameter '%s', expected positive integer.\n", + argv[i]); + valid = false; + goto out; + } + + params.step_idx = l; + } else { + if (!str_to_long(argv[i], 10, &l)) { + fprintf(stderr, + "Invalid parameter '%s', expected positive integer.\n", + argv[i]); + valid = false; + goto out; + } + + if (l_ids == 0) { + l_ids = l; + } else if (l_threads == 0) { + l_threads = l; + } else { + fprintf(stderr, + "Invalid parameter '%s', too many integer values.\n", + argv[i]); + valid = false; + goto out; + } + } + } + + if (l_ids != 0) { + params.n_ids = l_ids; + } else { + fprintf(stderr, "Invalid parameters: no number of elements given.\n"); + valid = false; + } + + if (l_threads != 0) { + params.n_threads = l_threads; + } else { + fprintf(stderr, "Invalid parameters: no number of threads given.\n"); + valid = false; + } + +out: + return valid; +} + +static void +run_benchmark(struct ovs_cmdl_context *ctx) +{ + pthread_t *threads; + pthread_t clock; + struct aux aux; + size_t i; + + if (!parse_benchmark_params(ctx->argc, ctx->argv)) { + return; + } + + ids = xcalloc(params.n_ids, sizeof *ids); + values = xcalloc(params.n_ids, sizeof *values); + thread_working_ms = xcalloc(params.n_threads, + sizeof *thread_working_ms); + for (i = 0; i < params.n_threads; i++) { + atomic_init(&thread_working_ms[i], 0); + } + + atomic_init(&stop, false); + + clock = ovs_thread_create("clock", clock_main, NULL); + + ovsrcu_quiesce_start(); + ovs_barrier_init(&barrier_outer, params.n_threads + 1); + ovs_barrier_init(&barrier_inner, params.n_threads + 1); + threads = xmalloc(params.n_threads * sizeof *threads); + for (i = 0; i < params.n_threads; i++) { + threads[i] = ovs_thread_create("worker", + benchmark_thread_worker, &aux); + } + + print_header(); + for (i = 0; i < ARRAY_SIZE(test_cases); i++) { + test_cases[i].idx = i; + if (params.step_idx != -1 && + params.step_idx != i) { + continue; + } + /* If we don't block workers from progressing now, + * there would be a race for access to aux.test, + * leading to some workers not respecting the schedule. + */ + ovs_barrier_block(&barrier_outer); + memcpy(&aux.test, &test_cases[i], sizeof aux.test); + benchmark_thread_main(&aux); + } + + atomic_store(&stop, true); + ovs_barrier_block(&barrier_outer); + + for (i = 0; i < params.n_threads; i++) { + xpthread_join(threads[i], NULL); + } + + free(threads); + + ovs_barrier_destroy(&barrier_outer); + ovs_barrier_destroy(&barrier_inner); + free(ids); + free(values); + free(thread_working_ms); + xpthread_join(clock, NULL); +} + +static const struct ovs_cmdl_command commands[] = { + {"check", "[debug]", 0, 1, run_check, OVS_RO}, + {"benchmark", " [step=] [csv]", 0, 4, + run_benchmark, OVS_RO}, + {NULL, NULL, 0, 0, NULL, OVS_RO}, +}; + +static void +parse_test_params(int argc, char *argv[]) +{ + int i; + + for (i = 0; i < argc; i++) { + if (!strcmp(argv[i], "debug")) { + params.debug = true; + } + } +} + +static void +refmap_test_main(int argc, char *argv[]) +{ + struct ovs_cmdl_context ctx = { + .argc = argc - optind, + .argv = argv + optind, + }; + + parse_test_params(argc - optind, argv + optind); + + vlog_set_levels(NULL, VLF_ANY_DESTINATION, VLL_OFF); + if (params.debug) { + vlog_set_levels_from_string_assert("refmap:console:dbg"); + } + + /* Quiesce to start the RCU. */ + ovsrcu_quiesce(); + + set_program_name(argv[0]); + ovs_cmdl_run_command(&ctx, commands); + + ovsrcu_exit(); +} + +OVSTEST_REGISTER("test-refmap", refmap_test_main); diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt index c1f43e5af..5ad599c1d 100644 --- a/utilities/checkpatch_dict.txt +++ b/utilities/checkpatch_dict.txt @@ -107,6 +107,7 @@ icmp icmp4 icmpv6 idl +ie ifdef ifindex initializer @@ -232,6 +233,7 @@ rebased recirc recirculation recirculations +refmap revalidate revalidation revalidator From patchwork Wed Apr 1 09:13:10 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218448 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=uo7qZjXx; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzpl6mM6z1yGH for ; Wed, 01 Apr 2026 20:15:11 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6310480CFE; Wed, 1 Apr 2026 09:15:10 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id rY_PehHFBUHP; Wed, 1 Apr 2026 09:15:09 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5D26F80D06 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=uo7qZjXx Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5D26F80D06; Wed, 1 Apr 2026 09:15:09 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 4809DC0070; Wed, 1 Apr 2026 09:15:09 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 251EBC0551 for ; Wed, 1 Apr 2026 09:15:07 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 4715E40864 for ; Wed, 1 Apr 2026 09:14:59 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id B6MvC2A62tey for ; Wed, 1 Apr 2026 09:14:58 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c10c::1; helo=sa9pr02cu001.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org CF37940126 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org CF37940126 Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=uo7qZjXx Received: from SA9PR02CU001.outbound.protection.outlook.com (mail-southcentralusazlp170130001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c10c::1]) by smtp2.osuosl.org (Postfix) with ESMTPS id CF37940126 for ; Wed, 1 Apr 2026 09:14:57 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TkPjdQODL0KVxsF/eRcDyK3wCdja1MEfakdtBl8QxYXt0BpAH86BHz9LRS4av3qoaoSxxsQwTVtVAR2tAgpSx8qkAnOUGt5sH8z/VGXJAYmnLgIPTyyIN5BDSsrniDLYyueuaOYdXhubVjAmv2Gs13klknCUZ+c/PH21W8nbmeU8dNirK8ejpauOyJGumGg/iDCKeec6LbDKGT7qjYB5f9phh2Ish7xA4G/KDLvyQ6tr1RnECZM/PdHCqjLMYAZRDY3nU6SUjLuPyy7rwHYusIXUvfBKtQZXqDBIN/87BEH50IyL2WGHwjqAwDALZmg/7OXtYLu7lgwIptJwEXI4jQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=I2mXXSB2XQRytotvNvekaXv4Xy37IQAf2nvF0GF/axc=; b=uDburOuDp+34lePkeA83aLAnEZRHBSA5cdS4Fag+ZFBzZsQ4hK6oGPcp4Jbklw7/WxgIcu8ZTxNKMXJwpwYwNxz9xEqn40N4RocD0klGtNH7yQwAYIuxaJMLZnkniu+pooLiGVR2kkoo8AmIxLtu0BKiL98yK28dujIU8M88CbJQHoxmjBnyNhFL6qY/DMQhCYM15CgS86qyr9bKDTfm6eR7ii7dCUCNNJt8QZkyogOb86ru1RBoViTnDB8KnnSCMks7Ufma8/e7rYfa/1dhUvBKuoALjZYJhmNpA6/FiVSpigVImWDN9P063oiViOxMi+vKUkcU8Ea7DvrIkWMkGQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=I2mXXSB2XQRytotvNvekaXv4Xy37IQAf2nvF0GF/axc=; b=uo7qZjXxX6GaaMtYNmxd1iehusm09B8u0XlkyfpBSt8xiSDyPgDxceJw1D8SJ4IhzfXiNhgbL18cO4FH6RaKmZDk0sUbNRPuhqkZ5ySdpUwqTZ6Xvfs837bk8sTtet/QvQpsav467WfH4zZv0aYhbSuFCNZ2jExqGOz1NjZmBDQmSedFlDFt0o/8qTOjPvPu3SvSary2qu54u3D0QtB6thw6/iJIKSQLxyovEiDoxBJ7y3sQkPTCoGO5SDh0yC1xyFIWbqHpqhR5wmWFxeXt18VvfQZ14nQhfWiHJR+HpB18I6mFv7oamsQXqXE+q9667BMx0/re+7TDUIXUjU2TuQ== Received: from PH7PR17CA0033.namprd17.prod.outlook.com (2603:10b6:510:323::7) by LV2PR12MB5895.namprd12.prod.outlook.com (2603:10b6:408:173::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 09:14:53 +0000 Received: from SA2PEPF000015C7.namprd03.prod.outlook.com (2603:10b6:510:323:cafe::87) by PH7PR17CA0033.outlook.office365.com (2603:10b6:510:323::7) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.29 via Frontend Transport; Wed, 1 Apr 2026 09:14:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015C7.mail.protection.outlook.com (10.167.241.197) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:14:51 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:34 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:31 -0700 To: Date: Wed, 1 Apr 2026 12:13:10 +0300 Message-ID: <20260401091318.2671624-4-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015C7:EE_|LV2PR12MB5895:EE_ X-MS-Office365-Filtering-Correlation-Id: 3b040e04-a503-49bd-33f8-08de8fcf2174 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|376014|1800799024|36860700016|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: DhxiVmqboZn7PQDO0XkljeJnfp+QDsAyRMphdTb1/NwC2HpXx2LUVhbYnHBZCSWROM2ihExVH3mWFaRRFHirfY2RQ5406pwPS0PxbKEgliABLJsDZyQjT4/mtAhxTp54r4+aCEpHOw983WgF9Zz8+vGi1QEn4rVnU+fViiYrtjN+Yn03muwdGep/MPeQqRQ5gvD1e0WfvFQ/x/zkD4UGAJYmwEghw3MORj0NLBuiHYHejW12ouUh2B7bZij0pAB2nn3T1LzQG2QgCYXp5+SO8s0sBXQ6DYntqqWBc2Zk810hrs8UVZOZjK4lzMVZJtSvNDGhnEK8W4698VqE2AUd7aPKGSMkbaKgUedOwDIPh84cJAaAI2i4cf90faeSQP39uDgtxx9Fb2PiEcSZ9YNGHZgiDlc62UuC4Htc8NZZ23Dt+tp5n4DecWQpA2yldCOcChm50RRilqVvtcwnQ/jNHATVFMVYJavFJu+WdELzo4l0UWY5C4Z5vqeFcQ4sWSn95hjmTiQemDnTzU1AlzXAoPDnzso3uA7GTM6IA0TJU9CUofeJTZk5n3mAmz4wCTOXFqr8hlwGEWzLtbOADjyvnRO7EqT6OePuohb3Y6cLqENrb9PCwktFLPLe5n7RM+G+UlUxJo8XQldeAqLSawMimRx8nnnbvzdDnII1ScHYvhgChRA+DpolLr7h0XvOiypeV3gdSwTshleKGkXKZv1Rfs7dRfqKkJuOAnPjCZU4GwexdW1bNDBL4w+ybOqoq3NQnQRiLEguTULQ8qhSzwMkZw== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(82310400026)(376014)(1800799024)(36860700016)(18002099003)(56012099003)(22082099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: F77x1PNYNygveRnpVKQCj0Xu7xcL+5N8cv2JkLPNbG5o214qpcxNaHYNXP6VUTTjIAuEPn3XJtd1bllxcPF4MIiZVdWJpCpvZdhJtqz4mSwq6NMPZJYpQBmO0eIIUxvGUduaSHmV18pyKiQQPGBKyxiA+g5R4nPJjvMqKhN3epan0dZXYqNISEs6RsNzKwQdy2SPKeBGTjCndGGwtPiYpSxB+cI1hBrW+81l5YVhv5smh6q/D8iuXtdNx+pfi2kAnWTtRNVdlM97hgtcnZtSCQxcD3ZkDfimjzvkbkATjWgJBJCMtfVSl9qpo9bwH9WlCcj45K/byztHG4hbgPH1SNmwqGsUV8Ogbno5bpkjP3Jsiwgb3rsdXENTxvRZIU66G9O1RJBTKnW8q9OB+n5UvVqOXiRhnD/LmQMnYIboP9efz22Dp7l1eAXp43HF0pt7 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:14:51.4019 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3b040e04-a503-49bd-33f8-08de8fcf2174 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015C7.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV2PR12MB5895 Subject: [ovs-dev] [PATCH v3 03/11] packets: Move ETH_TYPE_LLDP to be a public define. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Eli Britstein via dev From: Eli Britstein Reply-To: Eli Britstein Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Move this define to be public as a pre-step towards using it in other files. Also reorder the ETH_TYPE_XXX defines by their values. Signed-off-by: Eli Britstein Acked-by: Eelco Chaudron echaudro@redhat.com --- lib/ovs-lldp.c | 1 - lib/packets.h | 7 ++++--- tests/test-aa.c | 2 -- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/lib/ovs-lldp.c b/lib/ovs-lldp.c index 152777248..9411de3a4 100644 --- a/lib/ovs-lldp.c +++ b/lib/ovs-lldp.c @@ -54,7 +54,6 @@ VLOG_DEFINE_THIS_MODULE(ovs_lldp); #define LLDP_PROTOCOL_VERSION 0x00 #define LLDP_TYPE_CONFIG 0x00 #define LLDP_CHASSIS_TTL 120 -#define ETH_TYPE_LLDP 0x88cc #define MINIMUM_ETH_PACKET_SIZE 68 #define AA_STATUS_MULTIPLE \ diff --git a/lib/packets.h b/lib/packets.h index 61666f3ad..3c56c95d2 100644 --- a/lib/packets.h +++ b/lib/packets.h @@ -406,18 +406,19 @@ void add_mpls(struct dp_packet *packet, ovs_be16 ethtype, ovs_be32 lse, #define ETH_TYPE_IP 0x0800 #define ETH_TYPE_ARP 0x0806 +#define ETH_TYPE_ERSPAN2 0x22eb /* version 2 type III */ #define ETH_TYPE_TEB 0x6558 +#define ETH_TYPE_RARP 0x8035 #define ETH_TYPE_VLAN_8021Q 0x8100 #define ETH_TYPE_VLAN ETH_TYPE_VLAN_8021Q -#define ETH_TYPE_VLAN_8021AD 0x88a8 #define ETH_TYPE_IPV6 0x86dd #define ETH_TYPE_LACP 0x8809 -#define ETH_TYPE_RARP 0x8035 #define ETH_TYPE_MPLS 0x8847 #define ETH_TYPE_MPLS_MCAST 0x8848 #define ETH_TYPE_NSH 0x894f +#define ETH_TYPE_VLAN_8021AD 0x88a8 #define ETH_TYPE_ERSPAN1 0x88be /* version 1 type II */ -#define ETH_TYPE_ERSPAN2 0x22eb /* version 2 type III */ +#define ETH_TYPE_LLDP 0x88cc static inline bool eth_type_mpls(ovs_be16 eth_type) { diff --git a/tests/test-aa.c b/tests/test-aa.c index ba21d5908..c7687bb1b 100644 --- a/tests/test-aa.c +++ b/tests/test-aa.c @@ -23,8 +23,6 @@ #include "ovs-lldp.h" #include "ovstest.h" -#define ETH_TYPE_LLDP 0x88cc - /* Dummy MAC addresses */ static const struct eth_addr chassis_mac = ETH_ADDR_C(5e,10,8e,e7,84,ad); static const struct eth_addr eth_src = ETH_ADDR_C(5e,10,8e,e7,84,ad); From patchwork Wed Apr 1 09:13:11 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218459 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=oFagxqSG; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzrs3nqCz1yGH for ; Wed, 01 Apr 2026 20:17:01 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 217B44090B; Wed, 1 Apr 2026 09:17:00 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 9CSzDtDOHHhy; Wed, 1 Apr 2026 09:16:56 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 0A36E40865 Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=oFagxqSG Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 0A36E40865; Wed, 1 Apr 2026 09:16:56 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E4893C0070; Wed, 1 Apr 2026 09:16:55 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0EE4AC0070 for ; Wed, 1 Apr 2026 09:16:55 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 998B46103C for ; Wed, 1 Apr 2026 09:15:33 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 4BXO9nTsMoqC for ; Wed, 1 Apr 2026 09:15:30 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c007::2; helo=mw6pr02cu001.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org A4DB36101B Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org A4DB36101B Authentication-Results: smtp3.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=oFagxqSG Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azlp170120002.outbound.protection.outlook.com [IPv6:2a01:111:f403:c007::2]) by smtp3.osuosl.org (Postfix) with ESMTPS id A4DB36101B for ; Wed, 1 Apr 2026 09:15:28 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=QRc+odm5nnx2w++1pwCrfMjgoZdFVvrhvBp70wulbBjAH+T1dZqzE2GY2Ss9h7CwUoxobJ6iF2VbgkE+YXLnPmeOHg/mO5OSI8qnxhehRT5piu+f3o8NfcV0KVuZv61RW9KpYLHLmhDZE99GmyYVsTlmurGNg7xT6vZEWIS4KulPa0jR88+uqvUPP0Zi3A709g0NuSJRomal7J6l1yWXlk4FVTMusdthU+MRE19dAuglEJEMXudYrwkBIoxNouKKNHZyEWMOZtaDKekoIv40/6DNmKXgDuUt8TB7YfMK/lS1QPpi5H5UEA7ZsZ6eNyrgr8nmo6A8twtx0cpxa3odeQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=JoNInTZH/DG+aowdbaMaEm0LcveLShOqq7x/SwXnUNE=; b=GLZieBF5Tu7Tltu+5FgFKkxpWE17pfh/z50rf28y+lTnrZsERvgbNJxp9yxJLtLuun4cQQ/94zlHX3Bb58lg1Gv2HTIibIMfzMtxSaxzQxWljA2x5hpwdLPbfmoCGC0uoIYTeTnDqxnNWS3T59T30h+CEDNlS2D8z+p+ciRcr3TqA474W67/4H9Qr47a538Y2RojweT9U+BKhBiB4ADsU/5RRnjHM1FcB3rN3DsRBqyU3DeByMLDdCpvNevU7ILBaKWJaKAEcWVRz9Udyl4AW0LnLT4uBiUpYxH1zz5hANI3LOReGunecS3MZiGEEvEOg6OC78TdGK7N8vLvkcCbOg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=JoNInTZH/DG+aowdbaMaEm0LcveLShOqq7x/SwXnUNE=; b=oFagxqSGj3KAhNpPRPsPX/1KEF1EeEn77mxRNdOuleuseToqkeQjtLxjrbIIHH2bVQdXsoFmvYF/aCYeVHHigh/Nj0sNOM58rAq0nUoo8q+3cKyllySM8d5SpiY0PIUx9wxN4nk5jz1d7j7azsOIbZeYJ3UIED2bWoHBFKumMaMja6bqx85/v8rZMl0s6Qx2/nX1ibwAI9zMx1CAbK7r7NhyqN6lPOV0nENaV2N1c9CESa7rQ9jlupQbQOietDjd4QbgGbBlztPp1kQpL35BTHSL0QItpkgGpBXa8b8qwqjkBCbytZOooNx7KMPFFS2WEiKO03qrrqBrDNNTBsrduQ== Received: from PH5P220CA0009.NAMP220.PROD.OUTLOOK.COM (2603:10b6:510:34a::14) by DS0PR12MB8766.namprd12.prod.outlook.com (2603:10b6:8:14e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.16; Wed, 1 Apr 2026 09:14:56 +0000 Received: from CY4PEPF0000E9D7.namprd05.prod.outlook.com (2603:10b6:510:34a:cafe::e1) by PH5P220CA0009.outlook.office365.com (2603:10b6:510:34a::14) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 09:15:00 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CY4PEPF0000E9D7.mail.protection.outlook.com (10.167.241.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:14:55 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:38 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:34 -0700 To: Date: Wed, 1 Apr 2026 12:13:11 +0300 Message-ID: <20260401091318.2671624-5-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9D7:EE_|DS0PR12MB8766:EE_ X-MS-Office365-Filtering-Correlation-Id: 07112494-04b7-4338-7008-08de8fcf241f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700016|376014|1800799024|82310400026|13003099007|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: 7bwCoZ2hKYCIqSOopuIO1v5wHA5r2DDWU1tNeegtwNwz5od9LrpKLTv4RLTYixBpeu7LiJCtNtoU+aVHSIV2xz+RAxhGU7q/HHfKPJIoQPOFqp5/IpuOhRYsQNimBpMRXslMBDd6h/t+QSY5Ax3V4rotYGBhhIVpsXH+2uC7fCH5x2qiuxP1uZkpH1kwCtY7xGQRbCA7C0N4CY3xKOV3kXjBQIgGM9C2E4kGiwouvDu6sg8KiuOgk7VhQuSUDyp+SzR/yyLfdO1UbRAUh+MIDX1zUKUyLMOmdJtWl1s49qB5sYtsQAUZ2gs7fi/LPieyqydKBkVivNeiZQrwAhcGMzZVo7hALZC7i4YE6b+c1IMole1K2KxFEFEFc6d7I5i3jmCn4C1WXLf4buR1/qJ5S07vjI11UIqTn4xXQZkrgQiCgfdoB1zn5iaIuVEa3jKXUwN4NDM9xrlhdwdKDEO17m1C/Z129Kk6O5OUCQDR9MnD21pDAMcgWka2uF54qFv9VKD4iokf1TngdTdn6/OHVgWNaxddwJCxIY8Hp4SRzpEZpivGXxdafp8DA0v4UTHd8pshq9dkhnEsrAYKDcdH55z9VEx/Y0clIrmyJIUFWfjZdiNxXsuomU1KhWkS7KuClwF6uG1EplXlT6yXYMNjLswnX+xDzY/PCBlFdn5W2DgUnL40awp0oTKHgFG00JnzUzcy676GSeSNMF9WlQ8nCJwfddp2WazzEnnN2eVmcMmDEVsjotBsgniZtN71pF4+U61C0oBKOiCdjeoNYJArJA== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(36860700016)(376014)(1800799024)(82310400026)(13003099007)(18002099003)(56012099003)(22082099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: D7IUHXi8YhCtrtOkdtk5HiSC60w1X97AJLWai2mmJU6Uingu2bzJKyNQemyHsaMZDcB4malLkwYxwKWAxJIt0DYhiFRQQ1qmzELR+aVKxnheF5gxxhJv2cZfh3dGc9S66geQHnLiM/YReEL3ReRJ+RcpxmyuqqYZwSZxB4+1xeNIwftQgl57J1coPEj7JklKOv8AgvN1Cap6Nf+HhTXzvOvJsrklFeMdH+Am2NBdbf1yWDWb2OyAQ2DL6Ph2JchygsTxm8zLzm/Pr35VymvlGFoVzr9EYbAw+X24jKvf0cWgwjHyjbV6A21EMydRgw2jDEAjLsrQZm48Cefxtm49Q6zXj1+H2DyFLEmjUnhZq6mlBb5Gg9pX4RE1ECq6hwJ/k2H6D6jFU79lrhLRGAYRlrKBAtgY9vRhQfanKGsVDveLzixj3lnl26K9AUzEjlyx X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:14:55.8862 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 07112494-04b7-4338-7008-08de8fcf241f X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9D7.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS0PR12MB8766 Subject: [ovs-dev] [PATCH v3 04/11] netdev-dpdk-private: Refactor declarations from netdev-dpdk. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Eli Britstein via dev From: Eli Britstein Reply-To: Eli Britstein Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" As a pre-step towards introducing netdev-doca, that has common parts with netdev-dpdk, refactor declarations to be non-static, declared in a new file netdev-dpdk-private. Signed-off-by: Eli Britstein --- lib/automake.mk | 1 + lib/netdev-dpdk-private.h | 173 +++++ lib/netdev-dpdk.c | 1519 +++++++++++++++++-------------------- 3 files changed, 880 insertions(+), 813 deletions(-) create mode 100644 lib/netdev-dpdk-private.h diff --git a/lib/automake.mk b/lib/automake.mk index cb6458b0d..bab03c3e7 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -209,6 +209,7 @@ lib_libopenvswitch_la_SOURCES = \ lib/multipath.h \ lib/namemap.c \ lib/netdev-dpdk.h \ + lib/netdev-dpdk-private.h \ lib/netdev-dummy.c \ lib/netdev-provider.h \ lib/netdev-vport.c \ diff --git a/lib/netdev-dpdk-private.h b/lib/netdev-dpdk-private.h new file mode 100644 index 000000000..9b82db750 --- /dev/null +++ b/lib/netdev-dpdk-private.h @@ -0,0 +1,173 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_DPDK_PRIVATE_H +#define NETDEV_DPDK_PRIVATE_H + +#include + +#include +#include +#include + +#include "netdev-provider.h" +#include "util.h" + +#include "openvswitch/thread.h" + +extern const struct rte_eth_conf port_conf; + +/* Defines. */ + +#define SOCKET0 0 + +/* + * need to reserve tons of extra space in the mbufs so we can align the + * DMA addresses to 4KB. + * The minimum mbuf size is limited to avoid scatter behaviour and drop in + * performance for standard Ethernet MTU. + */ +#define ETHER_HDR_MAX_LEN (RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN \ + + (2 * VLAN_HEADER_LEN)) +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + RTE_ETHER_HDR_LEN + \ + RTE_ETHER_CRC_LEN) +#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ + - RTE_ETHER_HDR_LEN - RTE_ETHER_CRC_LEN) +#define NETDEV_DPDK_MBUF_ALIGN 1024 + +#define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE + +/* Default size of Physical NIC RXQ */ +#define NIC_PORT_DEFAULT_RXQ_SIZE 2048 +/* Default size of Physical NIC TXQ */ +#define NIC_PORT_DEFAULT_TXQ_SIZE 2048 + +#define DPDK_ETH_PORT_ID_INVALID RTE_MAX_ETHPORTS + +/* DPDK library uses uint16_t for port_id. */ +typedef uint16_t dpdk_port_t; +#define DPDK_PORT_ID_FMT "%"PRIu16 + +/* Enums. */ + +enum dpdk_hw_ol_features { + NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, + NETDEV_RX_HW_CRC_STRIP = 1 << 1, + NETDEV_RX_HW_SCATTER = 1 << 2, + NETDEV_TX_IPV4_CKSUM_OFFLOAD = 1 << 3, + NETDEV_TX_TCP_CKSUM_OFFLOAD = 1 << 4, + NETDEV_TX_UDP_CKSUM_OFFLOAD = 1 << 5, + NETDEV_TX_SCTP_CKSUM_OFFLOAD = 1 << 6, + NETDEV_TX_TSO_OFFLOAD = 1 << 7, + NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 8, + NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD = 1 << 9, + NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD = 1 << 10, + NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD = 1 << 11, + NETDEV_TX_GRE_TNL_TSO_OFFLOAD = 1 << 12, +}; + +/* Structs. */ + +#ifndef NETDEV_DPDK_TX_Q_TYPE +#error "NETDEV_DPDK_TX_Q_TYPE must be defined before" \ + "including netdev-dpdk-private.h" +#endif + +#ifndef NETDEV_DPDK_SW_STATS_TYPE +#error "NETDEV_DPDK_SW_STATS_TYPE must be defined before" \ + "including netdev-dpdk-private.h" +#endif + +#ifndef NETDEV_DPDK_GLOBAL_MUTEX +#error "NETDEV_DPDK_GLOBAL_MUTEX must be defined before" \ + "including netdev-dpdk-private.h" +#endif + +struct netdev_rxq_dpdk { + struct netdev_rxq up; + dpdk_port_t port_id; +}; + +struct netdev_dpdk_common { + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, + uint16_t port_id; + bool attached; + bool is_representor; + bool started; + struct eth_addr hwaddr; + int mtu; + int socket_id; + int max_packet_len; + enum netdev_flags flags; + int link_reset_cnt; + char *devargs; + NETDEV_DPDK_TX_Q_TYPE *tx_q; + struct rte_eth_link link; + ); + + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, + struct ovs_mutex mutex OVS_ACQ_AFTER(NETDEV_DPDK_GLOBAL_MUTEX); + struct dpdk_mp *dpdk_mp; + ); + + PADDED_MEMBERS(CACHE_LINE_SIZE, + struct netdev up; + struct ovs_list list_node OVS_GUARDED_BY(NETDEV_DPDK_GLOBAL_MUTEX); + bool rx_metadata_delivery_configured; + ); + + PADDED_MEMBERS(CACHE_LINE_SIZE, + struct netdev_stats stats; + NETDEV_DPDK_SW_STATS_TYPE *sw_stats; + rte_spinlock_t stats_lock; + ); + + PADDED_MEMBERS(CACHE_LINE_SIZE, + /* Configuration fields */ + int requested_mtu; + int requested_n_txq; + int user_n_rxq; + int requested_n_rxq; + int requested_rxq_size; + int requested_txq_size; + int rxq_size; + int txq_size; + int requested_socket_id; + struct rte_eth_fc_conf fc_conf; + uint32_t hw_ol_features; + bool requested_lsc_interrupt_mode; + bool lsc_interrupt_mode; + struct eth_addr requested_hwaddr; + ); + + PADDED_MEMBERS(CACHE_LINE_SIZE, + struct rte_eth_xstat_name *rte_xstats_names; + int rte_xstats_names_size; + int rte_xstats_ids_size; + uint64_t *rte_xstats_ids; + ); +}; + +static inline struct netdev_dpdk_common * +netdev_dpdk_common_cast(const struct netdev *netdev) +{ + return CONTAINER_OF(netdev, struct netdev_dpdk_common, up); +} + +#endif /* NETDEV_DPDK_PRIVATE_H */ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 54959ff0d..e34e96dd3 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -17,6 +17,14 @@ #include #include "netdev-dpdk.h" +#include "openvswitch/thread.h" + +#define NETDEV_DPDK_TX_Q_TYPE struct dpdk_tx_queue +#define NETDEV_DPDK_SW_STATS_TYPE struct netdev_dpdk_sw_stats +static struct ovs_mutex dpdk_mutex; +#define NETDEV_DPDK_GLOBAL_MUTEX dpdk_mutex +#include "netdev-dpdk-private.h" + #include #include #include @@ -94,20 +102,6 @@ static bool per_port_memory = false; /* Status of per port memory support */ #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE #define OVS_VPORT_DPDK "ovs_dpdk" -/* - * need to reserve tons of extra space in the mbufs so we can align the - * DMA addresses to 4KB. - * The minimum mbuf size is limited to avoid scatter behaviour and drop in - * performance for standard Ethernet MTU. - */ -#define ETHER_HDR_MAX_LEN (RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN \ - + (2 * VLAN_HEADER_LEN)) -#define MTU_TO_FRAME_LEN(mtu) ((mtu) + RTE_ETHER_HDR_LEN + \ - RTE_ETHER_CRC_LEN) -#define MTU_TO_MAX_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_MAX_LEN) -#define FRAME_LEN_TO_MTU(frame_len) ((frame_len) \ - - RTE_ETHER_HDR_LEN - RTE_ETHER_CRC_LEN) -#define NETDEV_DPDK_MBUF_ALIGN 1024 #define NETDEV_DPDK_MAX_PKT_LEN 9728 /* Max and min number of packets in the mempool. OVS tries to allocate a @@ -117,7 +111,6 @@ static bool per_port_memory = false; /* Status of per port memory support */ #define MAX_NB_MBUF (4096 * 64) #define MIN_NB_MBUF (4096 * 4) -#define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF) @@ -128,24 +121,11 @@ BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF) BUILD_ASSERT_DECL((MAX_NB_MBUF / ROUND_DOWN_POW2(MAX_NB_MBUF / MIN_NB_MBUF)) % MP_CACHE_SZ == 0); -#define SOCKET0 0 - -/* Default size of Physical NIC RXQ */ -#define NIC_PORT_DEFAULT_RXQ_SIZE 2048 -/* Default size of Physical NIC TXQ */ -#define NIC_PORT_DEFAULT_TXQ_SIZE 2048 - #define OVS_VHOST_MAX_QUEUE_NUM 1024 /* Maximum number of vHost TX queues. */ #define OVS_VHOST_QUEUE_MAP_UNKNOWN (-1) /* Mapping not initialized. */ #define OVS_VHOST_QUEUE_DISABLED (-2) /* Queue was disabled by guest and not * yet mapped to another queue. */ -#define DPDK_ETH_PORT_ID_INVALID RTE_MAX_ETHPORTS - -/* DPDK library uses uint16_t for port_id. */ -typedef uint16_t dpdk_port_t; -#define DPDK_PORT_ID_FMT "%"PRIu16 - /* Minimum amount of vhost tx retries, effectively a disable. */ #define VHOST_ENQ_RETRY_MIN 0 /* Maximum amount of vhost tx retries. */ @@ -160,7 +140,7 @@ typedef uint16_t dpdk_port_t; #define IF_NAME_SZ (PATH_MAX > IFNAMSIZ ? PATH_MAX : IFNAMSIZ) -static const struct rte_eth_conf port_conf = { +const struct rte_eth_conf port_conf = { .rxmode = { .offloads = 0, }, @@ -402,22 +382,6 @@ struct ingress_policer { rte_spinlock_t policer_lock; }; -enum dpdk_hw_ol_features { - NETDEV_RX_CHECKSUM_OFFLOAD = 1 << 0, - NETDEV_RX_HW_CRC_STRIP = 1 << 1, - NETDEV_RX_HW_SCATTER = 1 << 2, - NETDEV_TX_IPV4_CKSUM_OFFLOAD = 1 << 3, - NETDEV_TX_TCP_CKSUM_OFFLOAD = 1 << 4, - NETDEV_TX_UDP_CKSUM_OFFLOAD = 1 << 5, - NETDEV_TX_SCTP_CKSUM_OFFLOAD = 1 << 6, - NETDEV_TX_TSO_OFFLOAD = 1 << 7, - NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD = 1 << 8, - NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD = 1 << 9, - NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD = 1 << 10, - NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD = 1 << 11, - NETDEV_TX_GRE_TNL_TSO_OFFLOAD = 1 << 12, -}; - enum dpdk_rx_steer_flags { DPDK_RX_STEER_LACP = 1 << 0, }; @@ -447,151 +411,40 @@ enum dpdk_rx_steer_flags { * struct netdev *netdev = netdev_from_name(name); * struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); * - * Also, 'netdev' should be used instead of 'dev->up', where 'netdev' was - * already defined. + * Also, 'netdev' should be used instead of 'dev->common.up', + * where 'netdev' was already defined. */ struct netdev_dpdk { - PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, - dpdk_port_t port_id; - - /* If true, device was attached by rte_eth_dev_attach(). */ - bool attached; - /* If true, rte_eth_dev_start() was successfully called. */ - bool started; - /* If true, this is a port representor. */ - bool is_representor; - struct eth_addr hwaddr; - /* 1 pad bytes here. */ - int mtu; - int socket_id; - int buf_size; - int max_packet_len; - enum dpdk_dev_type type; - enum netdev_flags flags; - int link_reset_cnt; - union { - /* Device arguments for dpdk ports. */ - char *devargs; - /* Identifier used to distinguish vhost devices from each other. */ - char *vhost_id; - }; - struct dpdk_tx_queue *tx_q; - struct rte_eth_link link; - ); - - PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, - struct ovs_mutex mutex OVS_ACQ_AFTER(dpdk_mutex); - struct dpdk_mp *dpdk_mp; - - /* virtio identifier for vhost devices */ - ovsrcu_index vid; - - /* True if vHost device is 'up' and has been reconfigured at least once */ - bool vhost_reconfigured; - - atomic_uint8_t vhost_tx_retries_max; - - /* Flags for virtio features recovery mechanism. */ - uint8_t virtio_features_state; - - /* 1 pad byte here. */ - ); + struct netdev_dpdk_common common; - PADDED_MEMBERS(CACHE_LINE_SIZE, - struct netdev up; - /* In dpdk_list. */ - struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); - - /* QoS configuration and lock for the device */ - OVSRCU_TYPE(struct qos_conf *) qos_conf; - - /* Ingress Policer */ - OVSRCU_TYPE(struct ingress_policer *) ingress_policer; - uint32_t policer_rate; - uint32_t policer_burst; - - /* Array of vhost rxq states, see vring_state_changed. */ - bool *vhost_rxq_enabled; - - /* Ensures that Rx metadata delivery is configured only once. */ - bool rx_metadata_delivery_configured; - ); + enum dpdk_dev_type type; + int buf_size; - PADDED_MEMBERS(CACHE_LINE_SIZE, - struct netdev_stats stats; - struct netdev_dpdk_sw_stats *sw_stats; - /* Protects stats */ - rte_spinlock_t stats_lock; - /* 36 pad bytes here. */ - ); - - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* The following properties cannot be changed when a device is running, - * so we remember the request and update them next time - * netdev_dpdk*_reconfigure() is called */ - int requested_mtu; - int requested_n_txq; - /* User input for n_rxq (see dpdk_set_rxq_config). */ - int user_n_rxq; - /* user_n_rxq + an optional rx steering queue (see - * netdev_dpdk_reconfigure). This field is different from the other - * requested_* fields as it may contain a different value than the user - * input. */ - int requested_n_rxq; - int requested_rxq_size; - int requested_txq_size; - - /* Number of rx/tx descriptors for physical devices */ - int rxq_size; - int txq_size; - - /* Socket ID detected when vHost device is brought up */ - int requested_socket_id; - - /* Ignored by DPDK for vhost-user backends, only for VDUSE. */ - uint8_t vhost_max_queue_pairs; - - /* Denotes whether vHost port is client/server mode */ - uint64_t vhost_driver_flags; - - /* DPDK-ETH Flow control */ - struct rte_eth_fc_conf fc_conf; - - /* DPDK-ETH hardware offload features, - * from the enum set 'dpdk_hw_ol_features' */ - uint32_t hw_ol_features; - - /* Properties for link state change detection mode. - * If lsc_interrupt_mode is set to false, poll mode is used, - * otherwise interrupt mode is used. */ - bool requested_lsc_interrupt_mode; - bool lsc_interrupt_mode; - - /* VF configuration. */ - struct eth_addr requested_hwaddr; - - /* Requested rx queue steering flags, - * from the enum set 'dpdk_rx_steer_flags'. */ - uint64_t requested_rx_steer_flags; - uint64_t rx_steer_flags; - size_t rx_steer_flows_num; - struct rte_flow **rx_steer_flows; - ); - - PADDED_MEMBERS(CACHE_LINE_SIZE, - /* Names of all XSTATS counters */ - struct rte_eth_xstat_name *rte_xstats_names; - int rte_xstats_names_size; - int rte_xstats_ids_size; - uint64_t *rte_xstats_ids; - ); + /* vHost-specific fields */ + char *vhost_id; + ovsrcu_index vid; + bool vhost_reconfigured; + atomic_uint8_t vhost_tx_retries_max; + uint8_t virtio_features_state; + bool *vhost_rxq_enabled; + uint8_t vhost_max_queue_pairs; + uint64_t vhost_driver_flags; + + /* QoS fields */ + OVSRCU_TYPE(struct qos_conf *) qos_conf; + OVSRCU_TYPE(struct ingress_policer *) ingress_policer; + uint32_t policer_rate; + uint32_t policer_burst; + + /* Rx steering */ + uint64_t requested_rx_steer_flags; + uint64_t rx_steer_flags; + size_t rx_steer_flows_num; + struct rte_flow **rx_steer_flows; }; -struct netdev_rxq_dpdk { - struct netdev_rxq up; - dpdk_port_t port_id; -}; +BUILD_ASSERT_DECL(offsetof(struct netdev_dpdk, common) == 0); static void netdev_dpdk_destruct(struct netdev *netdev); static void netdev_dpdk_vhost_destruct(struct netdev *netdev); @@ -763,10 +616,14 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu) * + * + */ - n_mbufs = dev->requested_n_rxq * dev->requested_rxq_size - + dev->requested_n_txq * dev->requested_txq_size - + MIN(RTE_MAX_LCORE, dev->requested_n_rxq) * NETDEV_MAX_BURST - + MIN_NB_MBUF; + n_mbufs = dev->common.requested_n_rxq * + dev->common.requested_rxq_size + + dev->common.requested_n_txq * + dev->common.requested_txq_size + + MIN(RTE_MAX_LCORE, + dev->common.requested_n_rxq) * + NETDEV_MAX_BURST + + MIN_NB_MBUF; } return n_mbufs; @@ -776,8 +633,8 @@ static struct dpdk_mp * dpdk_mp_create(struct netdev_dpdk *dev, int mtu) { char mp_name[RTE_MEMPOOL_NAMESIZE]; - const char *netdev_name = netdev_get_name(&dev->up); - int socket_id = dev->requested_socket_id; + const char *netdev_name = netdev_get_name(&dev->common.up); + int socket_id = dev->common.requested_socket_id; uint32_t n_mbufs = 0; uint32_t mbuf_size = 0; uint32_t aligned_mbuf_size = 0; @@ -823,7 +680,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) "on socket %d for %d Rx and %d Tx queues, " "cache line size of %u", netdev_name, n_mbufs, mbuf_size, socket_id, - dev->requested_n_rxq, dev->requested_n_txq, + dev->common.requested_n_rxq, dev->common.requested_n_txq, RTE_CACHE_LINE_SIZE); /* The size of the mbuf's private area (i.e. area that holds OvS' @@ -895,10 +752,10 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu) if (!per_port_memory) { /* If user has provided defined mempools, check if one is suitable * and get new buffer size.*/ - mtu = dpdk_get_user_adjusted_mtu(mtu, dev->requested_mtu, - dev->requested_socket_id); + mtu = dpdk_get_user_adjusted_mtu(mtu, dev->common.requested_mtu, + dev->common.requested_socket_id); LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { - if (dmp->socket_id == dev->requested_socket_id + if (dmp->socket_id == dev->common.requested_socket_id && dmp->mtu == mtu) { VLOG_DBG("Reusing mempool \"%s\"", dmp->mp->name); dmp->refcount++; @@ -961,17 +818,17 @@ dpdk_mp_put(struct dpdk_mp *dmp) * On error, device will be left unchanged. */ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { - uint32_t buf_size = dpdk_buf_size(dev->requested_mtu); + uint32_t buf_size = dpdk_buf_size(dev->common.requested_mtu); struct dpdk_mp *dmp; int ret = 0; /* With shared memory we do not need to configure a mempool if the MTU * and socket ID have not changed, the previous configuration is still * valid so return 0 */ - if (!per_port_memory && dev->mtu == dev->requested_mtu - && dev->socket_id == dev->requested_socket_id) { + if (!per_port_memory && dev->common.mtu == dev->common.requested_mtu + && dev->common.socket_id == dev->common.requested_socket_id) { return ret; } @@ -979,14 +836,16 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) if (!dmp) { VLOG_ERR("Failed to create memory pool for netdev " "%s, with MTU %d on socket %d: %s\n", - dev->up.name, dev->requested_mtu, dev->requested_socket_id, + dev->common.up.name, + dev->common.requested_mtu, + dev->common.requested_socket_id, rte_strerror(rte_errno)); ret = rte_errno; } else { /* Check for any pre-existing dpdk_mp for the device before accessing * the associated mempool. */ - if (dev->dpdk_mp != NULL) { + if (dev->common.dpdk_mp != NULL) { /* A new MTU was requested, decrement the reference count for the * devices current dpdk_mp. This is required even if a pointer to * same dpdk_mp is returned by dpdk_mp_get. The refcount for dmp @@ -994,12 +853,12 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) * must be decremented to keep an accurate refcount for the * dpdk_mp. */ - dpdk_mp_put(dev->dpdk_mp); + dpdk_mp_put(dev->common.dpdk_mp); } - dev->dpdk_mp = dmp; - dev->mtu = dev->requested_mtu; - dev->socket_id = dev->requested_socket_id; - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); + dev->common.dpdk_mp = dmp; + dev->common.mtu = dev->common.requested_mtu; + dev->common.socket_id = dev->common.requested_socket_id; + dev->common.max_packet_len = MTU_TO_FRAME_LEN(dev->common.mtu); } return ret; @@ -1010,27 +869,29 @@ check_link_status(struct netdev_dpdk *dev) { struct rte_eth_link link; - if (rte_eth_link_get_nowait(dev->port_id, &link) < 0) { + if (rte_eth_link_get_nowait(dev->common.port_id, &link) < 0) { VLOG_DBG_RL(&rl, "Failed to retrieve link status for port "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); return; } - if (dev->link.link_status != link.link_status) { - netdev_change_seq_changed(&dev->up); + if (dev->common.link.link_status != link.link_status) { + netdev_change_seq_changed(&dev->common.up); - dev->link_reset_cnt++; - dev->link = link; - if (dev->link.link_status) { + dev->common.link_reset_cnt++; + dev->common.link = link; + if (dev->common.link.link_status) { VLOG_DBG_RL(&rl, "Port "DPDK_PORT_ID_FMT" Link Up - speed %u Mbps - %s", - dev->port_id, (unsigned) dev->link.link_speed, - (dev->link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) + dev->common.port_id, + (unsigned) dev->common.link.link_speed, + (dev->common.link.link_duplex == + RTE_ETH_LINK_FULL_DUPLEX) ? "full-duplex" : "half-duplex"); } else { VLOG_DBG_RL(&rl, "Port "DPDK_PORT_ID_FMT" Link Down", - dev->port_id); + dev->common.port_id); } } } @@ -1044,12 +905,12 @@ dpdk_watchdog(void *dummy OVS_UNUSED) for (;;) { ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + ovs_mutex_lock(&dev->common.mutex); if (dev->type == DPDK_DEV_ETH) { check_link_status(dev); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); xsleep(DPDK_PORT_WATCHDOG_INTERVAL); @@ -1062,11 +923,11 @@ static void netdev_dpdk_update_netdev_flag(struct netdev_dpdk *dev, enum dpdk_hw_ol_features hw_ol_features, enum netdev_ol_flags flag) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { - struct netdev *netdev = &dev->up; + struct netdev *netdev = &dev->common.up; - if (dev->hw_ol_features & hw_ol_features) { + if (dev->common.hw_ol_features & hw_ol_features) { netdev->ol_flags |= flag; } else { netdev->ol_flags &= ~flag; @@ -1075,7 +936,7 @@ netdev_dpdk_update_netdev_flag(struct netdev_dpdk *dev, static void netdev_dpdk_update_netdev_flags(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_IPV4_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_IPV4_CKSUM); @@ -1113,60 +974,60 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, * scatter to support jumbo RX. * Setting scatter for the device is done after checking for * scatter support in the device capabilites. */ - if (dev->mtu > RTE_ETHER_MTU) { - if (dev->hw_ol_features & NETDEV_RX_HW_SCATTER) { + if (dev->common.mtu > RTE_ETHER_MTU) { + if (dev->common.hw_ol_features & NETDEV_RX_HW_SCATTER) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_SCATTER; } } - conf.intr_conf.lsc = dev->lsc_interrupt_mode; + conf.intr_conf.lsc = dev->common.lsc_interrupt_mode; - if (dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_CHECKSUM; } - if (!(dev->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) + if (!(dev->common.hw_ol_features & NETDEV_RX_HW_CRC_STRIP) && info->rx_offload_capa & RTE_ETH_RX_OFFLOAD_KEEP_CRC) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; } - if (dev->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; } - if (dev->hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_CKSUM; } - if (dev->hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_UDP_CKSUM; } - if (dev->hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; } - if (dev->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO; } - if (dev->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; } - if (dev->hw_ol_features & NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; } - if (dev->hw_ol_features & NETDEV_TX_GRE_TNL_TSO_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_GRE_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO; } - if (dev->hw_ol_features & NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM; } - if (dev->hw_ol_features & NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD) { + if (dev->common.hw_ol_features & NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; } @@ -1189,36 +1050,38 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); } - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); + diag = rte_eth_dev_configure(dev->common.port_id, n_rxq, n_txq, &conf); if (diag) { VLOG_WARN("Interface %s eth_dev setup error %s\n", - dev->up.name, rte_strerror(-diag)); + dev->common.up.name, rte_strerror(-diag)); break; } - diag = rte_eth_dev_set_mtu(dev->port_id, dev->mtu); + diag = rte_eth_dev_set_mtu(dev->common.port_id, dev->common.mtu); if (diag) { /* A device may not support rte_eth_dev_set_mtu, in this case * flag a warning to the user and include the devices configured * MTU value that will be used instead. */ if (-ENOTSUP == diag) { - rte_eth_dev_get_mtu(dev->port_id, &conf_mtu); + rte_eth_dev_get_mtu(dev->common.port_id, &conf_mtu); VLOG_WARN("Interface %s does not support MTU configuration, " "max packet size supported is %"PRIu16".", - dev->up.name, conf_mtu); + dev->common.up.name, conf_mtu); } else { VLOG_ERR("Interface %s MTU (%d) setup error: %s", - dev->up.name, dev->mtu, rte_strerror(-diag)); + dev->common.up.name, dev->common.mtu, + rte_strerror(-diag)); break; } } for (i = 0; i < n_txq; i++) { - diag = rte_eth_tx_queue_setup(dev->port_id, i, dev->txq_size, - dev->socket_id, NULL); + diag = rte_eth_tx_queue_setup(dev->common.port_id, + i, dev->common.txq_size, + dev->common.socket_id, NULL); if (diag) { VLOG_INFO("Interface %s unable to setup txq(%d): %s", - dev->up.name, i, rte_strerror(-diag)); + dev->common.up.name, i, rte_strerror(-diag)); break; } } @@ -1230,12 +1093,13 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, } for (i = 0; i < n_rxq; i++) { - diag = rte_eth_rx_queue_setup(dev->port_id, i, dev->rxq_size, - dev->socket_id, NULL, - dev->dpdk_mp->mp); + diag = rte_eth_rx_queue_setup(dev->common.port_id, i, + dev->common.rxq_size, + dev->common.socket_id, NULL, + dev->common.dpdk_mp->mp); if (diag) { VLOG_INFO("Interface %s unable to setup rxq(%d): %s", - dev->up.name, i, rte_strerror(-diag)); + dev->common.up.name, i, rte_strerror(-diag)); break; } } @@ -1246,8 +1110,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, continue; } - dev->up.n_rxq = n_rxq; - dev->up.n_txq = n_txq; + dev->common.up.n_rxq = n_rxq; + dev->common.up.n_txq = n_txq; return 0; } @@ -1256,11 +1120,12 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, } static void -dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) OVS_REQUIRES(dev->mutex) +dpdk_eth_flow_ctrl_setup(struct netdev_dpdk *dev) + OVS_REQUIRES(dev->common.mutex) { - if (rte_eth_dev_flow_ctrl_set(dev->port_id, &dev->fc_conf)) { + if (rte_eth_dev_flow_ctrl_set(dev->common.port_id, &dev->common.fc_conf)) { VLOG_WARN("Failed to enable flow control on device "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); } } @@ -1270,7 +1135,7 @@ dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev) uint64_t rx_metadata = 0; int ret; - if (dev->rx_metadata_delivery_configured) { + if (dev->common.rx_metadata_delivery_configured) { return; } @@ -1282,30 +1147,30 @@ dpdk_eth_dev_init_rx_metadata(struct netdev_dpdk *dev) rx_metadata |= RTE_ETH_RX_METADATA_TUNNEL_ID; #endif /* ALLOW_EXPERIMENTAL_API */ - ret = rte_eth_rx_metadata_negotiate(dev->port_id, &rx_metadata); + ret = rte_eth_rx_metadata_negotiate(dev->common.port_id, &rx_metadata); if (ret == 0) { if (!(rx_metadata & RTE_ETH_RX_METADATA_USER_MARK)) { VLOG_DBG("%s: The NIC will not provide per-packet USER_MARK", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } #ifdef ALLOW_EXPERIMENTAL_API if (!(rx_metadata & RTE_ETH_RX_METADATA_TUNNEL_ID)) { VLOG_DBG("%s: The NIC will not provide per-packet TUNNEL_ID", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } #endif /* ALLOW_EXPERIMENTAL_API */ } else { VLOG(ret == -ENOTSUP ? VLL_DBG : VLL_WARN, "%s: Cannot negotiate Rx metadata: %s", - netdev_get_name(&dev->up), rte_strerror(-ret)); + netdev_get_name(&dev->common.up), rte_strerror(-ret)); } - dev->rx_metadata_delivery_configured = true; + dev->common.rx_metadata_delivery_configured = true; } static int dpdk_eth_dev_init(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { struct rte_pktmbuf_pool_private *mbp_priv; struct rte_eth_dev_info info; @@ -1328,142 +1193,143 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dpdk_eth_dev_init_rx_metadata(dev); } - diag = rte_eth_dev_info_get(dev->port_id, &info); + diag = rte_eth_dev_info_get(dev->common.port_id, &info); if (diag < 0) { VLOG_ERR("Interface %s rte_eth_dev_info_get error: %s", - dev->up.name, rte_strerror(-diag)); + dev->common.up.name, rte_strerror(-diag)); return -diag; } - dev->is_representor = !!(*info.dev_flags & RTE_ETH_DEV_REPRESENTOR); + dev->common.is_representor = !!(*info.dev_flags & RTE_ETH_DEV_REPRESENTOR); if (strstr(info.driver_name, "vf") != NULL) { VLOG_INFO("Virtual function detected, HW_CRC_STRIP will be enabled"); - dev->hw_ol_features |= NETDEV_RX_HW_CRC_STRIP; + dev->common.hw_ol_features |= NETDEV_RX_HW_CRC_STRIP; } else { - dev->hw_ol_features &= ~NETDEV_RX_HW_CRC_STRIP; + dev->common.hw_ol_features &= ~NETDEV_RX_HW_CRC_STRIP; } if ((info.rx_offload_capa & rx_chksm_offload_capa) != rx_chksm_offload_capa) { VLOG_WARN("Rx checksum offload is not supported on port " - DPDK_PORT_ID_FMT, dev->port_id); - dev->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; + DPDK_PORT_ID_FMT, dev->common.port_id); + dev->common.hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; } else { - dev->hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD; } if (info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_SCATTER) { - dev->hw_ol_features |= NETDEV_RX_HW_SCATTER; + dev->common.hw_ol_features |= NETDEV_RX_HW_SCATTER; } else { /* Do not warn on lack of scatter support */ - dev->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; + dev->common.hw_ol_features &= ~NETDEV_RX_HW_SCATTER; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM) { - dev->hw_ol_features |= NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; } else { - dev->hw_ol_features &= ~NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; } - dev->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + dev->common.hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; if (userspace_tso_enabled()) { if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_TSO) { - dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx TSO offload is not supported.", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO) { - dev->hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx Vxlan tunnel TSO offload is not supported.", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO) { - dev->hw_ol_features |= NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx Geneve tunnel TSO offload is not supported.", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO) { - dev->hw_ol_features |= NETDEV_TX_GRE_TNL_TSO_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_GRE_TNL_TSO_OFFLOAD; } else { VLOG_WARN("%s: Tx GRE tunnel TSO offload is not supported.", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } } - n_rxq = MIN(info.max_rx_queues, dev->up.n_rxq); - n_txq = MIN(info.max_tx_queues, dev->up.n_txq); + n_rxq = MIN(info.max_rx_queues, dev->common.up.n_rxq); + n_txq = MIN(info.max_tx_queues, dev->common.up.n_txq); diag = dpdk_eth_dev_port_config(dev, &info, n_rxq, n_txq); if (diag) { VLOG_ERR("Interface %s(rxq:%d txq:%d lsc interrupt mode:%s) " "configure error: %s", - dev->up.name, n_rxq, n_txq, - dev->lsc_interrupt_mode ? "true" : "false", + dev->common.up.name, n_rxq, n_txq, + dev->common.lsc_interrupt_mode ? "true" : "false", rte_strerror(-diag)); return -diag; } - diag = rte_eth_dev_start(dev->port_id); + diag = rte_eth_dev_start(dev->common.port_id); if (diag) { - VLOG_ERR("Interface %s start error: %s", dev->up.name, + VLOG_ERR("Interface %s start error: %s", dev->common.up.name, rte_strerror(-diag)); return -diag; } - dev->started = true; + dev->common.started = true; netdev_dpdk_configure_xstats(dev); - rte_eth_promiscuous_enable(dev->port_id); - rte_eth_allmulticast_enable(dev->port_id); + rte_eth_promiscuous_enable(dev->common.port_id); + rte_eth_allmulticast_enable(dev->common.port_id); memset(ð_addr, 0x0, sizeof(eth_addr)); - rte_eth_macaddr_get(dev->port_id, ð_addr); + rte_eth_macaddr_get(dev->common.port_id, ð_addr); VLOG_INFO_RL(&rl, "Port "DPDK_PORT_ID_FMT": "ETH_ADDR_FMT, - dev->port_id, ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes)); + dev->common.port_id, + ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes)); - memcpy(dev->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN); - if (rte_eth_link_get_nowait(dev->port_id, &dev->link) < 0) { - memset(&dev->link, 0, sizeof dev->link); + memcpy(dev->common.hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN); + if (rte_eth_link_get_nowait(dev->common.port_id, &dev->common.link) < 0) { + memset(&dev->common.link, 0, sizeof dev->common.link); } - mbp_priv = rte_mempool_get_priv(dev->dpdk_mp->mp); + mbp_priv = rte_mempool_get_priv(dev->common.dpdk_mp->mp); dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM; return 0; } @@ -1471,7 +1337,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) static struct netdev_dpdk * netdev_dpdk_cast(const struct netdev *netdev) { - return CONTAINER_OF(netdev, struct netdev_dpdk, up); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); + + return CONTAINER_OF(common, struct netdev_dpdk, common); } static struct netdev * @@ -1481,7 +1349,7 @@ netdev_dpdk_alloc(void) dev = dpdk_rte_mzalloc(sizeof *dev); if (dev) { - return &dev->up; + return &dev->common.up; } return NULL; @@ -1512,26 +1380,26 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_init(&dev->mutex); + ovs_mutex_init(&dev->common.mutex); - rte_spinlock_init(&dev->stats_lock); + rte_spinlock_init(&dev->common.stats_lock); /* If the 'sid' is negative, it means that the kernel fails * to obtain the pci numa info. In that situation, always * use 'SOCKET0'. */ - dev->socket_id = socket_id < 0 ? SOCKET0 : socket_id; - dev->requested_socket_id = dev->socket_id; - dev->port_id = port_no; + dev->common.socket_id = socket_id < 0 ? SOCKET0 : socket_id; + dev->common.requested_socket_id = dev->common.socket_id; + dev->common.port_id = port_no; dev->type = type; - dev->flags = 0; - dev->requested_mtu = RTE_ETHER_MTU; - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); - dev->requested_lsc_interrupt_mode = 0; + dev->common.flags = 0; + dev->common.requested_mtu = RTE_ETHER_MTU; + dev->common.max_packet_len = MTU_TO_FRAME_LEN(dev->common.mtu); + dev->common.requested_lsc_interrupt_mode = 0; ovsrcu_index_init(&dev->vid, -1); dev->vhost_reconfigured = false; dev->virtio_features_state = OVS_VIRTIO_F_CLEAN; - dev->attached = false; - dev->started = false; + dev->common.attached = false; + dev->common.started = false; ovsrcu_init(&dev->qos_conf, NULL); @@ -1541,38 +1409,39 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, netdev->n_rxq = 0; netdev->n_txq = 0; - dev->user_n_rxq = NR_QUEUE; - dev->requested_n_rxq = NR_QUEUE; - dev->requested_n_txq = NR_QUEUE; - dev->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; - dev->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; + dev->common.user_n_rxq = NR_QUEUE; + dev->common.requested_n_rxq = NR_QUEUE; + dev->common.requested_n_txq = NR_QUEUE; + dev->common.requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; + dev->common.requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; dev->requested_rx_steer_flags = 0; dev->rx_steer_flags = 0; dev->rx_steer_flows_num = 0; dev->rx_steer_flows = NULL; /* Initialize the flow control to NULL */ - memset(&dev->fc_conf, 0, sizeof dev->fc_conf); + memset(&dev->common.fc_conf, 0, sizeof dev->common.fc_conf); /* Initilize the hardware offload flags to 0 */ - dev->hw_ol_features = 0; + dev->common.hw_ol_features = 0; - dev->rx_metadata_delivery_configured = false; + dev->common.rx_metadata_delivery_configured = false; - dev->flags = NETDEV_UP | NETDEV_PROMISC; + dev->common.flags = NETDEV_UP | NETDEV_PROMISC; - ovs_list_push_back(&dpdk_list, &dev->list_node); + ovs_list_push_back(&dpdk_list, &dev->common.list_node); netdev_request_reconfigure(netdev); - dev->rte_xstats_names = NULL; - dev->rte_xstats_names_size = 0; + dev->common.rte_xstats_names = NULL; + dev->common.rte_xstats_names_size = 0; - dev->rte_xstats_ids = NULL; - dev->rte_xstats_ids_size = 0; + dev->common.rte_xstats_ids = NULL; + dev->common.rte_xstats_ids_size = 0; - dev->sw_stats = xzalloc(sizeof *dev->sw_stats); - dev->sw_stats->tx_retries = (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX; + dev->common.sw_stats = xzalloc(sizeof *dev->common.sw_stats); + dev->common.sw_stats->tx_retries = + (dev->type == DPDK_DEV_VHOST) ? 0 : UINT64_MAX; return 0; } @@ -1589,8 +1458,8 @@ vhost_common_construct(struct netdev *netdev) if (!dev->vhost_rxq_enabled) { return ENOMEM; } - dev->tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM); - if (!dev->tx_q) { + dev->common.tx_q = netdev_dpdk_alloc_txq(OVS_VHOST_MAX_QUEUE_NUM); + if (!dev->common.tx_q) { rte_free(dev->vhost_rxq_enabled); return ENOMEM; } @@ -1716,16 +1585,16 @@ netdev_dpdk_construct(struct netdev *netdev) static void common_destruct(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) - OVS_EXCLUDED(dev->mutex) + OVS_EXCLUDED(dev->common.mutex) { - rte_free(dev->tx_q); - dpdk_mp_put(dev->dpdk_mp); + rte_free(dev->common.tx_q); + dpdk_mp_put(dev->common.dpdk_mp); - ovs_list_remove(&dev->list_node); + ovs_list_remove(&dev->common.list_node); free(ovsrcu_get_protected(struct ingress_policer *, &dev->ingress_policer)); - free(dev->sw_stats); - ovs_mutex_destroy(&dev->mutex); + free(dev->common.sw_stats); + ovs_mutex_destroy(&dev->common.mutex); } static void dpdk_rx_steer_unconfigure(struct netdev_dpdk *); @@ -1740,10 +1609,10 @@ netdev_dpdk_destruct(struct netdev *netdev) /* Destroy any rx-steering flows to allow RXQs to be removed. */ dpdk_rx_steer_unconfigure(dev); - rte_eth_dev_stop(dev->port_id); - dev->started = false; + rte_eth_dev_stop(dev->common.port_id); + dev->common.started = false; - if (dev->attached) { + if (dev->common.attached) { bool dpdk_resources_still_used = false; struct rte_eth_dev_info dev_info; dpdk_port_t sibling_port_id; @@ -1751,16 +1620,16 @@ netdev_dpdk_destruct(struct netdev *netdev) /* Check if this netdev has siblings (i.e. shares DPDK resources) among * other OVS netdevs. */ - RTE_ETH_FOREACH_DEV_SIBLING (sibling_port_id, dev->port_id) { + RTE_ETH_FOREACH_DEV_SIBLING (sibling_port_id, dev->common.port_id) { struct netdev_dpdk *sibling; - /* RTE_ETH_FOREACH_DEV_SIBLING lists dev->port_id as part of the - * loop. */ - if (sibling_port_id == dev->port_id) { + /* RTE_ETH_FOREACH_DEV_SIBLING lists dev->common.port_id + * as part of the loop. */ + if (sibling_port_id == dev->common.port_id) { continue; } - LIST_FOR_EACH (sibling, list_node, &dpdk_list) { - if (sibling->port_id != sibling_port_id) { + LIST_FOR_EACH (sibling, common.list_node, &dpdk_list) { + if (sibling->common.port_id != sibling_port_id) { continue; } dpdk_resources_still_used = true; @@ -1772,10 +1641,10 @@ netdev_dpdk_destruct(struct netdev *netdev) } /* Retrieve eth device data before closing it. */ - diag = rte_eth_dev_info_get(dev->port_id, &dev_info); + diag = rte_eth_dev_info_get(dev->common.port_id, &dev_info); /* Remove the eth device. */ - rte_eth_dev_close(dev->port_id); + rte_eth_dev_close(dev->common.port_id); /* Remove the rte device if no associated eth device is used by OVS. * Note: any remaining eth devices associated to this rte device are @@ -1787,33 +1656,33 @@ netdev_dpdk_destruct(struct netdev *netdev) if (diag < 0) { VLOG_ERR("Device '%s' can not be detached: %s.", - dev->devargs, rte_strerror(-diag)); + dev->common.devargs, rte_strerror(-diag)); } else { /* Device was closed and detached. */ VLOG_INFO("Device '%s' has been removed and detached", - dev->devargs); + dev->common.devargs); } } else { /* Device was only closed. rte_dev_remove() was not called. */ - VLOG_INFO("Device '%s' has been removed", dev->devargs); + VLOG_INFO("Device '%s' has been removed", dev->common.devargs); } } netdev_dpdk_clear_xstats(dev); - free(dev->devargs); + free(dev->common.devargs); common_destruct(dev); ovs_mutex_unlock(&dpdk_mutex); } /* rte_vhost_driver_unregister() can call back destroy_device(), which will - * try to acquire 'dpdk_mutex' and possibly 'dev->mutex'. To avoid a + * try to acquire 'dpdk_mutex' and possibly 'dev->common.mutex'. To avoid a * deadlock, none of the mutexes must be held while calling this function. */ static int dpdk_vhost_driver_unregister(struct netdev_dpdk *dev OVS_UNUSED, char *vhost_id) OVS_EXCLUDED(dpdk_mutex) - OVS_EXCLUDED(dev->mutex) + OVS_EXCLUDED(dev->common.mutex) { return rte_vhost_driver_unregister(vhost_id); } @@ -1870,24 +1739,24 @@ netdev_dpdk_dealloc(struct netdev *netdev) static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { - free(dev->rte_xstats_names); - dev->rte_xstats_names = NULL; - dev->rte_xstats_names_size = 0; - free(dev->rte_xstats_ids); - dev->rte_xstats_ids = NULL; - dev->rte_xstats_ids_size = 0; + free(dev->common.rte_xstats_names); + dev->common.rte_xstats_names = NULL; + dev->common.rte_xstats_names_size = 0; + free(dev->common.rte_xstats_ids); + dev->common.rte_xstats_ids = NULL; + dev->common.rte_xstats_ids_size = 0; } static const char * netdev_dpdk_get_xstat_name(struct netdev_dpdk *dev, uint64_t id) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { - if (id >= dev->rte_xstats_names_size) { + if (id >= dev->common.rte_xstats_names_size) { return "UNKNOWN"; } - return dev->rte_xstats_names[id].name; + return dev->common.rte_xstats_names[id].name; } static bool @@ -1902,7 +1771,7 @@ is_queue_stat(const char *s) static void netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { struct rte_eth_xstat_name *rte_xstats_names = NULL; struct rte_eth_xstat *rte_xstats = NULL; @@ -1913,39 +1782,40 @@ netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) netdev_dpdk_clear_xstats(dev); - rte_xstats_names_size = rte_eth_xstats_get_names(dev->port_id, NULL, 0); + rte_xstats_names_size = + rte_eth_xstats_get_names(dev->common.port_id, NULL, 0); if (rte_xstats_names_size < 0) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); goto out; } rte_xstats_names = xcalloc(rte_xstats_names_size, sizeof *rte_xstats_names); - rte_xstats_len = rte_eth_xstats_get_names(dev->port_id, + rte_xstats_len = rte_eth_xstats_get_names(dev->common.port_id, rte_xstats_names, rte_xstats_names_size); if (rte_xstats_len < 0 || rte_xstats_len != rte_xstats_names_size) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); goto out; } rte_xstats = xcalloc(rte_xstats_names_size, sizeof *rte_xstats); - rte_xstats_len = rte_eth_xstats_get(dev->port_id, rte_xstats, + rte_xstats_len = rte_eth_xstats_get(dev->common.port_id, rte_xstats, rte_xstats_names_size); if (rte_xstats_len < 0 || rte_xstats_len != rte_xstats_names_size) { VLOG_WARN("Cannot get XSTATS for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); goto out; } - dev->rte_xstats_names = rte_xstats_names; + dev->common.rte_xstats_names = rte_xstats_names; rte_xstats_names = NULL; - dev->rte_xstats_names_size = rte_xstats_names_size; + dev->common.rte_xstats_names_size = rte_xstats_names_size; - dev->rte_xstats_ids = xcalloc(rte_xstats_names_size, - sizeof *dev->rte_xstats_ids); + dev->common.rte_xstats_ids = xcalloc(rte_xstats_names_size, + sizeof *dev->common.rte_xstats_ids); for (unsigned int i = 0; i < rte_xstats_names_size; i++) { id = rte_xstats[i].id; name = netdev_dpdk_get_xstat_name(dev, id); @@ -1957,8 +1827,8 @@ netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) strstr(name, "_management_") || string_ends_with(name, "_dropped")) { - dev->rte_xstats_ids[dev->rte_xstats_ids_size] = id; - dev->rte_xstats_ids_size++; + dev->common.rte_xstats_ids[dev->common.rte_xstats_ids_size] = id; + dev->common.rte_xstats_ids_size++; } } @@ -1972,44 +1842,44 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); - if (dev->devargs && dev->devargs[0]) { - smap_add_format(args, "dpdk-devargs", "%s", dev->devargs); + if (dev->common.devargs && dev->common.devargs[0]) { + smap_add_format(args, "dpdk-devargs", "%s", dev->common.devargs); } - smap_add_format(args, "n_rxq", "%d", dev->user_n_rxq); + smap_add_format(args, "n_rxq", "%d", dev->common.user_n_rxq); - if (dev->fc_conf.mode == RTE_ETH_FC_TX_PAUSE || - dev->fc_conf.mode == RTE_ETH_FC_FULL) { + if (dev->common.fc_conf.mode == RTE_ETH_FC_TX_PAUSE || + dev->common.fc_conf.mode == RTE_ETH_FC_FULL) { smap_add(args, "rx-flow-ctrl", "true"); } - if (dev->fc_conf.mode == RTE_ETH_FC_RX_PAUSE || - dev->fc_conf.mode == RTE_ETH_FC_FULL) { + if (dev->common.fc_conf.mode == RTE_ETH_FC_RX_PAUSE || + dev->common.fc_conf.mode == RTE_ETH_FC_FULL) { smap_add(args, "tx-flow-ctrl", "true"); } - if (dev->fc_conf.autoneg) { + if (dev->common.fc_conf.autoneg) { smap_add(args, "flow-ctrl-autoneg", "true"); } - smap_add_format(args, "n_rxq_desc", "%d", dev->rxq_size); - smap_add_format(args, "n_txq_desc", "%d", dev->txq_size); + smap_add_format(args, "n_rxq_desc", "%d", dev->common.rxq_size); + smap_add_format(args, "n_txq_desc", "%d", dev->common.txq_size); if (dev->rx_steer_flags == DPDK_RX_STEER_LACP) { smap_add(args, "rx-steering", "rss+lacp"); } smap_add(args, "dpdk-lsc-interrupt", - dev->lsc_interrupt_mode ? "true" : "false"); + dev->common.lsc_interrupt_mode ? "true" : "false"); - if (dev->is_representor) { + if (dev->common.is_representor) { smap_add_format(args, "dpdk-vf-mac", ETH_ADDR_FMT, - ETH_ADDR_ARGS(dev->requested_hwaddr)); + ETH_ADDR_ARGS(dev->common.requested_hwaddr)); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -2020,8 +1890,8 @@ netdev_dpdk_lookup_by_port_id(dpdk_port_t port_id) { struct netdev_dpdk *dev; - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - if (dev->port_id == port_id) { + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + if (dev->common.port_id == port_id) { return dev; } } @@ -2099,7 +1969,7 @@ netdev_dpdk_process_devargs(struct netdev_dpdk *dev, new_port_id = netdev_dpdk_get_port_by_devargs(devargs); if (rte_eth_dev_is_valid_port(new_port_id)) { /* Attach successful */ - dev->attached = true; + dev->common.attached = true; VLOG_INFO("Device '%s' attached to DPDK", devargs); } else { /* Attach unsuccessful */ @@ -2155,11 +2025,11 @@ netdev_dpdk_run(const struct netdev_class *netdev_class OVS_UNUSED) ovs_mutex_lock(&dpdk_mutex); dev = netdev_dpdk_lookup_by_port_id(port_id); if (dev) { - ovs_mutex_lock(&dev->mutex); - netdev_request_reconfigure(&dev->up); + ovs_mutex_lock(&dev->common.mutex); + netdev_request_reconfigure(&dev->common.up); VLOG_DBG_RL(&rl, "%s: Device reset requested.", - netdev_get_name(&dev->up)); - ovs_mutex_unlock(&dev->mutex); + netdev_get_name(&dev->common.up)); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); } @@ -2185,14 +2055,14 @@ dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, static void dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { int new_n_rxq; new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); - if (new_n_rxq != dev->user_n_rxq) { - dev->user_n_rxq = new_n_rxq; - netdev_request_reconfigure(&dev->up); + if (new_n_rxq != dev->common.user_n_rxq) { + dev->common.user_n_rxq = new_n_rxq; + netdev_request_reconfigure(&dev->common.up); } } @@ -2209,14 +2079,14 @@ dpdk_process_queue_size(struct netdev *netdev, const struct smap *args, if (is_rx) { default_size = NIC_PORT_DEFAULT_RXQ_SIZE; new_requested_size = smap_get_int(args, "n_rxq_desc", default_size); - cur_size = dev->rxq_size; - cur_requested_size = &dev->requested_rxq_size; + cur_size = dev->common.rxq_size; + cur_requested_size = &dev->common.requested_rxq_size; lim = info ? &info->rx_desc_lim : NULL; } else { default_size = NIC_PORT_DEFAULT_TXQ_SIZE; new_requested_size = smap_get_int(args, "n_txq_desc", default_size); - cur_size = dev->txq_size; - cur_requested_size = &dev->requested_txq_size; + cur_size = dev->common.txq_size; + cur_requested_size = &dev->common.requested_txq_size; lim = info ? &info->tx_desc_lim : NULL; } @@ -2302,7 +2172,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, int err = 0; ovs_mutex_lock(&dpdk_mutex); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); dpdk_set_rx_steer_config(netdev, dev, args, errp); @@ -2310,7 +2180,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, new_devargs = smap_get(args, "dpdk-devargs"); - if (dev->devargs && new_devargs && strcmp(new_devargs, dev->devargs)) { + if (dev->common.devargs && new_devargs && + strcmp(new_devargs, dev->common.devargs)) { /* The user requested a new device. If we return error, the caller * will delete this netdev and try to recreate it. */ err = EAGAIN; @@ -2321,14 +2192,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, if (new_devargs && new_devargs[0]) { /* Don't process dpdk-devargs if value is unchanged and port id * is valid */ - if (!(dev->devargs && !strcmp(dev->devargs, new_devargs) - && rte_eth_dev_is_valid_port(dev->port_id))) { + if (!(dev->common.devargs && !strcmp(dev->common.devargs, new_devargs) + && rte_eth_dev_is_valid_port(dev->common.port_id))) { dpdk_port_t new_port_id = netdev_dpdk_process_devargs(dev, new_devargs, errp); if (!rte_eth_dev_is_valid_port(new_port_id)) { err = EINVAL; - } else if (new_port_id == dev->port_id) { + } else if (new_port_id == dev->common.port_id) { /* Already configured, do not reconfigure again */ err = 0; } else { @@ -2339,15 +2210,15 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' " "which is already in use by '%s'", netdev_get_name(netdev), new_devargs, - netdev_get_name(&dup_dev->up)); + netdev_get_name(&dup_dev->common.up)); err = EADDRINUSE; } else { int sid = rte_eth_dev_socket_id(new_port_id); - dev->requested_socket_id = sid < 0 ? SOCKET0 : sid; - dev->devargs = xstrdup(new_devargs); - dev->port_id = new_port_id; - netdev_request_reconfigure(&dev->up); + dev->common.requested_socket_id = sid < 0 ? SOCKET0 : sid; + dev->common.devargs = xstrdup(new_devargs); + dev->common.port_id = new_port_id; + netdev_request_reconfigure(&dev->common.up); err = 0; } } @@ -2363,7 +2234,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, goto out; } - err = -rte_eth_dev_info_get(dev->port_id, &info); + err = -rte_eth_dev_info_get(dev->common.port_id, &info); if (err) { VLOG_WARN_BUF(errp, "%s: Failed to get device info: %s" , netdev_get_name(netdev), rte_strerror(err)); @@ -2377,7 +2248,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, if (vf_mac) { struct eth_addr mac; - if (!dev->is_representor) { + if (!dev->common.is_representor) { VLOG_WARN("'%s' is trying to set the VF MAC '%s' " "but 'options:dpdk-vf-mac' is only supported for " "VF representors.", @@ -2388,8 +2259,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } else if (eth_addr_is_multicast(mac)) { VLOG_WARN("interface '%s': cannot set VF MAC to multicast " "address '%s'.", netdev_get_name(netdev), vf_mac); - } else if (!eth_addr_equals(dev->requested_hwaddr, mac)) { - dev->requested_hwaddr = mac; + } else if (!eth_addr_equals(dev->common.requested_hwaddr, mac)) { + dev->common.requested_hwaddr = mac; netdev_request_reconfigure(netdev); } } @@ -2406,8 +2277,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, netdev_get_name(netdev)); lsc_interrupt_mode = false; } - if (dev->requested_lsc_interrupt_mode != lsc_interrupt_mode) { - dev->requested_lsc_interrupt_mode = lsc_interrupt_mode; + if (dev->common.requested_lsc_interrupt_mode != lsc_interrupt_mode) { + dev->common.requested_lsc_interrupt_mode = lsc_interrupt_mode; netdev_request_reconfigure(netdev); } @@ -2426,7 +2297,8 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, } /* Get the Flow control configuration. */ - err = -rte_eth_dev_flow_ctrl_get(dev->port_id, &dev->fc_conf); + err = -rte_eth_dev_flow_ctrl_get(dev->common.port_id, + &dev->common.fc_conf); if (err) { if (err == ENOTSUP) { if (flow_control_requested) { @@ -2441,14 +2313,15 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, goto out; } - if (dev->fc_conf.mode != fc_mode || autoneg != dev->fc_conf.autoneg) { - dev->fc_conf.mode = fc_mode; - dev->fc_conf.autoneg = autoneg; + if (dev->common.fc_conf.mode != fc_mode || + autoneg != dev->common.fc_conf.autoneg) { + dev->common.fc_conf.mode = fc_mode; + dev->common.fc_conf.autoneg = autoneg; dpdk_eth_flow_ctrl_setup(dev); } out: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); ovs_mutex_unlock(&dpdk_mutex); return err; @@ -2461,7 +2334,7 @@ netdev_dpdk_vhost_client_get_config(const struct netdev *netdev, struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int tx_retries_max; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (dev->vhost_id) { smap_add(args, "vhost-server-path", dev->vhost_id); @@ -2472,7 +2345,7 @@ netdev_dpdk_vhost_client_get_config(const struct netdev *netdev, smap_add_format(args, "tx-retries-max", "%d", tx_retries_max); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -2487,7 +2360,7 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, int max_tx_retries, cur_max_tx_retries; uint32_t max_queue_pairs; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (!(dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT)) { path = smap_get(args, "vhost-server-path"); if (!nullable_string_is_equal(path, dev->vhost_id)) { @@ -2518,7 +2391,7 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, VLOG_INFO("Max Tx retries for vhost device '%s' set to %d", netdev_get_name(netdev), max_tx_retries); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -2528,7 +2401,7 @@ netdev_dpdk_get_numa_id(const struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - return dev->socket_id; + return dev->common.socket_id; } /* Sets the number of tx queues for the dpdk interface. */ @@ -2537,17 +2410,17 @@ netdev_dpdk_set_tx_multiq(struct netdev *netdev, unsigned int n_txq) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); - if (dev->requested_n_txq == n_txq) { + if (dev->common.requested_n_txq == n_txq) { goto out; } - dev->requested_n_txq = n_txq; + dev->common.requested_n_txq = n_txq; netdev_request_reconfigure(netdev); out: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -2575,9 +2448,9 @@ netdev_dpdk_rxq_construct(struct netdev_rxq *rxq) struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); - ovs_mutex_lock(&dev->mutex); - rx->port_id = dev->port_id; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + rx->port_id = dev->common.port_id; + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -2635,8 +2508,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) if (OVS_UNLIKELY(unexpected)) { VLOG_WARN_RL(&rl, "%s: Unexpected Tx offload flags: %#"PRIx64, - netdev_get_name(&dev->up), unexpected); - netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), + netdev_get_name(&dev->common.up), unexpected); + netdev_dpdk_mbuf_dump(netdev_get_name(&dev->common.up), "Packet with unexpected ol_flags", mbuf); return false; } @@ -2737,11 +2610,12 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) hdr_len += mbuf->outer_l2_len + mbuf->outer_l3_len; } - if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > dev->max_packet_len)) { + if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > + dev->common.max_packet_len)) { VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " "gso: %"PRIu32", max len: %"PRIu32"", - dev->up.name, hdr_len, mbuf->tso_segsz, - dev->max_packet_len); + dev->common.up.name, hdr_len, mbuf->tso_segsz, + dev->common.max_packet_len); return false; } mbuf->ol_flags |= RTE_MBUF_F_TX_TCP_SEG; @@ -2822,19 +2696,20 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, uint32_t nb_tx = 0; uint16_t nb_tx_prep = cnt; - nb_tx_prep = rte_eth_tx_prepare(dev->port_id, qid, pkts, cnt); + nb_tx_prep = rte_eth_tx_prepare(dev->common.port_id, qid, pkts, cnt); if (nb_tx_prep != cnt) { VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " - "Only %u/%u are valid: %s", netdev_get_name(&dev->up), + "Only %u/%u are valid: %s", + netdev_get_name(&dev->common.up), nb_tx_prep, cnt, rte_strerror(rte_errno)); - netdev_dpdk_mbuf_dump(netdev_get_name(&dev->up), + netdev_dpdk_mbuf_dump(netdev_get_name(&dev->common.up), "First invalid packet", pkts[nb_tx_prep]); } while (nb_tx != nb_tx_prep) { uint32_t ret; - ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, + ret = rte_eth_tx_burst(dev->common.port_id, qid, pkts + nb_tx, nb_tx_prep - nb_tx); if (!ret) { break; @@ -2928,11 +2803,11 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, int vid = netdev_dpdk_get_vid(dev); if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured - || !(dev->flags & NETDEV_UP))) { + || !(dev->common.flags & NETDEV_UP))) { return EAGAIN; } - nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->dpdk_mp->mp, + nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->common.dpdk_mp->mp, (struct rte_mbuf **) batch->packets, NETDEV_MAX_BURST); if (!nb_rx) { @@ -2958,10 +2833,10 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, } if (OVS_UNLIKELY(qos_drops)) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.rx_dropped += qos_drops; - dev->sw_stats->rx_qos_drops += qos_drops; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.rx_dropped += qos_drops; + dev->common.sw_stats->rx_qos_drops += qos_drops; + rte_spinlock_unlock(&dev->common.stats_lock); } batch->count = nb_rx; @@ -2988,7 +2863,7 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, int nb_rx; int dropped = 0; - if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { + if (OVS_UNLIKELY(!(dev->common.flags & NETDEV_UP))) { return EAGAIN; } @@ -3017,10 +2892,10 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, /* Update stats to reflect dropped packets */ if (OVS_UNLIKELY(dropped)) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.rx_dropped += dropped; - dev->sw_stats->rx_qos_drops += dropped; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.rx_dropped += dropped; + dev->common.sw_stats->rx_qos_drops += dropped; + rte_spinlock_unlock(&dev->common.stats_lock); } batch->count = nb_rx; @@ -3056,11 +2931,12 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, * during the offloading preparation for performance reasons. */ for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; - if (OVS_UNLIKELY((pkt->pkt_len > dev->max_packet_len) + if (OVS_UNLIKELY((pkt->pkt_len > dev->common.max_packet_len) && !pkt->tso_segsz)) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " - "max_packet_len %d", dev->up.name, pkt->pkt_len, - dev->max_packet_len); + "max_packet_len %d", + dev->common.up.name, pkt->pkt_len, + dev->common.max_packet_len); rte_pktmbuf_free(pkt); continue; } @@ -3241,7 +3117,8 @@ dpdk_copy_batch_to_mbuf(struct netdev *netdev, struct dp_packet_batch *batch) } else { struct dp_packet *pktcopy; - pktcopy = dpdk_copy_dp_packet_to_mbuf(dev->dpdk_mp->mp, packet); + pktcopy = dpdk_copy_dp_packet_to_mbuf( + dev->common.dpdk_mp->mp, packet); if (pktcopy) { dp_packet_batch_refill(batch, pktcopy, i); } @@ -3313,19 +3190,19 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, int retries; batch_cnt = cnt = dp_packet_batch_size(batch); - qid = dev->tx_q[qid % netdev->n_txq].map; + qid = dev->common.tx_q[qid % netdev->n_txq].map; if (OVS_UNLIKELY(vid < 0 || !dev->vhost_reconfigured || qid < 0 - || !(dev->flags & NETDEV_UP))) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped += cnt; - rte_spinlock_unlock(&dev->stats_lock); + || !(dev->common.flags & NETDEV_UP))) { + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.tx_dropped += cnt; + rte_spinlock_unlock(&dev->common.stats_lock); dp_packet_delete_batch(batch, true); return 0; } - if (OVS_UNLIKELY(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) { + if (OVS_UNLIKELY(!rte_spinlock_trylock(&dev->common.tx_q[qid].tx_lock))) { COVERAGE_INC(vhost_tx_contention); - rte_spinlock_lock(&dev->tx_q[qid].tx_lock); + rte_spinlock_lock(&dev->common.tx_q[qid].tx_lock); } cnt = netdev_dpdk_common_send(netdev, batch, &stats); @@ -3357,23 +3234,23 @@ netdev_dpdk_vhost_send(struct netdev *netdev, int qid, } } while (cnt && (retries++ < max_retries)); - rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); + rte_spinlock_unlock(&dev->common.tx_q[qid].tx_lock); stats.tx_failure_drops += cnt; dropped += cnt; stats.tx_retries = MIN(retries, max_retries); if (OVS_UNLIKELY(dropped || stats.tx_retries)) { - struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats; + struct netdev_dpdk_sw_stats *sw_stats = dev->common.sw_stats; - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped += dropped; + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.tx_dropped += dropped; sw_stats->tx_retries += stats.tx_retries; sw_stats->tx_failure_drops += stats.tx_failure_drops; sw_stats->tx_mtu_exceeded_drops += stats.tx_mtu_exceeded_drops; sw_stats->tx_qos_drops += stats.tx_qos_drops; sw_stats->tx_invalid_hwol_drops += stats.tx_invalid_hwol_drops; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_unlock(&dev->common.stats_lock); } pkts = (struct rte_mbuf **) batch->packets; @@ -3392,17 +3269,17 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, struct netdev_dpdk_sw_stats stats; int cnt, dropped; - if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped += dp_packet_batch_size(batch); - rte_spinlock_unlock(&dev->stats_lock); + if (OVS_UNLIKELY(!(dev->common.flags & NETDEV_UP))) { + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.tx_dropped += dp_packet_batch_size(batch); + rte_spinlock_unlock(&dev->common.stats_lock); dp_packet_delete_batch(batch, true); return 0; } if (OVS_UNLIKELY(concurrent_txq)) { - qid = qid % dev->up.n_txq; - rte_spinlock_lock(&dev->tx_q[qid].tx_lock); + qid = qid % dev->common.up.n_txq; + rte_spinlock_lock(&dev->common.tx_q[qid].tx_lock); } cnt = netdev_dpdk_common_send(netdev, batch, &stats); @@ -3411,19 +3288,19 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, stats.tx_failure_drops += dropped; dropped += batch_cnt - cnt; if (OVS_UNLIKELY(dropped)) { - struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats; + struct netdev_dpdk_sw_stats *sw_stats = dev->common.sw_stats; - rte_spinlock_lock(&dev->stats_lock); - dev->stats.tx_dropped += dropped; + rte_spinlock_lock(&dev->common.stats_lock); + dev->common.stats.tx_dropped += dropped; sw_stats->tx_failure_drops += stats.tx_failure_drops; sw_stats->tx_mtu_exceeded_drops += stats.tx_mtu_exceeded_drops; sw_stats->tx_qos_drops += stats.tx_qos_drops; sw_stats->tx_invalid_hwol_drops += stats.tx_invalid_hwol_drops; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_unlock(&dev->common.stats_lock); } if (OVS_UNLIKELY(concurrent_txq)) { - rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); + rte_spinlock_unlock(&dev->common.tx_q[qid].tx_lock); } return 0; @@ -3431,7 +3308,7 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, static int netdev_dpdk_set_etheraddr__(struct netdev_dpdk *dev, const struct eth_addr mac) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { int err = 0; @@ -3439,13 +3316,13 @@ netdev_dpdk_set_etheraddr__(struct netdev_dpdk *dev, const struct eth_addr mac) struct rte_ether_addr ea; memcpy(ea.addr_bytes, mac.ea, ETH_ADDR_LEN); - err = -rte_eth_dev_default_mac_addr_set(dev->port_id, &ea); + err = -rte_eth_dev_default_mac_addr_set(dev->common.port_id, &ea); } if (!err) { - dev->hwaddr = mac; + dev->common.hwaddr = mac; } else { VLOG_WARN("%s: Failed to set requested mac("ETH_ADDR_FMT"): %s", - netdev_get_name(&dev->up), ETH_ADDR_ARGS(mac), + netdev_get_name(&dev->common.up), ETH_ADDR_ARGS(mac), rte_strerror(err)); } @@ -3458,14 +3335,14 @@ netdev_dpdk_set_etheraddr(struct netdev *netdev, const struct eth_addr mac) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err = 0; - ovs_mutex_lock(&dev->mutex); - if (!eth_addr_equals(dev->hwaddr, mac)) { + ovs_mutex_lock(&dev->common.mutex); + if (!eth_addr_equals(dev->common.hwaddr, mac)) { err = netdev_dpdk_set_etheraddr__(dev, mac); if (!err) { netdev_change_seq_changed(netdev); } } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return err; } @@ -3475,9 +3352,9 @@ netdev_dpdk_get_etheraddr(const struct netdev *netdev, struct eth_addr *mac) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - *mac = dev->hwaddr; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + *mac = dev->common.hwaddr; + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -3487,9 +3364,9 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - *mtup = dev->mtu; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + *mtup = dev->common.mtu; + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -3512,16 +3389,16 @@ netdev_dpdk_set_mtu(struct netdev *netdev, int mtu) */ if (MTU_TO_MAX_FRAME_LEN(mtu) > NETDEV_DPDK_MAX_PKT_LEN || mtu < RTE_ETHER_MIN_MTU) { - VLOG_WARN("%s: unsupported MTU %d\n", dev->up.name, mtu); + VLOG_WARN("%s: unsupported MTU %d\n", dev->common.up.name, mtu); return EINVAL; } - ovs_mutex_lock(&dev->mutex); - if (dev->requested_mtu != mtu) { - dev->requested_mtu = mtu; + ovs_mutex_lock(&dev->common.mutex); + if (dev->common.requested_mtu != mtu) { + dev->common.requested_mtu = mtu; netdev_request_reconfigure(netdev); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -3538,7 +3415,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, int qid; int vid; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (!is_vhost_running(dev)) { err = EPROTO; @@ -3580,11 +3457,11 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, VHOST_RXQ_STAT(rx_1024_to_1522_packets, "size_1024_1518_packets") \ VHOST_RXQ_STAT(rx_1523_to_max_packets, "size_1519_max_packets") -#define VHOST_RXQ_STAT(MEMBER, NAME) dev->stats.MEMBER = 0; +#define VHOST_RXQ_STAT(MEMBER, NAME) dev->common.stats.MEMBER = 0; VHOST_RXQ_STATS; #undef VHOST_RXQ_STAT - for (int q = 0; q < dev->up.n_rxq; q++) { + for (int q = 0; q < dev->common.up.n_rxq; q++) { qid = q * VIRTIO_QNUM + VIRTIO_TXQ; err = rte_vhost_vring_stats_get(vid, qid, vhost_stats, @@ -3597,7 +3474,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, for (int i = 0; i < vhost_stats_count; i++) { #define VHOST_RXQ_STAT(MEMBER, NAME) \ if (string_ends_with(vhost_stats_names[i].name, NAME)) { \ - dev->stats.MEMBER += vhost_stats[i].value; \ + dev->common.stats.MEMBER += vhost_stats[i].value; \ continue; \ } VHOST_RXQ_STATS; @@ -3609,11 +3486,12 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, * Since vhost only reports good packets and has no error counter, * rx_undersized_errors is highjacked (see above) to retrieve * "undersize_packets". */ - dev->stats.rx_1_to_64_packets += dev->stats.rx_undersized_errors; - memset(&dev->stats.rx_undersized_errors, 0xff, - sizeof dev->stats.rx_undersized_errors); + dev->common.stats.rx_1_to_64_packets += + dev->common.stats.rx_undersized_errors; + memset(&dev->common.stats.rx_undersized_errors, 0xff, + sizeof dev->common.stats.rx_undersized_errors); -#define VHOST_RXQ_STAT(MEMBER, NAME) stats->MEMBER = dev->stats.MEMBER; +#define VHOST_RXQ_STAT(MEMBER, NAME) stats->MEMBER = dev->common.stats.MEMBER; VHOST_RXQ_STATS; #undef VHOST_RXQ_STAT @@ -3655,11 +3533,11 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, VHOST_TXQ_STAT(tx_1024_to_1522_packets, "size_1024_1518_packets") \ VHOST_TXQ_STAT(tx_1523_to_max_packets, "size_1519_max_packets") -#define VHOST_TXQ_STAT(MEMBER, NAME) dev->stats.MEMBER = 0; +#define VHOST_TXQ_STAT(MEMBER, NAME) dev->common.stats.MEMBER = 0; VHOST_TXQ_STATS; #undef VHOST_TXQ_STAT - for (int q = 0; q < dev->up.n_txq; q++) { + for (int q = 0; q < dev->common.up.n_txq; q++) { qid = q * VIRTIO_QNUM; err = rte_vhost_vring_stats_get(vid, qid, vhost_stats, @@ -3672,7 +3550,7 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, for (int i = 0; i < vhost_stats_count; i++) { #define VHOST_TXQ_STAT(MEMBER, NAME) \ if (string_ends_with(vhost_stats_names[i].name, NAME)) { \ - dev->stats.MEMBER += vhost_stats[i].value; \ + dev->common.stats.MEMBER += vhost_stats[i].value; \ continue; \ } VHOST_TXQ_STATS; @@ -3682,23 +3560,24 @@ netdev_dpdk_vhost_get_stats(const struct netdev *netdev, /* OVS reports 64 bytes and smaller packets into "tx_1_to_64_packets". * Same as for rx, rx_undersized_errors is highjacked. */ - dev->stats.tx_1_to_64_packets += dev->stats.rx_undersized_errors; - memset(&dev->stats.rx_undersized_errors, 0xff, - sizeof dev->stats.rx_undersized_errors); + dev->common.stats.tx_1_to_64_packets += + dev->common.stats.rx_undersized_errors; + memset(&dev->common.stats.rx_undersized_errors, 0xff, + sizeof dev->common.stats.rx_undersized_errors); -#define VHOST_TXQ_STAT(MEMBER, NAME) stats->MEMBER = dev->stats.MEMBER; +#define VHOST_TXQ_STAT(MEMBER, NAME) stats->MEMBER = dev->common.stats.MEMBER; VHOST_TXQ_STATS; #undef VHOST_TXQ_STAT - rte_spinlock_lock(&dev->stats_lock); - stats->rx_dropped = dev->stats.rx_dropped; - stats->tx_dropped = dev->stats.tx_dropped; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + stats->rx_dropped = dev->common.stats.rx_dropped; + stats->tx_dropped = dev->common.stats.tx_dropped; + rte_spinlock_unlock(&dev->common.stats_lock); err = 0; out: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); free(vhost_stats); free(vhost_stats_names); @@ -3723,7 +3602,7 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, netdev_dpdk_get_sw_custom_stats(netdev, custom_stats); stat_offset = custom_stats->size; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (!is_vhost_running(dev)) { goto out; @@ -3745,8 +3624,8 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, } vhost_txq_stats_count = err; - stat_offset += dev->up.n_rxq * vhost_rxq_stats_count; - stat_offset += dev->up.n_txq * vhost_txq_stats_count; + stat_offset += dev->common.up.n_rxq * vhost_rxq_stats_count; + stat_offset += dev->common.up.n_txq * vhost_txq_stats_count; custom_stats->counters = xrealloc(custom_stats->counters, stat_offset * sizeof *custom_stats->counters); @@ -3756,7 +3635,7 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, sizeof *vhost_stats_names); vhost_stats = xcalloc(vhost_rxq_stats_count, sizeof *vhost_stats); - for (int q = 0; q < dev->up.n_rxq; q++) { + for (int q = 0; q < dev->common.up.n_rxq; q++) { qid = q * VIRTIO_QNUM + VIRTIO_TXQ; err = rte_vhost_vring_stats_get_names(vid, qid, vhost_stats_names, @@ -3790,7 +3669,7 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, sizeof *vhost_stats_names); vhost_stats = xcalloc(vhost_txq_stats_count, sizeof *vhost_stats); - for (int q = 0; q < dev->up.n_txq; q++) { + for (int q = 0; q < dev->common.up.n_txq; q++) { qid = q * VIRTIO_QNUM; err = rte_vhost_vring_stats_get_names(vid, qid, vhost_stats_names, @@ -3816,7 +3695,7 @@ netdev_dpdk_vhost_get_custom_stats(const struct netdev *netdev, } out: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); custom_stats->size = stat_offset; free(vhost_stats_names); @@ -3879,24 +3758,24 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) bool gg; netdev_dpdk_get_carrier(netdev, &gg); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); struct rte_eth_xstat *rte_xstats = NULL; struct rte_eth_xstat_name *rte_xstats_names = NULL; int rte_xstats_len, rte_xstats_new_len, rte_xstats_ret; - if (rte_eth_stats_get(dev->port_id, &rte_stats)) { + if (rte_eth_stats_get(dev->common.port_id, &rte_stats)) { VLOG_ERR("Can't get ETH statistics for port: "DPDK_PORT_ID_FMT, - dev->port_id); - ovs_mutex_unlock(&dev->mutex); + dev->common.port_id); + ovs_mutex_unlock(&dev->common.mutex); return EPROTO; } /* Get length of statistics */ - rte_xstats_len = rte_eth_xstats_get_names(dev->port_id, NULL, 0); + rte_xstats_len = rte_eth_xstats_get_names(dev->common.port_id, NULL, 0); if (rte_xstats_len < 0) { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); goto out; } /* Reserve memory for xstats names and values */ @@ -3904,24 +3783,24 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) rte_xstats = xcalloc(rte_xstats_len, sizeof *rte_xstats); /* Retreive xstats names */ - rte_xstats_new_len = rte_eth_xstats_get_names(dev->port_id, + rte_xstats_new_len = rte_eth_xstats_get_names(dev->common.port_id, rte_xstats_names, rte_xstats_len); if (rte_xstats_new_len != rte_xstats_len) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); goto out; } /* Retreive xstats values */ memset(rte_xstats, 0xff, sizeof *rte_xstats * rte_xstats_len); - rte_xstats_ret = rte_eth_xstats_get(dev->port_id, rte_xstats, + rte_xstats_ret = rte_eth_xstats_get(dev->common.port_id, rte_xstats, rte_xstats_len); if (rte_xstats_ret > 0 && rte_xstats_ret <= rte_xstats_len) { netdev_dpdk_convert_xstats(stats, rte_xstats, rte_xstats_names, rte_xstats_len); } else { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); } out: @@ -3935,17 +3814,17 @@ out: stats->rx_errors = rte_stats.ierrors; stats->tx_errors = rte_stats.oerrors; - rte_spinlock_lock(&dev->stats_lock); - stats->tx_dropped = dev->stats.tx_dropped; - stats->rx_dropped = dev->stats.rx_dropped; - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + stats->tx_dropped = dev->common.stats.tx_dropped; + stats->rx_dropped = dev->common.stats.rx_dropped; + rte_spinlock_unlock(&dev->common.stats_lock); /* These are the available DPDK counters for packets not received due to * local resource constraints in DPDK and NIC respectively. */ stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed; stats->rx_missed_errors = rte_stats.imissed; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -3961,18 +3840,20 @@ netdev_dpdk_get_custom_stats(const struct netdev *netdev, netdev_dpdk_get_sw_custom_stats(netdev, custom_stats); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); - if (dev->rte_xstats_ids_size > 0) { - uint64_t *values = xcalloc(dev->rte_xstats_ids_size, + if (dev->common.rte_xstats_ids_size > 0) { + uint64_t *values = xcalloc(dev->common.rte_xstats_ids_size, sizeof(uint64_t)); rte_xstats_ret = - rte_eth_xstats_get_by_id(dev->port_id, dev->rte_xstats_ids, - values, dev->rte_xstats_ids_size); + rte_eth_xstats_get_by_id(dev->common.port_id, + dev->common.rte_xstats_ids, + values, + dev->common.rte_xstats_ids_size); if (rte_xstats_ret > 0 && - rte_xstats_ret <= dev->rte_xstats_ids_size) { + rte_xstats_ret <= dev->common.rte_xstats_ids_size) { sw_stats_size = custom_stats->size; custom_stats->size += rte_xstats_ret; @@ -3982,20 +3863,20 @@ netdev_dpdk_get_custom_stats(const struct netdev *netdev, for (i = 0; i < rte_xstats_ret; i++) { ovs_strlcpy(custom_stats->counters[sw_stats_size + i].name, - netdev_dpdk_get_xstat_name(dev, - dev->rte_xstats_ids[i]), + netdev_dpdk_get_xstat_name( + dev, dev->common.rte_xstats_ids[i]), NETDEV_CUSTOM_STATS_NAME_SIZE); custom_stats->counters[sw_stats_size + i].value = values[i]; } } else { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->port_id); + dev->common.port_id); } free(values); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4021,17 +3902,17 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev, custom_stats->counters = xcalloc(custom_stats->size, sizeof *custom_stats->counters); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); - rte_spinlock_lock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); i = 0; #define SW_CSTAT(NAME) \ - custom_stats->counters[i++].value = dev->sw_stats->NAME; + custom_stats->counters[i++].value = dev->common.sw_stats->NAME; SW_CSTATS; #undef SW_CSTAT - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_unlock(&dev->common.stats_lock); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); i = 0; n = 0; @@ -4061,9 +3942,9 @@ netdev_dpdk_get_features(const struct netdev *netdev, struct rte_eth_link link; uint32_t feature = 0; - ovs_mutex_lock(&dev->mutex); - link = dev->link; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + link = dev->common.link; + ovs_mutex_unlock(&dev->common.mutex); /* Match against OpenFlow defined link speed values. */ if (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) { @@ -4124,10 +4005,10 @@ netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current, struct rte_eth_link link; int diag; - ovs_mutex_lock(&dev->mutex); - link = dev->link; - diag = rte_eth_dev_info_get(dev->port_id, &dev_info); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + link = dev->common.link; + diag = rte_eth_dev_info_get(dev->common.port_id, &dev_info); + ovs_mutex_unlock(&dev->common.mutex); *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN ? link.link_speed : 0; @@ -4179,13 +4060,14 @@ netdev_dpdk_get_duplex(const struct netdev *netdev, bool *full_duplex) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err = 0; - ovs_mutex_lock(&dev->mutex); - if (dev->link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN) { - *full_duplex = dev->link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX; + ovs_mutex_lock(&dev->common.mutex); + if (dev->common.link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN) { + *full_duplex = dev->common.link.link_duplex == + RTE_ETH_LINK_FULL_DUPLEX; } else { err = EOPNOTSUPP; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return err; } @@ -4240,7 +4122,7 @@ netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, : !policer_burst ? 8000 : policer_burst); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); policer = ovsrcu_get_protected(struct ingress_policer *, &dev->ingress_policer); @@ -4248,7 +4130,7 @@ netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, if (dev->policer_rate == policer_rate && dev->policer_burst == policer_burst) { /* Assume that settings haven't changed since we last set them. */ - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4265,7 +4147,7 @@ netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, ovsrcu_set(&dev->ingress_policer, policer); dev->policer_rate = policer_rate; dev->policer_burst = policer_burst; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4275,12 +4157,12 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); /* Calculate hash from the netdev name. Ensure that ifindex is a 24-bit * postive integer to meet RFC 2863 recommendations. */ int ifindex = hash_string(netdev->name, 0) % 0xfffffe + 1; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return ifindex; } @@ -4290,11 +4172,11 @@ netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); check_link_status(dev); - *carrier = dev->link.link_status; + *carrier = dev->common.link.link_status; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4304,7 +4186,7 @@ netdev_dpdk_vhost_get_carrier(const struct netdev *netdev, bool *carrier) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (is_vhost_running(dev)) { *carrier = 1; @@ -4312,7 +4194,7 @@ netdev_dpdk_vhost_get_carrier(const struct netdev *netdev, bool *carrier) *carrier = 0; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4323,9 +4205,9 @@ netdev_dpdk_get_carrier_resets(const struct netdev *netdev) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); long long int carrier_resets; - ovs_mutex_lock(&dev->mutex); - carrier_resets = dev->link_reset_cnt; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + carrier_resets = dev->common.link_reset_cnt; + ovs_mutex_unlock(&dev->common.mutex); return carrier_resets; } @@ -4341,46 +4223,46 @@ static int netdev_dpdk_update_flags__(struct netdev_dpdk *dev, enum netdev_flags off, enum netdev_flags on, enum netdev_flags *old_flagsp) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { return EINVAL; } - *old_flagsp = dev->flags; - dev->flags |= on; - dev->flags &= ~off; + *old_flagsp = dev->common.flags; + dev->common.flags |= on; + dev->common.flags &= ~off; - if (dev->flags == *old_flagsp) { + if (dev->common.flags == *old_flagsp) { return 0; } if (dev->type == DPDK_DEV_ETH) { - if ((dev->flags ^ *old_flagsp) & NETDEV_UP) { + if ((dev->common.flags ^ *old_flagsp) & NETDEV_UP) { int err; - if (dev->flags & NETDEV_UP) { - err = rte_eth_dev_set_link_up(dev->port_id); + if (dev->common.flags & NETDEV_UP) { + err = rte_eth_dev_set_link_up(dev->common.port_id); } else { - err = rte_eth_dev_set_link_down(dev->port_id); + err = rte_eth_dev_set_link_down(dev->common.port_id); } if (err == -ENOTSUP) { VLOG_INFO("Interface %s does not support link state " - "configuration", netdev_get_name(&dev->up)); + "configuration", netdev_get_name(&dev->common.up)); } else if (err < 0) { VLOG_ERR("Interface %s link change error: %s", - netdev_get_name(&dev->up), rte_strerror(-err)); - dev->flags = *old_flagsp; + netdev_get_name(&dev->common.up), rte_strerror(-err)); + dev->common.flags = *old_flagsp; return -err; } } - if (dev->flags & NETDEV_PROMISC) { - rte_eth_promiscuous_enable(dev->port_id); + if (dev->common.flags & NETDEV_PROMISC) { + rte_eth_promiscuous_enable(dev->common.port_id); } - netdev_change_seq_changed(&dev->up); + netdev_change_seq_changed(&dev->common.up); } else { /* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is * running then change netdev's change_seq to trigger link state @@ -4388,14 +4270,15 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) && is_vhost_running(dev)) { - netdev_change_seq_changed(&dev->up); + netdev_change_seq_changed(&dev->common.up); /* Clear statistics if device is getting up. */ if (NETDEV_UP & on) { - rte_spinlock_lock(&dev->stats_lock); - memset(&dev->stats, 0, sizeof dev->stats); - memset(dev->sw_stats, 0, sizeof *dev->sw_stats); - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + memset(&dev->common.stats, 0, sizeof dev->common.stats); + memset(dev->common.sw_stats, 0, + sizeof *dev->common.sw_stats); + rte_spinlock_unlock(&dev->common.stats_lock); } } } @@ -4411,9 +4294,9 @@ netdev_dpdk_update_flags(struct netdev *netdev, struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int error; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); error = netdev_dpdk_update_flags__(dev, off, on, old_flagsp); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -4424,7 +4307,7 @@ netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); bool client_mode = dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT; smap_add_format(args, "mode", "%s", client_mode ? "client" : "server"); @@ -4432,7 +4315,7 @@ netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, int vid = netdev_dpdk_get_vid(dev); if (vid < 0) { smap_add_format(args, "status", "disconnected"); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } else { smap_add_format(args, "status", "connected"); @@ -4480,7 +4363,7 @@ netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, smap_add_format(args, "n_rxq", "%d", netdev->n_rxq); smap_add_format(args, "n_txq", "%d", netdev->n_txq); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return 0; } @@ -4519,28 +4402,28 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) int n_rxq; int diag; - if (!rte_eth_dev_is_valid_port(dev->port_id)) { + if (!rte_eth_dev_is_valid_port(dev->common.port_id)) { return ENODEV; } ovs_mutex_lock(&dpdk_mutex); - ovs_mutex_lock(&dev->mutex); - diag = rte_eth_dev_info_get(dev->port_id, &dev_info); - link_speed = dev->link.link_speed; + ovs_mutex_lock(&dev->common.mutex); + diag = rte_eth_dev_info_get(dev->common.port_id, &dev_info); + link_speed = dev->common.link.link_speed; rx_steer_flags = dev->rx_steer_flags; rx_steer_flows_num = dev->rx_steer_flows_num; n_rxq = netdev->n_rxq; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); ovs_mutex_unlock(&dpdk_mutex); - smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id); + smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->common.port_id); smap_add_format(args, "numa_id", "%d", - rte_eth_dev_socket_id(dev->port_id)); + rte_eth_dev_socket_id(dev->common.port_id)); if (!diag) { smap_add_format(args, "driver_name", "%s", dev_info.driver_name); smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); } - smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); + smap_add_format(args, "max_rx_pktlen", "%u", dev->common.max_packet_len); if (!diag) { smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); @@ -4555,7 +4438,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) smap_add_format(args, "n_txq", "%d", netdev->n_txq); smap_add(args, "rx_csum_offload", - dev->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD + dev->common.hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD ? "true" : "false"); /* Querying the DPDK library for iftype may be done in future, pending @@ -4581,9 +4464,9 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) smap_add(args, "link_speed", netdev_dpdk_link_speed_to_str__(link_speed)); - if (dev->is_representor) { + if (dev->common.is_representor) { smap_add_format(args, "dpdk-vf-mac", ETH_ADDR_FMT, - ETH_ADDR_ARGS(dev->hwaddr)); + ETH_ADDR_ARGS(dev->common.hwaddr)); } if (rx_steer_flags && !rx_steer_flows_num) { @@ -4609,7 +4492,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) static void netdev_dpdk_set_admin_state__(struct netdev_dpdk *dev, bool admin_state) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { enum netdev_flags old_flags; @@ -4641,9 +4524,9 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc, if (netdev && is_dpdk_class(netdev->netdev_class)) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); netdev_dpdk_set_admin_state__(dev, up); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); netdev_close(netdev); } else { @@ -4655,10 +4538,10 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc, struct netdev_dpdk *dev; ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + ovs_mutex_lock(&dev->common.mutex); netdev_dpdk_set_admin_state__(dev, up); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); } @@ -4692,13 +4575,13 @@ netdev_dpdk_detach(struct unixctl_conn *conn, int argc OVS_UNUSED, RTE_ETH_FOREACH_DEV_SIBLING (sibling_port_id, port_id) { struct netdev_dpdk *dev; - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - if (dev->port_id != sibling_port_id) { + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + if (dev->common.port_id != sibling_port_id) { continue; } used = true; ds_put_format(&used_interfaces, " %s", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); break; } } @@ -4762,20 +4645,20 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn, if (netdev) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); ovs_mutex_lock(&dpdk_mp_mutex); - if (dev->dpdk_mp) { - rte_mempool_dump(stream, dev->dpdk_mp->mp); + if (dev->common.dpdk_mp) { + rte_mempool_dump(stream, dev->common.dpdk_mp->mp); fprintf(stream, " count: avail (%u), in use (%u)\n", - rte_mempool_avail_count(dev->dpdk_mp->mp), - rte_mempool_in_use_count(dev->dpdk_mp->mp)); + rte_mempool_avail_count(dev->common.dpdk_mp->mp), + rte_mempool_in_use_count(dev->common.dpdk_mp->mp)); } else { error = "Not allocated"; } ovs_mutex_unlock(&dpdk_mp_mutex); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } else { ovs_mutex_lock(&dpdk_mp_mutex); rte_mempool_list_dump(stream); @@ -4813,16 +4696,16 @@ set_irq_status(int vid) */ static void netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { int *enabled_queues, n_enabled = 0; - int i, k, total_txqs = dev->up.n_txq; + int i, k, total_txqs = dev->common.up.n_txq; enabled_queues = xcalloc(total_txqs, sizeof *enabled_queues); for (i = 0; i < total_txqs; i++) { /* Enabled queues always mapped to themselves. */ - if (dev->tx_q[i].map == i) { + if (dev->common.tx_q[i].map == i) { enabled_queues[n_enabled++] = i; } } @@ -4834,8 +4717,8 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) k = 0; for (i = 0; i < total_txqs; i++) { - if (dev->tx_q[i].map != i) { - dev->tx_q[i].map = enabled_queues[k]; + if (dev->common.tx_q[i].map != i) { + dev->common.tx_q[i].map = enabled_queues[k]; k = (k + 1) % n_enabled; } } @@ -4844,9 +4727,10 @@ netdev_dpdk_remap_txqs(struct netdev_dpdk *dev) struct ds mapping = DS_EMPTY_INITIALIZER; ds_put_format(&mapping, "TX queue mapping for port '%s':\n", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); for (i = 0; i < total_txqs; i++) { - ds_put_format(&mapping, "%2d --> %2d\n", i, dev->tx_q[i].map); + ds_put_format(&mapping, "%2d --> %2d\n", + i, dev->common.tx_q[i].map); } VLOG_DBG("%s", ds_cstr(&mapping)); @@ -4870,9 +4754,10 @@ new_device(int vid) rte_vhost_get_ifname(vid, ifname, sizeof ifname); ovs_mutex_lock(&dpdk_mutex); + /* Add device to the vhost port with the same name as that passed down. */ - LIST_FOR_EACH(dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + ovs_mutex_lock(&dev->common.mutex); if (nullable_string_is_equal(ifname, dev->vhost_id)) { uint32_t qp_num = rte_vhost_get_vring_num(vid) / VIRTIO_QNUM; uint64_t features; @@ -4884,19 +4769,19 @@ new_device(int vid) VLOG_INFO("Error getting NUMA info for vHost Device '%s'", ifname); #endif - newnode = dev->socket_id; + newnode = dev->common.socket_id; } dev->virtio_features_state |= OVS_VIRTIO_F_NEGOTIATED; - if (dev->requested_n_txq < qp_num - || dev->requested_n_rxq < qp_num - || dev->requested_socket_id != newnode - || dev->dpdk_mp == NULL) { - dev->requested_socket_id = newnode; - dev->requested_n_rxq = qp_num; - dev->requested_n_txq = qp_num; - netdev_request_reconfigure(&dev->up); + if (dev->common.requested_n_txq < qp_num + || dev->common.requested_n_rxq < qp_num + || dev->common.requested_socket_id != newnode + || dev->common.dpdk_mp == NULL) { + dev->common.requested_socket_id = newnode; + dev->common.requested_n_rxq = qp_num; + dev->common.requested_n_txq = qp_num; + netdev_request_reconfigure(&dev->common.up); } else { /* Reconfiguration not required. */ dev->vhost_reconfigured = true; @@ -4907,13 +4792,13 @@ new_device(int vid) "vHost Device '%s'", dev->vhost_id); } else { if (features & (1ULL << VIRTIO_NET_F_GUEST_CSUM)) { - dev->hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; - dev->hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; - dev->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; /* There is no support in virtio net to offload IPv4 csum, * but the vhost library handles IPv4 csum offloading. */ - dev->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; } if (userspace_tso_enabled() @@ -4922,12 +4807,12 @@ new_device(int vid) if (features & (1ULL << VIRTIO_NET_F_GUEST_TSO4) && features & (1ULL << VIRTIO_NET_F_GUEST_TSO6)) { - dev->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + dev->common.hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; VLOG_DBG("%s: TSO enabled on vhost port", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } else { VLOG_WARN("%s: Tx TSO offload is not supported.", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } } } @@ -4939,11 +4824,11 @@ new_device(int vid) /* Disable notifications. */ set_irq_status(vid); - netdev_change_seq_changed(&dev->up); - ovs_mutex_unlock(&dev->mutex); + netdev_change_seq_changed(&dev->common.up); + ovs_mutex_unlock(&dev->common.mutex); break; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); @@ -4962,12 +4847,12 @@ new_device(int vid) /* Clears mapping for all available queues of vhost interface. */ static void netdev_dpdk_txq_map_clear(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { int i; - for (i = 0; i < dev->up.n_txq; i++) { - dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; + for (i = 0; i < dev->common.up.n_txq; i++) { + dev->common.tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; } } @@ -4987,22 +4872,22 @@ destroy_device(int vid) rte_vhost_get_ifname(vid, ifname, sizeof ifname); ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { if (netdev_dpdk_get_vid(dev) == vid) { - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); dev->vhost_reconfigured = false; ovsrcu_index_set(&dev->vid, -1); memset(dev->vhost_rxq_enabled, 0, - dev->up.n_rxq * sizeof *dev->vhost_rxq_enabled); + dev->common.up.n_rxq * sizeof *dev->vhost_rxq_enabled); netdev_dpdk_txq_map_clear(dev); /* Clear offload capabilities before next new_device. */ - dev->hw_ol_features = 0; + dev->common.hw_ol_features = 0; netdev_dpdk_update_netdev_flags(dev); - netdev_change_seq_changed(&dev->up); - ovs_mutex_unlock(&dev->mutex); + netdev_change_seq_changed(&dev->common.up); + ovs_mutex_unlock(&dev->common.mutex); exists = true; break; } @@ -5047,29 +4932,29 @@ vring_state_changed__(struct vhost_state_change *sc) bool is_rx = (sc->queue_id % VIRTIO_QNUM) == VIRTIO_TXQ; ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + ovs_mutex_lock(&dev->common.mutex); if (nullable_string_is_equal(sc->ifname, dev->vhost_id)) { if (is_rx) { bool old_state = dev->vhost_rxq_enabled[qid]; dev->vhost_rxq_enabled[qid] = sc->enable != 0; if (old_state != dev->vhost_rxq_enabled[qid]) { - netdev_change_seq_changed(&dev->up); + netdev_change_seq_changed(&dev->common.up); } } else { if (sc->enable) { - dev->tx_q[qid].map = qid; + dev->common.tx_q[qid].map = qid; } else { - dev->tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; + dev->common.tx_q[qid].map = OVS_VHOST_QUEUE_DISABLED; } netdev_dpdk_remap_txqs(dev); } exists = true; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); break; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); @@ -5153,8 +5038,8 @@ destroy_connection(int vid) rte_vhost_get_ifname(vid, ifname, sizeof ifname); ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, list_node, &dpdk_list) { - ovs_mutex_lock(&dev->mutex); + LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { + ovs_mutex_lock(&dev->common.mutex); if (nullable_string_is_equal(ifname, dev->vhost_id)) { uint32_t qp_num = NR_QUEUE; @@ -5164,11 +5049,11 @@ destroy_connection(int vid) } /* Restore the number of queue pairs to default. */ - if (dev->requested_n_txq != qp_num - || dev->requested_n_rxq != qp_num) { - dev->requested_n_rxq = qp_num; - dev->requested_n_txq = qp_num; - netdev_request_reconfigure(&dev->up); + if (dev->common.requested_n_txq != qp_num + || dev->common.requested_n_rxq != qp_num) { + dev->common.requested_n_rxq = qp_num; + dev->common.requested_n_txq = qp_num; + netdev_request_reconfigure(&dev->common.up); } if (!(dev->virtio_features_state & OVS_VIRTIO_F_NEGOTIATED)) { @@ -5205,15 +5090,15 @@ destroy_connection(int vid) } if (!(dev->virtio_features_state & OVS_VIRTIO_F_NEGOTIATED)) { dev->virtio_features_state |= OVS_VIRTIO_F_RECONF_PENDING; - netdev_request_reconfigure(&dev->up); + netdev_request_reconfigure(&dev->common.up); } } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); exists = true; break; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); @@ -5355,7 +5240,7 @@ netdev_dpdk_get_qos(const struct netdev *netdev, struct qos_conf *qos_conf; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (qos_conf) { *typep = qos_conf->ops->qos_name; @@ -5365,7 +5250,7 @@ netdev_dpdk_get_qos(const struct netdev *netdev, /* No QoS configuration set, return an empty string */ *typep = ""; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5379,7 +5264,7 @@ netdev_dpdk_set_qos(struct netdev *netdev, const char *type, struct qos_conf *qos_conf, *new_qos_conf = NULL; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); @@ -5409,7 +5294,7 @@ netdev_dpdk_set_qos(struct netdev *netdev, const char *type, } } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5422,7 +5307,7 @@ netdev_dpdk_get_queue(const struct netdev *netdev, uint32_t queue_id, struct qos_conf *qos_conf; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_get) { @@ -5431,7 +5316,7 @@ netdev_dpdk_get_queue(const struct netdev *netdev, uint32_t queue_id, error = qos_conf->ops->qos_queue_get(details, queue_id, qos_conf); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5444,7 +5329,7 @@ netdev_dpdk_set_queue(struct netdev *netdev, uint32_t queue_id, struct qos_conf *qos_conf; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (!qos_conf || !qos_conf->ops || !qos_conf->ops->qos_queue_construct) { @@ -5459,7 +5344,7 @@ netdev_dpdk_set_queue(struct netdev *netdev, uint32_t queue_id, queue_id, netdev_get_name(netdev), rte_strerror(error)); } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5471,7 +5356,7 @@ netdev_dpdk_delete_queue(struct netdev *netdev, uint32_t queue_id) struct qos_conf *qos_conf; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (qos_conf && qos_conf->ops && qos_conf->ops->qos_queue_destruct) { @@ -5480,7 +5365,7 @@ netdev_dpdk_delete_queue(struct netdev *netdev, uint32_t queue_id) error = EOPNOTSUPP; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5493,7 +5378,7 @@ netdev_dpdk_get_queue_stats(const struct netdev *netdev, uint32_t queue_id, struct qos_conf *qos_conf; int error = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (qos_conf && qos_conf->ops && qos_conf->ops->qos_queue_get_stats) { @@ -5502,7 +5387,7 @@ netdev_dpdk_get_queue_stats(const struct netdev *netdev, uint32_t queue_id, error = EOPNOTSUPP; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5514,7 +5399,7 @@ netdev_dpdk_queue_dump_start(const struct netdev *netdev, void **statep) struct qos_conf *qos_conf; struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); qos_conf = ovsrcu_get_protected(struct qos_conf *, &dev->qos_conf); if (qos_conf && qos_conf->ops @@ -5527,7 +5412,7 @@ netdev_dpdk_queue_dump_start(const struct netdev *netdev, void **statep) error = EOPNOTSUPP; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -5541,7 +5426,7 @@ netdev_dpdk_queue_dump_next(const struct netdev *netdev, void *state_, struct qos_conf *qos_conf; int error = EOF; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); while (state->cur_queue < state->n_queues) { uint32_t queue_id = state->queues[state->cur_queue++]; @@ -5554,7 +5439,7 @@ netdev_dpdk_queue_dump_next(const struct netdev *netdev, void *state_, } } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return error; } @@ -6007,7 +5892,7 @@ dpdk_rx_steer_add_flow(struct netdev_dpdk *dev, { .type = RTE_FLOW_ACTION_TYPE_QUEUE, .conf = &(const struct rte_flow_action_queue) { - .index = dev->up.n_rxq - 1, + .index = dev->common.up.n_rxq - 1, }, }, { .type = RTE_FLOW_ACTION_TYPE_END }, @@ -6018,19 +5903,20 @@ dpdk_rx_steer_add_flow(struct netdev_dpdk *dev, int err; set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); - err = rte_flow_validate(dev->port_id, &attr, items, actions, &error); + err = rte_flow_validate(dev->common.port_id, &attr, + items, actions, &error); if (err) { VLOG_WARN("%s: rx-steering: device does not support %s flow: %s", - netdev_get_name(&dev->up), desc, + netdev_get_name(&dev->common.up), desc, error.message ? error.message : ""); goto out; } set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); - flow = rte_flow_create(dev->port_id, &attr, items, actions, &error); + flow = rte_flow_create(dev->common.port_id, &attr, items, actions, &error); if (flow == NULL) { VLOG_WARN("%s: rx-steering: failed to add %s flow: %s", - netdev_get_name(&dev->up), desc, + netdev_get_name(&dev->common.up), desc, error.message ? error.message : ""); err = rte_errno; goto out; @@ -6042,7 +5928,8 @@ dpdk_rx_steer_add_flow(struct netdev_dpdk *dev, dev->rx_steer_flows_num = num; VLOG_INFO("%s: rx-steering: redirected %s traffic to rx queue %d", - netdev_get_name(&dev->up), desc, dev->up.n_rxq - 1); + netdev_get_name(&dev->common.up), desc, + dev->common.up.n_rxq - 1); out: return err; } @@ -6056,10 +5943,10 @@ dpdk_rx_steer_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq) struct rte_eth_dev_info info; int err; - err = rte_eth_dev_info_get(dev->port_id, &info); + err = rte_eth_dev_info_get(dev->common.port_id, &info); if (err < 0) { VLOG_WARN("%s: failed to query RSS info: %s", - netdev_get_name(&dev->up), rte_strerror(-err)); + netdev_get_name(&dev->common.up), rte_strerror(-err)); goto error; } @@ -6101,10 +5988,11 @@ dpdk_rx_steer_rss_configure(struct netdev_dpdk *dev, int rss_n_rxq) reta_conf[idx].reta[shift] = i % rss_n_rxq; } - err = rte_eth_dev_rss_reta_update(dev->port_id, reta_conf, info.reta_size); + err = rte_eth_dev_rss_reta_update(dev->common.port_id, + reta_conf, info.reta_size); if (err < 0) { VLOG_WARN("%s: failed to configure RSS redirection table: err=%d", - netdev_get_name(&dev->up), err); + netdev_get_name(&dev->common.up), err); } error: @@ -6116,10 +6004,10 @@ dpdk_rx_steer_configure(struct netdev_dpdk *dev) { int err = 0; - if (dev->up.n_rxq < 2) { + if (dev->common.up.n_rxq < 2) { err = ENOTSUP; VLOG_WARN("%s: rx-steering: not enough available rx queues", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); goto out; } @@ -6144,16 +6032,17 @@ dpdk_rx_steer_configure(struct netdev_dpdk *dev) if (dev->rx_steer_flows_num) { /* Reconfigure RSS reta in all but the rx steering queue. */ - err = dpdk_rx_steer_rss_configure(dev, dev->up.n_rxq - 1); + err = dpdk_rx_steer_rss_configure(dev, dev->common.up.n_rxq - 1); if (err) { goto out; } - if (dev->up.n_rxq == 2) { + if (dev->common.up.n_rxq == 2) { VLOG_INFO("%s: rx-steering: redirected other traffic to " - "rx queue 0", netdev_get_name(&dev->up)); + "rx queue 0", netdev_get_name(&dev->common.up)); } else { - VLOG_INFO("%s: rx-steering: applied rss on rx queues 0-%u", - netdev_get_name(&dev->up), dev->up.n_rxq - 2); + VLOG_INFO("%s: rx-steering: applied rss on rx queues" + " 0-%u", netdev_get_name(&dev->common.up), + dev->common.up.n_rxq - 2); } } @@ -6170,13 +6059,14 @@ dpdk_rx_steer_unconfigure(struct netdev_dpdk *dev) return; } - VLOG_DBG("%s: rx-steering: reset flows", netdev_get_name(&dev->up)); + VLOG_DBG("%s: rx-steering: reset flows", netdev_get_name(&dev->common.up)); for (int i = 0; i < dev->rx_steer_flows_num; i++) { set_error(&error, RTE_FLOW_ERROR_TYPE_NONE); - if (rte_flow_destroy(dev->port_id, dev->rx_steer_flows[i], &error)) { + if (rte_flow_destroy(dev->common.port_id, + dev->rx_steer_flows[i], &error)) { VLOG_WARN("%s: rx-steering: failed to destroy flow: %s", - netdev_get_name(&dev->up), + netdev_get_name(&dev->common.up), error.message ? error.message : ""); } } @@ -6198,27 +6088,28 @@ netdev_dpdk_reconfigure(struct netdev *netdev) bool try_rx_steer; int err = 0; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); try_rx_steer = dev->requested_rx_steer_flags != 0; - dev->requested_n_rxq = dev->user_n_rxq; + dev->common.requested_n_rxq = dev->common.user_n_rxq; if (try_rx_steer) { - dev->requested_n_rxq += 1; + dev->common.requested_n_rxq += 1; } - atomic_read_relaxed(&netdev_dpdk_pending_reset[dev->port_id], + atomic_read_relaxed(&netdev_dpdk_pending_reset[dev->common.port_id], &pending_reset); - if (netdev->n_txq == dev->requested_n_txq - && netdev->n_rxq == dev->requested_n_rxq + if (netdev->n_txq == dev->common.requested_n_txq + && netdev->n_rxq == dev->common.requested_n_rxq && dev->rx_steer_flags == dev->requested_rx_steer_flags - && dev->mtu == dev->requested_mtu - && dev->lsc_interrupt_mode == dev->requested_lsc_interrupt_mode - && dev->rxq_size == dev->requested_rxq_size - && dev->txq_size == dev->requested_txq_size - && eth_addr_equals(dev->hwaddr, dev->requested_hwaddr) - && dev->socket_id == dev->requested_socket_id - && dev->started && !pending_reset) { + && dev->common.mtu == dev->common.requested_mtu + && dev->common.lsc_interrupt_mode == + dev->common.requested_lsc_interrupt_mode + && dev->common.rxq_size == dev->common.requested_rxq_size + && dev->common.txq_size == dev->common.requested_txq_size + && eth_addr_equals(dev->common.hwaddr, dev->common.requested_hwaddr) + && dev->common.socket_id == dev->common.requested_socket_id + && dev->common.started && !pending_reset) { /* Reconfiguration is unnecessary */ goto out; @@ -6232,33 +6123,34 @@ retry: * Set false before reset to avoid missing a new reset interrupt event * in a race with event callback. */ - atomic_store_relaxed(&netdev_dpdk_pending_reset[dev->port_id], false); - rte_eth_dev_reset(dev->port_id); + atomic_store_relaxed( + &netdev_dpdk_pending_reset[dev->common.port_id], false); + rte_eth_dev_reset(dev->common.port_id); if_notifier_manual_report(); } else { - rte_eth_dev_stop(dev->port_id); + rte_eth_dev_stop(dev->common.port_id); } - dev->started = false; + dev->common.started = false; err = netdev_dpdk_mempool_configure(dev); if (err && err != EEXIST) { goto out; } - dev->lsc_interrupt_mode = dev->requested_lsc_interrupt_mode; + dev->common.lsc_interrupt_mode = dev->common.requested_lsc_interrupt_mode; - netdev->n_txq = dev->requested_n_txq; - netdev->n_rxq = dev->requested_n_rxq; + netdev->n_txq = dev->common.requested_n_txq; + netdev->n_rxq = dev->common.requested_n_rxq; - dev->rxq_size = dev->requested_rxq_size; - dev->txq_size = dev->requested_txq_size; + dev->common.rxq_size = dev->common.requested_rxq_size; + dev->common.txq_size = dev->common.requested_txq_size; - rte_free(dev->tx_q); - dev->tx_q = NULL; + rte_free(dev->common.tx_q); + dev->common.tx_q = NULL; - if (!eth_addr_equals(dev->hwaddr, dev->requested_hwaddr)) { - err = netdev_dpdk_set_etheraddr__(dev, dev->requested_hwaddr); + if (!eth_addr_equals(dev->common.hwaddr, dev->common.requested_hwaddr)) { + err = netdev_dpdk_set_etheraddr__(dev, dev->common.requested_hwaddr); if (err) { goto out; } @@ -6280,7 +6172,7 @@ retry: * configured by the user, as netdev_dpdk_set_etheraddr__() * will have succeeded to get to this point. */ - dev->requested_hwaddr = dev->hwaddr; + dev->common.requested_hwaddr = dev->common.hwaddr; if (try_rx_steer) { err = dpdk_rx_steer_configure(dev); @@ -6291,46 +6183,47 @@ retry: * The extra queue must be explicitly removed here to ensure that * it is unconfigured immediately. */ - dev->requested_n_rxq = dev->user_n_rxq; + dev->common.requested_n_rxq = dev->common.user_n_rxq; goto retry; } } else { - VLOG_INFO("%s: rx-steering: default rss", netdev_get_name(&dev->up)); + VLOG_INFO("%s: rx-steering: default rss", + netdev_get_name(&dev->common.up)); } dev->rx_steer_flags = dev->requested_rx_steer_flags; - dev->tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); - if (!dev->tx_q) { + dev->common.tx_q = netdev_dpdk_alloc_txq(netdev->n_txq); + if (!dev->common.tx_q) { err = ENOMEM; } netdev_change_seq_changed(netdev); out: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return err; } static int dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->mutex) + OVS_REQUIRES(dev->common.mutex) { - dev->up.n_txq = dev->requested_n_txq; - dev->up.n_rxq = dev->requested_n_rxq; + dev->common.up.n_txq = dev->common.requested_n_txq; + dev->common.up.n_rxq = dev->common.requested_n_rxq; /* Always keep RX queue 0 enabled for implementations that won't * report vring states. */ dev->vhost_rxq_enabled[0] = true; /* Enable TX queue 0 by default if it wasn't disabled. */ - if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { - dev->tx_q[0].map = 0; + if (dev->common.tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { + dev->common.tx_q[0].map = 0; } - rte_spinlock_lock(&dev->stats_lock); - memset(&dev->stats, 0, sizeof dev->stats); - memset(dev->sw_stats, 0, sizeof *dev->sw_stats); - rte_spinlock_unlock(&dev->stats_lock); + rte_spinlock_lock(&dev->common.stats_lock); + memset(&dev->common.stats, 0, sizeof dev->common.stats); + memset(dev->common.sw_stats, 0, sizeof *dev->common.sw_stats); + rte_spinlock_unlock(&dev->common.stats_lock); netdev_dpdk_remap_txqs(dev); @@ -6340,7 +6233,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) err = netdev_dpdk_mempool_configure(dev); if (!err) { /* A new mempool was created or re-used. */ - netdev_change_seq_changed(&dev->up); + netdev_change_seq_changed(&dev->common.up); } else if (err != EEXIST) { return err; } @@ -6348,7 +6241,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) if (dev->vhost_reconfigured == false) { dev->vhost_reconfigured = true; /* Carrier status may need updating. */ - netdev_change_seq_changed(&dev->up); + netdev_change_seq_changed(&dev->common.up); } } @@ -6363,9 +6256,9 @@ netdev_dpdk_vhost_reconfigure(struct netdev *netdev) struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int err; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); err = dpdk_vhost_reconfigure_helper(dev); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return err; } @@ -6378,7 +6271,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) char *vhost_id; int err; - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (dev->vhost_driver_flags & RTE_VHOST_USER_CLIENT && dev->vhost_id && dev->virtio_features_state & OVS_VIRTIO_F_RECONF_PENDING) { @@ -6391,13 +6284,13 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) unregister = true; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); if (unregister) { dpdk_vhost_driver_unregister(dev, vhost_id); } - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); /* Configure vHost client mode if requested and if the following criteria * are met: @@ -6451,14 +6344,14 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) dev->vhost_driver_flags |= vhost_flags; VLOG_INFO("vHost User device '%s' created in 'client' mode, " "using client socket '%s'", - dev->up.name, dev->vhost_id); + dev->common.up.name, dev->vhost_id); } err = rte_vhost_driver_callback_register(dev->vhost_id, &virtio_net_device_ops); if (err) { VLOG_ERR("rte_vhost_driver_callback_register failed for " - "vhost user client port: %s\n", dev->up.name); + "vhost user client port: %s\n", dev->common.up.name); goto unlock; } @@ -6466,7 +6359,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) virtio_unsup_features = 1ULL << VIRTIO_NET_F_HOST_ECN | 1ULL << VIRTIO_NET_F_HOST_UFO; VLOG_DBG("%s: TSO enabled on vhost port", - netdev_get_name(&dev->up)); + netdev_get_name(&dev->common.up)); } else { /* Advertise checksum offloading to the guest, but explicitly * disable TSO and friends. @@ -6481,7 +6374,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) virtio_unsup_features); if (err) { VLOG_ERR("rte_vhost_driver_disable_features failed for " - "vhost user client port: %s\n", dev->up.name); + "vhost user client port: %s\n", dev->common.up.name); goto unlock; } @@ -6492,7 +6385,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) err = rte_vhost_driver_set_max_queue_num(dev->vhost_id, max_qp); if (err) { VLOG_ERR("rte_vhost_driver_set_max_queue_num failed for " - "vhost-user client port: %s\n", dev->up.name); + "vhost-user client port: %s\n", dev->common.up.name); goto unlock; } } @@ -6500,7 +6393,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) err = rte_vhost_driver_start(dev->vhost_id); if (err) { VLOG_ERR("rte_vhost_driver_start failed for vhost user " - "client port: %s\n", dev->up.name); + "client port: %s\n", dev->common.up.name); goto unlock; } } @@ -6508,7 +6401,7 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev) err = dpdk_vhost_reconfigure_helper(dev); unlock: - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return err; } @@ -6524,9 +6417,9 @@ netdev_dpdk_get_port_id(struct netdev *netdev) } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = dev->port_id; - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + ret = dev->common.port_id; + ovs_mutex_unlock(&dev->common.mutex); out: return ret; } @@ -6549,7 +6442,7 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev, bool check_only) } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); if (dev->type == DPDK_DEV_ETH) { if (dev->requested_rx_steer_flags && !check_only) { VLOG_WARN("%s: rx-steering is mutually exclusive with hw-offload," @@ -6561,7 +6454,7 @@ netdev_dpdk_flow_api_supported(struct netdev *netdev, bool check_only) /* TODO: Check if we able to offload some minimal flow. */ ret = true; } - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); out: return ret; } @@ -6574,7 +6467,7 @@ netdev_dpdk_rte_flow_destroy(struct netdev *netdev, struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int ret; - ret = rte_flow_destroy(dev->port_id, rte_flow, error); + ret = rte_flow_destroy(dev->common.port_id, rte_flow, error); return ret; } @@ -6588,7 +6481,7 @@ netdev_dpdk_rte_flow_create(struct netdev *netdev, struct rte_flow *flow; struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - flow = rte_flow_create(dev->port_id, attr, items, actions, error); + flow = rte_flow_create(dev->common.port_id, attr, items, actions, error); return flow; } @@ -6616,7 +6509,7 @@ netdev_dpdk_rte_flow_query_count(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); - ret = rte_flow_query(dev->port_id, rte_flow, actions, query, error); + ret = rte_flow_query(dev->common.port_id, rte_flow, actions, query, error); return ret; } @@ -6637,10 +6530,10 @@ netdev_dpdk_rte_flow_tunnel_decap_set(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = rte_flow_tunnel_decap_set(dev->port_id, tunnel, actions, + ovs_mutex_lock(&dev->common.mutex); + ret = rte_flow_tunnel_decap_set(dev->common.port_id, tunnel, actions, num_of_actions, error); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return ret; } @@ -6659,10 +6552,10 @@ netdev_dpdk_rte_flow_tunnel_match(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = rte_flow_tunnel_match(dev->port_id, tunnel, items, num_of_items, - error); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + ret = rte_flow_tunnel_match(dev->common.port_id, tunnel, + items, num_of_items, error); + ovs_mutex_unlock(&dev->common.mutex); return ret; } @@ -6681,9 +6574,9 @@ netdev_dpdk_rte_flow_get_restore_info(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = rte_flow_get_restore_info(dev->port_id, m, info, error); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + ret = rte_flow_get_restore_info(dev->common.port_id, m, info, error); + ovs_mutex_unlock(&dev->common.mutex); return ret; } @@ -6702,10 +6595,10 @@ netdev_dpdk_rte_flow_tunnel_action_decap_release( } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = rte_flow_tunnel_action_decap_release(dev->port_id, actions, + ovs_mutex_lock(&dev->common.mutex); + ret = rte_flow_tunnel_action_decap_release(dev->common.port_id, actions, num_of_actions, error); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dev->common.mutex); return ret; } @@ -6723,10 +6616,10 @@ netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev, } dev = netdev_dpdk_cast(netdev); - ovs_mutex_lock(&dev->mutex); - ret = rte_flow_tunnel_item_release(dev->port_id, items, num_of_items, - error); - ovs_mutex_unlock(&dev->mutex); + ovs_mutex_lock(&dev->common.mutex); + ret = rte_flow_tunnel_item_release(dev->common.port_id, + items, num_of_items, error); + ovs_mutex_unlock(&dev->common.mutex); return ret; } From patchwork Wed Apr 1 09:13:12 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218455 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=dm4qcdA8; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzqn3kpwz1yGH for ; Wed, 01 Apr 2026 20:16:05 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 1B19081E61; Wed, 1 Apr 2026 09:16:04 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id xA29vXmeqwxb; Wed, 1 Apr 2026 09:16:02 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 0B8F280F1B Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=dm4qcdA8 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp1.osuosl.org (Postfix) with ESMTPS id 0B8F280F1B; Wed, 1 Apr 2026 09:16:02 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0117DC0070; Wed, 1 Apr 2026 09:16:02 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6A995C003D for ; Wed, 1 Apr 2026 09:16:01 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 1995440936 for ; Wed, 1 Apr 2026 09:15:23 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id zgiW1NRnfVB6 for ; Wed, 1 Apr 2026 09:15:17 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c110::1; helo=bn1pr04cu002.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org E4F76408D6 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org E4F76408D6 Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=dm4qcdA8 Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azlp170100001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c110::1]) by smtp2.osuosl.org (Postfix) with ESMTPS id E4F76408D6 for ; Wed, 1 Apr 2026 09:15:15 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=CHx86bgV6lQP1LFeTiWywpRZxE0154TkeeZ1kmoVjiS1SG5n/8QjUEpBUZ3aDihmk7RjQfsq/q7k20Xn5gYz2GD0i4yHxqeQDfq+8X3GIgnxeZyj3P2xDqor8WFH+n+lLRCZNaUmQ7Ns8Bv3oI/e18J9bGb6axn3+X5+R2+sVdITRfuVyxSKdSD+c7OoZu5r/cErmYMWYx7ZnuylXLtAseahLZ++jWeSFNGpnUKcf/YXKeOVE6ff8ukmjPAUmlJPLTUCxFyxKgC+SQ4+2gQGwyu0XmweaNxzq1SDC4JBAAvhwrysY+/JV6yNkB3QwkdeRe3PQZop2b62pz/0CZxePQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=c5+aX2uGvKpievJ2H0vGlxdMw05SsEcJwy+SsEul6Ew=; b=oQoLTU+9dmfFl/klMXw8SweW+RHm+M90i5cjaP4+sQQPVRHxvPtbGncm09F3LCp1Yz0q4h/mAvg6KVrjx68UQZAP8QnAhUtlvyu33CfQeqtkIY5RzKjMklRlPbqamkRUFSab4hAivIlKSvsqN8KhYq/F/bkPx4X6//YYSN6l3+0fquwv2tVVhygd8kgFv0VPhWuLY16RtanjD1RuzUzPs/WbDpYIs/Zy19v7udDcuBb0cw4SKteQNcmXKSdOFIk/xXa97QMEiXRIVeDVb/EyD74APYRuD1gwOP6dVkXbUCyl7s9ftVald0Aa6R9PNQS6lzCYo/xWmpP3UOAIEWHapA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=c5+aX2uGvKpievJ2H0vGlxdMw05SsEcJwy+SsEul6Ew=; b=dm4qcdA8uNHgYOygam4sg1wGdMEDu6hK/KRXcTpj4lSIiAqQmrsGFJO2FTVTms7V5DhYLT86Gb9o9t8NJw0qbdccMWboLZmdj6QTNiPE6Mlxp0e4mNU1bwxj6DTkYUfs/Qh1emEHDxRrgjLcuqezmtYyaBPKpn+xnR1D0n0TpwXCPiEou+cIBGAfGDxIk3Op59HFwUWS1W7M35q7BtQbR0+Ia40/OwgemqGKwKyjKqXZphT4Z/Iez/V+PfktmvKlWSW8+OFCh5btuPIiS4YRL+JWbzbikXTw3uzToAufEvPWwAIdnEbl+vq2iVNTgLNx7YLlX8Qb/KJueeZ0tDUdmA== Received: from DM6PR02CA0135.namprd02.prod.outlook.com (2603:10b6:5:1b4::37) by DS7PR12MB8292.namprd12.prod.outlook.com (2603:10b6:8:e2::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.16; Wed, 1 Apr 2026 09:15:01 +0000 Received: from CY4PEPF0000E9D8.namprd05.prod.outlook.com (2603:10b6:5:1b4:cafe::2e) by DM6PR02CA0135.outlook.office365.com (2603:10b6:5:1b4::37) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.29 via Frontend Transport; Wed, 1 Apr 2026 09:15:01 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CY4PEPF0000E9D8.mail.protection.outlook.com (10.167.241.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:00 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:41 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:38 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:12 +0300 Message-ID: <20260401091318.2671624-6-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9D8:EE_|DS7PR12MB8292:EE_ X-MS-Office365-Filtering-Correlation-Id: 76264b25-3e35-40fb-8d12-08de8fcf2704 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|36860700016|376014|1800799024|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: BEp40uEfdyZssKCXIWir9RQb1xEv6xfBcLuxUiLDAln9aNLiaKGp58bX3bJ54/2XQkmBJzyjXPcB82XJNRn4cQiA8dkkM0/A6OsAT3fiv8ulrqEdUgXZO5fiKBDKxV+W5PaMkib0++X4GYqF/pRCYf7p700GQYK3FN8NT9k7tJsxirPfxAoKR45gvrCZsYqXz54lSLWlSUoU0Lz5UpTEUIesRYyylcL1VdN6dENiHam27zWy3xs2BtQI4DPCOC6Kk5i4O/HjHce6c5t/rw9GOUEzU3zO/AZl14iXxi3+EsBWZBLxsA9m7ZYt/nI9le8wnSMdLi8W/Hg0yQT/HKo9vWic+KAIbv2PEvFrjgaHHwS9EuJOTUamocd/2+MDihCP5aOrOrzOELCDkjTkC71K7Sw1M2YsvJ7UdAI6MrKmyJ2pw8HeV11N8P1J+o64poS5IhHy3DixllFB7BEuABiQn6tWDbsJAh4im2yBpTporRnqB88vD5qDOY8KnY5YLGRz0BvUhETbpmAK5/h9NgK0U6+k0a0SKyoHDzYv6qNBzcNlzQHW/39uToiSAHGPaIJ1pn1CD1Wda7PVehXFRjmH7VB+nh79ucmUB9Wh8FH7m+QhzOjWnYIx1WIAdlMNFN1iiP/0QVPXZSlKoZUVlc5aulw+Tlba9HS2Teo/lylKgTTIWh7CJdkpB6r16r+Baxw1I+71b+yLA5ZejrXHZ9OA6JhPB9QCpunVhkz2K2DxvzN7LfSWA4AUL/Sr4FKg7/6yNZV/7V4cwmmj8mHwR1YGXg== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(82310400026)(36860700016)(376014)(1800799024)(22082099003)(56012099003)(18002099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: OqlJJsM4/wETgX/z/0oiNTH+ISHjTrcaRjvIqTJfwXvRg6VmxvWquxWkwUuPuXmqU/EgZG2oaUfegByvFsxDlAR4WmdTEx/gO7ecoUQx9Qx6P3Z/I4UYfAZzcOkeZiR3TQyXQmLBb6zqnKkCwoxDjeA/BYbfo3urT7pRX7C/PrvHJHijYcZcn02EgSWgeYvWAyWjuaOofKJvugjLj3fhKQMw2W9NJdBxGu9XDyJqFeCXtQFg0dVYxLmHmNLx7ywkJAOtdR7PsFOc7k8Ba5dGTfoneIx+8IQuI7NnQJhSfmsm1CLsPpxY6/DaKJmcwXpgmi2s76zmOlOPw1zH6H0vjOzLpnpWuc5XvNaj0qq2WB6Dg1yQvn998janhwkzlcLpOdUxq/ktbla2ktWbVfABCRpitU9YmVAvj3J06ffHYkpbOUydF112mo9rhn+TkW8t X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:00.7405 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 76264b25-3e35-40fb-8d12-08de8fcf2704 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9D8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB8292 Subject: [ovs-dev] [PATCH v3 05/11] netdev-dpdk: Change access from dev->common.xxx to common->xxx. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Change function signatures to take struct netdev_dpdk_common instead of struct netdev_dpdk, and update internal accesses from dev->common.xxx to common->xxx. This makes these functions reusable by netdev-doca which also operates on netdev_dpdk_common. Signed-off-by: Eli Britstein --- lib/netdev-dpdk.c | 456 +++++++++++++++++++++++----------------------- 1 file changed, 230 insertions(+), 226 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index e34e96dd3..5167ef1b0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -451,8 +451,8 @@ static void netdev_dpdk_vhost_destruct(struct netdev *netdev); static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, struct netdev_custom_stats *); -static void netdev_dpdk_configure_xstats(struct netdev_dpdk *dev); -static void netdev_dpdk_clear_xstats(struct netdev_dpdk *dev); +static void netdev_dpdk_configure_xstats(struct netdev_dpdk_common *common); +static void netdev_dpdk_clear_xstats(struct netdev_dpdk_common *common); int netdev_dpdk_get_vid(const struct netdev_dpdk *dev); @@ -865,33 +865,37 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) } static void -check_link_status(struct netdev_dpdk *dev) +check_link_status(struct netdev_dpdk_common *common) { struct rte_eth_link link; - if (rte_eth_link_get_nowait(dev->common.port_id, &link) < 0) { + if (common->port_id == DPDK_ETH_PORT_ID_INVALID) { + return; + } + + if (rte_eth_link_get_nowait(common->port_id, &link) < 0) { VLOG_DBG_RL(&rl, "Failed to retrieve link status for port "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); return; } - if (dev->common.link.link_status != link.link_status) { - netdev_change_seq_changed(&dev->common.up); + if (common->link.link_status != link.link_status) { + netdev_change_seq_changed(&common->up); - dev->common.link_reset_cnt++; - dev->common.link = link; - if (dev->common.link.link_status) { + common->link_reset_cnt++; + common->link = link; + if (common->link.link_status) { VLOG_DBG_RL(&rl, "Port "DPDK_PORT_ID_FMT" Link Up - speed %u Mbps - %s", - dev->common.port_id, - (unsigned) dev->common.link.link_speed, - (dev->common.link.link_duplex == + common->port_id, + (unsigned) common->link.link_speed, + (common->link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ? "full-duplex" : "half-duplex"); } else { VLOG_DBG_RL(&rl, "Port "DPDK_PORT_ID_FMT" Link Down", - dev->common.port_id); + common->port_id); } } } @@ -899,18 +903,16 @@ check_link_status(struct netdev_dpdk *dev) static void * dpdk_watchdog(void *dummy OVS_UNUSED) { - struct netdev_dpdk *dev; + struct netdev_dpdk_common *common; pthread_detach(pthread_self()); for (;;) { ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { - ovs_mutex_lock(&dev->common.mutex); - if (dev->type == DPDK_DEV_ETH) { - check_link_status(dev); - } - ovs_mutex_unlock(&dev->common.mutex); + LIST_FOR_EACH (common, list_node, &dpdk_list) { + ovs_mutex_lock(&common->mutex); + check_link_status(common); + ovs_mutex_unlock(&common->mutex); } ovs_mutex_unlock(&dpdk_mutex); xsleep(DPDK_PORT_WATCHDOG_INTERVAL); @@ -920,48 +922,46 @@ dpdk_watchdog(void *dummy OVS_UNUSED) } static void -netdev_dpdk_update_netdev_flag(struct netdev_dpdk *dev, +netdev_dpdk_update_netdev_flag(struct netdev_dpdk_common *common, enum dpdk_hw_ol_features hw_ol_features, enum netdev_ol_flags flag) - OVS_REQUIRES(dev->common.mutex) + OVS_REQUIRES(common->mutex) { - struct netdev *netdev = &dev->common.up; - - if (dev->common.hw_ol_features & hw_ol_features) { - netdev->ol_flags |= flag; + if (common->hw_ol_features & hw_ol_features) { + common->up.ol_flags |= flag; } else { - netdev->ol_flags &= ~flag; + common->up.ol_flags &= ~flag; } } static void -netdev_dpdk_update_netdev_flags(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->common.mutex) +netdev_dpdk_update_netdev_flags(struct netdev_dpdk_common *common) + OVS_REQUIRES(common->mutex) { - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_IPV4_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_IPV4_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_IPV4_CKSUM); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_TCP_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_TCP_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_TCP_CKSUM); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_UDP_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_UDP_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_UDP_CKSUM); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_SCTP_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_SCTP_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_SCTP_CKSUM); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_TSO_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_TSO_OFFLOAD, NETDEV_TX_OFFLOAD_TCP_TSO); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD, NETDEV_TX_VXLAN_TNL_TSO); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_GRE_TNL_TSO_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_GRE_TNL_TSO_OFFLOAD, NETDEV_TX_GRE_TNL_TSO); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD, NETDEV_TX_GENEVE_TNL_TSO); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_OUTER_IP_CKSUM); - netdev_dpdk_update_netdev_flag(dev, NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD, + netdev_dpdk_update_netdev_flag(common, NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD, NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM); } static int -dpdk_eth_dev_port_config(struct netdev_dpdk *dev, +dpdk_eth_dev_port_config(struct netdev_dpdk_common *common, const struct rte_eth_dev_info *info, int n_rxq, int n_txq) { @@ -974,60 +974,60 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, * scatter to support jumbo RX. * Setting scatter for the device is done after checking for * scatter support in the device capabilites. */ - if (dev->common.mtu > RTE_ETHER_MTU) { - if (dev->common.hw_ol_features & NETDEV_RX_HW_SCATTER) { + if (common->mtu > RTE_ETHER_MTU) { + if (common->hw_ol_features & NETDEV_RX_HW_SCATTER) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_SCATTER; } } - conf.intr_conf.lsc = dev->common.lsc_interrupt_mode; + conf.intr_conf.lsc = common->lsc_interrupt_mode; - if (dev->common.hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_CHECKSUM; } - if (!(dev->common.hw_ol_features & NETDEV_RX_HW_CRC_STRIP) + if (!(common->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) && info->rx_offload_capa & RTE_ETH_RX_OFFLOAD_KEEP_CRC) { conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; } - if (dev->common.hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; } - if (dev->common.hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_CKSUM; } - if (dev->common.hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_UDP_CKSUM; } - if (dev->common.hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; } - if (dev->common.hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO; } - if (dev->common.hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; } - if (dev->common.hw_ol_features & NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; } - if (dev->common.hw_ol_features & NETDEV_TX_GRE_TNL_TSO_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_GRE_TNL_TSO_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO; } - if (dev->common.hw_ol_features & NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM; } - if (dev->common.hw_ol_features & NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD) { + if (common->hw_ol_features & NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD) { conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; } @@ -1050,38 +1050,38 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); } - diag = rte_eth_dev_configure(dev->common.port_id, n_rxq, n_txq, &conf); + diag = rte_eth_dev_configure(common->port_id, n_rxq, n_txq, &conf); if (diag) { VLOG_WARN("Interface %s eth_dev setup error %s\n", - dev->common.up.name, rte_strerror(-diag)); + common->up.name, rte_strerror(-diag)); break; } - diag = rte_eth_dev_set_mtu(dev->common.port_id, dev->common.mtu); + diag = rte_eth_dev_set_mtu(common->port_id, common->mtu); if (diag) { /* A device may not support rte_eth_dev_set_mtu, in this case * flag a warning to the user and include the devices configured * MTU value that will be used instead. */ if (-ENOTSUP == diag) { - rte_eth_dev_get_mtu(dev->common.port_id, &conf_mtu); + rte_eth_dev_get_mtu(common->port_id, &conf_mtu); VLOG_WARN("Interface %s does not support MTU configuration, " "max packet size supported is %"PRIu16".", - dev->common.up.name, conf_mtu); + common->up.name, conf_mtu); } else { VLOG_ERR("Interface %s MTU (%d) setup error: %s", - dev->common.up.name, dev->common.mtu, + common->up.name, common->mtu, rte_strerror(-diag)); break; } } for (i = 0; i < n_txq; i++) { - diag = rte_eth_tx_queue_setup(dev->common.port_id, - i, dev->common.txq_size, - dev->common.socket_id, NULL); + diag = rte_eth_tx_queue_setup(common->port_id, + i, common->txq_size, + common->socket_id, NULL); if (diag) { VLOG_INFO("Interface %s unable to setup txq(%d): %s", - dev->common.up.name, i, rte_strerror(-diag)); + common->up.name, i, rte_strerror(-diag)); break; } } @@ -1093,13 +1093,13 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, } for (i = 0; i < n_rxq; i++) { - diag = rte_eth_rx_queue_setup(dev->common.port_id, i, - dev->common.rxq_size, - dev->common.socket_id, NULL, - dev->common.dpdk_mp->mp); + diag = rte_eth_rx_queue_setup(common->port_id, i, + common->rxq_size, + common->socket_id, NULL, + common->dpdk_mp->mp); if (diag) { VLOG_INFO("Interface %s unable to setup rxq(%d): %s", - dev->common.up.name, i, rte_strerror(-diag)); + common->up.name, i, rte_strerror(-diag)); break; } } @@ -1110,8 +1110,8 @@ dpdk_eth_dev_port_config(struct netdev_dpdk *dev, continue; } - dev->common.up.n_rxq = n_rxq; - dev->common.up.n_txq = n_txq; + common->up.n_rxq = n_rxq; + common->up.n_txq = n_txq; return 0; } @@ -1295,7 +1295,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) n_rxq = MIN(info.max_rx_queues, dev->common.up.n_rxq); n_txq = MIN(info.max_tx_queues, dev->common.up.n_txq); - diag = dpdk_eth_dev_port_config(dev, &info, n_rxq, n_txq); + diag = dpdk_eth_dev_port_config(&dev->common, &info, n_rxq, n_txq); if (diag) { VLOG_ERR("Interface %s(rxq:%d txq:%d lsc interrupt mode:%s) " "configure error: %s", @@ -1313,7 +1313,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) } dev->common.started = true; - netdev_dpdk_configure_xstats(dev); + netdev_dpdk_configure_xstats(&dev->common); rte_eth_promiscuous_enable(dev->common.port_id); rte_eth_allmulticast_enable(dev->common.port_id); @@ -1668,7 +1668,7 @@ netdev_dpdk_destruct(struct netdev *netdev) } } - netdev_dpdk_clear_xstats(dev); + netdev_dpdk_clear_xstats(&dev->common); free(dev->common.devargs); common_destruct(dev); @@ -1738,25 +1738,25 @@ netdev_dpdk_dealloc(struct netdev *netdev) } static void -netdev_dpdk_clear_xstats(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->common.mutex) +netdev_dpdk_clear_xstats(struct netdev_dpdk_common *common) + OVS_REQUIRES(common->mutex) { - free(dev->common.rte_xstats_names); - dev->common.rte_xstats_names = NULL; - dev->common.rte_xstats_names_size = 0; - free(dev->common.rte_xstats_ids); - dev->common.rte_xstats_ids = NULL; - dev->common.rte_xstats_ids_size = 0; + free(common->rte_xstats_names); + common->rte_xstats_names = NULL; + common->rte_xstats_names_size = 0; + free(common->rte_xstats_ids); + common->rte_xstats_ids = NULL; + common->rte_xstats_ids_size = 0; } static const char * -netdev_dpdk_get_xstat_name(struct netdev_dpdk *dev, uint64_t id) - OVS_REQUIRES(dev->common.mutex) +netdev_dpdk_get_xstat_name(struct netdev_dpdk_common *common, uint64_t id) + OVS_REQUIRES(common->mutex) { - if (id >= dev->common.rte_xstats_names_size) { + if (id >= common->rte_xstats_names_size) { return "UNKNOWN"; } - return dev->common.rte_xstats_names[id].name; + return common->rte_xstats_names[id].name; } static bool @@ -1770,8 +1770,8 @@ is_queue_stat(const char *s) } static void -netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) - OVS_REQUIRES(dev->common.mutex) +netdev_dpdk_configure_xstats(struct netdev_dpdk_common *common) + OVS_REQUIRES(common->mutex) { struct rte_eth_xstat_name *rte_xstats_names = NULL; struct rte_eth_xstat *rte_xstats = NULL; @@ -1780,45 +1780,45 @@ netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) const char *name; uint64_t id; - netdev_dpdk_clear_xstats(dev); + netdev_dpdk_clear_xstats(common); rte_xstats_names_size = - rte_eth_xstats_get_names(dev->common.port_id, NULL, 0); + rte_eth_xstats_get_names(common->port_id, NULL, 0); if (rte_xstats_names_size < 0) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); goto out; } rte_xstats_names = xcalloc(rte_xstats_names_size, sizeof *rte_xstats_names); - rte_xstats_len = rte_eth_xstats_get_names(dev->common.port_id, + rte_xstats_len = rte_eth_xstats_get_names(common->port_id, rte_xstats_names, rte_xstats_names_size); if (rte_xstats_len < 0 || rte_xstats_len != rte_xstats_names_size) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); goto out; } rte_xstats = xcalloc(rte_xstats_names_size, sizeof *rte_xstats); - rte_xstats_len = rte_eth_xstats_get(dev->common.port_id, rte_xstats, + rte_xstats_len = rte_eth_xstats_get(common->port_id, rte_xstats, rte_xstats_names_size); if (rte_xstats_len < 0 || rte_xstats_len != rte_xstats_names_size) { VLOG_WARN("Cannot get XSTATS for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); goto out; } - dev->common.rte_xstats_names = rte_xstats_names; + common->rte_xstats_names = rte_xstats_names; rte_xstats_names = NULL; - dev->common.rte_xstats_names_size = rte_xstats_names_size; + common->rte_xstats_names_size = rte_xstats_names_size; - dev->common.rte_xstats_ids = xcalloc(rte_xstats_names_size, - sizeof *dev->common.rte_xstats_ids); + common->rte_xstats_ids = xcalloc(rte_xstats_names_size, + sizeof *common->rte_xstats_ids); for (unsigned int i = 0; i < rte_xstats_names_size; i++) { id = rte_xstats[i].id; - name = netdev_dpdk_get_xstat_name(dev, id); + name = netdev_dpdk_get_xstat_name(common, id); /* For custom stats, we filter out everything except per rxq/txq basic * stats, and dropped, error and management counters. */ @@ -1827,8 +1827,8 @@ netdev_dpdk_configure_xstats(struct netdev_dpdk *dev) strstr(name, "_management_") || string_ends_with(name, "_dropped")) { - dev->common.rte_xstats_ids[dev->common.rte_xstats_ids_size] = id; - dev->common.rte_xstats_ids_size++; + common->rte_xstats_ids[common->rte_xstats_ids_size] = id; + common->rte_xstats_ids_size++; } } @@ -2054,15 +2054,16 @@ dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, } static void -dpdk_set_rxq_config(struct netdev_dpdk *dev, const struct smap *args) - OVS_REQUIRES(dev->common.mutex) +dpdk_set_rxq_config(struct netdev_dpdk_common *common, + const struct smap *args) + OVS_REQUIRES(common->mutex) { int new_n_rxq; new_n_rxq = MAX(smap_get_int(args, "n_rxq", NR_QUEUE), 1); - if (new_n_rxq != dev->common.user_n_rxq) { - dev->common.user_n_rxq = new_n_rxq; - netdev_request_reconfigure(&dev->common.up); + if (new_n_rxq != common->user_n_rxq) { + common->user_n_rxq = new_n_rxq; + netdev_request_reconfigure(&common->up); } } @@ -2176,7 +2177,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, dpdk_set_rx_steer_config(netdev, dev, args, errp); - dpdk_set_rxq_config(dev, args); + dpdk_set_rxq_config(&dev->common, args); new_devargs = smap_get(args, "dpdk-devargs"); @@ -2399,28 +2400,28 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, static int netdev_dpdk_get_numa_id(const struct netdev *netdev) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - return dev->common.socket_id; + return common->socket_id; } /* Sets the number of tx queues for the dpdk interface. */ static int netdev_dpdk_set_tx_multiq(struct netdev *netdev, unsigned int n_txq) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); - if (dev->common.requested_n_txq == n_txq) { + if (common->requested_n_txq == n_txq) { goto out; } - dev->common.requested_n_txq = n_txq; + common->requested_n_txq = n_txq; netdev_request_reconfigure(netdev); out: - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return 0; } @@ -2492,7 +2493,8 @@ netdev_dpdk_batch_init_packet_fields(struct dp_packet_batch *batch) /* Prepare the packet for HWOL. * Return True if the packet is OK to continue. */ static bool -netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) +netdev_dpdk_prep_hwol_packet(struct netdev_dpdk_common *common, + struct rte_mbuf *mbuf) { struct dp_packet *pkt = CONTAINER_OF(mbuf, struct dp_packet, mbuf); uint64_t unexpected = mbuf->ol_flags & RTE_MBUF_F_TX_OFFLOAD_MASK; @@ -2508,8 +2510,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) if (OVS_UNLIKELY(unexpected)) { VLOG_WARN_RL(&rl, "%s: Unexpected Tx offload flags: %#"PRIx64, - netdev_get_name(&dev->common.up), unexpected); - netdev_dpdk_mbuf_dump(netdev_get_name(&dev->common.up), + netdev_get_name(&common->up), unexpected); + netdev_dpdk_mbuf_dump(netdev_get_name(&common->up), "Packet with unexpected ol_flags", mbuf); return false; } @@ -2611,11 +2613,11 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) } if (OVS_UNLIKELY((hdr_len + mbuf->tso_segsz) > - dev->common.max_packet_len)) { + common->max_packet_len)) { VLOG_WARN_RL(&rl, "%s: Oversized TSO packet. hdr: %"PRIu32", " "gso: %"PRIu32", max len: %"PRIu32"", - dev->common.up.name, hdr_len, mbuf->tso_segsz, - dev->common.max_packet_len); + common->up.name, hdr_len, mbuf->tso_segsz, + common->max_packet_len); return false; } mbuf->ol_flags |= RTE_MBUF_F_TX_TCP_SEG; @@ -2632,8 +2634,8 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk *dev, struct rte_mbuf *mbuf) /* Prepare a batch for HWOL. * Return the number of good packets in the batch. */ static int -netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf **pkts, - int pkt_cnt) +netdev_dpdk_prep_hwol_batch(struct netdev_dpdk_common *common, + struct rte_mbuf **pkts, int pkt_cnt) { int i = 0; int cnt = 0; @@ -2642,7 +2644,7 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk *dev, struct rte_mbuf **pkts, /* Prepare and filter bad HWOL packets. */ for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; - if (!netdev_dpdk_prep_hwol_packet(dev, pkt)) { + if (!netdev_dpdk_prep_hwol_packet(common, pkt)) { rte_pktmbuf_free(pkt); continue; } @@ -3105,9 +3107,9 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet *pkt_orig) * * Returns the number of good packets in the batch. */ static size_t -dpdk_copy_batch_to_mbuf(struct netdev *netdev, struct dp_packet_batch *batch) +dpdk_copy_batch_to_mbuf(struct netdev_dpdk_common *common, + struct dp_packet_batch *batch) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); size_t i, size = dp_packet_batch_size(batch); struct dp_packet *packet; @@ -3118,7 +3120,7 @@ dpdk_copy_batch_to_mbuf(struct netdev *netdev, struct dp_packet_batch *batch) struct dp_packet *pktcopy; pktcopy = dpdk_copy_dp_packet_to_mbuf( - dev->common.dpdk_mp->mp, packet); + common->dpdk_mp->mp, packet); if (pktcopy) { dp_packet_batch_refill(batch, pktcopy, i); } @@ -3151,7 +3153,7 @@ netdev_dpdk_common_send(struct netdev *netdev, struct dp_packet_batch *batch, /* Copy dp-packets to mbufs. */ if (OVS_UNLIKELY(need_copy)) { - cnt = dpdk_copy_batch_to_mbuf(netdev, batch); + cnt = dpdk_copy_batch_to_mbuf(&dev->common, batch); stats->tx_failure_drops += pkt_cnt - cnt; pkt_cnt = cnt; } @@ -3163,7 +3165,7 @@ netdev_dpdk_common_send(struct netdev *netdev, struct dp_packet_batch *batch, if (netdev->ol_flags) { /* Prepare each mbuf for hardware offloading. */ - cnt = netdev_dpdk_prep_hwol_batch(dev, pkts, pkt_cnt); + cnt = netdev_dpdk_prep_hwol_batch(&dev->common, pkts, pkt_cnt); stats->tx_invalid_hwol_drops += pkt_cnt - cnt; pkt_cnt = cnt; } @@ -3310,19 +3312,20 @@ static int netdev_dpdk_set_etheraddr__(struct netdev_dpdk *dev, const struct eth_addr mac) OVS_REQUIRES(dev->common.mutex) { + struct netdev_dpdk_common *common = &dev->common; int err = 0; if (dev->type == DPDK_DEV_ETH) { struct rte_ether_addr ea; memcpy(ea.addr_bytes, mac.ea, ETH_ADDR_LEN); - err = -rte_eth_dev_default_mac_addr_set(dev->common.port_id, &ea); + err = -rte_eth_dev_default_mac_addr_set(common->port_id, &ea); } if (!err) { - dev->common.hwaddr = mac; + common->hwaddr = mac; } else { VLOG_WARN("%s: Failed to set requested mac("ETH_ADDR_FMT"): %s", - netdev_get_name(&dev->common.up), ETH_ADDR_ARGS(mac), + netdev_get_name(&common->up), ETH_ADDR_ARGS(mac), rte_strerror(err)); } @@ -3350,11 +3353,11 @@ netdev_dpdk_set_etheraddr(struct netdev *netdev, const struct eth_addr mac) static int netdev_dpdk_get_etheraddr(const struct netdev *netdev, struct eth_addr *mac) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); - *mac = dev->common.hwaddr; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + *mac = common->hwaddr; + ovs_mutex_unlock(&common->mutex); return 0; } @@ -3362,11 +3365,11 @@ netdev_dpdk_get_etheraddr(const struct netdev *netdev, struct eth_addr *mac) static int netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); - *mtup = dev->common.mtu; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + *mtup = common->mtu; + ovs_mutex_unlock(&common->mutex); return 0; } @@ -3753,29 +3756,29 @@ netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); static int netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); struct rte_eth_stats rte_stats; bool gg; netdev_dpdk_get_carrier(netdev, &gg); - ovs_mutex_lock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); struct rte_eth_xstat *rte_xstats = NULL; struct rte_eth_xstat_name *rte_xstats_names = NULL; int rte_xstats_len, rte_xstats_new_len, rte_xstats_ret; - if (rte_eth_stats_get(dev->common.port_id, &rte_stats)) { + if (rte_eth_stats_get(common->port_id, &rte_stats)) { VLOG_ERR("Can't get ETH statistics for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); - ovs_mutex_unlock(&dev->common.mutex); + common->port_id); + ovs_mutex_unlock(&common->mutex); return EPROTO; } /* Get length of statistics */ - rte_xstats_len = rte_eth_xstats_get_names(dev->common.port_id, NULL, 0); + rte_xstats_len = rte_eth_xstats_get_names(common->port_id, NULL, 0); if (rte_xstats_len < 0) { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); goto out; } /* Reserve memory for xstats names and values */ @@ -3783,24 +3786,24 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) rte_xstats = xcalloc(rte_xstats_len, sizeof *rte_xstats); /* Retreive xstats names */ - rte_xstats_new_len = rte_eth_xstats_get_names(dev->common.port_id, + rte_xstats_new_len = rte_eth_xstats_get_names(common->port_id, rte_xstats_names, rte_xstats_len); if (rte_xstats_new_len != rte_xstats_len) { VLOG_WARN("Cannot get XSTATS names for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); goto out; } /* Retreive xstats values */ memset(rte_xstats, 0xff, sizeof *rte_xstats * rte_xstats_len); - rte_xstats_ret = rte_eth_xstats_get(dev->common.port_id, rte_xstats, + rte_xstats_ret = rte_eth_xstats_get(common->port_id, rte_xstats, rte_xstats_len); if (rte_xstats_ret > 0 && rte_xstats_ret <= rte_xstats_len) { netdev_dpdk_convert_xstats(stats, rte_xstats, rte_xstats_names, rte_xstats_len); } else { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); } out: @@ -3814,17 +3817,17 @@ out: stats->rx_errors = rte_stats.ierrors; stats->tx_errors = rte_stats.oerrors; - rte_spinlock_lock(&dev->common.stats_lock); - stats->tx_dropped = dev->common.stats.tx_dropped; - stats->rx_dropped = dev->common.stats.rx_dropped; - rte_spinlock_unlock(&dev->common.stats_lock); + rte_spinlock_lock(&common->stats_lock); + stats->tx_dropped = common->stats.tx_dropped; + stats->rx_dropped = common->stats.rx_dropped; + rte_spinlock_unlock(&common->stats_lock); /* These are the available DPDK counters for packets not received due to * local resource constraints in DPDK and NIC respectively. */ stats->rx_dropped += rte_stats.rx_nombuf + rte_stats.imissed; stats->rx_missed_errors = rte_stats.imissed; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return 0; } @@ -3833,27 +3836,26 @@ static int netdev_dpdk_get_custom_stats(const struct netdev *netdev, struct netdev_custom_stats *custom_stats) { - - uint32_t i; - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); int rte_xstats_ret, sw_stats_size; + uint32_t i; netdev_dpdk_get_sw_custom_stats(netdev, custom_stats); - ovs_mutex_lock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); - if (dev->common.rte_xstats_ids_size > 0) { - uint64_t *values = xcalloc(dev->common.rte_xstats_ids_size, + if (common->rte_xstats_ids_size > 0) { + uint64_t *values = xcalloc(common->rte_xstats_ids_size, sizeof(uint64_t)); rte_xstats_ret = - rte_eth_xstats_get_by_id(dev->common.port_id, - dev->common.rte_xstats_ids, + rte_eth_xstats_get_by_id(common->port_id, + common->rte_xstats_ids, values, - dev->common.rte_xstats_ids_size); + common->rte_xstats_ids_size); if (rte_xstats_ret > 0 && - rte_xstats_ret <= dev->common.rte_xstats_ids_size) { + rte_xstats_ret <= common->rte_xstats_ids_size) { sw_stats_size = custom_stats->size; custom_stats->size += rte_xstats_ret; @@ -3864,19 +3866,19 @@ netdev_dpdk_get_custom_stats(const struct netdev *netdev, for (i = 0; i < rte_xstats_ret; i++) { ovs_strlcpy(custom_stats->counters[sw_stats_size + i].name, netdev_dpdk_get_xstat_name( - dev, dev->common.rte_xstats_ids[i]), + common, common->rte_xstats_ids[i]), NETDEV_CUSTOM_STATS_NAME_SIZE); custom_stats->counters[sw_stats_size + i].value = values[i]; } } else { VLOG_WARN("Cannot get XSTATS values for port: "DPDK_PORT_ID_FMT, - dev->common.port_id); + common->port_id); } free(values); } - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return 0; } @@ -3938,13 +3940,13 @@ netdev_dpdk_get_features(const struct netdev *netdev, enum netdev_features *supported, enum netdev_features *peer) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); struct rte_eth_link link; uint32_t feature = 0; - ovs_mutex_lock(&dev->common.mutex); - link = dev->common.link; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + link = common->link; + ovs_mutex_unlock(&common->mutex); /* Match against OpenFlow defined link speed values. */ if (link.link_duplex == RTE_ETH_LINK_FULL_DUPLEX) { @@ -4000,15 +4002,15 @@ static int netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current, uint32_t *max) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); struct rte_eth_dev_info dev_info; struct rte_eth_link link; int diag; - ovs_mutex_lock(&dev->common.mutex); - link = dev->common.link; - diag = rte_eth_dev_info_get(dev->common.port_id, &dev_info); - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + link = common->link; + diag = rte_eth_dev_info_get(common->port_id, &dev_info); + ovs_mutex_unlock(&common->mutex); *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN ? link.link_speed : 0; @@ -4155,14 +4157,14 @@ netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, static int netdev_dpdk_get_ifindex(const struct netdev *netdev) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); /* Calculate hash from the netdev name. Ensure that ifindex is a 24-bit * postive integer to meet RFC 2863 recommendations. */ int ifindex = hash_string(netdev->name, 0) % 0xfffffe + 1; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return ifindex; } @@ -4170,13 +4172,13 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev) static int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); - check_link_status(dev); - *carrier = dev->common.link.link_status; + ovs_mutex_lock(&common->mutex); + check_link_status(common); + *carrier = common->link.link_status; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return 0; } @@ -4202,12 +4204,12 @@ netdev_dpdk_vhost_get_carrier(const struct netdev *netdev, bool *carrier) static long long int netdev_dpdk_get_carrier_resets(const struct netdev *netdev) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); long long int carrier_resets; - ovs_mutex_lock(&dev->common.mutex); - carrier_resets = dev->common.link_reset_cnt; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + carrier_resets = common->link_reset_cnt; + ovs_mutex_unlock(&common->mutex); return carrier_resets; } @@ -4225,15 +4227,17 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, enum netdev_flags *old_flagsp) OVS_REQUIRES(dev->common.mutex) { + struct netdev_dpdk_common *common = &dev->common; + if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { return EINVAL; } - *old_flagsp = dev->common.flags; - dev->common.flags |= on; - dev->common.flags &= ~off; + *old_flagsp = common->flags; + common->flags |= on; + common->flags &= ~off; - if (dev->common.flags == *old_flagsp) { + if (common->flags == *old_flagsp) { return 0; } @@ -4242,27 +4246,27 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, if ((dev->common.flags ^ *old_flagsp) & NETDEV_UP) { int err; - if (dev->common.flags & NETDEV_UP) { - err = rte_eth_dev_set_link_up(dev->common.port_id); + if (common->flags & NETDEV_UP) { + err = rte_eth_dev_set_link_up(common->port_id); } else { - err = rte_eth_dev_set_link_down(dev->common.port_id); + err = rte_eth_dev_set_link_down(common->port_id); } if (err == -ENOTSUP) { VLOG_INFO("Interface %s does not support link state " - "configuration", netdev_get_name(&dev->common.up)); + "configuration", netdev_get_name(&common->up)); } else if (err < 0) { VLOG_ERR("Interface %s link change error: %s", - netdev_get_name(&dev->common.up), rte_strerror(-err)); - dev->common.flags = *old_flagsp; + netdev_get_name(&common->up), rte_strerror(-err)); + common->flags = *old_flagsp; return -err; } } - if (dev->common.flags & NETDEV_PROMISC) { - rte_eth_promiscuous_enable(dev->common.port_id); + if (common->flags & NETDEV_PROMISC) { + rte_eth_promiscuous_enable(common->port_id); } - netdev_change_seq_changed(&dev->common.up); + netdev_change_seq_changed(&common->up); } else { /* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is * running then change netdev's change_seq to trigger link state @@ -4270,15 +4274,14 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) && is_vhost_running(dev)) { - netdev_change_seq_changed(&dev->common.up); + netdev_change_seq_changed(&common->up); /* Clear statistics if device is getting up. */ if (NETDEV_UP & on) { - rte_spinlock_lock(&dev->common.stats_lock); - memset(&dev->common.stats, 0, sizeof dev->common.stats); - memset(dev->common.sw_stats, 0, - sizeof *dev->common.sw_stats); - rte_spinlock_unlock(&dev->common.stats_lock); + rte_spinlock_lock(&common->stats_lock); + memset(&common->stats, 0, sizeof common->stats); + memset(common->sw_stats, 0, sizeof *common->sw_stats); + rte_spinlock_unlock(&common->stats_lock); } } } @@ -4394,6 +4397,7 @@ netdev_dpdk_link_speed_to_str__(uint32_t link_speed) static int netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) { + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); struct rte_eth_dev_info dev_info; size_t rx_steer_flows_num; @@ -4402,28 +4406,28 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) int n_rxq; int diag; - if (!rte_eth_dev_is_valid_port(dev->common.port_id)) { + if (!rte_eth_dev_is_valid_port(common->port_id)) { return ENODEV; } ovs_mutex_lock(&dpdk_mutex); - ovs_mutex_lock(&dev->common.mutex); - diag = rte_eth_dev_info_get(dev->common.port_id, &dev_info); - link_speed = dev->common.link.link_speed; + ovs_mutex_lock(&common->mutex); + diag = rte_eth_dev_info_get(common->port_id, &dev_info); + link_speed = common->link.link_speed; rx_steer_flags = dev->rx_steer_flags; rx_steer_flows_num = dev->rx_steer_flows_num; n_rxq = netdev->n_rxq; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); ovs_mutex_unlock(&dpdk_mutex); - smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->common.port_id); + smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, common->port_id); smap_add_format(args, "numa_id", "%d", - rte_eth_dev_socket_id(dev->common.port_id)); + rte_eth_dev_socket_id(common->port_id)); if (!diag) { smap_add_format(args, "driver_name", "%s", dev_info.driver_name); smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); } - smap_add_format(args, "max_rx_pktlen", "%u", dev->common.max_packet_len); + smap_add_format(args, "max_rx_pktlen", "%u", common->max_packet_len); if (!diag) { smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); @@ -4438,7 +4442,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) smap_add_format(args, "n_txq", "%d", netdev->n_txq); smap_add(args, "rx_csum_offload", - dev->common.hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD + common->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD ? "true" : "false"); /* Querying the DPDK library for iftype may be done in future, pending @@ -4464,9 +4468,9 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) smap_add(args, "link_speed", netdev_dpdk_link_speed_to_str__(link_speed)); - if (dev->common.is_representor) { + if (common->is_representor) { smap_add_format(args, "dpdk-vf-mac", ETH_ADDR_FMT, - ETH_ADDR_ARGS(dev->common.hwaddr)); + ETH_ADDR_ARGS(common->hwaddr)); } if (rx_steer_flags && !rx_steer_flows_num) { @@ -4817,7 +4821,7 @@ new_device(int vid) } } - netdev_dpdk_update_netdev_flags(dev); + netdev_dpdk_update_netdev_flags(&dev->common); ovsrcu_index_set(&dev->vid, vid); exists = true; @@ -4884,7 +4888,7 @@ destroy_device(int vid) /* Clear offload capabilities before next new_device. */ dev->common.hw_ol_features = 0; - netdev_dpdk_update_netdev_flags(dev); + netdev_dpdk_update_netdev_flags(&dev->common); netdev_change_seq_changed(&dev->common.up); ovs_mutex_unlock(&dev->common.mutex); @@ -6160,7 +6164,7 @@ retry: if (err) { goto out; } - netdev_dpdk_update_netdev_flags(dev); + netdev_dpdk_update_netdev_flags(&dev->common); /* If both requested and actual hwaddr were previously * unset (initialized to 0), then first device init above @@ -6245,7 +6249,7 @@ dpdk_vhost_reconfigure_helper(struct netdev_dpdk *dev) } } - netdev_dpdk_update_netdev_flags(dev); + netdev_dpdk_update_netdev_flags(&dev->common); return 0; } From patchwork Wed Apr 1 09:13:13 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218452 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=h7IC9X63; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzpz0Kn5z1yGH for ; Wed, 01 Apr 2026 20:15:23 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 09AAB6112E; Wed, 1 Apr 2026 09:15:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 8Q4LP5U59NuY; Wed, 1 Apr 2026 09:15:20 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org F15B561085 Authentication-Results: smtp3.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=h7IC9X63 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp3.osuosl.org (Postfix) with ESMTPS id F15B561085; Wed, 1 Apr 2026 09:15:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E8D9EC0070; Wed, 1 Apr 2026 09:15:19 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [140.211.166.136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1F36DC054C for ; Wed, 1 Apr 2026 09:15:19 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 2F9A461029 for ; Wed, 1 Apr 2026 09:15:08 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id MDeHhmpoeiXu for ; Wed, 1 Apr 2026 09:15:07 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c000::1; helo=byapr05cu005.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp3.osuosl.org CE85761036 Authentication-Results: smtp3.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org CE85761036 Received: from BYAPR05CU005.outbound.protection.outlook.com (mail-westusazlp170100001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c000::1]) by smtp3.osuosl.org (Postfix) with ESMTPS id CE85761036 for ; Wed, 1 Apr 2026 09:15:06 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=LDhG0tBy5Llrx0HeqaMRusXgzWxbB1G8HpMkdn+IrmKnzIJxecLvYixFQzQqaVo2MBH5i56EWICVj+f7KMXhfdz66Be/4KSIdvK9GodcjAG1YU6dJOkcMl/KLP/05SHnuF+FLRKpe6quGe2rc/3yHDwz7A6P8+QVaQMax/meRRGx8v3gtu0W5qrKrvdij6oX+rwtPOaxr6hQ6/5t/GMOb9BSXKCzomNGIcqkOcLGj360RAa4M5ArsDXeAWFMm7Iif+v9lENfzwHYL5t779LohXn9hdFKMXWixI9oShWmwnEQlDQeW9xJ/dy/OtHmP9LPDROkGBIECBNLxVYwhQT0Mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=h1it8NlhqRCbEnR9cVfxRiWrPFnTPHWcwCDeY5D9W/I=; b=NNTRBIYq2mcsnZICxogdVN/d3wTD5P+5fp8v6umY8EGNOgBHH8h4FqQcGBTqwe2E4Mfjn0oinWjuKRi7/k1t13Ryiq3ZOyrBRyZFVfaJOD57OpKI2KBDbjKJKN5tRE87SvpwQ132wxpzUJqS5hx26tl0ZJTWJO4juYV7LnQEG6FWZqC/82d1/D2bjNVDMVLPMv4KsRYS9U/Wch8+kH+IQaYUl33h9P07BOX2cNIPs5bM+ChbmYoAqKYneHSMuBiidK6tNGQ+cAEhr++H6CVAxASPcfD/woOPOIeBVvsgyeED0lRtwFPb9FnTfHvxOJ7qnq57PrHnCXMehgT2ozYi4A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=h1it8NlhqRCbEnR9cVfxRiWrPFnTPHWcwCDeY5D9W/I=; b=h7IC9X63qJnl7/mM+8lNzIFrh+WP+bkI1VWsOmoF/Yj1skFSO6G73RX+qgVqAd5UeAIAzhLKM25+mPVXSDxobQzIKu9V8/YCRg5l5CjUZlmw/s4vkfueAFjN2RL4F7KdkW4Ar2gCZ5AxOBEvCrfZ0iP2RSrb2fXTzxURuyCFkum/VoshccTmJGpF76wmxl8WuoT0uuQvdoI1jjQBcVbFT0DtaAc7QMd2WwM40DoI1eBIygj3Dk5EIUfwB1w5WciHAIyGbl69Ur2PJQtelOqD4/WcJHfYtXk8kQPHCktz13HnWNjH+TgVvcwdva6mNCVo5O0sQ2MN5XzEaN9U8MltpQ== Received: from PH1PEPF000132EF.NAMP220.PROD.OUTLOOK.COM (2603:10b6:518:1::37) by IA1PR12MB6044.namprd12.prod.outlook.com (2603:10b6:208:3d4::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 09:15:03 +0000 Received: from SA2PEPF000015CB.namprd03.prod.outlook.com (2a01:111:f403:c801::5) by PH1PEPF000132EF.outlook.office365.com (2603:1036:903:47::3) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.27 via Frontend Transport; Wed, 1 Apr 2026 09:15:02 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015CB.mail.protection.outlook.com (10.167.241.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:02 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:43 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:41 -0700 To: Date: Wed, 1 Apr 2026 12:13:13 +0300 Message-ID: <20260401091318.2671624-7-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015CB:EE_|IA1PR12MB6044:EE_ X-MS-Office365-Filtering-Correlation-Id: d94b7735-19b7-4373-0777-08de8fcf27ef X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|1800799024|376014|36860700016|82310400026|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: PVu8olc0i6E66oMCdCICVimaQjfiI+SMQZhJA0A1Gec8zIxiy1oNWbNEesKuAjueUgbYDrHaLfdb92OgEouWx/iIrWO2mUCqLNQQJQYl2SxN9bEpdlipF/whR70GGzkiXRgEipUGeRvJ2okfKuzRfDauICvxTFkkC0YUBQRWtiK0infpN/D/voDyL/plSGEEjWJBIP6JZpxdHkn5W5u/Sfu8TxpyvlTth76tweGy7wSh3MVcXnHwk/76w2lZJiPhSKrC4xHyzSYBzj0VPxjk1VgBa1RsmHPhwAIWnscIMNxfpB0PQjvaH1S10ULp/F76pF/ZxO6VgUFZ28kFeKFLzfXRtMUTNyUW1mFiDwlQs4GPeaQHvKBKrJ4TcOwe9AqupGIt+RB+b9AJabafDehDkuq2EjEEcdaRGwhv5bEAIFQMgcIr6mUji8z08b3yo6t5ALZHf3oJk0ZrfEZxEYiDkTbWPcKXjh9lsCiaZaKim1bHJ7FicomZpd3XDCBEnG91anTlwobeY5XooqUnyfskMdoZqzVwnEtWzFABUmTtAGF4fBWZKZ6TDYamaHWN5imM+hM1W3XzxWcF9ECQyq6Jd0Ytgpc0yk1WP+7GffjAYWC5sy26bWxBBHEZNhHqWSa3nOCS5r6o4iRGKhPqlmHAwvR2puO1r9QfboCJgIDYB4a8SsqBcsvdPS5h9IpDZ+GGWH+CGZRSNATsfYX1qe1JOWQ+xtHV0aBk4MFOyyPIfzUlsWhoSk8DREkfwgMxpK7a0EG/oPHM71KoFLIgcydPJA== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(1800799024)(376014)(36860700016)(82310400026)(22082099003)(18002099003)(56012099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: SL3qcorORwsBS67srjHV3R6ko887bKUDukQRZeEEu15gfosroIezTorhSRGRyctGgombbETOSOkUcI4bjLrBNS2FHMVj1hbtPy1WDckccjYf1zHy2QuERGufVlxoGpnfRV8/0ckEx0AWIqEl7bz1RaovF+toNkDFy5rXiHaogcbZ6CM8V0FEwximg4OH5G4r0I0F/xRcSIQZS2bwcC9+zXOsXKTY9Y7j4YvzUBDw90QHXssYuhvFeOxxEjmsDDhJvQbokoc6Fbqq2z/F7hgKbOwueLiBj99Q7NQIZ/E9FkpmXnHCPs8OqgcAApmUDa+vzjtSaY8SMTgTGEaYRTs8eF6q2lkkNbCmAgP84xI/uoxsTFAItlY3SSyi6U2dIIRfzN1OU+Mgbz5bMJ+A7ohbBUfARXFX/D0fTBcgVeP6Hza4/p6i0HGPt0KQqsQDhj/a X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:02.2788 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: d94b7735-19b7-4373-0777-08de8fcf27ef X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015CB.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA1PR12MB6044 Subject: [ovs-dev] [PATCH v3 06/11] netdev-dpdk: Make 'started' field atomic. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Eli Britstein via dev From: Eli Britstein Reply-To: Eli Britstein Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The started field is currently 'bool', accessed in netdev-dpdk only under a mutex lock. In netdev-doca it will be used not under a lock. Move it to be atomic as a pre-step towards it. Signed-off-by: Eli Britstein --- lib/netdev-dpdk-private.h | 11 ++++++++++- lib/netdev-dpdk.c | 12 +++++++----- 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/lib/netdev-dpdk-private.h b/lib/netdev-dpdk-private.h index 9b82db750..79aa2292a 100644 --- a/lib/netdev-dpdk-private.h +++ b/lib/netdev-dpdk-private.h @@ -109,7 +109,7 @@ struct netdev_dpdk_common { uint16_t port_id; bool attached; bool is_representor; - bool started; + atomic_bool started; struct eth_addr hwaddr; int mtu; int socket_id; @@ -170,4 +170,13 @@ netdev_dpdk_common_cast(const struct netdev *netdev) return CONTAINER_OF(netdev, struct netdev_dpdk_common, up); } +static inline bool +dpdk_dev_is_started(struct netdev_dpdk_common *common) +{ + bool started; + + atomic_read(&common->started, &started); + return started; +} + #endif /* NETDEV_DPDK_PRIVATE_H */ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 5167ef1b0..02b346561 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -1311,7 +1311,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) rte_strerror(-diag)); return -diag; } - dev->common.started = true; netdev_dpdk_configure_xstats(&dev->common); @@ -1331,6 +1330,9 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) mbp_priv = rte_mempool_get_priv(dev->common.dpdk_mp->mp); dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM; + + atomic_store(&dev->common.started, true); + return 0; } @@ -1399,7 +1401,7 @@ common_construct(struct netdev *netdev, dpdk_port_t port_no, dev->vhost_reconfigured = false; dev->virtio_features_state = OVS_VIRTIO_F_CLEAN; dev->common.attached = false; - dev->common.started = false; + atomic_init(&dev->common.started, false); ovsrcu_init(&dev->qos_conf, NULL); @@ -1610,7 +1612,7 @@ netdev_dpdk_destruct(struct netdev *netdev) dpdk_rx_steer_unconfigure(dev); rte_eth_dev_stop(dev->common.port_id); - dev->common.started = false; + atomic_store(&dev->common.started, false); if (dev->common.attached) { bool dpdk_resources_still_used = false; @@ -6113,7 +6115,7 @@ netdev_dpdk_reconfigure(struct netdev *netdev) && dev->common.txq_size == dev->common.requested_txq_size && eth_addr_equals(dev->common.hwaddr, dev->common.requested_hwaddr) && dev->common.socket_id == dev->common.requested_socket_id - && dev->common.started && !pending_reset) { + && dpdk_dev_is_started(&dev->common) && !pending_reset) { /* Reconfiguration is unnecessary */ goto out; @@ -6135,7 +6137,7 @@ retry: rte_eth_dev_stop(dev->common.port_id); } - dev->common.started = false; + atomic_store(&dev->common.started, false); err = netdev_dpdk_mempool_configure(dev); if (err && err != EEXIST) { From patchwork Wed Apr 1 09:13:14 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218454 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=Uv/8bVUW; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzqV1m7Qz1yGH for ; Wed, 01 Apr 2026 20:15:50 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id D098C4112B; Wed, 1 Apr 2026 09:15:48 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id 5xph0EnIJ6pg; Wed, 1 Apr 2026 09:15:47 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B46424110F Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=Uv/8bVUW Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp4.osuosl.org (Postfix) with ESMTPS id B46424110F; Wed, 1 Apr 2026 09:15:47 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8F812C0070; Wed, 1 Apr 2026 09:15:47 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 429AFC003D for ; Wed, 1 Apr 2026 09:15:46 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 21175408A4 for ; Wed, 1 Apr 2026 09:15:19 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id WtXRfUOH2ZQx for ; Wed, 1 Apr 2026 09:15:14 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c110::1; helo=bn1pr04cu002.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 4717140850 Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 4717140850 Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=Uv/8bVUW Received: from BN1PR04CU002.outbound.protection.outlook.com (mail-eastus2azlp170100001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c110::1]) by smtp2.osuosl.org (Postfix) with ESMTPS id 4717140850 for ; Wed, 1 Apr 2026 09:15:13 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=TCaQFXsjZWnYkWjLDpdCyDgSz3QDaL/i65ISEz03rjzfZ0eBp4pPkCA8F1LXhl5WbMyczBMX3Y19b4fZKXoVpDpHFBYeuvH0bbSnMSyZAl3OFaC2skrB+EmkaIgbcy9xHJqX3E8pxyADL01I2tNlQl+oxr+Ra8AC7SBwF0Ho4cR2ydf9aH527brD7GuLp15r43e4xQE89EGsNImzbuU3zzjUFQtZs6EQHYKAUqaWPOHsWDerkWtTQKsHC1sJo6vMVbhXd0/GM/x765Wl2yN2i09vPW/VvcNiKz6K8ESP08D9+mWWq5ynVHpkQ3Fw1/zniYlrH/9vMWdlktoAxwYwUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=KH7Haec6a9NBsBqxhVacBIaXTuIcnxnb9UXDfFHE6Yo=; b=fsdCkln/imYCHnjHea72RXD2V8fFZgL1q3nd8eSfN3A4kcvOc9TjV9Sn9mQKrHf2nxxiwAQ3xWxz7kxPlL0AgBlklZXT1pcMTFu9PzdIyO3+eB1Vu4hL2+ERs2YubGRpJluuPN/KNXpsglSRty2uGao7t8KOoc4wnJtwk0AMLqeFSHJcFDcp/cDFK9CbOBE32L0XRAzqU3m5mMcKltm55Q7JDGUqejeGDKFZ2yUXVFuceSacOD1aWASNY9RyHo0BZKQVwVar79nfn+/8y1yy2gLslMqTyinSmry1V8MgvNurf8gGQAFjJylv4JhF67FPkCEsfmbzRHIoiA9VLTxulQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KH7Haec6a9NBsBqxhVacBIaXTuIcnxnb9UXDfFHE6Yo=; b=Uv/8bVUWfnJ5gH+3i3KyNadq7WHvj8Q+ig7j39CBTTWG8FTd/BR0TNgO2IV7Tisv8IeNWV0X46ILxKxI+vRJ2f6oIbypXxAJPkBbDiQCXe/Hvpg+2z3bbjgmOy+xNHdGHvqQ8pdxxS37l5qpCl+kquu7tAJX2+eNamYvPOrhdOk54jXYcHh32HEYw3quVwuvrW8TTvwcTGk18xd8ugqx8LbFl879ooVpAcpy2fVnstyhYJueG1HmmjNlnVE33YhGNX1wrix6RF3t+Zwo3T/dpv4YlVDQxiq93sFhAzKTxgsU+D8OYXBckzEXthdlVoBP0iCiMsuNw2PBXvz8lyKk7A== Received: from DM6PR02CA0117.namprd02.prod.outlook.com (2603:10b6:5:1b4::19) by SN7PR12MB6816.namprd12.prod.outlook.com (2603:10b6:806:264::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 09:15:08 +0000 Received: from CY4PEPF0000E9D8.namprd05.prod.outlook.com (2603:10b6:5:1b4:cafe::c4) by DM6PR02CA0117.outlook.office365.com (2603:10b6:5:1b4::19) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 09:15:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CY4PEPF0000E9D8.mail.protection.outlook.com (10.167.241.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:07 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:46 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:43 -0700 To: Date: Wed, 1 Apr 2026 12:13:14 +0300 Message-ID: <20260401091318.2671624-8-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9D8:EE_|SN7PR12MB6816:EE_ X-MS-Office365-Filtering-Correlation-Id: 6a8c6341-dd90-419b-0fc7-08de8fcf2b40 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|1800799024|36860700016|376014|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: pW4jHSJWXqQykhoydc9CV2SsTiu3GNFUrsOoF+Ko41Q+0XwSBsF3edvav5WRoQDGJ75DZCvb0aNCVjL7Yhi56Be7M4k/k+Sfm4qheg4sFfbi+gfdxL8O4PbySxU4n+OEN+dQraqAp5JyIAMFk9GF3gx6bkEQMsdryqUejOpd+S9tAa7CowPVyDSx/t7OURX8J+JXyfHHyn0X3Me9zPkUIn9fzCDq0yQxb3G/IcsENbuNq2kYL9fPZilTkd5rqVF1d//Jon8WksXR5qkymvIJwu9m5CDj6SzUT3Kof4kqSuFIJ4e1ix3j06spC1TYacGB8+b4bo0yyEzJ4ZSi4bHWSaAGOkyKytp6ie3MKfHolQZGQ+hgiqWe+6APG7vivo53Y4F3jU6FfehMiL+R6malc33QCnhnayPbfxbNAIaS7J03ppSlEVsQbuzg/UmkFI+GlOhlZojB4MGp7S42izhSdkCJ4Riuu/Y7j3Nv4Gu0xfYVv8sfJWy0qjNWPKZNwjp2ndfnTYRM4lB19i8xAAB/wtboUZAzsHAxJaMOsNdhZ+Jaj9Umswj33qBdDBOfYY1aY0fBsKritBIedCWGANTeuG6Ldhruezg+YyNJoEGfsSyrkcn1d/1YLpMiviTYKxFD/0L5lV261KFcXRE8FIT16bJAARyaxHKEVnRpGqIaakmoAO2aWBCLIEugW6Ozo5fmS71ek5xHPqvelETxDBEGa/XzkOOwZ49khQxrDZrC3BGNWB1r2iziDSuZmhaM3E/piitSkEvbTOh+wnQxsvegJw== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(36860700016)(376014)(56012099003)(18002099003)(22082099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: x2fGdTPrlHPI2Ed7fz6ao7EK/c6JRaU8ZrPZ2m1IlLGUk5LPYhavAW7THRfJY97ylRe2q7LF9TeFXxUqE5zvfNyaftoHdLlFMnUn2HAb3V7Z05rZ1uxp0swoTvDks2xZrdSPfJn6FXzY1t/wsefOLNBXAZCKUEgQps7Lbjzo8FKbSy83yOB/30vmZRFpbh8fq/RAVzq5szMG9YS82z/Yqb94TEeHo+qneO7hvOIDGq/jhd8ReH52IS2wXGCCZuzO13iw7H/GNv8d3fHyAevyaIlRnEDFzk93FWarucxteA4mQLRhOm/HyvZTLSeE/rfda6GcQvOteGls6SC2DZ3oBy0mRTlW7Y6iR1nRVPIcIEt7oQJgkdkju5b0/jM5yngIknRHsNGfUuOzx35sVEh2NosdkH/43S0flbk7D9F1bYVDWwuHwFlyhMfz1e2Hnlwn X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:07.8403 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 6a8c6341-dd90-419b-0fc7-08de8fcf2b40 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9D8.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN7PR12MB6816 Subject: [ovs-dev] [PATCH v3 07/11] netdev-dpdk: Direct mempool usage. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Eli Britstein via dev From: Eli Britstein Reply-To: Eli Britstein Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" The mempool object 'mp' resides inside "struct dpdk_mp", that has more metadata fields used to manage the mempools, share them etc. Netdev-doca will not need this metadata handling. Keep the 'mp' as a separated field in netdev_dpdk_common as a pre-step towards it. Usage is a bit optimized as the access has one less indirect dereference. Signed-off-by: Eli Britstein Acked-by: Eelco Chaudron --- lib/netdev-dpdk-private.h | 2 +- lib/netdev-dpdk.c | 30 ++++++++++++++++-------------- 2 files changed, 17 insertions(+), 15 deletions(-) diff --git a/lib/netdev-dpdk-private.h b/lib/netdev-dpdk-private.h index 79aa2292a..083ddacb3 100644 --- a/lib/netdev-dpdk-private.h +++ b/lib/netdev-dpdk-private.h @@ -123,7 +123,7 @@ struct netdev_dpdk_common { PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline1, struct ovs_mutex mutex OVS_ACQ_AFTER(NETDEV_DPDK_GLOBAL_MUTEX); - struct dpdk_mp *dpdk_mp; + struct rte_mempool *mp; ); PADDED_MEMBERS(CACHE_LINE_SIZE, diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 02b346561..2562eb4b4 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -418,6 +418,7 @@ enum dpdk_rx_steer_flags { struct netdev_dpdk { struct netdev_dpdk_common common; + struct dpdk_mp *dpdk_mp; enum dpdk_dev_type type; int buf_size; @@ -845,7 +846,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) /* Check for any pre-existing dpdk_mp for the device before accessing * the associated mempool. */ - if (dev->common.dpdk_mp != NULL) { + if (dev->dpdk_mp != NULL) { /* A new MTU was requested, decrement the reference count for the * devices current dpdk_mp. This is required even if a pointer to * same dpdk_mp is returned by dpdk_mp_get. The refcount for dmp @@ -853,9 +854,10 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) * must be decremented to keep an accurate refcount for the * dpdk_mp. */ - dpdk_mp_put(dev->common.dpdk_mp); + dpdk_mp_put(dev->dpdk_mp); } - dev->common.dpdk_mp = dmp; + dev->dpdk_mp = dmp; + dev->common.mp = dmp->mp; dev->common.mtu = dev->common.requested_mtu; dev->common.socket_id = dev->common.requested_socket_id; dev->common.max_packet_len = MTU_TO_FRAME_LEN(dev->common.mtu); @@ -1096,7 +1098,7 @@ dpdk_eth_dev_port_config(struct netdev_dpdk_common *common, diag = rte_eth_rx_queue_setup(common->port_id, i, common->rxq_size, common->socket_id, NULL, - common->dpdk_mp->mp); + common->mp); if (diag) { VLOG_INFO("Interface %s unable to setup rxq(%d): %s", common->up.name, i, rte_strerror(-diag)); @@ -1328,7 +1330,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) memset(&dev->common.link, 0, sizeof dev->common.link); } - mbp_priv = rte_mempool_get_priv(dev->common.dpdk_mp->mp); + mbp_priv = rte_mempool_get_priv(dev->common.mp); dev->buf_size = mbp_priv->mbuf_data_room_size - RTE_PKTMBUF_HEADROOM; atomic_store(&dev->common.started, true); @@ -1590,7 +1592,8 @@ common_destruct(struct netdev_dpdk *dev) OVS_EXCLUDED(dev->common.mutex) { rte_free(dev->common.tx_q); - dpdk_mp_put(dev->common.dpdk_mp); + dpdk_mp_put(dev->dpdk_mp); + dev->common.mp = NULL; ovs_list_remove(&dev->common.list_node); free(ovsrcu_get_protected(struct ingress_policer *, @@ -2811,7 +2814,7 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; } - nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->common.dpdk_mp->mp, + nb_rx = rte_vhost_dequeue_burst(vid, qid, dev->common.mp, (struct rte_mbuf **) batch->packets, NETDEV_MAX_BURST); if (!nb_rx) { @@ -3121,8 +3124,7 @@ dpdk_copy_batch_to_mbuf(struct netdev_dpdk_common *common, } else { struct dp_packet *pktcopy; - pktcopy = dpdk_copy_dp_packet_to_mbuf( - common->dpdk_mp->mp, packet); + pktcopy = dpdk_copy_dp_packet_to_mbuf(common->mp, packet); if (pktcopy) { dp_packet_batch_refill(batch, pktcopy, i); } @@ -4654,11 +4656,11 @@ netdev_dpdk_get_mempool_info(struct unixctl_conn *conn, ovs_mutex_lock(&dev->common.mutex); ovs_mutex_lock(&dpdk_mp_mutex); - if (dev->common.dpdk_mp) { - rte_mempool_dump(stream, dev->common.dpdk_mp->mp); + if (dev->common.mp) { + rte_mempool_dump(stream, dev->common.mp); fprintf(stream, " count: avail (%u), in use (%u)\n", - rte_mempool_avail_count(dev->common.dpdk_mp->mp), - rte_mempool_in_use_count(dev->common.dpdk_mp->mp)); + rte_mempool_avail_count(dev->common.mp), + rte_mempool_in_use_count(dev->common.mp)); } else { error = "Not allocated"; } @@ -4783,7 +4785,7 @@ new_device(int vid) if (dev->common.requested_n_txq < qp_num || dev->common.requested_n_rxq < qp_num || dev->common.requested_socket_id != newnode - || dev->common.dpdk_mp == NULL) { + || dev->common.mp == NULL) { dev->common.requested_socket_id = newnode; dev->common.requested_n_rxq = qp_num; dev->common.requested_n_txq = qp_num; From patchwork Wed Apr 1 09:13:15 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218457 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=QNAcx5d5; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzrH5WN7z1yGH for ; Wed, 01 Apr 2026 20:16:31 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id DC31E81FAF; Wed, 1 Apr 2026 09:16:28 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id tdXSPAxPum24; Wed, 1 Apr 2026 09:16:25 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 8AEAD821B7 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=QNAcx5d5 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id 8AEAD821B7; Wed, 1 Apr 2026 09:16:25 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 43741C003D; Wed, 1 Apr 2026 09:16:25 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id 036BEC003D for ; Wed, 1 Apr 2026 09:16:23 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 837704088E for ; Wed, 1 Apr 2026 09:15:26 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id BiPa-qdss1lX for ; Wed, 1 Apr 2026 09:15:21 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c107::1; helo=ph8pr06cu001.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp2.osuosl.org 9A3994090D Authentication-Results: smtp2.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 9A3994090D Authentication-Results: smtp2.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=QNAcx5d5 Received: from PH8PR06CU001.outbound.protection.outlook.com (mail-westus3azlp170120001.outbound.protection.outlook.com [IPv6:2a01:111:f403:c107::1]) by smtp2.osuosl.org (Postfix) with ESMTPS id 9A3994090D for ; Wed, 1 Apr 2026 09:15:20 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=dm6gegaf+nwhGozpWgyHbhlOD50z3xJ3bMxjkorGdO26Wytx0WR4uw9VsKvK8pECCBJ+jEuVp6U8sjdGe69lJYAES5xdGTcxZJW6CgofALygMFf/tN4w7bF5beqO6z51vUbH4bZL8Q880JjErMENtD1Q0zE+va+nBLpMcL30XHIHDmBKB8w7VhG1RGmYUgUOpc8S3oaW0l38wC+gZymFfbo+drxVqVbqjs+sJ2QE5Y+KZRSl9XBoVTy8CyLKObwm3VwTTdYs5VH8J7XCWBnztu7YYNj8uFI5ggTyHsuTdXFNzGvDLgCZtAhrcf997MjKzmld7LFtAAsNAQLMozie6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PUG5x4PHuGJiplWjsEIfr6jDi5vlD5HRom7yZ6uoFN8=; b=dh4AQ6gN6PtMIlPRQPqxpYpI9S5OVYln8T/HpcrSUoSnuJrYO61gPVXD/YgosgDmP65navRXGMrUZdC14/9x/zpBxeojbWvr4pMMsisWLFEIwRS+bvyfdUQFAqPAErcn+wBmb4H2LdQfeWAoiyJd8x07XBTa80losI+vIsdKRA07qqKt/ERXRu53FKbrPgpuNVrs/sCq7Ngl5UT33IVV9pA1OxtAqyurwHpO91zJFXy9Bi+To/9cbeTlGecc1jok8Bai+E7SsIAa2He1vKzZdW8Nf9fHyNPQti1uvyKhhqAvuCRiM6ai+blChidzETG8tIoxI9Grjog4DPk7FNDw3Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PUG5x4PHuGJiplWjsEIfr6jDi5vlD5HRom7yZ6uoFN8=; b=QNAcx5d5OdMOc5ik0PF/q3BCNMsOx1sdvdp4JUyxqHlG7Q8s7J+YD288bBWILdwWhXjK50ILFc2SSDeAxSWq3sLrraOE103XEc2k1km5OmlONFWzz2omgb9UiI4mELPEr1LA7J4eQFdvstygvxjrT/zVIHGePcgm7wUwqOAUW3o/2vMeWPEBmPvKYIsFzq/n2Db8XUhBt6csX+Ey2TAHIGOoAPsqKHOQn1r4XV5XZZIMfL9bOf7hfvj4EuTSXY1/IzfeH6X6BiCWjWiARGlDldLZz5P232XOefhKcGShWmHBf4k3v8JrvG71EOVzxIHk+jIMZbPtVwpQhjkfXO0S5w== Received: from PH7PR02CA0003.namprd02.prod.outlook.com (2603:10b6:510:33d::31) by DM6PR12MB4402.namprd12.prod.outlook.com (2603:10b6:5:2a5::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.16; Wed, 1 Apr 2026 09:15:07 +0000 Received: from SA2PEPF000015C8.namprd03.prod.outlook.com (2603:10b6:510:33d:cafe::e8) by PH7PR02CA0003.outlook.office365.com (2603:10b6:510:33d::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.30 via Frontend Transport; Wed, 1 Apr 2026 09:14:52 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015C8.mail.protection.outlook.com (10.167.241.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:07 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:49 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:46 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:15 +0300 Message-ID: <20260401091318.2671624-9-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015C8:EE_|DM6PR12MB4402:EE_ X-MS-Office365-Filtering-Correlation-Id: 7616278f-2192-4484-27a3-08de8fcf2acc X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|1800799024|376014|36860700016|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: Q6d80iXL/Iin6YGOR5OU+ma2cbNjlnmuD+SCJ1FYO4kgPND8aBBi0YOnzu2slOiPp7z8oNbZPr7i06AVfqf4qPfHVzx7OtQnT29UcDIE78t17E+GqrtwMPR57BPkMl/gdDq4quGuRle/jJ9AsbTdhWKz7ZpA9ThOG3lAVHcT915JfMuhuAYM0iPfEVoSw1sfCGbr5vD2eZgM9R8kcMmIedTfMe2ECjrzU3+W8PQIKAXXZd8wqco0lPSvj8jnRj+JreoyrlUPyUZkgZSd1i5YrDoYGUoVZ5Hn5ye2y3iE4vPj9cPTYQ0vYJLj+eF87dOcSk8kw9jClrLQ9bbp+C/6ZGzZpuLSsr5z59bb966CGg/GtnwMJp52WEm471qy3OrZWX1D4sD+2CruRzu0uz5QeddGWX8GGVAtMUUqebrwioQi2u6mY/+7qBucpoah9M1DaPUzOSzQONWxeN9WtGXsTirtABzhyHs4hmZHhmZz8MOlizuwYagMkEroa3UpylvZxQd/WAo9t/LANL7RGTuYltZsQBWBE9wecj3Mn1RzdsCnJjQUq18UKwAP4l1IQfxjb4MVzSulWVca4mEYHtqQUqShec/kipK9F8kJxhprAyQ+P/g+C9dX5yNo4QITRzgBjoEQeUtJs0D+bpw5fUPxIgwsk8djjcMIXouw7pRgITCj1mLim5C+nHjUsdlhp7SPLJznukOqO5L+gBFJF9L/iol83PqNM4WSVmrDckbf41yiIcdHwBEMnbWnhGoH0Wjwad6f7kyxLwhbNFbB4OcVCA== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(376014)(36860700016)(18002099003)(22082099003)(56012099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: v5av7yBdrqelU2+bFTjxGR6eDB6pppcEssb3N3frETLhYAZZLWRtDOxjEvq3Nm5pYUtc9FX0LyU1nt3wCNbMQllDpq8BcHus94JP7NFsI/Q6H6fFgIFaapkPDoPGm17BiYngWfLC4StofPJsMIAMfy2RI2XKUxcmA5SXcikvzzoS4ZWX64A5F7aDvDZlfz2CMjVsAmUDHOgIwAfw0ZfDFFsJnZsjRnC14koWa2oQMTghy0VXrBwSovRB4xrpXBaugBEmWjsqGS2X+yJyKX1meq3KofLJRhUFeWZNlGDgjgLSyly1x9dyGsR9WE8/++MvpjmxXIoixUrjLN82VODxEma06h/m6+hQGEIKekXoGOyv2UGANQJAqV5pTtzg7NnNQVThkfO3s6rNW9BuS5N0kd6kn8Y/GhJqrtcIAnj+q3sZodaVRXLmA8NYIx1y30A0 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:07.0726 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7616278f-2192-4484-27a3-08de8fcf2acc X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015C8.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR12MB4402 Subject: [ovs-dev] [PATCH v3 08/11] netdev-dpdk: Refactor common functions for reuse by netdev-doca. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Refactor common functions from netdev-dpdk to be declared in netdev-dpdk-private to be reused by netdev-doca. Signed-off-by: Eli Britstein --- lib/netdev-dpdk-private.h | 108 ++++++ lib/netdev-dpdk.c | 692 +++++++++++++++++++++----------------- 2 files changed, 492 insertions(+), 308 deletions(-) diff --git a/lib/netdev-dpdk-private.h b/lib/netdev-dpdk-private.h index 083ddacb3..1b33c27a4 100644 --- a/lib/netdev-dpdk-private.h +++ b/lib/netdev-dpdk-private.h @@ -64,6 +64,16 @@ extern const struct rte_eth_conf port_conf; typedef uint16_t dpdk_port_t; #define DPDK_PORT_ID_FMT "%"PRIu16 +struct dp_packet; +struct dp_packet_batch; +struct eth_addr; +struct netdev; +struct netdev_stats; +struct rte_eth_xstat; +struct rte_eth_xstat_name; +struct smap; +enum netdev_features; + /* Enums. */ enum dpdk_hw_ol_features { @@ -84,6 +94,11 @@ enum dpdk_hw_ol_features { /* Structs. */ +struct netdev_dpdk_watchdog_params { + struct ovs_mutex *mutex; + struct ovs_list *list; +}; + #ifndef NETDEV_DPDK_TX_Q_TYPE #error "NETDEV_DPDK_TX_Q_TYPE must be defined before" \ "including netdev-dpdk-private.h" @@ -104,6 +119,12 @@ struct netdev_rxq_dpdk { dpdk_port_t port_id; }; +static inline struct netdev_rxq_dpdk * +netdev_rxq_dpdk_cast(const struct netdev_rxq *rxq) +{ + return CONTAINER_OF(rxq, struct netdev_rxq_dpdk, up); +} + struct netdev_dpdk_common { PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline0, uint16_t port_id; @@ -179,4 +200,91 @@ dpdk_dev_is_started(struct netdev_dpdk_common *common) return started; } +/* Common functions shared between netdev-dpdk and netdev-doca. */ + +/* Type-independent helpers. */ +struct rte_mempool *netdev_dpdk_mp_create_pool(const char *pool_name, + uint32_t n_mbufs, + uint32_t mbuf_size, + int socket_id); +uint32_t netdev_dpdk_buf_size(int mtu); +size_t netdev_dpdk_copy_batch_to_mbuf(struct netdev_dpdk_common *common, + struct dp_packet_batch *batch); +const char *netdev_dpdk_link_speed_to_str__(uint32_t link_speed); +void netdev_dpdk_mbuf_dump(const char *prefix, const char *message, + const struct rte_mbuf *mbuf); + +/* Functions operating on struct netdev_dpdk_common. */ +void netdev_dpdk_detect_hw_ol_features(struct netdev_dpdk_common *common, + const struct rte_eth_dev_info *info); +void netdev_dpdk_build_port_conf(struct netdev_dpdk_common *common, + const struct rte_eth_dev_info *info, + struct rte_eth_conf *conf); +void netdev_dpdk_check_link_status(struct netdev_dpdk_common *common); + +void *netdev_dpdk_watchdog(void *params); + +void netdev_dpdk_update_netdev_flags(struct netdev_dpdk_common *common); +void netdev_dpdk_clear_xstats(struct netdev_dpdk_common *common); +void netdev_dpdk_configure_xstats(struct netdev_dpdk_common *common); +void netdev_dpdk_set_rxq_config(struct netdev_dpdk_common *common, + const struct smap *args); +int netdev_dpdk_prep_hwol_batch(struct netdev_dpdk_common *common, + struct rte_mbuf **pkts, int pkt_cnt); +int netdev_dpdk_filter_packet_len(struct netdev_dpdk_common *common, + struct rte_mbuf **pkts, int pkt_cnt); +int netdev_dpdk_eth_tx_burst(struct netdev_dpdk_common *common, + dpdk_port_t port_id, int qid, + struct rte_mbuf **pkts, int cnt); +void netdev_dpdk_get_config_common(struct netdev_dpdk_common *common, + struct smap *args); +struct netdev_dpdk_common * +netdev_dpdk_lookup_by_port_id__(dpdk_port_t port_id, + struct ovs_list *list); +dpdk_port_t netdev_dpdk_get_port_by_devargs(const char *devargs); + +/* Rxq ops shared between dpdk and doca. */ +struct netdev_rxq *netdev_dpdk_rxq_alloc(void); +int netdev_dpdk_rxq_construct(struct netdev_rxq *rxq); +void netdev_dpdk_rxq_destruct(struct netdev_rxq *rxq); +void netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq); + +/* Netdev provider ops usable by both dpdk and doca. */ + +int netdev_dpdk_get_numa_id(const struct netdev *netdev); +int netdev_dpdk_set_tx_multiq(struct netdev *netdev, unsigned int n_txq); +int netdev_dpdk_set_etheraddr__(struct netdev_dpdk_common *common, + const struct eth_addr mac); +int netdev_dpdk_update_flags(struct netdev *netdev, + enum netdev_flags off, enum netdev_flags on, + enum netdev_flags *old_flagsp); +int netdev_dpdk_update_flags__(struct netdev_dpdk_common *common, + enum netdev_flags off, enum netdev_flags on, + enum netdev_flags *old_flagsp); +int netdev_dpdk_set_etheraddr(struct netdev *netdev, + const struct eth_addr mac); +int netdev_dpdk_get_etheraddr(const struct netdev *netdev, + struct eth_addr *mac); +int netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup); +int netdev_dpdk_get_ifindex(const struct netdev *netdev); +int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); +long long int netdev_dpdk_get_carrier_resets(const struct netdev *netdev); +int netdev_dpdk_set_miimon(struct netdev *netdev, long long int interval); +int netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current, + uint32_t *max); +int netdev_dpdk_get_features(const struct netdev *netdev, + enum netdev_features *current, + enum netdev_features *advertised, + enum netdev_features *supported, + enum netdev_features *peer); +void netdev_dpdk_convert_xstats(struct netdev_stats *stats, + const struct rte_eth_xstat *xstats, + const struct rte_eth_xstat_name *names, + const unsigned int size); +int netdev_dpdk_get_stats(const struct netdev *netdev, + struct netdev_stats *stats); +int netdev_dpdk_get_status__(const struct netdev *netdev, + struct ovs_mutex *dev_mutex, + struct smap *args); + #endif /* NETDEV_DPDK_PRIVATE_H */ diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2562eb4b4..dbf988de4 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -452,17 +452,11 @@ static void netdev_dpdk_vhost_destruct(struct netdev *netdev); static int netdev_dpdk_get_sw_custom_stats(const struct netdev *, struct netdev_custom_stats *); -static void netdev_dpdk_configure_xstats(struct netdev_dpdk_common *common); -static void netdev_dpdk_clear_xstats(struct netdev_dpdk_common *common); - int netdev_dpdk_get_vid(const struct netdev_dpdk *dev); struct ingress_policer * netdev_dpdk_get_ingress_policer(const struct netdev_dpdk *dev); -static void netdev_dpdk_mbuf_dump(const char *prefix, const char *message, - const struct rte_mbuf *); - static bool is_dpdk_class(const struct netdev_class *class) { @@ -479,8 +473,8 @@ is_dpdk_class(const struct netdev_class *class) * behaviour, which reduces performance. To prevent this, use a buffer size * that is closest to 'mtu', but which satisfies the aforementioned criteria. */ -static uint32_t -dpdk_buf_size(int mtu) +uint32_t +netdev_dpdk_buf_size(int mtu) { return ROUND_UP(MTU_TO_MAX_FRAME_LEN(mtu), NETDEV_DPDK_MBUF_ALIGN) + RTE_PKTMBUF_HEADROOM; @@ -630,6 +624,49 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu) return n_mbufs; } +struct rte_mempool * +netdev_dpdk_mp_create_pool(const char *pool_name, uint32_t n_mbufs, + uint32_t mbuf_size, int socket_id) +{ + uint32_t mbuf_priv_data_len; + uint32_t aligned_mbuf_size; + struct rte_mempool *mp; + uint32_t pkt_size; + + /* The size of the mbuf's private area (i.e. area that holds OvS' + * dp_packet data)*/ + mbuf_priv_data_len = sizeof(struct dp_packet) - + sizeof(struct rte_mbuf); + /* The size of the entire dp_packet. */ + pkt_size = sizeof(struct dp_packet) + mbuf_size; + /* mbuf size, rounded up to cacheline size. */ + aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE); + /* If there is a size discrepancy, add padding to mbuf_priv_data_len. + * This maintains mbuf size cache alignment, while also honoring RX + * buffer alignment in the data portion of the mbuf. If this adjustment + * is not made, there is a possiblity later on that for an element of + * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). + * This is problematic in the case of multi-segment mbufs, particularly + * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN + * header, for example. + */ + mbuf_priv_data_len += (aligned_mbuf_size - pkt_size); + + mp = rte_pktmbuf_pool_create(pool_name, n_mbufs, MP_CACHE_SZ, + mbuf_priv_data_len, mbuf_size, + socket_id); + + if (mp) { + /* rte_pktmbuf_pool_create has done some initialization of the + * rte_mbuf part of each dp_packet, while ovs_rte_pktmbuf_init + * initializes some OVS specific fields of dp_packet. + */ + rte_mempool_obj_iter(mp, ovs_rte_pktmbuf_init, NULL); + } + + return mp; +} + static struct dpdk_mp * dpdk_mp_create(struct netdev_dpdk *dev, int mtu) { @@ -638,9 +675,6 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) int socket_id = dev->common.requested_socket_id; uint32_t n_mbufs = 0; uint32_t mbuf_size = 0; - uint32_t aligned_mbuf_size = 0; - uint32_t mbuf_priv_data_len = 0; - uint32_t pkt_size = 0; uint32_t hash = hash_string(netdev_name, 0); struct dpdk_mp *dmp = NULL; int ret; @@ -659,13 +693,6 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) n_mbufs = dpdk_calculate_mbufs(dev, mtu); do { - /* Full DPDK memory pool name must be unique and cannot be - * longer than RTE_MEMPOOL_NAMESIZE. Note that for the shared - * mempool case this can result in one device using a mempool - * which references a different device in it's name. However as - * mempool names are hashed, the device name will not be readable - * so this is not an issue for tasks such as debugging. - */ ret = snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs%08x%02d%05d%07u", hash, socket_id, mtu, n_mbufs); @@ -684,38 +711,12 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu) dev->common.requested_n_rxq, dev->common.requested_n_txq, RTE_CACHE_LINE_SIZE); - /* The size of the mbuf's private area (i.e. area that holds OvS' - * dp_packet data)*/ - mbuf_priv_data_len = sizeof(struct dp_packet) - - sizeof(struct rte_mbuf); - /* The size of the entire dp_packet. */ - pkt_size = sizeof(struct dp_packet) + mbuf_size; - /* mbuf size, rounded up to cacheline size. */ - aligned_mbuf_size = ROUND_UP(pkt_size, RTE_CACHE_LINE_SIZE); - /* If there is a size discrepancy, add padding to mbuf_priv_data_len. - * This maintains mbuf size cache alignment, while also honoring RX - * buffer alignment in the data portion of the mbuf. If this adjustment - * is not made, there is a possiblity later on that for an element of - * the mempool, buf, buf->data_len < (buf->buf_len - buf->data_off). - * This is problematic in the case of multi-segment mbufs, particularly - * when an mbuf segment needs to be resized (when [push|popp]ing a VLAN - * header, for example. - */ - mbuf_priv_data_len += (aligned_mbuf_size - pkt_size); - - dmp->mp = rte_pktmbuf_pool_create(mp_name, n_mbufs, MP_CACHE_SZ, - mbuf_priv_data_len, - mbuf_size, - socket_id); + dmp->mp = netdev_dpdk_mp_create_pool(mp_name, n_mbufs, mbuf_size, + socket_id); if (dmp->mp) { VLOG_DBG("Allocated \"%s\" mempool with %u mbufs", mp_name, n_mbufs); - /* rte_pktmbuf_pool_create has done some initialization of the - * rte_mbuf part of each dp_packet, while ovs_rte_pktmbuf_init - * initializes some OVS specific fields of dp_packet. - */ - rte_mempool_obj_iter(dmp->mp, ovs_rte_pktmbuf_init, NULL); return dmp; } else if (rte_errno == EEXIST) { /* A mempool with the same name already exists. We just @@ -821,7 +822,7 @@ static int netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) OVS_REQUIRES(dev->common.mutex) { - uint32_t buf_size = dpdk_buf_size(dev->common.requested_mtu); + uint32_t buf_size = netdev_dpdk_buf_size(dev->common.requested_mtu); struct dpdk_mp *dmp; int ret = 0; @@ -866,8 +867,8 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev) return ret; } -static void -check_link_status(struct netdev_dpdk_common *common) +void +netdev_dpdk_check_link_status(struct netdev_dpdk_common *common) { struct rte_eth_link link; @@ -902,21 +903,24 @@ check_link_status(struct netdev_dpdk_common *common) } } -static void * -dpdk_watchdog(void *dummy OVS_UNUSED) +void * +netdev_dpdk_watchdog(void *args_) { + struct netdev_dpdk_watchdog_params *params = args_; struct netdev_dpdk_common *common; + ovs_assert(params); + pthread_detach(pthread_self()); for (;;) { - ovs_mutex_lock(&dpdk_mutex); - LIST_FOR_EACH (common, list_node, &dpdk_list) { + ovs_mutex_lock(params->mutex); + LIST_FOR_EACH (common, list_node, params->list) { ovs_mutex_lock(&common->mutex); - check_link_status(common); + netdev_dpdk_check_link_status(common); ovs_mutex_unlock(&common->mutex); } - ovs_mutex_unlock(&dpdk_mutex); + ovs_mutex_unlock(params->mutex); xsleep(DPDK_PORT_WATCHDOG_INTERVAL); } @@ -936,7 +940,7 @@ netdev_dpdk_update_netdev_flag(struct netdev_dpdk_common *common, } } -static void +void netdev_dpdk_update_netdev_flags(struct netdev_dpdk_common *common) OVS_REQUIRES(common->mutex) { @@ -962,85 +966,192 @@ netdev_dpdk_update_netdev_flags(struct netdev_dpdk_common *common) NETDEV_TX_OFFLOAD_OUTER_UDP_CKSUM); } -static int -dpdk_eth_dev_port_config(struct netdev_dpdk_common *common, - const struct rte_eth_dev_info *info, - int n_rxq, int n_txq) +void +netdev_dpdk_detect_hw_ol_features(struct netdev_dpdk_common *common, + const struct rte_eth_dev_info *info) + OVS_REQUIRES(common->mutex) { - struct rte_eth_conf conf = port_conf; - uint16_t conf_mtu; - int diag = 0; - int i; + uint32_t rx_chksm_offload_capa = RTE_ETH_RX_OFFLOAD_UDP_CKSUM | + RTE_ETH_RX_OFFLOAD_TCP_CKSUM | + RTE_ETH_RX_OFFLOAD_IPV4_CKSUM; + + if (strstr(info->driver_name, "vf") != NULL) { + VLOG_INFO("Virtual function detected, HW_CRC_STRIP will be enabled"); + common->hw_ol_features |= NETDEV_RX_HW_CRC_STRIP; + } else { + common->hw_ol_features &= ~NETDEV_RX_HW_CRC_STRIP; + } + if ((info->rx_offload_capa & rx_chksm_offload_capa) != + rx_chksm_offload_capa) { + VLOG_WARN("Rx checksum offload is not supported on port " + DPDK_PORT_ID_FMT, common->port_id); + common->hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; + } else { + common->hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD; + } + + if (info->rx_offload_capa & RTE_ETH_RX_OFFLOAD_SCATTER) { + common->hw_ol_features |= NETDEV_RX_HW_SCATTER; + } else { + common->hw_ol_features &= ~NETDEV_RX_HW_SCATTER; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { + common->hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) { + common->hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) { + common->hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM) { + common->hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM) { + common->hw_ol_features |= NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM) { + common->hw_ol_features |= NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; + } else { + common->hw_ol_features &= ~NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; + } + + common->hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; + if (userspace_tso_enabled()) { + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_TSO) { + common->hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; + } else { + VLOG_WARN("%s: Tx TSO offload is not supported.", + netdev_get_name(&common->up)); + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO) { + common->hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; + } else { + VLOG_WARN("%s: Tx Vxlan tunnel TSO offload is not supported.", + netdev_get_name(&common->up)); + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO) { + common->hw_ol_features |= NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD; + } else { + VLOG_WARN("%s: Tx Geneve tunnel TSO offload is not supported.", + netdev_get_name(&common->up)); + } + + if (info->tx_offload_capa & RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO) { + common->hw_ol_features |= NETDEV_TX_GRE_TNL_TSO_OFFLOAD; + } else { + VLOG_WARN("%s: Tx GRE tunnel TSO offload is not supported.", + netdev_get_name(&common->up)); + } + } +} + +void +netdev_dpdk_build_port_conf(struct netdev_dpdk_common *common, + const struct rte_eth_dev_info *info, + struct rte_eth_conf *conf) +{ /* As of DPDK 17.11.1 a few PMDs require to explicitly enable * scatter to support jumbo RX. * Setting scatter for the device is done after checking for * scatter support in the device capabilites. */ if (common->mtu > RTE_ETHER_MTU) { if (common->hw_ol_features & NETDEV_RX_HW_SCATTER) { - conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_SCATTER; + conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_SCATTER; } } - conf.intr_conf.lsc = common->lsc_interrupt_mode; + conf->intr_conf.lsc = common->lsc_interrupt_mode; if (common->hw_ol_features & NETDEV_RX_CHECKSUM_OFFLOAD) { - conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_CHECKSUM; + conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_CHECKSUM; } if (!(common->hw_ol_features & NETDEV_RX_HW_CRC_STRIP) && info->rx_offload_capa & RTE_ETH_RX_OFFLOAD_KEEP_CRC) { - conf.rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; + conf->rxmode.offloads |= RTE_ETH_RX_OFFLOAD_KEEP_CRC; } if (common->hw_ol_features & NETDEV_TX_IPV4_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_IPV4_CKSUM; } if (common->hw_ol_features & NETDEV_TX_TCP_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_CKSUM; } if (common->hw_ol_features & NETDEV_TX_UDP_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_UDP_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_UDP_CKSUM; } if (common->hw_ol_features & NETDEV_TX_SCTP_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_SCTP_CKSUM; } if (common->hw_ol_features & NETDEV_TX_TSO_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_TCP_TSO; } if (common->hw_ol_features & NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO; } if (common->hw_ol_features & NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO; } if (common->hw_ol_features & NETDEV_TX_GRE_TNL_TSO_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO; } if (common->hw_ol_features & NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM; } if (common->hw_ol_features & NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD) { - conf.txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; + conf->txmode.offloads |= RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM; } /* Limit configured rss hash functions to only those supported * by the eth device. */ - conf.rx_adv_conf.rss_conf.rss_hf &= info->flow_type_rss_offloads; - if (conf.rx_adv_conf.rss_conf.rss_hf == 0) { - conf.rxmode.mq_mode = RTE_ETH_MQ_RX_NONE; + conf->rx_adv_conf.rss_conf.rss_hf &= info->flow_type_rss_offloads; + if (conf->rx_adv_conf.rss_conf.rss_hf == 0) { + conf->rxmode.mq_mode = RTE_ETH_MQ_RX_NONE; } else { - conf.rxmode.mq_mode = RTE_ETH_MQ_RX_RSS; + conf->rxmode.mq_mode = RTE_ETH_MQ_RX_RSS; } +} + +static int +dpdk_eth_dev_port_config(struct netdev_dpdk_common *common, + const struct rte_eth_dev_info *info, + int n_rxq, int n_txq) +{ + struct rte_eth_conf conf = port_conf; + uint16_t conf_mtu; + int diag = 0; + int i; + + netdev_dpdk_build_port_conf(common, info, &conf); /* A device may report more queues than it makes available (this has * been observed for Intel xl710, which reserves some of them for @@ -1179,9 +1290,6 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) struct rte_ether_addr eth_addr; int diag; int n_rxq, n_txq; - uint32_t rx_chksm_offload_capa = RTE_ETH_RX_OFFLOAD_UDP_CKSUM | - RTE_ETH_RX_OFFLOAD_TCP_CKSUM | - RTE_ETH_RX_OFFLOAD_IPV4_CKSUM; if (dpif_offload_enabled()) { /* @@ -1204,95 +1312,7 @@ dpdk_eth_dev_init(struct netdev_dpdk *dev) dev->common.is_representor = !!(*info.dev_flags & RTE_ETH_DEV_REPRESENTOR); - if (strstr(info.driver_name, "vf") != NULL) { - VLOG_INFO("Virtual function detected, HW_CRC_STRIP will be enabled"); - dev->common.hw_ol_features |= NETDEV_RX_HW_CRC_STRIP; - } else { - dev->common.hw_ol_features &= ~NETDEV_RX_HW_CRC_STRIP; - } - - if ((info.rx_offload_capa & rx_chksm_offload_capa) != - rx_chksm_offload_capa) { - VLOG_WARN("Rx checksum offload is not supported on port " - DPDK_PORT_ID_FMT, dev->common.port_id); - dev->common.hw_ol_features &= ~NETDEV_RX_CHECKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features |= NETDEV_RX_CHECKSUM_OFFLOAD; - } - - if (info.rx_offload_capa & RTE_ETH_RX_OFFLOAD_SCATTER) { - dev->common.hw_ol_features |= NETDEV_RX_HW_SCATTER; - } else { - /* Do not warn on lack of scatter support */ - dev->common.hw_ol_features &= ~NETDEV_RX_HW_SCATTER; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_IPV4_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_IPV4_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_IPV4_CKSUM_OFFLOAD; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_TCP_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_TCP_CKSUM_OFFLOAD; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_UDP_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_UDP_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_UDP_CKSUM_OFFLOAD; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_SCTP_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_SCTP_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_SCTP_CKSUM_OFFLOAD; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_IPV4_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_OUTER_IP_CKSUM_OFFLOAD; - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_OUTER_UDP_CKSUM) { - dev->common.hw_ol_features |= NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; - } else { - dev->common.hw_ol_features &= ~NETDEV_TX_OUTER_UDP_CKSUM_OFFLOAD; - } - - dev->common.hw_ol_features &= ~NETDEV_TX_TSO_OFFLOAD; - if (userspace_tso_enabled()) { - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_TCP_TSO) { - dev->common.hw_ol_features |= NETDEV_TX_TSO_OFFLOAD; - } else { - VLOG_WARN("%s: Tx TSO offload is not supported.", - netdev_get_name(&dev->common.up)); - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_VXLAN_TNL_TSO) { - dev->common.hw_ol_features |= NETDEV_TX_VXLAN_TNL_TSO_OFFLOAD; - } else { - VLOG_WARN("%s: Tx Vxlan tunnel TSO offload is not supported.", - netdev_get_name(&dev->common.up)); - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GENEVE_TNL_TSO) { - dev->common.hw_ol_features |= NETDEV_TX_GENEVE_TNL_TSO_OFFLOAD; - } else { - VLOG_WARN("%s: Tx Geneve tunnel TSO offload is not supported.", - netdev_get_name(&dev->common.up)); - } - - if (info.tx_offload_capa & RTE_ETH_TX_OFFLOAD_GRE_TNL_TSO) { - dev->common.hw_ol_features |= NETDEV_TX_GRE_TNL_TSO_OFFLOAD; - } else { - VLOG_WARN("%s: Tx GRE tunnel TSO offload is not supported.", - netdev_get_name(&dev->common.up)); - } - } + netdev_dpdk_detect_hw_ol_features(&dev->common, &info); n_rxq = MIN(info.max_rx_queues, dev->common.up.n_rxq); n_txq = MIN(info.max_tx_queues, dev->common.up.n_txq); @@ -1742,7 +1762,7 @@ netdev_dpdk_dealloc(struct netdev *netdev) rte_free(dev); } -static void +void netdev_dpdk_clear_xstats(struct netdev_dpdk_common *common) OVS_REQUIRES(common->mutex) { @@ -1774,7 +1794,7 @@ is_queue_stat(const char *s) ovs_scan(s + 1, "x_q%"SCNu16"_bytes", &tmp)); } -static void +void netdev_dpdk_configure_xstats(struct netdev_dpdk_common *common) OVS_REQUIRES(common->mutex) { @@ -1842,46 +1862,54 @@ out: free(rte_xstats_names); } -static int -netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) +void +netdev_dpdk_get_config_common(struct netdev_dpdk_common *common, + struct smap *args) + OVS_REQUIRES(common->mutex) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - - ovs_mutex_lock(&dev->common.mutex); - - if (dev->common.devargs && dev->common.devargs[0]) { - smap_add_format(args, "dpdk-devargs", "%s", dev->common.devargs); + if (common->devargs && common->devargs[0]) { + smap_add_format(args, "dpdk-devargs", "%s", common->devargs); } - smap_add_format(args, "n_rxq", "%d", dev->common.user_n_rxq); + smap_add_format(args, "n_rxq", "%d", common->user_n_rxq); - if (dev->common.fc_conf.mode == RTE_ETH_FC_TX_PAUSE || - dev->common.fc_conf.mode == RTE_ETH_FC_FULL) { + if (common->fc_conf.mode == RTE_ETH_FC_TX_PAUSE || + common->fc_conf.mode == RTE_ETH_FC_FULL) { smap_add(args, "rx-flow-ctrl", "true"); } - if (dev->common.fc_conf.mode == RTE_ETH_FC_RX_PAUSE || - dev->common.fc_conf.mode == RTE_ETH_FC_FULL) { + if (common->fc_conf.mode == RTE_ETH_FC_RX_PAUSE || + common->fc_conf.mode == RTE_ETH_FC_FULL) { smap_add(args, "tx-flow-ctrl", "true"); } - if (dev->common.fc_conf.autoneg) { + if (common->fc_conf.autoneg) { smap_add(args, "flow-ctrl-autoneg", "true"); } - smap_add_format(args, "n_rxq_desc", "%d", dev->common.rxq_size); - smap_add_format(args, "n_txq_desc", "%d", dev->common.txq_size); - - if (dev->rx_steer_flags == DPDK_RX_STEER_LACP) { - smap_add(args, "rx-steering", "rss+lacp"); - } + smap_add_format(args, "n_rxq_desc", "%d", common->rxq_size); + smap_add_format(args, "n_txq_desc", "%d", common->txq_size); smap_add(args, "dpdk-lsc-interrupt", - dev->common.lsc_interrupt_mode ? "true" : "false"); + common->lsc_interrupt_mode ? "true" : "false"); - if (dev->common.is_representor) { + if (common->is_representor) { smap_add_format(args, "dpdk-vf-mac", ETH_ADDR_FMT, - ETH_ADDR_ARGS(dev->common.requested_hwaddr)); + ETH_ADDR_ARGS(common->requested_hwaddr)); + } +} + +static int +netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + + ovs_mutex_lock(&dev->common.mutex); + + netdev_dpdk_get_config_common(&dev->common, args); + + if (dev->rx_steer_flags == DPDK_RX_STEER_LACP) { + smap_add(args, "rx-steering", "rss+lacp"); } ovs_mutex_unlock(&dev->common.mutex); @@ -1889,21 +1917,30 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) return 0; } -static struct netdev_dpdk * -netdev_dpdk_lookup_by_port_id(dpdk_port_t port_id) - OVS_REQUIRES(dpdk_mutex) +struct netdev_dpdk_common * +netdev_dpdk_lookup_by_port_id__(dpdk_port_t port_id, struct ovs_list *list) { - struct netdev_dpdk *dev; + struct netdev_dpdk_common *common; - LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { - if (dev->common.port_id == port_id) { - return dev; + LIST_FOR_EACH (common, list_node, list) { + if (common->port_id == port_id) { + return common; } } return NULL; } +static struct netdev_dpdk * +netdev_dpdk_lookup_by_port_id(dpdk_port_t port_id) + OVS_REQUIRES(dpdk_mutex) +{ + struct netdev_dpdk_common *common; + + common = netdev_dpdk_lookup_by_port_id__(port_id, &dpdk_list); + return common ? CONTAINER_OF(common, struct netdev_dpdk, common) : NULL; +} + static dpdk_port_t netdev_dpdk_get_port_by_mac(const char *mac_str) { @@ -1929,7 +1966,7 @@ netdev_dpdk_get_port_by_mac(const char *mac_str) } /* Return the first DPDK port id matching the devargs pattern. */ -static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char *devargs) +dpdk_port_t netdev_dpdk_get_port_by_devargs(const char *devargs) OVS_REQUIRES(dpdk_mutex) { dpdk_port_t port_id; @@ -2058,8 +2095,8 @@ dpdk_eth_event_callback(dpdk_port_t port_id, enum rte_eth_event_type type, return 0; } -static void -dpdk_set_rxq_config(struct netdev_dpdk_common *common, +void +netdev_dpdk_set_rxq_config(struct netdev_dpdk_common *common, const struct smap *args) OVS_REQUIRES(common->mutex) { @@ -2182,7 +2219,7 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args, dpdk_set_rx_steer_config(netdev, dev, args, errp); - dpdk_set_rxq_config(&dev->common, args); + netdev_dpdk_set_rxq_config(&dev->common, args); new_devargs = smap_get(args, "dpdk-devargs"); @@ -2402,7 +2439,7 @@ netdev_dpdk_vhost_client_set_config(struct netdev *netdev, return 0; } -static int +int netdev_dpdk_get_numa_id(const struct netdev *netdev) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -2411,7 +2448,7 @@ netdev_dpdk_get_numa_id(const struct netdev *netdev) } /* Sets the number of tx queues for the dpdk interface. */ -static int +int netdev_dpdk_set_tx_multiq(struct netdev *netdev, unsigned int n_txq) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -2430,7 +2467,7 @@ out: return 0; } -static struct netdev_rxq * +struct netdev_rxq * netdev_dpdk_rxq_alloc(void) { struct netdev_rxq_dpdk *rx = dpdk_rte_mzalloc(sizeof *rx); @@ -2442,31 +2479,25 @@ netdev_dpdk_rxq_alloc(void) return NULL; } -static struct netdev_rxq_dpdk * -netdev_rxq_dpdk_cast(const struct netdev_rxq *rxq) -{ - return CONTAINER_OF(rxq, struct netdev_rxq_dpdk, up); -} - -static int +int netdev_dpdk_rxq_construct(struct netdev_rxq *rxq) { struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); - struct netdev_dpdk *dev = netdev_dpdk_cast(rxq->netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(rxq->netdev); - ovs_mutex_lock(&dev->common.mutex); - rx->port_id = dev->common.port_id; - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + rx->port_id = common->port_id; + ovs_mutex_unlock(&common->mutex); return 0; } -static void +void netdev_dpdk_rxq_destruct(struct netdev_rxq *rxq OVS_UNUSED) { } -static void +void netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) { struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); @@ -2638,7 +2669,7 @@ netdev_dpdk_prep_hwol_packet(struct netdev_dpdk_common *common, /* Prepare a batch for HWOL. * Return the number of good packets in the batch. */ -static int +int netdev_dpdk_prep_hwol_batch(struct netdev_dpdk_common *common, struct rte_mbuf **pkts, int pkt_cnt) { @@ -2663,7 +2694,7 @@ netdev_dpdk_prep_hwol_batch(struct netdev_dpdk_common *common, return cnt; } -static void +void netdev_dpdk_mbuf_dump(const char *prefix, const char *message, const struct rte_mbuf *mbuf) { @@ -2696,27 +2727,32 @@ netdev_dpdk_mbuf_dump(const char *prefix, const char *message, * 'pkts', even in case of failure. * * Returns the number of packets that weren't transmitted. */ -static inline int -netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, +int +netdev_dpdk_eth_tx_burst(struct netdev_dpdk_common *common, + dpdk_port_t port_id, int qid, struct rte_mbuf **pkts, int cnt) { uint32_t nb_tx = 0; uint16_t nb_tx_prep = cnt; - nb_tx_prep = rte_eth_tx_prepare(dev->common.port_id, qid, pkts, cnt); + if (OVS_UNLIKELY(!dpdk_dev_is_started(common))) { + goto out; + } + + nb_tx_prep = rte_eth_tx_prepare(port_id, qid, pkts, cnt); if (nb_tx_prep != cnt) { VLOG_WARN_RL(&rl, "%s: Output batch contains invalid packets. " "Only %u/%u are valid: %s", - netdev_get_name(&dev->common.up), + netdev_get_name(&common->up), nb_tx_prep, cnt, rte_strerror(rte_errno)); - netdev_dpdk_mbuf_dump(netdev_get_name(&dev->common.up), + netdev_dpdk_mbuf_dump(netdev_get_name(&common->up), "First invalid packet", pkts[nb_tx_prep]); } while (nb_tx != nb_tx_prep) { uint32_t ret; - ret = rte_eth_tx_burst(dev->common.port_id, qid, pkts + nb_tx, + ret = rte_eth_tx_burst(port_id, qid, pkts + nb_tx, nb_tx_prep - nb_tx); if (!ret) { break; @@ -2725,6 +2761,7 @@ netdev_dpdk_eth_tx_burst(struct netdev_dpdk *dev, int qid, nb_tx += ret; } +out: if (OVS_UNLIKELY(nb_tx != cnt)) { /* Free buffers, which we couldn't transmit. */ rte_pktmbuf_free_bulk(&pkts[nb_tx], cnt - nb_tx); @@ -2926,9 +2963,9 @@ netdev_dpdk_qos_run(struct netdev_dpdk *dev, struct rte_mbuf **pkts, return cnt; } -static int -netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, - int pkt_cnt) +int +netdev_dpdk_filter_packet_len(struct netdev_dpdk_common *common, + struct rte_mbuf **pkts, int pkt_cnt) { int i = 0; int cnt = 0; @@ -2938,12 +2975,12 @@ netdev_dpdk_filter_packet_len(struct netdev_dpdk *dev, struct rte_mbuf **pkts, * during the offloading preparation for performance reasons. */ for (i = 0; i < pkt_cnt; i++) { pkt = pkts[i]; - if (OVS_UNLIKELY((pkt->pkt_len > dev->common.max_packet_len) + if (OVS_UNLIKELY((pkt->pkt_len > common->max_packet_len) && !pkt->tso_segsz)) { VLOG_WARN_RL(&rl, "%s: Too big size %" PRIu32 " " "max_packet_len %d", - dev->common.up.name, pkt->pkt_len, - dev->common.max_packet_len); + common->up.name, pkt->pkt_len, + common->max_packet_len); rte_pktmbuf_free(pkt); continue; } @@ -3111,8 +3148,8 @@ dpdk_copy_dp_packet_to_mbuf(struct rte_mempool *mp, struct dp_packet *pkt_orig) * DPDK memory. * * Returns the number of good packets in the batch. */ -static size_t -dpdk_copy_batch_to_mbuf(struct netdev_dpdk_common *common, +size_t +netdev_dpdk_copy_batch_to_mbuf(struct netdev_dpdk_common *common, struct dp_packet_batch *batch) { size_t i, size = dp_packet_batch_size(batch); @@ -3157,13 +3194,13 @@ netdev_dpdk_common_send(struct netdev *netdev, struct dp_packet_batch *batch, /* Copy dp-packets to mbufs. */ if (OVS_UNLIKELY(need_copy)) { - cnt = dpdk_copy_batch_to_mbuf(&dev->common, batch); + cnt = netdev_dpdk_copy_batch_to_mbuf(&dev->common, batch); stats->tx_failure_drops += pkt_cnt - cnt; pkt_cnt = cnt; } /* Drop oversized packets. */ - cnt = netdev_dpdk_filter_packet_len(dev, pkts, pkt_cnt); + cnt = netdev_dpdk_filter_packet_len(&dev->common, pkts, pkt_cnt); stats->tx_mtu_exceeded_drops += pkt_cnt - cnt; pkt_cnt = cnt; @@ -3290,7 +3327,8 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, cnt = netdev_dpdk_common_send(netdev, batch, &stats); - dropped = netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt); + dropped = netdev_dpdk_eth_tx_burst(&dev->common, dev->common.port_id, + qid, pkts, cnt); stats.tx_failure_drops += dropped; dropped += batch_cnt - cnt; if (OVS_UNLIKELY(dropped)) { @@ -3312,14 +3350,14 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid, return 0; } -static int -netdev_dpdk_set_etheraddr__(struct netdev_dpdk *dev, const struct eth_addr mac) - OVS_REQUIRES(dev->common.mutex) +int +netdev_dpdk_set_etheraddr__(struct netdev_dpdk_common *common, + const struct eth_addr mac) + OVS_REQUIRES(common->mutex) { - struct netdev_dpdk_common *common = &dev->common; int err = 0; - if (dev->type == DPDK_DEV_ETH) { + if (common->port_id != DPDK_ETH_PORT_ID_INVALID) { struct rte_ether_addr ea; memcpy(ea.addr_bytes, mac.ea, ETH_ADDR_LEN); @@ -3336,25 +3374,25 @@ netdev_dpdk_set_etheraddr__(struct netdev_dpdk *dev, const struct eth_addr mac) return err; } -static int +int netdev_dpdk_set_etheraddr(struct netdev *netdev, const struct eth_addr mac) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); int err = 0; - ovs_mutex_lock(&dev->common.mutex); - if (!eth_addr_equals(dev->common.hwaddr, mac)) { - err = netdev_dpdk_set_etheraddr__(dev, mac); + ovs_mutex_lock(&common->mutex); + if (!eth_addr_equals(common->hwaddr, mac)) { + err = netdev_dpdk_set_etheraddr__(common, mac); if (!err) { netdev_change_seq_changed(netdev); } } - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&common->mutex); return err; } -static int +int netdev_dpdk_get_etheraddr(const struct netdev *netdev, struct eth_addr *mac) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -3366,7 +3404,7 @@ netdev_dpdk_get_etheraddr(const struct netdev *netdev, struct eth_addr *mac) return 0; } -static int +int netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -3711,7 +3749,7 @@ out: return 0; } -static void +void netdev_dpdk_convert_xstats(struct netdev_stats *stats, const struct rte_eth_xstat *xstats, const struct rte_eth_xstat_name *names, @@ -3754,10 +3792,10 @@ netdev_dpdk_convert_xstats(struct netdev_stats *stats, #undef DPDK_XSTATS } -static int +int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); -static int +int netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -3767,6 +3805,12 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) netdev_dpdk_get_carrier(netdev, &gg); ovs_mutex_lock(&common->mutex); + if (!dpdk_dev_is_started(common)) { + memset(stats, 0, sizeof *stats); + ovs_mutex_unlock(&common->mutex); + return 0; + } + struct rte_eth_xstat *rte_xstats = NULL; struct rte_eth_xstat_name *rte_xstats_names = NULL; int rte_xstats_len, rte_xstats_new_len, rte_xstats_ret; @@ -3789,7 +3833,7 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) rte_xstats_names = xcalloc(rte_xstats_len, sizeof *rte_xstats_names); rte_xstats = xcalloc(rte_xstats_len, sizeof *rte_xstats); - /* Retreive xstats names */ + /* Retrieve 'xstats' names. */ rte_xstats_new_len = rte_eth_xstats_get_names(common->port_id, rte_xstats_names, rte_xstats_len); @@ -3798,7 +3842,7 @@ netdev_dpdk_get_stats(const struct netdev *netdev, struct netdev_stats *stats) common->port_id); goto out; } - /* Retreive xstats values */ + /* Retrieve 'xstats' values. */ memset(rte_xstats, 0xff, sizeof *rte_xstats * rte_xstats_len); rte_xstats_ret = rte_eth_xstats_get(common->port_id, rte_xstats, rte_xstats_len); @@ -3937,7 +3981,7 @@ netdev_dpdk_get_sw_custom_stats(const struct netdev *netdev, return 0; } -static int +int netdev_dpdk_get_features(const struct netdev *netdev, enum netdev_features *current, enum netdev_features *advertised, @@ -4002,7 +4046,7 @@ netdev_dpdk_get_features(const struct netdev *netdev, return 0; } -static int +int netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current, uint32_t *max) { @@ -4013,7 +4057,12 @@ netdev_dpdk_get_speed(const struct netdev *netdev, uint32_t *current, ovs_mutex_lock(&common->mutex); link = common->link; - diag = rte_eth_dev_info_get(common->port_id, &dev_info); + if (dpdk_dev_is_started(common)) { + diag = rte_eth_dev_info_get(common->port_id, &dev_info); + } else { + memset(&dev_info, 0, sizeof dev_info); + diag = -ENODEV; + } ovs_mutex_unlock(&common->mutex); *current = link.link_speed != RTE_ETH_SPEED_NUM_UNKNOWN @@ -4158,7 +4207,7 @@ netdev_dpdk_set_policing(struct netdev* netdev, uint32_t policer_rate, return 0; } -static int +int netdev_dpdk_get_ifindex(const struct netdev *netdev) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -4173,13 +4222,13 @@ netdev_dpdk_get_ifindex(const struct netdev *netdev) return ifindex; } -static int +int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); ovs_mutex_lock(&common->mutex); - check_link_status(common); + netdev_dpdk_check_link_status(common); *carrier = common->link.link_status; ovs_mutex_unlock(&common->mutex); @@ -4205,7 +4254,7 @@ netdev_dpdk_vhost_get_carrier(const struct netdev *netdev, bool *carrier) return 0; } -static long long int +long long int netdev_dpdk_get_carrier_resets(const struct netdev *netdev) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); @@ -4218,21 +4267,19 @@ netdev_dpdk_get_carrier_resets(const struct netdev *netdev) return carrier_resets; } -static int +int netdev_dpdk_set_miimon(struct netdev *netdev OVS_UNUSED, long long int interval OVS_UNUSED) { return EOPNOTSUPP; } -static int -netdev_dpdk_update_flags__(struct netdev_dpdk *dev, +int +netdev_dpdk_update_flags__(struct netdev_dpdk_common *common, enum netdev_flags off, enum netdev_flags on, enum netdev_flags *old_flagsp) - OVS_REQUIRES(dev->common.mutex) + OVS_REQUIRES(common->mutex) { - struct netdev_dpdk_common *common = &dev->common; - if ((off | on) & ~(NETDEV_UP | NETDEV_PROMISC)) { return EINVAL; } @@ -4245,9 +4292,8 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, return 0; } - if (dev->type == DPDK_DEV_ETH) { - - if ((dev->common.flags ^ *old_flagsp) & NETDEV_UP) { + if (common->port_id != DPDK_ETH_PORT_ID_INVALID) { + if ((common->flags ^ *old_flagsp) & NETDEV_UP) { int err; if (common->flags & NETDEV_UP) { @@ -4272,6 +4318,8 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, netdev_change_seq_changed(&common->up); } else { + struct netdev_dpdk *dev = netdev_dpdk_cast(&common->up); + /* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is * running then change netdev's change_seq to trigger link state * update. */ @@ -4293,17 +4341,17 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, return 0; } -static int +int netdev_dpdk_update_flags(struct netdev *netdev, enum netdev_flags off, enum netdev_flags on, enum netdev_flags *old_flagsp) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); int error; - ovs_mutex_lock(&dev->common.mutex); - error = netdev_dpdk_update_flags__(dev, off, on, old_flagsp); - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + error = netdev_dpdk_update_flags__(common, off, on, old_flagsp); + ovs_mutex_unlock(&common->mutex); return error; } @@ -4378,7 +4426,7 @@ netdev_dpdk_vhost_user_get_status(const struct netdev *netdev, * Convert a given uint32_t link speed defined in DPDK to a string * equivalent. */ -static const char * +const char * netdev_dpdk_link_speed_to_str__(uint32_t link_speed) { switch (link_speed) { @@ -4398,31 +4446,28 @@ netdev_dpdk_link_speed_to_str__(uint32_t link_speed) } } -static int -netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) +int +netdev_dpdk_get_status__(const struct netdev *netdev, + struct ovs_mutex *dev_mutex, + struct smap *args) { struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); struct rte_eth_dev_info dev_info; - size_t rx_steer_flows_num; - uint64_t rx_steer_flags; uint32_t link_speed; - int n_rxq; int diag; if (!rte_eth_dev_is_valid_port(common->port_id)) { return ENODEV; } - ovs_mutex_lock(&dpdk_mutex); + ovs_assert(dev_mutex); + + ovs_mutex_lock(dev_mutex); ovs_mutex_lock(&common->mutex); diag = rte_eth_dev_info_get(common->port_id, &dev_info); link_speed = common->link.link_speed; - rx_steer_flags = dev->rx_steer_flags; - rx_steer_flows_num = dev->rx_steer_flows_num; - n_rxq = netdev->n_rxq; ovs_mutex_unlock(&common->mutex); - ovs_mutex_unlock(&dpdk_mutex); + ovs_mutex_unlock(dev_mutex); smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, common->port_id); smap_add_format(args, "numa_id", "%d", @@ -4477,6 +4522,29 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) ETH_ADDR_ARGS(common->hwaddr)); } + return 0; +} + +static int +netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + size_t rx_steer_flows_num; + uint64_t rx_steer_flags; + int n_rxq; + int ret; + + ret = netdev_dpdk_get_status__(netdev, &dpdk_mutex, args); + if (ret) { + return ret; + } + + ovs_mutex_lock(&dev->common.mutex); + rx_steer_flags = dev->rx_steer_flags; + rx_steer_flows_num = dev->rx_steer_flows_num; + n_rxq = netdev->n_rxq; + ovs_mutex_unlock(&dev->common.mutex); + if (rx_steer_flags && !rx_steer_flows_num) { smap_add(args, "rx-steering", "unsupported"); } else if (rx_steer_flags == DPDK_RX_STEER_LACP) { @@ -4499,15 +4567,16 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args) } static void -netdev_dpdk_set_admin_state__(struct netdev_dpdk *dev, bool admin_state) - OVS_REQUIRES(dev->common.mutex) +netdev_dpdk_set_admin_state__(struct netdev_dpdk_common *common, + bool admin_state) + OVS_REQUIRES(common->mutex) { enum netdev_flags old_flags; if (admin_state) { - netdev_dpdk_update_flags__(dev, 0, NETDEV_UP, &old_flags); + netdev_dpdk_update_flags__(common, 0, NETDEV_UP, &old_flags); } else { - netdev_dpdk_update_flags__(dev, NETDEV_UP, 0, &old_flags); + netdev_dpdk_update_flags__(common, NETDEV_UP, 0, &old_flags); } } @@ -4530,11 +4599,12 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc, struct netdev *netdev = netdev_from_name(argv[1]); if (netdev && is_dpdk_class(netdev->netdev_class)) { - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + struct netdev_dpdk_common *common = + netdev_dpdk_common_cast(netdev); - ovs_mutex_lock(&dev->common.mutex); - netdev_dpdk_set_admin_state__(dev, up); - ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_lock(&common->mutex); + netdev_dpdk_set_admin_state__(common, up); + ovs_mutex_unlock(&common->mutex); netdev_close(netdev); } else { @@ -4548,7 +4618,7 @@ netdev_dpdk_set_admin_state(struct unixctl_conn *conn, int argc, ovs_mutex_lock(&dpdk_mutex); LIST_FOR_EACH (dev, common.list_node, &dpdk_list) { ovs_mutex_lock(&dev->common.mutex); - netdev_dpdk_set_admin_state__(dev, up); + netdev_dpdk_set_admin_state__(&dev->common, up); ovs_mutex_unlock(&dev->common.mutex); } ovs_mutex_unlock(&dpdk_mutex); @@ -5144,9 +5214,14 @@ netdev_dpdk_class_init(void) /* This function can be called for different classes. The initialization * needs to be done only once */ if (ovsthread_once_start(&once)) { + static struct netdev_dpdk_watchdog_params watchdog_params = { + .mutex = &dpdk_mutex, + .list = &dpdk_list, + }; int ret; - ovs_thread_create("dpdk_watchdog", dpdk_watchdog, NULL); + ovs_thread_create("dpdk_watchdog", netdev_dpdk_watchdog, + &watchdog_params); unixctl_command_register("netdev-dpdk/set-admin-state", "[netdev] up|down", 1, 2, netdev_dpdk_set_admin_state, NULL); @@ -6158,7 +6233,8 @@ retry: dev->common.tx_q = NULL; if (!eth_addr_equals(dev->common.hwaddr, dev->common.requested_hwaddr)) { - err = netdev_dpdk_set_etheraddr__(dev, dev->common.requested_hwaddr); + err = netdev_dpdk_set_etheraddr__(&dev->common, + dev->common.requested_hwaddr); if (err) { goto out; } @@ -6676,7 +6752,7 @@ parse_user_mempools_list(const struct smap *ovs_other_config) user_mempools = xrealloc(user_mempools, (n_user_mempools + 1) * sizeof(struct user_mempool_config)); - adj_mtu = FRAME_LEN_TO_MTU(dpdk_buf_size(mtu)); + adj_mtu = FRAME_LEN_TO_MTU(netdev_dpdk_buf_size(mtu)); user_mempools[n_user_mempools].adj_mtu = adj_mtu; user_mempools[n_user_mempools].socket_id = socket_id; n_user_mempools++; From patchwork Wed Apr 1 09:13:16 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218453 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=KT5n4KhH; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzqH20Xkz1yGH for ; Wed, 01 Apr 2026 20:15:39 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id BFF8380D84; Wed, 1 Apr 2026 09:15:37 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id qSTtnJiUQH-Q; Wed, 1 Apr 2026 09:15:36 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org B1CFA808C5 Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=KT5n4KhH Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id B1CFA808C5; Wed, 1 Apr 2026 09:15:36 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id A97CAC0070; Wed, 1 Apr 2026 09:15:36 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 1071AC003D for ; Wed, 1 Apr 2026 09:15:36 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id AA50A406D0 for ; Wed, 1 Apr 2026 09:15:14 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id GHasVeAtazje for ; Wed, 1 Apr 2026 09:15:13 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c001::2; helo=sj2pr03cu001.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org B82BD40794 Authentication-Results: smtp4.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B82BD40794 Authentication-Results: smtp4.osuosl.org; dkim=pass (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=KT5n4KhH Received: from SJ2PR03CU001.outbound.protection.outlook.com (mail-westusazlp170120002.outbound.protection.outlook.com [IPv6:2a01:111:f403:c001::2]) by smtp4.osuosl.org (Postfix) with ESMTPS id B82BD40794 for ; Wed, 1 Apr 2026 09:15:13 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=WssN4xkAoDwMof7I3SVMld5VHYi8eW9i1GuHMlQW2Du8mwoXdYNbUdhGMFAhqew847zJG9heMHSTkvgbEPrrxV9fIRi1lBu1NEvIA6dWBnivowc2veki7dwz1UxkME64aLIOynVyiVmjrWFNcl0aX48BPb+TrtXJYpmEtLNXATOyGeSF82GrijH6tl7nx2t9gxSkGLKrOIl2eMw3iPRHagAH6xxh7GL2bC8kSAzQ5JaFF6aEbCRvEO/lxfWWFVZfjGkEPOtAhmladFmYxoUA/96zua/Ozu5X6hs/Cr6d2aJwCAhwQvJTBjagw+OX170qYEuKB0YBn0NZLL5XLCvLeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8mUMI+Yji0+quQKW8wSynYU7AWimK+Z1TXZTZdomg2o=; b=XgGztEbb+/dOcuSj3sS91iaNnEVd+7nIsEPCzzGSQ8L814gdpxRCwot6CMRuS9Xg3F8eLz+2n0KDXbpaGwh3rRvdnbOyfFgje6759p99/Luo5yhfZvhpAd6HrvxlxLzfB9svpjti48O5l3UwzgdIjNmoeYWjZy4AGKsyzjunPpdXhOISri3D+wj1zKcY2qMW6yWlVoU2yU6RB2kEeDve+/l0KhmkZdMZDUWnFURf9hecX36FA3junNg7kbhPXsTiUCBGokI2JAMYLAARA1aUSmdrXs9Hj0rD1u4WhxvljgReZsR85sf/Fce9TRKEIWzXu9JG1j8Ir492gChtbp73Hg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8mUMI+Yji0+quQKW8wSynYU7AWimK+Z1TXZTZdomg2o=; b=KT5n4KhHjZs8cJQPuW97ce/RsMRrhJ+K1BJ3Wds3llEiYD2VT7Iw7MMJFEj8VQv6GOWuRocP8cvXyPROQZ6xljSSCgRKfE0GigdNmWWiTXyYsfJKgzsccKvA5l6yNcnyehZKXBa0zRMTXZttb0XjRYglWC7x/vC5GX6FAsJvCaXipwMqfS3gvMJSmGDBHBCS+siDRA594812t86uSKyyaZyaGLeRNtKB+rXNZwV2htAHOoOMa3BxexmCOfs3/GcNy38BIfqtMpmJ+Dwdcoxru/q324brPJzVDDVzpQwv3f9RCxrs8dYdqfOpUk/PzpLirYHhtCmcdLp4xzO2k7Uv2A== Received: from PH1PEPF000132E4.NAMP220.PROD.OUTLOOK.COM (2603:10b6:518:1::24) by SJ2PR12MB9243.namprd12.prod.outlook.com (2603:10b6:a03:578::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Wed, 1 Apr 2026 09:15:09 +0000 Received: from SA2PEPF000015CB.namprd03.prod.outlook.com (2a01:111:f403:c931::1) by PH1PEPF000132E4.outlook.office365.com (2603:1036:903:47::3) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.27 via Frontend Transport; Wed, 1 Apr 2026 09:15:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015CB.mail.protection.outlook.com (10.167.241.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:09 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:52 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:49 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:16 +0300 Message-ID: <20260401091318.2671624-10-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015CB:EE_|SJ2PR12MB9243:EE_ X-MS-Office365-Filtering-Correlation-Id: 603ad34c-1129-4fd8-2593-08de8fcf2c32 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|82310400026|1800799024|376014|36860700016|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: 5NKPgTpz5CGHxsl/UJB/I8UsRqsVdI2Cscf5KK7QksCxKu9tLxNbxdusGGMKoAss9JfGkQ+D07ZzOZDKRhkjqAR9bTmqj4YHdFJMr/4QyQ+gHqunTj7vCxx/f+VouNmhLvHO2Xr5j2w4sjA0j34KwX80lnV+xs29ck6lA6ySQ19tWe8DhVAGUD2ScAZzI79WJcIp5ftJTDWxIC0QNoygss4k/toYgPE6KajGEIbCvwqw7hbStgCxn1Rc+5V3+isepo4aLS/bYyXZa8cXvvKpIqBrEucGNTX55qVNAZ5uNBwW5pPzEiYFD0iX3zd39gdrfSdy6MVeDLqjR6APKU9+dUL6NI2JjxyPeMwyZWLePsv0sX7u7bXouLilQpjFyW+Gc1vvLK5CNnekWuQ9wt6Z0KKftjJkPFvBivRsjTtDXxqQchyPcj7LbuxkUONrkNpWFIKpvBGrDWZ8HYpw+uVKXqOuSfgPwmni8ZnGP8iemGYAEF4kOIWyS88uBGKXvHLqONhUS0rpcK3APpu2zWVUw22uoFmwU3vKXGpwAKCX4C0f0ilV1GYS7+1ho7nCImBNytR4PFIJlprXyTDMpr7LN5Hne0APcqbv1NbawzDBl+OPhZyVYDA/r4IWSel/5XG1noIrO1oVE/vmfhPj6y2PzMnLZeehVf0VJWZuLPRRTz4tfSvqleBR/+ZD9lobAAqqZUWAstqanGHBNzSHp1ljduUQRDzzwFHSIt9dpP7OockJ8GRXoemaiTc6oyEwRtj9sPgCqDdaEpXa9X03cc0dMQ== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(82310400026)(1800799024)(376014)(36860700016)(22082099003)(18002099003)(56012099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: aaMy8G9fZsNIHEEoxuSo7cei200yqvLQS985GJAsyh8lp1XL+sJEcULrp8ubBb/yAO6Wx+uRS9B1epKts3hoMnaLYejG3WOQJjrm2S/wXBIxInTk2X4HE9Y0/MBgQyQWX9qklhuNrt/6PmxW6G8bkO9GMJtaLweDcLND+EVo2dNpdJy3luI1aVFRS/7XVdIOJR2mgIjknmcYZOMp26D3sW16sbRnQTknge1LxEBNCPwLCAN01wh74NlBRZ1wM+f6sTOGBIqYjU9aBdUBU0WYakMUII7Wxs3+T9945XU/i6y62TAxVyH+s7zGKjnKqYdr8+fovbB59jKvDr42ahoVZVjIj6XzYyBipkQc71HFTAAbTHsRWadyKwq5LJRAV8iQiZvWaalIq358btHaLfbnada8tNK5atZQ9lZgSY2ZxSb5f2P0pHqUGPmViRsnIbcS X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:09.4323 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 603ad34c-1129-4fd8-2593-08de8fcf2c32 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015CB.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB9243 Subject: [ovs-dev] [PATCH v3 09/11] unixctl: Introduce unixctl_mem_stream(). X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" A utility function for mem-stream is introduced. It replaces the usage in dpdk and later for doca. Signed-off-by: Eli Britstein --- configure.ac | 2 +- lib/dpdk.c | 32 ++++---------------------------- lib/unixctl.c | 44 +++++++++++++++++++++++++++++++++++++++----- lib/unixctl.h | 3 +++ 4 files changed, 47 insertions(+), 34 deletions(-) diff --git a/configure.ac b/configure.ac index 56eacbbc7..cd063f811 100644 --- a/configure.ac +++ b/configure.ac @@ -115,7 +115,7 @@ AC_CHECK_MEMBERS([struct sockaddr_in6.sin6_scope_id], [], [], [[#include #include #include ]]) -AC_CHECK_FUNCS([mlockall strnlen getloadavg statvfs getmntent_r sendmmsg clock_gettime]) +AC_CHECK_FUNCS([mlockall strnlen getloadavg statvfs getmntent_r sendmmsg clock_gettime open_memstream]) AC_CHECK_HEADERS([mntent.h sys/statvfs.h linux/types.h linux/if_ether.h]) AC_CHECK_HEADERS([linux/net_namespace.h stdatomic.h bits/floatn-common.h]) AC_CHECK_HEADERS([net/if_mib.h], [], [], [[#include diff --git a/lib/dpdk.c b/lib/dpdk.c index d27b95cd9..edd07e3ac 100644 --- a/lib/dpdk.c +++ b/lib/dpdk.c @@ -273,30 +273,6 @@ static cookie_io_functions_t dpdk_log_func = { .write = dpdk_log_write, }; -static void -dpdk_unixctl_mem_stream(struct unixctl_conn *conn, int argc OVS_UNUSED, - const char *argv[] OVS_UNUSED, void *aux) -{ - void (*callback)(FILE *) = aux; - char *response = NULL; - FILE *stream; - size_t size; - - stream = open_memstream(&response, &size); - if (!stream) { - response = xasprintf("Unable to open memstream: %s.", - ovs_strerror(errno)); - unixctl_command_reply_error(conn, response); - goto out; - } - - callback(stream); - fclose(stream); - unixctl_command_reply(conn, response); -out: - free(response); -} - static int dpdk_parse_log_level(const char *s) { @@ -491,16 +467,16 @@ dpdk_init__(const struct smap *ovs_other_config) } unixctl_command_register("dpdk/lcore-list", "", 0, 0, - dpdk_unixctl_mem_stream, rte_lcore_dump); + unixctl_mem_stream, rte_lcore_dump); unixctl_command_register("dpdk/log-list", "", 0, 0, - dpdk_unixctl_mem_stream, rte_log_dump); + unixctl_mem_stream, rte_log_dump); unixctl_command_register("dpdk/log-set", "{level | pattern:level}", 0, INT_MAX, dpdk_unixctl_log_set, NULL); unixctl_command_register("dpdk/get-malloc-stats", "", 0, 0, - dpdk_unixctl_mem_stream, + unixctl_mem_stream, malloc_dump_stats_wrapper); unixctl_command_register("dpdk/get-memzone-stats", "", 0, 0, - dpdk_unixctl_mem_stream, rte_memzone_dump); + unixctl_mem_stream, rte_memzone_dump); /* We are called from the main thread here */ RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID; diff --git a/lib/unixctl.c b/lib/unixctl.c index 4fd150959..b8499394f 100644 --- a/lib/unixctl.c +++ b/lib/unixctl.c @@ -15,22 +15,26 @@ */ #include -#include "unixctl.h" + #include #include +#include #include + #include "command-line.h" #include "coverage.h" #include "dirs.h" +#include "jsonrpc.h" +#include "stream.h" +#include "stream-provider.h" +#include "svec.h" +#include "unixctl.h" + #include "openvswitch/dynamic-string.h" #include "openvswitch/json.h" -#include "jsonrpc.h" #include "openvswitch/list.h" #include "openvswitch/poll-loop.h" #include "openvswitch/shash.h" -#include "stream.h" -#include "stream-provider.h" -#include "svec.h" #include "openvswitch/vlog.h" VLOG_DEFINE_THIS_MODULE(unixctl); @@ -643,3 +647,33 @@ unixctl_client_transact(struct jsonrpc *client, const char *command, int argc, jsonrpc_msg_destroy(reply); return error; } + +#ifdef HAVE_OPEN_MEMSTREAM + +void +unixctl_mem_stream(struct unixctl_conn *conn, int argc OVS_UNUSED, + const char *argv[] OVS_UNUSED, void *aux) +{ + void (*callback)(FILE *) = aux; + char *response = NULL; + FILE *stream; + size_t size; + + ovs_assert(callback); + + stream = open_memstream(&response, &size); + if (!stream) { + response = xasprintf("Unable to open memstream: %s.", + ovs_strerror(errno)); + unixctl_command_reply_error(conn, response); + goto out; + } + + callback(stream); + fclose(stream); + unixctl_command_reply(conn, response); +out: + free(response); +} + +#endif diff --git a/lib/unixctl.h b/lib/unixctl.h index 1965f100d..377ecd0a9 100644 --- a/lib/unixctl.h +++ b/lib/unixctl.h @@ -62,6 +62,9 @@ void unixctl_command_reply_error(struct unixctl_conn *, const char *error); void unixctl_command_reply(struct unixctl_conn *, const char *body); void unixctl_command_reply_json(struct unixctl_conn *, struct json *body); +#ifdef HAVE_OPEN_MEMSTREAM +unixctl_cb_func unixctl_mem_stream; +#endif #ifdef __cplusplus } From patchwork Wed Apr 1 09:13:17 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218456 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=pTuBRooV; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::138; helo=smtp1.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzr668BXz1yGH for ; Wed, 01 Apr 2026 20:16:22 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 6E0C582016; Wed, 1 Apr 2026 09:16:21 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id HwVj8gWSVH1J; Wed, 1 Apr 2026 09:16:19 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=2605:bc80:3010:104::8cd3:938; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 5055781DEF Authentication-Results: smtp1.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=pTuBRooV Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5055781DEF; Wed, 1 Apr 2026 09:16:19 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 38E72C0070; Wed, 1 Apr 2026 09:16:19 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6C481C0070 for ; Wed, 1 Apr 2026 09:16:18 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id EFACD81DEF for ; Wed, 1 Apr 2026 09:15:26 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id dI_pEiYQJqzI for ; Wed, 1 Apr 2026 09:15:25 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c10d::3; helo=sn4pr0501cu005.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp1.osuosl.org 6138E81DF3 Authentication-Results: smtp1.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp1.osuosl.org 6138E81DF3 Received: from SN4PR0501CU005.outbound.protection.outlook.com (mail-southcentralusazlp170110003.outbound.protection.outlook.com [IPv6:2a01:111:f403:c10d::3]) by smtp1.osuosl.org (Postfix) with ESMTPS id 6138E81DF3 for ; Wed, 1 Apr 2026 09:15:25 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=Ac+GfkuSuljUknDP8YfPgV4QJEfIeSjgGuHI+rcl/RJ5o4IuTGSKAbkW4C7Cicmr9watcQtnS3XZCDvq+dNHQxBSwvopbNescYwxBp4J7diNuqmZ5ixfA/DYjn53dN0vWiqYc8yRQawiqDbTDFtgUUI/1+mjgD0OA2hUGQdmdh/AyHVzZCdZtAharYiA+DjbCg0gGDw76rxSUORYzQKJSTPF2sH6N4lzi/1YG45Tiyhbtraeqr6ty0UxfkDqVSJghEdZXfm+ODPJxN6tGErs+yAhijlwCuxheV4NroHM9tWkrA31M9MrrEHJADwuSMT1h61hzIACPA0/J9Wf2P5d2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=J1gdZhWuet6S+390boHU1C4sls7ts8v4vvD5YjHFBgk=; b=uXMVKd0e+kSHf/eznCa++c40hSqGCbJAOSXfs2sAswMWOE+u2PfrKrt6uW+WOOEhtUYO5DWW8qDgbsVo9UafBMaxXTPROZ3091XKqB1f+R7Dd7Yeg8hH689YarFFDcW2T+bhOcgMp1I/qLYFh6vdwb0nDVzqsVC2SfBwS0ol69fajU916hBeI2oqOka8BAa0M9QXMW5qgf8puYpd5+NyM3oU1rG4prBeH2MRP7zGvdXOdGSbtIcCwDkHh8j+iBMthwAV3BmiFUSCpQWVrvGwTxgwZCNhMiVQayP48qEDmEwqNwVUYWY+HuUy8qw4kSuF1sw2lU9fAyQIGsUH80Bt2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.161) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=J1gdZhWuet6S+390boHU1C4sls7ts8v4vvD5YjHFBgk=; b=pTuBRooVjyb1aPJcte01j0Ho19bzGhZb1afEoKvtePYsQhvXf7qkIeXN0gy3rKeFZiRYyZh5kIEjM9ia0un/0uVaQBD7of4g9OlXRav+JpTL6crlWcR3dtsIJ7pTkBNCizrV42NsEoWWUZDKMpFU3EGXWC49hujzbqnUNJ3V0y7knAzv75rlruoWKK1JPPc3R+SwWywtHom9iVvaOJCDAd3ldhk3UtL7CJ63W258uWdKIKDionp6IoDM0xEjDJ7iOMTCNLj1+oYa/TAwWmpiE/bXcIG8+frtq8laR2ADkes5DFvCt2ZQ8ucsdQZWiXUEqOczVs07gJ5QiMQwzx72Bg== Received: from PH1PEPF000132EA.NAMP220.PROD.OUTLOOK.COM (2603:10b6:518:1::2e) by BL1PR12MB5899.namprd12.prod.outlook.com (2603:10b6:208:397::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17; Wed, 1 Apr 2026 09:15:15 +0000 Received: from SA2PEPF000015CB.namprd03.prod.outlook.com (2a01:111:f403:c801::5) by PH1PEPF000132EA.outlook.office365.com (2603:1036:903:47::3) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Wed, 1 Apr 2026 09:15:14 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.161 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.161) by SA2PEPF000015CB.mail.protection.outlook.com (10.167.241.201) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:14 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:55 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:52 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:17 +0300 Message-ID: <20260401091318.2671624-11-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SA2PEPF000015CB:EE_|BL1PR12MB5899:EE_ X-MS-Office365-Filtering-Correlation-Id: 556857ba-c0e2-4e97-b675-08de8fcf2f55 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700016|376014|82310400026|1800799024|13003099007|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: nImoXpXE5f4KagG6MA+kmrB42i44WtP02/85R1ldcBa87EGQSQTUYaWBDrO20XDLCJsXwnzCBFrFbF9LkjP/iNzuYfFk9iFVwLWmVk2qtrVMBilu/97inZmrgj8vCMfFjsPjHgp5ALr6tDuv90eJQg1vtcgkRXOx1ZIJ8hNz4lZUR3gle+5PG1KLEv7jQYmqmhb9N1gwL6FY1Z4b0JLCHVN3W0pBN85LoNhomeCs72G1ib1BbFsu4LTl2a40cb5wv54bzJjEtaJ1n4EBJhEyuukDY0QpdVnXbV7Pjui4XGyGjRLN5Oe5+MBrX7NbaF8gZiCcGa97IyY6X+bx21PO3nD5JH1pep/zzIIf6oBW0KC06He8q8O688bLN6gP2uCqsMNBtH3chZhoH5esrjAC/pCLjK58CIZ8MVMSu9mwnHmUgPT9Eu3Ro5/lCmR+SzmJeA7v5TbAEWlIajJqPH3Ys3pqLEMB7vZzI3fSAAmJOLjUmtBsoKtBEKz7ocGoyKt4mDnvW0cgF6SCS3HEyz8gSI/vGVUabJX6BaMxbGSe7Owt0tUkAZR8kGnDWeMaaupE4qI1jDFC1E3afnXSmXODxBCIVqlsdkNygsGHzSzGlBX/lsSHI4OS7z/Kh1jdNXHuMyclgSbU4edXLpt8phmBHmEx6nA/bRtvRKpyFDLKICicx7bMm11hHIcrZ1BpZcEOxWRWl8NwB/afozzhLgj3l73EgEaHS8pFXUoNOoZG8tQdJ0JLXbTEjINADL2Er8wZh/tTUkPHC8hop6H4m+55qQ== X-Forefront-Antispam-Report: CIP:216.228.117.161; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge2.nvidia.com; CAT:NONE; SFS:(13230040)(36860700016)(376014)(82310400026)(1800799024)(13003099007)(18002099003)(56012099003)(22082099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: pPCAaXgi9zQsE6Vnpq8eY7mDQQ5HF925q658AtDGlthze+ymOb2jRHfDxNggHJI8IVxB/EpZ9HBKpCPhhjj/xMmzAVHAiFma0h9kVHIgjuCcC9hrD/Q6ou6Df6GcxdSCCBwvdB+PEHOKs5REOt9Bpo0wDeFnlgjwIyoK8/stVupVqEYu7nai6d/gpyZsKeYrJPKPH8WxPodlrhJ1B01fS7m+F4u9cmcxyU48sbQAYJNDFTD5Tn3AL2/CCKQS2XH5GYIe4QAlDXgFzue0CCt8iV6tR8u5mXOoeVYhJjtYNJag9rIpV/oyXVCOKu+wjpPTYV3/XqgiCiI+cmR8SPvDjdSEfTcdWamc3uQ0ymbjXKeLhpZQdRQY5fXhehQbPeSfufKLPo/na6gGxslZ50zof0oU0B5pnHsnMj/m1VuBZAcyZBNzkM+Z2HZ4L7dbgYIU X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:14.6838 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 556857ba-c0e2-4e97-b675-08de8fcf2f55 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.161]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: SA2PEPF000015CB.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL1PR12MB5899 Subject: [ovs-dev] [PATCH v3 10/11] acinclude.m4: Add '--with-doca' option. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ariel Levkovich , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" From: Ariel Levkovich Add a new option to build ovs with doca by specifying '--with-doca' in the configure line. This flag must be used along with '--with-dpdk'. Otherwise the configure step will fail. An example: ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc \ --with-dpdk=static --with-doca=static Co-authored-by: Salem Sol Signed-off-by: Salem Sol Co-authored-by: Eli Britstein Signed-off-by: Eli Britstein Signed-off-by: Ariel Levkovich --- .ci/doca-build.sh | 36 ++++++ .ci/doca-install.sh | 20 +++ .github/workflows/build-and-test.yml | 58 +++++++++ Makefile.am | 2 + acinclude.m4 | 174 +++++++++++++++++++++++++++ configure.ac | 1 + lib/automake.mk | 4 + lib/dpdk.h | 1 + lib/ovs-doca.c | 86 +++++++++++++ lib/ovs-doca.h | 31 +++++ utilities/checkpatch_dict.txt | 1 + vswitchd/bridge.c | 5 + vswitchd/ovs-vswitchd.c | 3 + vswitchd/vswitch.ovsschema | 9 +- vswitchd/vswitch.xml | 10 ++ 15 files changed, 439 insertions(+), 2 deletions(-) create mode 100755 .ci/doca-build.sh create mode 100755 .ci/doca-install.sh create mode 100644 lib/ovs-doca.c create mode 100644 lib/ovs-doca.h diff --git a/.ci/doca-build.sh b/.ci/doca-build.sh new file mode 100755 index 000000000..ed9dd7dc9 --- /dev/null +++ b/.ci/doca-build.sh @@ -0,0 +1,36 @@ +#!/bin/bash + +set -o errexit +set -x + +CFLAGS_FOR_OVS="-g -O2" +EXTRA_OPTS="--enable-Werror" +JOBS=${JOBS:-"-j4"} + +DOCA_LINK="${DOCA_LINK:-static}" + +for pc_dir in $(find /opt/mellanox -name pkgconfig -type d 2>/dev/null); do + PKG_CONFIG_PATH="${pc_dir}:${PKG_CONFIG_PATH}" +done +export PKG_CONFIG_PATH + +if [ "$DOCA_LINK" = "shared" ]; then + DOCA_LIB=$(find /opt/mellanox -name pkgconfig -type d 2>/dev/null \ + | head -1 | sed 's|/pkgconfig$||') + export LD_LIBRARY_PATH="${DOCA_LIB}:${LD_LIBRARY_PATH}" +fi +sudo ldconfig + +if [ "$CC" = "clang" ]; then + CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} -Wno-error=unused-command-line-argument" +fi + +EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=$DOCA_LINK --with-doca=$DOCA_LINK" + +if [ "$DOCA_LINK" = "shared" ]; then + EXTRA_OPTS="$EXTRA_OPTS --enable-shared" +fi + +./boot.sh +./configure CFLAGS="${CFLAGS_FOR_OVS}" $EXTRA_OPTS +make $JOBS diff --git a/.ci/doca-install.sh b/.ci/doca-install.sh new file mode 100755 index 000000000..5931bf821 --- /dev/null +++ b/.ci/doca-install.sh @@ -0,0 +1,20 @@ +#!/bin/bash + +set -ev + +# Install DOCA SDK packages. +# +# Download the DOCA host repo package from: +# https://developer.nvidia.com/doca-downloads +# deployment_platform=Host-Server, deployment_package=DOCA-Host, +# target_os=Linux, Architecture=x86_64, Profile=doca-all +# + +DOCA_REPO_PKG_URL="${DOCA_REPO_PKG_URL:?Set to .deb repo package URL}" + +wget -q "$DOCA_REPO_PKG_URL" -O /tmp/doca-repo.deb +sudo dpkg -i /tmp/doca-repo.deb +sudo apt-get update +sudo apt-get install -y dpdk-community-dev \ + libdoca-sdk-flow-dev libdoca-sdk-dpdk-bridge-dev + diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml index f1d006de6..8924d517c 100644 --- a/.github/workflows/build-and-test.yml +++ b/.github/workflows/build-and-test.yml @@ -699,3 +699,61 @@ jobs: path: | rpm/rpmbuild/SRPMS/*.rpm rpm/rpmbuild/RPMS/*/*.rpm + + build-doca: + env: + dependencies: | + automake libtool gcc bc libssl-dev llvm-dev libnuma-dev \ + libunbound-dev libunwind-dev libsystemd-dev wget python3-pip + DOCA_REPO_PKG_URL: "https://www.mellanox.com/downloads/DOCA/DOCA_v3.3.0/host/doca-host_3.3.0-088000-26.01-ubuntu2404_amd64.deb" + CC: ${{ matrix.compiler }} + DOCA_LINK: ${{ matrix.doca_link }} + + name: doca ubuntu ${{ matrix.compiler }} ${{ matrix.doca_link }} + runs-on: ubuntu-24.04 + timeout-minutes: 30 + + strategy: + fail-fast: false + matrix: + include: + - compiler: gcc + doca_link: static + - compiler: gcc + doca_link: shared + - compiler: clang + doca_link: static + - compiler: clang + doca_link: shared + + steps: + - name: checkout + uses: actions/checkout@v4 + + - name: update PATH + run: | + echo "$HOME/bin" >> $GITHUB_PATH + echo "$HOME/.local/bin" >> $GITHUB_PATH + + - name: set up python + uses: actions/setup-python@v5 + with: + python-version: ${{ env.python_default }} + + - name: update APT cache + run: sudo apt update || true + - name: install common dependencies + run: sudo apt install -y ${{ env.dependencies }} + + - name: install DOCA + run: ./.ci/doca-install.sh + + - name: build + run: ./.ci/doca-build.sh + + - name: upload logs on failure + if: failure() || cancelled() + uses: actions/upload-artifact@v4 + with: + name: logs-doca-ubuntu-${{ matrix.compiler }}-${{ matrix.doca_link }} + path: config.log diff --git a/Makefile.am b/Makefile.am index a805f21d1..ddc3e931e 100644 --- a/Makefile.am +++ b/Makefile.am @@ -77,6 +77,8 @@ EXTRA_DIST = \ MAINTAINERS.rst \ README.rst \ NOTICE \ + .ci/doca-build.sh \ + .ci/doca-install.sh \ .ci/dpdk-build.sh \ .ci/dpdk-prepare.sh \ .ci/linux-build.sh \ diff --git a/acinclude.m4 b/acinclude.m4 index 060c416f8..e8d475f37 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -374,6 +374,179 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [ AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true) ]) +dnl OVS_CHECK_DOCA +dnl +dnl Configure DOCA source tree +AC_DEFUN([OVS_CHECK_DOCA], [ + AC_ARG_WITH([doca], + [AS_HELP_STRING([--with-doca=static], + [Specify "static" depending on the + DOCA libraries to use. A custom DOCA install path + can be used otherwise for local builds.])], + [have_doca=true]) + + if test "$have_dpdk" != true || test "$with_dpdk" = no; then + if test "$have_doca" = true; then + AC_MSG_ERROR([Cannot compile link against doca without dpdk, please add --with-dpdk]) + fi + fi + AC_MSG_CHECKING([whether doca is enabled]) + if test "$have_doca" != true || test "$with_doca" = no; then + AC_MSG_RESULT([no]) + DOCALIB_FOUND=false + else + AC_MSG_RESULT([yes]) + if test -d "$with_doca"; then + DOCA_INSTALL="$with_doca" + elif test -d "/opt/mellanox/doca"; then + DOCA_INSTALL=/opt/mellanox/doca + else + DOCA_INSTALL=/usr/local + fi + DOCA_PKGCONFIG="$(find ${DOCA_INSTALL} -type f -name doca-flow.pc -exec dirname {} \; | head -1)" + if test -n "$DOCA_PKGCONFIG"; then + if test -n "$PKG_CONFIG_PATH"; then + export PKG_CONFIG_PATH="${DOCA_PKGCONFIG}:${PKG_CONFIG_PATH}" + else + export PKG_CONFIG_PATH="${DOCA_PKGCONFIG}" + fi + fi + + echo "checking for DOCA in PKG_CONFIG_PATH='${PKG_CONFIG_PATH}'" + case "$with_doca" in + "static"|"shared") + DOCA_LINK="$with_doca" + ;; + *) + if test "$enable_shared" = yes; then + DOCA_LINK="shared" + else + DOCA_LINK="static" + fi + ;; + esac + + DOCA_PKGS="doca-flow doca-dpdk-bridge doca-common" + if test "$DOCA_LINK" = static; then + PKG_CHECK_MODULES_STATIC([DOCA], [$DOCA_PKGS], [], + [AC_MSG_ERROR([unable to use $DOCA_PKGS .pc files for $DOCA_LINK build])]) + else + PKG_CHECK_MODULES([DOCA], [$DOCA_PKGS], [], + [AC_MSG_ERROR([unable to use $DOCA_PKGS .pc files for $DOCA_LINK build])]) + fi + DOCA_INCLUDE="$DOCA_CFLAGS -DDOCA_ALLOW_EXPERIMENTAL_API" + + if test "$DOCA_LINK" = static; then + # pkg-config --static may emit the same library in duplicate + # --whole-archive blocks when multiple packages share a dependency + # (both doca-flow and doca-dpdk-bridge pull in doca-common). + # Linking the same .a under --whole-archive twice causes "multiple + # definition" errors. Remove the second occurrence using a sed + # backreference, and strip redundant shared-lib flags (-l) + # since the static .a is already linked via --whole-archive. + DOCA_DEDUP_LIBS="doca_common doca_dpdk_bridge" + for lib in $DOCA_DEDUP_LIBS; do + lib_count=$(echo "$DOCA_LIBS" | grep -o "l:lib${lib}\.a" | wc -l) + if test "$lib_count" -ge 2; then + DOCA_LIBS=$(echo "$DOCA_LIBS" | sed "s@-Wl,--whole-archive -L[[^ ]]* -l:lib${lib}\.a -Wl,--no-whole-archive -Wl,--as-needed @@2g") + fi + if echo "$DOCA_LIBS" | grep -q "l:lib${lib}\.a"; then + DOCA_LIBS=$(echo "$DOCA_LIBS" | sed "s/-l${lib}//g") + fi + done + fi + + USED_PATH=`$PKG_CONFIG --variable=prefix doca-flow` + echo "Using DOCA release: '$USED_PATH'" + + ovs_save_CFLAGS="$CFLAGS" + ovs_save_LDFLAGS="$LDFLAGS" + # Statically linked libraries might have been built with sanitizers enabled. + # In such case, use the generated sanitizer cflags. + CFLAGS="$CFLAGS $SANITIZER_CFLAGS $DOCA_INCLUDE" + + AC_MSG_CHECKING([for doca_flow.h]) + AC_COMPILE_IFELSE( + [AC_LANG_PROGRAM([#include ], + [struct doca_flow_port *port = NULL ;])], + [AC_MSG_RESULT([yes])], + [AC_MSG_RESULT([no]) + AC_MSG_ERROR(m4_normalize([ + Unable to include doca_flow.h, check the config.log for more details. + As a DOCA library was found in the current search path, a missing doca_flow.h + usually means that it was built without DOCA-flow support. + Verify that you fullfilled all DOCA-flow build dependencies and that it + was not automatically disabled.])) + ]) + + # DOCA's static pkg-config output already includes DPDK through + # its transitive dependency on libdpdk (via doca-dpdk-bridge), so + # no need to add DPDK_LIB separately for static link tests. + if test "$enable_shared" = yes; then + LIBS="$DOCA_LIBS $LIBS" + else + LIBS="$DOCA_LIBS $ovs_save_libs_before_dpdk" + fi + AC_MSG_CHECKING([for DOCA-flow link]) + AC_LINK_IFELSE( + [AC_LANG_PROGRAM([#include + #include ], + [struct doca_flow_cfg *cfg; + int rv; + doca_flow_cfg_create(&cfg); + rv = doca_flow_init(cfg); + doca_flow_cfg_destroy(cfg); + return rv;])], + [AC_MSG_RESULT([yes]) + DOCALIB_FOUND=true], + [AC_MSG_RESULT([no]) + AC_MSG_ERROR(m4_normalize([ + Unable to link with DOCA-flow, check the config.log for more details. + If a working DOCA-flow library was not found in the current search path, + update PKG_CONFIG_PATH for pkg-config to find the .pc file in a proper location.])) + ]) + CFLAGS="$ovs_save_CFLAGS" + LDFLAGS="$ovs_save_LDFLAGS" + OVS_CFLAGS="$OVS_CFLAGS $DOCA_INCLUDE -Wno-deprecated-declarations -DALLOW_EXPERIMENTAL_API" + + # DOCA libraries are very specific in their ordering and inherit DPDK + # libraries which contain --whole-archive. Autotools will reorder + # them, breaking static links. Use the same solution as DPDK below. + # Transform the pkg-config output into a single linker parameter, separated + # by commas and wrapped by -Wl. + DOCA_LDFLAGS=$(echo "$DOCA_LIBS" | tr -s ' ' ',' | sed 's/-Wl,//g') + # Replace -pthread with -lpthread for LD and remove the last extra comma. + DOCA_LDFLAGS=$(echo "$DOCA_LDFLAGS"| sed 's/,$//' | sed 's/-pthread/-lpthread/g') + # Prepend "-Wl,". + DOCA_LDFLAGS="-Wl,$DOCA_LDFLAGS" + + # The full DOCA linker parameters must be made available to every + # object trying to link against libopenvswitch. It means every + # binary generated will contain DOCA unfortunately. + if test "$DOCA_LINK" = static; then + # DOCA's static pkg-config output already includes DPDK through + # its transitive dependency on libdpdk (via doca-dpdk-bridge). + OVS_LDFLAGS="$OVS_LDFLAGS $DOCA_LDFLAGS" + # Clear to prevent double linkage from Makefile.am + DPDK_vswitchd_LDFLAGS="" + else + # For shared builds, DPDK is not in DOCA's output (libdpdk is in + # Requires.private, not followed by pkg-config without --static). + # Link DPDK separately. Add PMD drivers explicitly as they may + # not be in Libs field for shared builds. + for pmd in rte_net_mlx5 rte_net_vhost; do + if ! echo "$DPDK_LIB" | grep -q "\-l$pmd"; then + DPDK_LIB="$DPDK_LIB -l$pmd" + fi + done + OVS_LDFLAGS="$OVS_LDFLAGS $DOCA_LDFLAGS $DPDK_LIB" + fi + AC_DEFINE([DOCA_NETDEV], [1], [System uses the DOCA module.]) + fi + + AM_CONDITIONAL([DOCA_NETDEV], [$DOCALIB_FOUND]) +]) + dnl OVS_CHECK_DPDK dnl dnl Configure DPDK source tree @@ -478,6 +651,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [ OVS_FIND_DEPENDENCY([dlopen], [dl], [libdl]) AC_MSG_CHECKING([whether linking with dpdk works]) + ovs_save_libs_before_dpdk="$LIBS" LIBS="$DPDK_LIB $LIBS" AC_LINK_IFELSE( [AC_LANG_PROGRAM([#include diff --git a/configure.ac b/configure.ac index cd063f811..031d38c90 100644 --- a/configure.ac +++ b/configure.ac @@ -204,6 +204,7 @@ OVS_CHECK_LINUX_TC OVS_CHECK_LINUX_SCTP_CT OVS_CHECK_LINUX_VIRTIO_TYPES OVS_CHECK_DPDK +OVS_CHECK_DOCA OVS_CHECK_PRAGMA_MESSAGE OVS_CHECK_VERSION_SUFFIX AC_SUBST([CFLAGS]) diff --git a/lib/automake.mk b/lib/automake.mk index bab03c3e7..66c5c3d93 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -517,6 +517,10 @@ lib_libopenvswitch_la_SOURCES += \ lib/dpdk-stub.c endif +lib_libopenvswitch_la_SOURCES += \ + lib/ovs-doca.c \ + lib/ovs-doca.h + if WIN32 lib_libopenvswitch_la_SOURCES += \ lib/dpif-netlink.c \ diff --git a/lib/dpdk.h b/lib/dpdk.h index 1b790e682..7571604dd 100644 --- a/lib/dpdk.h +++ b/lib/dpdk.h @@ -18,6 +18,7 @@ #define DPDK_H #include +#include #ifdef DPDK_NETDEV diff --git a/lib/ovs-doca.c b/lib/ovs-doca.c new file mode 100644 index 000000000..eae361a21 --- /dev/null +++ b/lib/ovs-doca.c @@ -0,0 +1,86 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include "compiler.h" +#include "ovs-doca.h" +#include "vswitch-idl.h" + +#ifdef DOCA_NETDEV + +#include +#include + +#include + +/* DOCA disables dpdk steering as a constructor in higher priority. + * Set a lower priority one to enable it back. Disable it only upon using + * doca ports. + */ +RTE_INIT(dpdk_steering_enable) +{ + rte_pmd_mlx5_enable_steering(); +} + +void +ovs_doca_init(const struct smap *ovs_other_config OVS_UNUSED) +{ +} + +void +print_doca_version(void) +{ + puts(doca_version_runtime()); +} + +void +ovs_doca_status(const struct ovsrec_open_vswitch *cfg) +{ + if (!cfg) { + return; + } + + ovsrec_open_vswitch_set_doca_initialized(cfg, false); + ovsrec_open_vswitch_set_doca_version(cfg, doca_version_runtime()); +} + +#else /* DOCA_NETDEV */ + +void +ovs_doca_init(const struct smap *ovs_other_config OVS_UNUSED) +{ +} + +void +print_doca_version(void) +{ +} + +void +ovs_doca_status(const struct ovsrec_open_vswitch *cfg) +{ + if (!cfg) { + return; + } + + ovsrec_open_vswitch_set_doca_initialized(cfg, false); + ovsrec_open_vswitch_set_doca_version(cfg, "none"); +} + +#endif /* DOCA_NETDEV */ diff --git a/lib/ovs-doca.h b/lib/ovs-doca.h new file mode 100644 index 000000000..9bd96c941 --- /dev/null +++ b/lib/ovs-doca.h @@ -0,0 +1,31 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OVS_DOCA_H +#define OVS_DOCA_H + +#include + +struct ovsrec_open_vswitch; +struct smap; + +void ovs_doca_init(const struct smap *ovs_other_config); +void print_doca_version(void); +void ovs_doca_status(const struct ovsrec_open_vswitch *); + +#endif /* OVS_DOCA_H */ diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt index 5ad599c1d..6a454bcf8 100644 --- a/utilities/checkpatch_dict.txt +++ b/utilities/checkpatch_dict.txt @@ -55,6 +55,7 @@ dhcpv4 dhcpv6 dnat dns +doca dpcls dpctl dpdk diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index 7a68e19ac..dd494b0f6 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -50,6 +50,7 @@ #include "openvswitch/ofpbuf.h" #include "openvswitch/vconn.h" #include "openvswitch/vlog.h" +#include "ovs-doca.h" #include "ovs-lldp.h" #include "ovs-numa.h" #include "packets.h" @@ -451,6 +452,8 @@ bridge_init(const char *remote) ovsdb_idl_omit(idl, &ovsrec_open_vswitch_col_system_version); ovsdb_idl_omit_alert(idl, &ovsrec_open_vswitch_col_dpdk_version); ovsdb_idl_omit_alert(idl, &ovsrec_open_vswitch_col_dpdk_initialized); + ovsdb_idl_omit_alert(idl, &ovsrec_open_vswitch_col_doca_version); + ovsdb_idl_omit_alert(idl, &ovsrec_open_vswitch_col_doca_initialized); ovsdb_idl_omit_alert(idl, &ovsrec_bridge_col_datapath_id); ovsdb_idl_omit_alert(idl, &ovsrec_bridge_col_datapath_version); @@ -3260,6 +3263,7 @@ run_status_update(void) connectivity_seqno = seq; status_txn = ovsdb_idl_txn_create(idl); dpdk_status(cfg); + ovs_doca_status(cfg); HMAP_FOR_EACH (br, node, &all_bridges) { struct port *port; @@ -3400,6 +3404,7 @@ bridge_run(void) if (cfg) { dpdk_init(&cfg->other_config); + ovs_doca_init(&cfg->other_config); userspace_tso_init(&cfg->other_config); } diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index 6d90c73b8..03c739443 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -30,6 +30,7 @@ #include "compiler.h" #include "daemon.h" #include "dirs.h" +#include "dpdk.h" #include "dpif.h" #include "dummy.h" #include "fatal-signal.h" @@ -37,6 +38,7 @@ #include "netdev.h" #include "openflow/openflow.h" #include "ovsdb-idl.h" +#include "ovs-doca.h" #include "ovs-rcu.h" #include "ovs-router.h" #include "ovs-thread.h" @@ -220,6 +222,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) case 'V': ovs_print_version(0, 0); print_dpdk_version(); + print_doca_version(); exit(EXIT_SUCCESS); case OPT_MLOCKALL: diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema index c658291c7..d3b84ac30 100644 --- a/vswitchd/vswitch.ovsschema +++ b/vswitchd/vswitch.ovsschema @@ -1,6 +1,6 @@ {"name": "Open_vSwitch", - "version": "8.8.0", - "cksum": "2823623553 27869", + "version": "8.9.0", + "cksum": "2639123554 28037", "tables": { "Open_vSwitch": { "columns": { @@ -56,6 +56,11 @@ "dpdk_initialized": { "type": "boolean"}, "dpdk_version": { + "type": {"key": {"type": "string"}, + "min": 0, "max": 1}}, + "doca_initialized": { + "type": "boolean"}, + "doca_version": { "type": {"key": {"type": "string"}, "min": 0, "max": 1}}}, "isRoot": true, diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index b7a5afc0a..9edd1027e 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -941,6 +941,10 @@ true and the DPDK library is successfully initialized. + + Always false. + +

The statistics column contains key-value pairs that @@ -1131,6 +1135,12 @@

+ +

+ The version of the linked DOCA library. +

+
+
From patchwork Wed Apr 1 09:13:18 2026 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eli Britstein X-Patchwork-Id: 2218458 X-Patchwork-Delegate: echaudro@redhat.com Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=hOjWcZw3; dkim-atps=neutral Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::137; helo=smtp4.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=patchwork.ozlabs.org) Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange x25519 server-signature ECDSA (secp384r1) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4flzrd57fTz1yGH for ; Wed, 01 Apr 2026 20:16:49 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 5CB1841031; Wed, 1 Apr 2026 09:16:48 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id pubMoLfPS5H4; Wed, 1 Apr 2026 09:16:45 +0000 (UTC) X-Comment: SPF check N/A for local connections - client-ip=140.211.9.56; helo=lists.linuxfoundation.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver= DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org B290141114 Authentication-Results: smtp4.osuosl.org; dkim=fail reason="signature verification failed" (2048-bit key, unprotected) header.d=Nvidia.com header.i=@Nvidia.com header.a=rsa-sha256 header.s=selector2 header.b=hOjWcZw3 Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp4.osuosl.org (Postfix) with ESMTPS id B290141114; Wed, 1 Apr 2026 09:16:45 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7271BC0070; Wed, 1 Apr 2026 09:16:45 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [140.211.166.137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 8BF58C003D for ; Wed, 1 Apr 2026 09:16:43 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 14FCB41112 for ; Wed, 1 Apr 2026 09:15:41 +0000 (UTC) X-Virus-Scanned: amavis at osuosl.org Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavis, port 10024) with ESMTP id pdtJcno00xOn for ; Wed, 1 Apr 2026 09:15:38 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a01:111:f403:c105::5; helo=ch5pr02cu005.outbound.protection.outlook.com; envelope-from=elibr@nvidia.com; receiver= DMARC-Filter: OpenDMARC Filter v1.4.2 smtp4.osuosl.org 800714110B Authentication-Results: smtp4.osuosl.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com DKIM-Filter: OpenDKIM Filter v2.11.0 smtp4.osuosl.org 800714110B Received: from CH5PR02CU005.outbound.protection.outlook.com (mail-northcentralusazlp170120005.outbound.protection.outlook.com [IPv6:2a01:111:f403:c105::5]) by smtp4.osuosl.org (Postfix) with ESMTPS id 800714110B for ; Wed, 1 Apr 2026 09:15:37 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=njH6lmCLucv34+cxIKBrtugqcWeAPXy5qn3WFuHeXdtUz/wy0XapAezWchKz/A7oOZi5wbdT+z7w13YVerrxEli8dNEgQIgiQVFSi84tXY5YcY1SzSjyGfk0YeSK4WHYl2sz8IaNQRNLNzAefi1b1L3McNsKA09Titw8wwvtzTashOrkzaxaUGl1qwiJz0pJIl/YDs5CgWO6MgtBnUHWhHMDdz+TmEYl73CQ/v4jXRZXG2nzFdcEKeZDQkpA6eMRYyl1FqaEybsnykqKAwDmQccY7ddNmRKCgzd95ZBZQjqVtskzxx0BhuY2tZAaF8uVxWvgK9USQBiotMKg/DhkEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=YVb8U5Hn9qFq0B4onD4x6FXiKL9aGUkUlKn59dV4FJ0=; b=qsYA5o7ughe2vZK5ElFQhXqdZsfnVSiGzKpSAyjHTj3Rs6tPhLq8ZZHShv7Xr7rSa3Gy28tHR72iQe7SrGAiNkG9Ao6cj1mmIRQEvDS+1KA7WKkluRX0nSDUhA8CnSdIt4co5bEYwtmbfB8zsKxcf3RpMwPg8VWe6/xJFYJgM8feFI8OTWerIVeD/WvgT1voTv03E7jNyg4VH1UWIEOAP6Ezpi4qEhMb0EjybFC2rGXwZAn8RGfk78CZMB9nw2LjkZh1SJzytW+BLtBsIyAHYzFF84eAZmiDOa5GvPw315hSJqKiw7jLYb+OE0zUXM7shuxgZF4lsl35WNdFuMx9fg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 216.228.117.160) smtp.rcpttodomain=openvswitch.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YVb8U5Hn9qFq0B4onD4x6FXiKL9aGUkUlKn59dV4FJ0=; b=hOjWcZw3CIYuSZCJykEy3Xao1wqizZzIK19j6bl7oQSlT9XuAQj3dYjdfARY3J2OTYIYB5+f7ubdOf4XkDKUSGXc/pEGAYuaVwhVpGHHYtox+V67I8kjEgJ6NczCIrE9pL4Bn+wXmtBkezMrTFX4PN+bBLiTbOKa1trPVjUXJ2VPMvcVKjbGhi31FpF4ro+mZ2CScribde7RZkOD+UV6c81FDfu9PRyZIwFhFrkl0Bgp+NpYuMAFqYu8GD4LMVbRp6QEHlWn1iDzV+81QOrLWb5/82oQeLp7EHM11gFPrs82eO3hbT6yr2i6AlYyZQDf6aedm6LXO4WII6UHFj0RYg== Received: from PH7P220CA0048.NAMP220.PROD.OUTLOOK.COM (2603:10b6:510:32b::19) by SJ2PR12MB8876.namprd12.prod.outlook.com (2603:10b6:a03:539::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.10; Wed, 1 Apr 2026 09:15:16 +0000 Received: from CY4PEPF0000E9DA.namprd05.prod.outlook.com (2603:10b6:510:32b:cafe::83) by PH7P220CA0048.outlook.office365.com (2603:10b6:510:32b::19) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.29 via Frontend Transport; Wed, 1 Apr 2026 09:15:08 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.160) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (216.228.117.160) by CY4PEPF0000E9DA.mail.protection.outlook.com (10.167.241.73) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.17 via Frontend Transport; Wed, 1 Apr 2026 09:15:15 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by mail.nvidia.com (10.129.200.66) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:59 -0700 Received: from nvidia.com (10.126.231.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Wed, 1 Apr 2026 02:14:55 -0700 From: Eli Britstein To: Date: Wed, 1 Apr 2026 12:13:18 +0300 Message-ID: <20260401091318.2671624-12-elibr@nvidia.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260401091318.2671624-1-elibr@nvidia.com> References: <20260401091318.2671624-1-elibr@nvidia.com> MIME-Version: 1.0 X-Originating-IP: [10.126.231.35] X-ClientProxiedBy: rnnvmail203.nvidia.com (10.129.68.9) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: CY4PEPF0000E9DA:EE_|SJ2PR12MB8876:EE_ X-MS-Office365-Filtering-Correlation-Id: 30c7b794-5cbf-437e-abb9-08de8fcf2fe1 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|376014|82310400026|1800799024|36860700016|13003099007|18092099006|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: 52UPJXPrMfRT8h8ZL4raSWvUxpUje/GRQHWsTgM+j1tR0GYeanDStdbOQOr+IeM7UE3daL6D8Ayggue6csD8drXTphcAaZSwCMuTLgepkVbvPRbh+X99DtI4iI2GfX1Tx3vTjWE9nV/TC5M9N/zkU78YFAFMjXqXC6ATLbnLGTDG0mRbqoIW1X9JmPv8NsLSP1pqpRvqUSMBUbcyV59+ghHLBxNViapn7JcUr/0vheJmXwQkB0vc9IWrUL0IgS+NqoDXDEFWLHjzTPjsWEEbiORXWkEagwuXwlunkoDN31hewqYUz+AkIUIBdD9eRnnAd2Vc2XUX9UiPHNLDfizo+TL/9PYXdPRvHE4DRKZqIs+D8CZVS4v2lAtXOMrhtysK0ZkcM9Q5KNyhTvYAb8pH1wCX4+z87G99JmPCo863BTuSqOM8UQNmW9KPlq62h351ovkF84fTXomJevMEOvvAjRgQ38u8/y+ks/9BEpBYBlSlfLTIiFV3UsfPx9Voa3sABF/cTAYrZw6UH/jGNHBy2cEzfiY0YSe9i+J0weJDPVOcFBPQX/1em3Gecl9mmiT9Qx8SDm7i82uhwhsiSB1AVNAfExrcRvnSVkc6bnK0VPYLyAWMQeM+Z9M2v9YeOdGEw0JNUGkAdeRU+Xc7NI8UxyZ5fvsKvhWTBEYNrggaexCJUYQj5VKg+b/k+6bBm6UAKQky5kfBLqfuYiZ0YrOSSb62w7MhMqg9+h/0V1TEWkkrgboECAR0de+aAzfHtVNOvg6pjHY89oVD3dXKrJdeWQ== X-Forefront-Antispam-Report: CIP:216.228.117.160; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:dc6edge1.nvidia.com; CAT:NONE; SFS:(13230040)(376014)(82310400026)(1800799024)(36860700016)(13003099007)(18092099006)(22082099003)(18002099003)(56012099003); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nko/ATWKXCFmutU59YAwGdUU39gxTZ6aarlh0x/PtdJi4PLL90Q8NkYtt6L3YaSTSVrA3RylfE3Dqpo77P7Asr+9n9xg2usp+UERrdHkgDzcAdLoRdXhir8QBvuSA2SLxqgTJC9F0a8A1saVpJvfHqKmeuB7G43FvnqrYVcmU1Dg2YhMOcT/Ff5ifbuleXqQol/Xw2DTArocVRa8gjHLy8aasanZ0X2I7S3Jznrj40j1bNL+gEodLnLCHd1qyZ5ExQd79lYYhkfE8L6oh/zGGHM1kJRqArVhBy0Fi9qcLYf8vhsH79U+864yk4BnPxPhX8ZiZLOtHIw3us9VItW0apEB7ZZrKuS5kEoMjPHWnEX2CLsr+NVMwjc1Kszu8PTyrRXGnpDfgi1/nqIIrLnZ73v7abDQ547kxoxrRxit1amU94L2Yq64aOBV5PDu7PK1 X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 Apr 2026 09:15:15.6070 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 30c7b794-5cbf-437e-abb9-08de8fcf2fe1 X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[216.228.117.160]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: CY4PEPF0000E9DA.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ2PR12MB8876 Subject: [ovs-dev] [PATCH v3 11/11] netdev-doca: Introduce doca netdev. X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Eli Britstein , Ilya Maximets , David Marchand , Maor Dickman Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Introduce a new netdev type - "doca". The code is placed in new files. - ovs-doca: initialization of doca library and utility functions that are used currently by netdev-doca and also will be used for future hw-offload code. - netdev-doca: implementation of the new netdev. Supported ports are mlx5 ports in switch-dev mode only that with a NIC that supports hw-steering. The netdev has the concept of ESW manager. A representor port is functional only if its ESW manager is attached to OVS. In case it is not, the representor appears as functional in ovs-vsctl show, but it is not. Upon initializing of an ESW manager port, each representor is reconfigured to be functional, and upon destruction, they are first stopped. Steering infrastructure: - RX packets of all ports are steered to a common queue. This queue is polled using dpdk API and the packets are classified to a per-port memory structure. - TX packets are marked with the target port as metadata and sent to a common queue. The egress pipe matches on the metadata and forwards the packets accordingly. Signed-off-by: Eli Britstein --- Documentation/automake.mk | 2 + Documentation/howto/doca.rst | 143 ++ Documentation/howto/index.rst | 1 + Documentation/intro/install/doca.rst | 104 + Documentation/intro/install/index.rst | 1 + NEWS | 4 + lib/automake.mk | 6 + lib/netdev-doca.c | 2898 +++++++++++++++++++++++++ lib/netdev-doca.h | 159 ++ lib/ovs-doca.c | 732 ++++++- lib/ovs-doca.h | 82 + tests/ofproto-macros.at | 1 + utilities/checkpatch_dict.txt | 1 + vswitchd/vswitch.xml | 87 +- 14 files changed, 4187 insertions(+), 34 deletions(-) create mode 100644 Documentation/howto/doca.rst create mode 100644 Documentation/intro/install/doca.rst create mode 100644 lib/netdev-doca.c create mode 100644 lib/netdev-doca.h diff --git a/Documentation/automake.mk b/Documentation/automake.mk index ea9459b55..230128efb 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -14,6 +14,7 @@ DOC_SOURCE = \ Documentation/intro/install/debian.rst \ Documentation/intro/install/documentation.rst \ Documentation/intro/install/distributions.rst \ + Documentation/intro/install/doca.rst \ Documentation/intro/install/dpdk.rst \ Documentation/intro/install/fedora.rst \ Documentation/intro/install/general.rst \ @@ -63,6 +64,7 @@ DOC_SOURCE = \ Documentation/topics/userspace-tx-steering.rst \ Documentation/topics/windows.rst \ Documentation/howto/index.rst \ + Documentation/howto/doca.rst \ Documentation/howto/dpdk.rst \ Documentation/howto/ipsec.rst \ Documentation/howto/kvm.rst \ diff --git a/Documentation/howto/doca.rst b/Documentation/howto/doca.rst new file mode 100644 index 000000000..4afb749d1 --- /dev/null +++ b/Documentation/howto/doca.rst @@ -0,0 +1,143 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +============================ +Using Open vSwitch with DOCA +============================ + +This document describes how to use Open vSwitch with DOCA on NVIDIA +BlueField DPUs and ConnectX NICs. + +.. important:: + + Using DOCA with OVS requires building OVS with both DPDK and DOCA + support. For build instructions refer to :doc:`/intro/install/doca`. + +Prerequisites +------------- + +Enabling DOCA +~~~~~~~~~~~~~ + +The ``doca-init`` option must be set to ``true`` before starting +``ovs-vswitchd``. If DOCA cannot be initialized, the process will abort:: + + $ ovs-vsctl --no-wait set Open_vSwitch . other_config:doca-init=true + +DOCA also requires DPDK, so ``dpdk-init`` must be enabled as well:: + + $ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true + +.. note:: + Changing either value requires restarting ``ovs-vswitchd``. + +DOCA initialization can be confirmed by checking the ``doca_initialized`` +value:: + + $ ovs-vsctl get Open_vSwitch . doca_initialized + true + +E-Switch Configuration +~~~~~~~~~~~~~~~~~~~~~~ + +The NIC embedded switch (E-Switch) must be set to ``switchdev`` mode. + +Set the E-Switch to switchdev mode using the PF PCI address:: + + $ sudo devlink dev eswitch set pci/0000:08:00.0 mode switchdev + +DPDK PCI Device Probing +~~~~~~~~~~~~~~~~~~~~~~~ + +DPDK must not automatically probe PCI devices when using DOCA ports. Disable +automatic probing by passing a dummy allow-list address via ``dpdk-extra``:: + + $ ovs-vsctl set Open_vSwitch . \ + other_config:dpdk-extra="-a pci:0000:00:00.0" + +Device Capabilities +~~~~~~~~~~~~~~~~~~~ + +DOCA requires ``CAP_SYS_RAWIO`` to configure the E-Switch manager. Without +it, OVS fails to detect the ESW manager port and all DOCA ports are +non-functional. The ``ovs-vswitchd`` process must be started with the +``--hw-rawio-access`` command line option. + +On RHEL/Fedora systems, edit ``/etc/sysconfig/openvswitch``:: + + OPTIONS="--ovs-vswitchd-options='--hw-rawio-access'" + +On Debian/Ubuntu systems, ``ovs-vswitchd`` runs as root by default and +already has all capabilities, so this step is not required. If running as +a non-root user, edit ``/etc/default/openvswitch-switch``:: + + OVS_CTL_OPTS="--ovs-vswitchd-options='--hw-rawio-access'" + +Restart ``ovs-vswitchd`` after making the change. + +Ports and Bridges +----------------- + +Bridges and ports are configured with ``ovs-vsctl``. Bridges should be +created with ``datapath_type=netdev``:: + + $ ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev + +DOCA ports are added by referencing the Linux network interface name of the +port representor and setting the interface type to ``doca``. For example, +given a NIC where ``enp8s0f0`` is the E-Switch uplink and ``enp8s0f0_0``, +``enp8s0f0_1`` are VF representors:: + + $ ovs-vsctl add-port br0 enp8s0f0 -- set Interface enp8s0f0 type=doca + $ ovs-vsctl add-port br0 enp8s0f0_0 -- set Interface enp8s0f0_0 type=doca + $ ovs-vsctl add-port br0 enp8s0f0_1 -- set Interface enp8s0f0_1 type=doca + +.. important:: + + The E-Switch uplink representor (e.g. ``enp8s0f0``) must be attached to + OVS. Without it, VF representor ports are silently non-functional. + +.. important:: + + DOCA ports and mlx5 DPDK ports (``type=dpdk``) cannot coexist in the + same OVS instance. NVIDIA NIC ports must be either all ``type=doca`` or + all ``type=dpdk``. Other (non-mlx5) DPDK port types and kernel ports + are not affected by this restriction and can be used alongside DOCA ports. + +Configuration Notes +------------------- + +The ``other_config:flow-limit`` value is read during DOCA initialization and +cannot be changed dynamically. Modifying ``flow-limit`` requires restarting +``ovs-vswitchd`` for the new value to take effect with DOCA:: + + $ ovs-vsctl set Open_vSwitch . other_config:flow-limit=100000 + +Further Reading +--------------- + +- :doc:`/intro/install/doca` -- Build and installation instructions. +- :doc:`/intro/install/dpdk` -- DPDK build prerequisites. +- :doc:`dpdk` -- General DPDK usage with OVS. +- `NVIDIA DOCA Documentation `_ -- Upstream + DOCA SDK reference. diff --git a/Documentation/howto/index.rst b/Documentation/howto/index.rst index 1491de3f3..9e083361e 100644 --- a/Documentation/howto/index.rst +++ b/Documentation/howto/index.rst @@ -48,5 +48,6 @@ OVS vtep sflow dpdk + doca tc-offload diff --git a/Documentation/intro/install/doca.rst b/Documentation/intro/install/doca.rst new file mode 100644 index 000000000..a3393077f --- /dev/null +++ b/Documentation/intro/install/doca.rst @@ -0,0 +1,104 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +====================== +Open vSwitch with DOCA +====================== + +This document describes how to build and install Open vSwitch with DOCA +support on NVIDIA BlueField and ConnectX network platforms. + +.. important:: + + Building OVS with DOCA requires a working DPDK build first. Refer to + :doc:`dpdk` for DPDK build and installation instructions. + +Build Requirements +------------------ + +In addition to the requirements described in :doc:`general` and :doc:`dpdk`, +building Open vSwitch with DOCA requires the following: + +- DPDK with mlx5 PMD driver enabled (see :doc:`dpdk`). The DOCA SDK + includes a compatible DPDK build (``dpdk-community-dev``); alternatively, + DPDK can be built from source with ``-Denable_drivers=net/mlx5``. + +- DOCA SDK packages (``libdoca-sdk-flow-dev``, ``libdoca-sdk-dpdk-bridge-dev``) + +- An NVIDIA BlueField DPU or ConnectX NIC with a supported firmware version + +.. _doca-install: + +Installing +---------- + +Install DOCA SDK +~~~~~~~~~~~~~~~~ + +The DOCA SDK can be installed from the NVIDIA package repository. + +#. Download the DOCA host repo package from the `NVIDIA DOCA Downloads`_ page: + + Select *Host-Server* deployment platform, *DOCA-Host* deployment package, + *Linux* target OS, and *x86_64* architecture. + +#. Install the repository and SDK packages:: + + $ sudo dpkg -i doca-repo.deb + $ sudo apt-get update + $ sudo apt-get install -y dpdk-community-dev \ + libdoca-sdk-flow-dev libdoca-sdk-dpdk-bridge-dev + + On RPM-based distributions:: + + $ sudo rpm -i doca-repo.rpm + $ sudo dnf install -y dpdk-community-devel \ + libdoca-sdk-flow-devel libdoca-sdk-dpdk-bridge-devel + +.. _NVIDIA DOCA Downloads: https://developer.nvidia.com/doca-downloads + +Install OVS +~~~~~~~~~~~~ + +OVS must be configured with both ``--with-dpdk`` and ``--with-doca`` flags. + +#. Ensure the standard OVS requirements, described in + :ref:`general-build-reqs`, are installed + +#. Bootstrap, if required, as described in :ref:`general-bootstrapping` + +#. Configure the package with DPDK and DOCA support:: + + $ ./configure --with-dpdk=static --with-doca=static + + .. note:: + ``--with-doca`` requires ``--with-dpdk``. The configure step will fail + if DPDK is not enabled. + + .. note:: + While ``--with-dpdk`` and ``--with-doca`` are required, you can pass + any other configuration option described in :ref:`general-configuring`. + +#. Build and install OVS, as described in :ref:`general-building` + +Additional information can be found in :doc:`general`. diff --git a/Documentation/intro/install/index.rst b/Documentation/intro/install/index.rst index 885a65d6e..767b4afd3 100644 --- a/Documentation/intro/install/index.rst +++ b/Documentation/intro/install/index.rst @@ -44,6 +44,7 @@ Installation from Source windows userspace dpdk + doca afxdp Installation from Packages diff --git a/NEWS b/NEWS index 1a3044cbf..2ccb6ea39 100644 --- a/NEWS +++ b/NEWS @@ -3,6 +3,10 @@ Post-v3.7.0 - Userspace datapath: * ARP/ND lookups for native tunnel are now rate limited. The holdout timer can be configured with 'tnl/neigh/retrans_time'. + - DOCA: + * New netdev type "doca", available under "netdev" datapath, + using the DOCA API for NVIDIA ConnectX and BlueField NICs. + See Documentation/howto/doca.rst. v3.7.0 - 16 Feb 2026 diff --git a/lib/automake.mk b/lib/automake.mk index 66c5c3d93..09a2936a9 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -521,6 +521,12 @@ lib_libopenvswitch_la_SOURCES += \ lib/ovs-doca.c \ lib/ovs-doca.h +if DOCA_NETDEV +lib_libopenvswitch_la_SOURCES += \ + lib/netdev-doca.c \ + lib/netdev-doca.h +endif + if WIN32 lib_libopenvswitch_la_SOURCES += \ lib/dpif-netlink.c \ diff --git a/lib/netdev-doca.c b/lib/netdev-doca.c new file mode 100644 index 000000000..c3b2fdc95 --- /dev/null +++ b/lib/netdev-doca.c @@ -0,0 +1,2898 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#include + +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include + +#include "coverage.h" +#include "dp-packet.h" +#include "dpif-netdev.h" +#include "netdev-doca.h" +#include "netdev-provider.h" +#include "ovs-doca.h" +#include "ovs-thread.h" +#include "refmap.h" +#include "rtnetlink.h" +#include "unixctl.h" +#include "userspace-tso.h" +#include "util.h" + +#include "openvswitch/vlog.h" + +VLOG_DEFINE_THIS_MODULE(netdev_doca); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); + +COVERAGE_DEFINE(netdev_doca_drop_ring_full); +COVERAGE_DEFINE(netdev_doca_invalid_classify_port); +COVERAGE_DEFINE(netdev_doca_no_mark); + +#define NETDEV_DOCA_MAX_MEGAFLOWS_COUNTERS (1 << 19) +#define NETDEV_DOCA_ACTIONS_MEM_SIZE \ + (64 * 2 * NETDEV_DOCA_MAX_MEGAFLOWS_COUNTERS) + +#define MAX_PHYS_ITEM_ID_LEN 32 + +struct netdev_doca_esw_key { + struct rte_pci_addr rte_pci; +}; + +struct netdev_doca_esw_ctx_arg { + struct netdev_doca_esw_key *esw_key; + struct netdev_doca *dev; +}; + +struct rss_match_type { + enum doca_flow_l3_meta l3_type; + enum doca_flow_l4_meta l4_type; +}; + +static uint16_t pre_miss_mapping[NUM_SEND_TO_KERNEL] = { + [SEND_TO_KERNEL_LACP] = ETH_TYPE_LACP, + [SEND_TO_KERNEL_LLDP] = ETH_TYPE_LLDP, +}; + +static struct refmap *netdev_doca_esw_rfm; +static struct atomic_count n_doca_ports = ATOMIC_COUNT_INIT(0); +struct ovs_mutex doca_mutex = OVS_MUTEX_INITIALIZER; +/* Contains all 'struct doca_dev's. */ +static struct ovs_list doca_list OVS_GUARDED_BY(doca_mutex) + = OVS_LIST_INITIALIZER(&doca_list); + +static void +netdev_doca_destruct(struct netdev *netdev); + +static int +netdev_doca_port_stop(struct netdev *netdev) + OVS_REQUIRES(doca_mutex); + +static dpdk_port_t +netdev_doca_get_esw_mgr_port_id(const struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + + if (!rte_eth_dev_is_valid_port(dev->common.port_id) || + !rte_eth_dev_is_valid_port(dev->esw_mgr_port_id)) { + return DPDK_ETH_PORT_ID_INVALID; + } + + return dev->esw_mgr_port_id; +} + +static dpdk_port_t +netdev_doca_get_port_id(const struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + + if (!rte_eth_dev_is_valid_port(dev->common.port_id)) { + return DPDK_ETH_PORT_ID_INVALID; + } + + return dev->common.port_id; +} + +static bool +netdev_doca_is_esw_mgr(const struct netdev *netdev) +{ + dpdk_port_t esw_mgr_id = netdev_doca_get_esw_mgr_port_id(netdev); + + return esw_mgr_id == netdev_doca_get_port_id(netdev) && + esw_mgr_id != DPDK_ETH_PORT_ID_INVALID; +} + +static int +netdev_doca_egress_pipe_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct ovs_doca_flow_match match; + struct doca_flow_monitor monitor; + struct doca_flow_fwd fwd; + + memset(&match, 0, sizeof match); + memset(&fwd, 0, sizeof fwd); + memset(&monitor, 0, sizeof monitor); + + /* Meta to match on is defined per entry. */ + memset(&match.d.meta.pkt_meta, 0xFF, sizeof match.d.meta.pkt_meta); + + /* Port ID to forward to is defined per entry. */ + fwd.type = DOCA_FLOW_FWD_PORT; + memset(&fwd.port_id, 0xFF, sizeof fwd.port_id); + monitor.counter_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; + + return ovs_doca_pipe_create(&dev->common.up, &match, NULL, &monitor, NULL, + NULL, NULL, &fwd, NULL, RTE_MAX_ETHPORTS, true, + true, UINT64_C(1) << AUX_QUEUE, "EGRESS", + &dev->esw_ctx->egress_pipe); +} + +static void +netdev_doca_egress_pipe_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_destroy_pipe(&esw->egress_pipe); +} + +static int +netdev_doca_rss_pipe_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct ovs_doca_flow_actions actions; + struct ovs_doca_flow_match match; + struct doca_flow_monitor monitor; + struct doca_flow_fwd fwd; + int rv; + + memset(&match, 0, sizeof match); + memset(&fwd, 0, sizeof fwd); + memset(&actions, 0, sizeof actions); + memset(&monitor, 0, sizeof monitor); + + memset(&match.d.parser_meta.port_id, 0xFF, + sizeof match.d.parser_meta.port_id); + memset(&match.d.parser_meta.outer_l3_type, 0xFF, + sizeof match.d.parser_meta.outer_l3_type); + memset(&match.d.parser_meta.outer_l4_type, 0xFF, + sizeof match.d.parser_meta.outer_l4_type); + + memset(&actions.mark, 0xFF, sizeof actions.mark); + + monitor.counter_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; + + fwd.type = DOCA_FLOW_FWD_RSS; + fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; + memset(&fwd.rss.nr_queues, 0xFF, sizeof fwd.rss.nr_queues); + + rv = ovs_doca_pipe_create(&dev->common.up, &match, NULL, &monitor, + &actions, &actions, NULL, &fwd, NULL, + NETDEV_DOCA_RSS_NUM_ENTRIES * RTE_MAX_ETHPORTS, + false, false, UINT64_C(1) << AUX_QUEUE, "RSS", + &dev->esw_ctx->rss_pipe); + return rv; +} + +static void +netdev_doca_rss_pipe_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_destroy_pipe(&esw->rss_pipe); +} + +static uint32_t +netdev_doca_rss_flags(enum netdev_doca_rss_type type) +{ + switch (type) { + case NETDEV_DOCA_RSS_IPV4_TCP: + return DOCA_FLOW_RSS_IPV4 | DOCA_FLOW_RSS_TCP; + case NETDEV_DOCA_RSS_IPV4_UDP: + return DOCA_FLOW_RSS_IPV4 | DOCA_FLOW_RSS_UDP; + case NETDEV_DOCA_RSS_IPV4_ICMP: + return DOCA_FLOW_RSS_IPV4; + case NETDEV_DOCA_RSS_IPV4_ESP: + return DOCA_FLOW_RSS_IPV4; + case NETDEV_DOCA_RSS_IPV4_OTHER: + return DOCA_FLOW_RSS_IPV4; + case NETDEV_DOCA_RSS_IPV6_TCP: + return DOCA_FLOW_RSS_IPV6 | DOCA_FLOW_RSS_TCP; + case NETDEV_DOCA_RSS_IPV6_UDP: + return DOCA_FLOW_RSS_IPV6 | DOCA_FLOW_RSS_UDP; + case NETDEV_DOCA_RSS_IPV6_ICMP: + return DOCA_FLOW_RSS_IPV6; + case NETDEV_DOCA_RSS_IPV6_ESP: + return DOCA_FLOW_RSS_IPV6; + case NETDEV_DOCA_RSS_IPV6_OTHER: + return DOCA_FLOW_RSS_IPV6; + case NETDEV_DOCA_RSS_OTHER: + return 0; + } + + OVS_NOT_REACHED(); + return 0; +} + +static struct rss_match_type +netdev_doca_rss_match_type(enum netdev_doca_rss_type type) +{ + switch (type) { + case NETDEV_DOCA_RSS_IPV4_TCP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV4, + .l4_type = DOCA_FLOW_L4_META_TCP, + }; + case NETDEV_DOCA_RSS_IPV4_UDP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV4, + .l4_type = DOCA_FLOW_L4_META_UDP, + }; + case NETDEV_DOCA_RSS_IPV4_ICMP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV4, + .l4_type = DOCA_FLOW_L4_META_ICMP, + }; + case NETDEV_DOCA_RSS_IPV4_ESP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV4, + .l4_type = DOCA_FLOW_L4_META_ESP, + }; + case NETDEV_DOCA_RSS_IPV4_OTHER: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV4, + .l4_type = DOCA_FLOW_L4_META_NONE, + }; + case NETDEV_DOCA_RSS_IPV6_TCP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV6, + .l4_type = DOCA_FLOW_L4_META_TCP, + }; + case NETDEV_DOCA_RSS_IPV6_UDP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV6, + .l4_type = DOCA_FLOW_L4_META_UDP, + }; + case NETDEV_DOCA_RSS_IPV6_ICMP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV6, + .l4_type = DOCA_FLOW_L4_META_ICMP, + }; + case NETDEV_DOCA_RSS_IPV6_ESP: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV6, + .l4_type = DOCA_FLOW_L4_META_ESP, + }; + case NETDEV_DOCA_RSS_IPV6_OTHER: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_IPV6, + .l4_type = DOCA_FLOW_L4_META_NONE, + }; + case NETDEV_DOCA_RSS_OTHER: + return (struct rss_match_type) { + .l3_type = DOCA_FLOW_L3_META_NONE, + .l4_type = DOCA_FLOW_L4_META_NONE, + }; + } + + OVS_NOT_REACHED(); + return (struct rss_match_type) {}; +} + +static const char * +netdev_doca_stats_name(enum netdev_doca_rss_type type) +{ + switch (type) { + case NETDEV_DOCA_RSS_IPV4_TCP: + return "rx_ipv4_tcp"; + case NETDEV_DOCA_RSS_IPV4_UDP: + return "rx_ipv4_udp"; + case NETDEV_DOCA_RSS_IPV4_ICMP: + return "rx_ipv4_icmp"; + case NETDEV_DOCA_RSS_IPV4_ESP: + return "rx_ipv4_esp"; + case NETDEV_DOCA_RSS_IPV4_OTHER: + return "rx_ipv4_other"; + case NETDEV_DOCA_RSS_IPV6_TCP: + return "rx_ipv6_tcp"; + case NETDEV_DOCA_RSS_IPV6_UDP: + return "rx_ipv6_udp"; + case NETDEV_DOCA_RSS_IPV6_ICMP: + return "rx_ipv6_icmp"; + case NETDEV_DOCA_RSS_IPV6_ESP: + return "rx_ipv6_esp"; + case NETDEV_DOCA_RSS_IPV6_OTHER: + return "rx_ipv6_other"; + case NETDEV_DOCA_RSS_OTHER: + return "rx_other"; + } + + OVS_NOT_REACHED(); + return "ERR"; +} + +static int +netdev_doca_rss_entries_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + dpdk_port_t port_id = common->port_id; + struct ovs_doca_flow_actions actions; + struct doca_flow_pipe_entry *entry; + struct ovs_doca_flow_match match; + unsigned int num_of_queues; + struct doca_flow_fwd fwd; + uint16_t *rss_queues; + int ret; + int i; + + num_of_queues = esw->n_rxq; + + rss_queues = xcalloc(num_of_queues, sizeof *rss_queues); + for (i = 0; i < num_of_queues; i++) { + rss_queues[i] = i; + } + + memset(&match, 0, sizeof match); + memset(&actions, 0, sizeof actions); + memset(&fwd, 0, sizeof fwd); + + fwd.type = DOCA_FLOW_FWD_RSS; + fwd.rss.queues_array = rss_queues; + fwd.rss.nr_queues = num_of_queues; + fwd.rss_type = DOCA_FLOW_RESOURCE_TYPE_NON_SHARED; + + match.d.parser_meta.port_id = port_id; + actions.mark = (OVS_FORCE doca_be32_t) DOCA_HTOBE32(port_id); + + for (i = 0; i < NETDEV_DOCA_RSS_NUM_ENTRIES; i++) { + struct rss_match_type match_type = netdev_doca_rss_match_type(i); + + match.d.parser_meta.outer_l3_type = match_type.l3_type; + match.d.parser_meta.outer_l4_type = match_type.l4_type; + fwd.rss.outer_flags = netdev_doca_rss_flags(i); + + ret = ovs_doca_add_entry(&common->up, AUX_QUEUE, esw->rss_pipe, &match, + &actions, NULL, &fwd, + DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, &entry); + if (ret) { + VLOG_ERR("%s: Failed to create '%s' rss entry. Error: %d (%s)", + netdev_get_name(&common->up), netdev_doca_stats_name(i), + ret, doca_error_get_descr(ret)); + break; + } + + dev->rss_entries[i] = entry; + } + + free(rss_queues); + + return ret; +} + +static void +netdev_doca_rss_entries_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + for (int i = 0; i < NETDEV_DOCA_RSS_NUM_ENTRIES; i++) { + ovs_doca_remove_entry(esw, AUX_QUEUE, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + &dev->rss_entries[i]); + } +} + +static int +netdev_doca_meta_tag0_pipe_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct ovs_doca_flow_actions actions_masks; + struct ovs_doca_flow_actions actions; + struct ovs_doca_flow_match match; + struct doca_flow_fwd fwd = { + .type = DOCA_FLOW_FWD_PIPE, + .next_pipe = dev->esw_ctx->rss_pipe, + }; + + memset(&match, 0, sizeof match); + memset(&actions, 0, sizeof actions); + memset(&actions_masks, 0, sizeof actions_masks); + + memset(&actions_masks.d.meta.u32[0], 0xFF, + sizeof actions_masks.d.meta.u32[0]); + + return ovs_doca_pipe_create(netdev, &match, NULL, NULL, &actions, + &actions_masks, NULL, &fwd, NULL, 1, false, + false, UINT64_C(1) << AUX_QUEUE, "META_TAG0", + &dev->esw_ctx->meta_tag0_pipe); +} + +static void +netdev_doca_meta_tag0_pipe_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_destroy_pipe(&esw->meta_tag0_pipe); +} + +static int +netdev_doca_meta_tag0_rule_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct doca_flow_pipe_entry **pentry; + struct doca_flow_pipe *pipe; + int ret; + + pentry = &dev->esw_ctx->meta_tag0_entry; + pipe = dev->esw_ctx->meta_tag0_pipe; + + ret = ovs_doca_add_entry(netdev, AUX_QUEUE, pipe, NULL, NULL, NULL, NULL, + DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, pentry); + if (ret) { + VLOG_ERR("%s: Failed to create meta-tag0 rule. Error: %d (%s)", + netdev_get_name(netdev), ret, doca_error_get_descr(ret)); + } + + return ret; +} + +static void +netdev_doca_meta_tag0_rule_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_remove_entry(esw, AUX_QUEUE, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + &dev->esw_ctx->meta_tag0_entry); +} + +static int +netdev_doca_pre_miss_pipe_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct ovs_doca_flow_match match = { .d = { + .parser_meta.outer_l2_type = DOCA_FLOW_L2_META_NO_VLAN, + .outer.eth.type = UINT16_MAX, + }, }; + struct doca_flow_target *kernel_target; + struct doca_flow_fwd fwd, miss; + int err; + + memset(&miss, 0, sizeof miss); + memset(&fwd, 0, sizeof fwd); + + miss.type = DOCA_FLOW_FWD_PIPE; + miss.next_pipe = dev->esw_ctx->meta_tag0_pipe; + + err = doca_flow_get_target(DOCA_FLOW_TARGET_KERNEL, &kernel_target); + if (err) { + VLOG_ERR("%s: Could not get miss to kernel target. Error: %d (%s)", + netdev_get_name(netdev), err, doca_error_get_descr(err)); + return err; + } + + fwd.type = DOCA_FLOW_FWD_TARGET; + fwd.target = kernel_target; + + return ovs_doca_pipe_create(netdev, &match, NULL, NULL, NULL, NULL, NULL, + &fwd, &miss, NUM_SEND_TO_KERNEL, false, + false, UINT64_C(1) << AUX_QUEUE, "PRE_MISS", + &dev->esw_ctx->pre_miss_pipe); +} + +static void +netdev_doca_pre_miss_pipe_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_destroy_pipe(&esw->pre_miss_pipe); +} + +static int +netdev_doca_pre_miss_rules_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct doca_flow_pipe_entry **pentry; + struct ovs_doca_flow_match match; + int ret; + + memset(&match, 0, sizeof match); + + for (int i = 0 ; i < NUM_SEND_TO_KERNEL ; i++) { + pentry = &dev->esw_ctx->pre_miss_entries[i]; + + match.d.outer.eth.type = htons(pre_miss_mapping[i]); + ret = ovs_doca_add_entry(netdev, AUX_QUEUE, + dev->esw_ctx->pre_miss_pipe, &match, NULL, + NULL, NULL, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + pentry); + if (ret) { + VLOG_ERR("%s: Failed to create pre_miss %x rule. Error: %d (%s)", + netdev_get_name(netdev), pre_miss_mapping[i], + ret, doca_error_get_descr(ret)); + break; + } + } + + return ret; +} + +static void +netdev_doca_pre_miss_rules_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + for (int i = 0 ; i < NUM_SEND_TO_KERNEL ; i++) { + ovs_doca_remove_entry(esw, AUX_QUEUE, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + &esw->pre_miss_entries[i]); + } +} + +static int +netdev_doca_root_pipe_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct doca_flow_fwd miss; + + memset(&miss, 0, sizeof miss); + miss.type = DOCA_FLOW_FWD_PIPE; + miss.next_pipe = dev->esw_ctx->pre_miss_pipe; + + return ovs_doca_pipe_create(netdev, NULL, NULL, NULL, NULL, NULL, NULL, + NULL, &miss, 0, false, true, 0, "ROOT", + &dev->esw_ctx->root_pipe); +} + +static void +netdev_doca_root_pipe_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_destroy_pipe(&esw->root_pipe); +} + +static int +netdev_doca_egress_entry_init(struct netdev_doca *dev) +{ + struct doca_flow_pipe *pipe = dev->esw_ctx->egress_pipe; + struct netdev_dpdk_common *common = &dev->common; + dpdk_port_t port_id = common->port_id; + struct ovs_doca_flow_match match; + struct doca_flow_fwd fwd; + int ret; + + memset(&match, 0, sizeof match); + memset(&fwd, 0, sizeof fwd); + + match.d.meta.pkt_meta = (OVS_FORCE doca_be32_t) DOCA_HTOBE32(port_id); + + fwd.type = DOCA_FLOW_FWD_PORT; + fwd.port_id = port_id; + + ret = ovs_doca_add_entry(&common->up, AUX_QUEUE, pipe, &match, NULL, NULL, + &fwd, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + &dev->egress_entry); + if (ret) { + VLOG_ERR("Failed to create egress pipe entry. Error: %d (%s)", ret, + doca_error_get_descr(ret)); + } + + return ret; +} + +static void +netdev_doca_egress_entry_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + + ovs_doca_remove_entry(esw, AUX_QUEUE, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT, + &dev->egress_entry); +} + +static void +netdev_doca_slowpath_esw_uninit(struct netdev *netdev) +{ + netdev_doca_root_pipe_uninit(netdev); + netdev_doca_pre_miss_rules_uninit(netdev); + netdev_doca_pre_miss_pipe_uninit(netdev); + netdev_doca_meta_tag0_rule_uninit(netdev); + netdev_doca_meta_tag0_pipe_uninit(netdev); + netdev_doca_rss_pipe_uninit(netdev); + netdev_doca_egress_pipe_uninit(netdev); +} + +static int +netdev_doca_slowpath_esw_init(struct netdev *netdev) +{ + int rv; + +#define ESW_INIT_CMD(func) \ + do { \ + rv = (func)(netdev); \ + if (!rv) { \ + break; \ + } \ + VLOG_ERR("%s: %s failed: %d", netdev_get_name(netdev), \ + #func, rv); \ + return rv; \ + } while (0) + + ESW_INIT_CMD(netdev_doca_egress_pipe_init); + ESW_INIT_CMD(netdev_doca_rss_pipe_init); + ESW_INIT_CMD(netdev_doca_meta_tag0_pipe_init); + ESW_INIT_CMD(netdev_doca_meta_tag0_rule_init); + ESW_INIT_CMD(netdev_doca_pre_miss_pipe_init); + ESW_INIT_CMD(netdev_doca_pre_miss_rules_init); + ESW_INIT_CMD(netdev_doca_root_pipe_init); + + return 0; +} + +static void +netdev_doca_esw_port_uninit(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + uint16_t pid; + + if (!esw) { + return; + } + + for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) { + if (esw->port_queues[pid]) { + for (uint16_t qid = 0; qid < esw->n_rxq; qid++) { + struct rte_ring **pring = &esw->port_queues[pid][qid].ring; + struct dp_packet *pkt; + int deq; + + if (!*pring) { + continue; + } + + while (1) { + deq = rte_ring_dequeue(*pring, (void **) &pkt); + if (deq) { + break; + } + dp_packet_delete(pkt); + } + rte_ring_free(*pring); + *pring = NULL; + } + + rte_free(esw->port_queues[pid]); + esw->port_queues[pid] = NULL; + } + } + + netdev_doca_slowpath_esw_uninit(netdev); +} + +static int +netdev_doca_esw_init(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + uint16_t pid; + int rv; + + esw->esw_port = dev->port; + esw->esw_netdev = netdev; + esw->port_id = common->port_id; + esw->n_rxq = netdev->n_rxq; + + rv = netdev_doca_slowpath_esw_init(netdev); + if (rv) { + return rv; + } + + for (pid = 0; pid < RTE_MAX_ETHPORTS; pid++) { + uint16_t qid; + + esw->port_queues[pid] = + rte_calloc_socket("port_queues", esw->n_rxq, + sizeof(struct netdev_doca_port_queue), + RTE_CACHE_LINE_SIZE, + common->socket_id); + if (!esw->port_queues[pid]) { + VLOG_ERR("%s: port_queues alloc failed for pid=%d", + netdev_get_name(netdev), pid); + rv = ENOMEM; + goto err; + } + + for (qid = 0; qid < esw->n_rxq; qid++) { + char *ring_name; + + ring_name = xasprintf("%s-%d-%d", netdev_get_name(netdev), pid, + qid); + if (!ring_name) { + VLOG_ERR("%s: ring_name alloc failed for pid=%d qid=%d", + netdev_get_name(netdev), pid, qid); + rv = ENOMEM; + goto err; + } + + if (strlen(ring_name) >= RTE_RING_NAMESIZE) { + VLOG_ERR("%s: ring_name too long for pid=%d qid=%d", + netdev_get_name(netdev), pid, qid); + free(ring_name); + rv = ENAMETOOLONG; + goto err; + } + + esw->port_queues[pid][qid].ring = + rte_ring_create(ring_name, NETDEV_MAX_BURST * 2, + common->socket_id, + RING_F_SC_DEQ | RING_F_SP_ENQ); + free(ring_name); + if (!esw->port_queues[pid][qid].ring) { + VLOG_ERR("%s: ring creation failed for pid=%d qid=%d", + netdev_get_name(netdev), pid, qid); + rv = ENOMEM; + goto err; + } + + atomic_init(&esw->port_queues[pid][qid].n_packets, 0); + atomic_init(&esw->port_queues[pid][qid].n_bytes, 0); + } + } + + return 0; +err: + netdev_doca_esw_port_uninit(netdev); + return rv; +} + +static int +get_sys(const char *prefix, const char *devname, const char *suffix, + char *outp, size_t maxlen) +{ + char str[PATH_MAX]; + size_t len; + FILE *fp; + char *p; + int n; + + n = snprintf(str, sizeof str, "/sys/%s/%s/%s", prefix, devname, suffix); + if (!(n >= 0 && n < sizeof str)) { + VLOG_DBG("%s: snprintf overflow for %s/%s/%s", OVS_SOURCE_LOCATOR, + prefix, devname, suffix); + return ENOSPC; + } + + fp = fopen(str, "r"); + if (!fp) { + VLOG_DBG("%s: fopen failed for %s", OVS_SOURCE_LOCATOR, str); + return errno; + } + + p = fgets(str, sizeof str, fp); + fclose(fp); + + if (!p) { + VLOG_DBG("%s: fgets failed for %s", OVS_SOURCE_LOCATOR, str); + return EIO; + } + + /* The string is terminated by \n. Drop it. */ + if (outp) { + len = strnlen(str, maxlen); + if (maxlen <= len) { + VLOG_DBG("%s: maxlen exceeded for %s/%s/%s", OVS_SOURCE_LOCATOR, + prefix, devname, suffix); + return ERANGE; + } + ovs_strlcpy(outp, str, len); + } + + return 0; +} + +static int +get_phys_port_name(const char *devname, char *outp, size_t maxlen) +{ + return get_sys("class/net", devname, "phys_port_name", outp, maxlen); +} + +static int +get_bonding_slaves(const char *devname, char *outp, size_t maxlen) +{ + return get_sys("class/net", devname, "bonding/slaves", outp, maxlen); +} + +static doca_error_t +dev_get_rep(const char *name, struct doca_devinfo *devinfo, bool *found) +{ + char dev_name[DOCA_DEVINFO_IFACE_NAME_SIZE]; + struct doca_devinfo_rep **dev_list_rep; + struct doca_dev *ddev; + uint32_t nb_devs_rep; + doca_error_t ret; + int i; + + ret = doca_dev_open(devinfo, &ddev); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to open device. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + return ret; + } + + ret = doca_devinfo_rep_create_list(ddev, DOCA_DEVINFO_REP_FILTER_NET, + &dev_list_rep, &nb_devs_rep); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to create a rep list. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + goto err_list; + } + + for (i = 0; i < nb_devs_rep; i++) { + ret = doca_devinfo_rep_get_iface_name(dev_list_rep[i], dev_name, + sizeof dev_name); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to get rep iface name. Error: %d (%s)", name, + ret, doca_error_get_descr(ret)); + goto out; + } + + if (!strcmp(name, dev_name)) { + *found = true; + break; + } + } + +out: + ret = doca_devinfo_rep_destroy_list(dev_list_rep); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to destroy rep list. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + } + +err_list: + ret = doca_dev_close(ddev); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to close dev. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + } + + return ret; +} + +static int +get_pci(const char *name, char *pci, size_t maxlen, bool *is_rep) +{ + struct doca_devinfo **dev_list; + bool found = false; + uint32_t nb_devs; + doca_error_t ret; + int i; + + if (maxlen <= PCI_PRI_STR_SIZE) { + return DOCA_ERROR_INVALID_VALUE; + } + + ret = doca_devinfo_create_list(&dev_list, &nb_devs); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to create a dev list. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + return ret; + } + + /* Traverse the list of devices. + * 1. If the device is not an ESW, continue. + * 2. If the device name is what we look for, done. + * 3. If not, try to find in the representors of this ESW. + */ + for (i = 0; i < nb_devs; i++) { + char dev_name[DOCA_DEVINFO_IFACE_NAME_SIZE]; + uint8_t net_supported; + + /* If not an ESW, continue. */ + ret = doca_devinfo_rep_cap_is_filter_net_supported( + dev_list[i], &net_supported); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to check rep_cap. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + goto out; + } + + if (!net_supported) { + continue; + } + + ret = doca_devinfo_get_pci_addr_str(dev_list[i], pci); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to get pci. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + goto out; + } + + ret = doca_devinfo_get_iface_name(dev_list[i], dev_name, + sizeof dev_name); + if (ret != DOCA_SUCCESS) { + VLOG_ERR("%s: Failed to get iface name. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + goto out; + } + + if (!strcmp(name, dev_name)) { + found = true; + *is_rep = false; + break; + } + + /* Search in its representor devices. */ + ret = dev_get_rep(name, dev_list[i], &found); + if (ret != DOCA_SUCCESS) { + goto out; + } + + if (found) { + *is_rep = true; + break; + } + } + + if (!found) { + ret = DOCA_ERROR_NOT_FOUND; + VLOG_ERR("%s: Not found. Error: %d (%s)", name, ret, + doca_error_get_descr(ret)); + } + +out: + doca_devinfo_destroy_list(dev_list); + return ret; +} + +static int +get_dpdk_iface_name(const char *name, char iface[IFNAMSIZ]) +{ + char phys_port_name[IFNAMSIZ]; + char slaves[PATH_MAX]; + char *save_ptr; + char *lower; + + /* In case the device is a bond, there is a lower_p0 symbolic link, with + * the format of ../../.../. Extract the lower device. + */ + + if (get_bonding_slaves(name, slaves, sizeof slaves)) { + goto fallback; + } + + lower = strtok_r(slaves, " ", &save_ptr); + while (lower) { + if (!get_phys_port_name(lower, phys_port_name, + sizeof phys_port_name) && + !strcmp(phys_port_name, "p0")) { + break; + } + lower = strtok_r(NULL, " ", &save_ptr); + } + + if (!lower) { + goto fallback; + } + + /* Reached here if found a lower device p0. */ + ovs_strlcpy(iface, lower, IFNAMSIZ); + goto out; + +fallback: + ovs_strlcpy(iface, name, IFNAMSIZ); +out: + return 0; +} + +struct netdev_doca * +netdev_doca_cast(const struct netdev *netdev) +{ + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); + + return CONTAINER_OF(common, struct netdev_doca, common); +} + +/* Allocates an area of 'sz' bytes from DPDK. The memory is zero'ed. + * + * Unlike xmalloc(), this function can return NULL on failure. */ +static void * +doca_rte_mzalloc(const char *type, size_t sz) +{ + return rte_zmalloc(type, sz, CACHE_LINE_SIZE); +} + +static struct netdev * +netdev_doca_alloc(void) +{ + struct netdev_doca *dev; + + dev = doca_rte_mzalloc("ovs_doca_netdev", sizeof *dev); + if (!dev) { + return NULL; + } + + /* Upon the first port disable dpdk steering to allow doca to work. */ + if (!atomic_count_inc(&n_doca_ports)) { + rte_pmd_mlx5_disable_steering(); + } + + return &dev->common.up; +} + +static void +netdev_doca_dealloc(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + + /* Upon the last doca port going down, enable back dpdk steering. */ + if (atomic_count_dec(&n_doca_ports) == 1) { + rte_pmd_mlx5_enable_steering(); + } + + rte_free(dev); +} + +static int +netdev_doca_set_mtu(struct netdev *netdev, int mtu) +{ + struct netdev_dpdk_common *common = netdev_dpdk_common_cast(netdev); + + ovs_mutex_lock(&common->mutex); + if (common->requested_mtu != mtu) { + if (!netdev_doca_is_esw_mgr(netdev)) { + VLOG_WARN("%s: setting requested MTU %d is ignored for " + "representor", netdev_get_name(netdev), mtu); + goto out; + } + + common->requested_mtu = mtu; + netdev_request_reconfigure(netdev); + } +out: + ovs_mutex_unlock(&common->mutex); + + return 0; +} + + +static int +netdev_doca_dev_open_pci(struct rte_pci_addr *rte_pci, struct doca_dev **pdev) +{ + struct doca_devinfo **dev_list; + char pci[PCI_PRI_STR_SIZE]; + uint8_t is_esw_manager = 0; + uint8_t is_addr_equal = 0; + uint32_t nb_devs; + size_t i; + int res; + + /* Set default return value. */ + *pdev = NULL; + + res = doca_devinfo_create_list(&dev_list, &nb_devs); + if (res != DOCA_SUCCESS) { + VLOG_ERR("Failed to load doca devices list. Error: %d (%s)", + res, doca_error_get_descr(res)); + return res; + } + + rte_pci_device_name(rte_pci, pci, sizeof pci); + /* Search. */ + for (i = 0; i < nb_devs; i++) { + res = doca_devinfo_is_equal_pci_addr(dev_list[i], pci, &is_addr_equal); + if (res != DOCA_SUCCESS || !is_addr_equal) { + continue; + } + + res = doca_dpdk_cap_is_rep_port_supported(dev_list[i], + &is_esw_manager); + if (res != DOCA_SUCCESS || !is_esw_manager) { + continue; + } + + VLOG_DBG("Opening '%s'", pci); + res = doca_dev_open(dev_list[i], pdev); + if (res != DOCA_SUCCESS) { + VLOG_ERR("Failed to open DOCA device. Error: %d (%s)", + res, doca_error_get_descr(res)); + } + + goto out; + } + + VLOG_WARN("No matching doca device found"); + res = DOCA_ERROR_NOT_FOUND; + +out: + doca_devinfo_destroy_list(dev_list); + return res; +} + +static int +netdev_doca_esw_ctx_init(void *ctx_, void *arg_) +{ + struct netdev_doca_esw_ctx_arg *arg = arg_; + struct netdev_doca_esw_ctx *ctx = ctx_; + + if (netdev_doca_dev_open_pci(&arg->esw_key->rte_pci, &ctx->dev)) { + return ENODEV; + } + + rte_pci_device_name(&arg->esw_key->rte_pci, ctx->pci_addr, + sizeof ctx->pci_addr); + ctx->cmd_fd = -1; + memset(ctx->offload_queues, 0, sizeof ctx->offload_queues); + + return 0; +} + +static void +netdev_doca_esw_ctx_uninit(void *ctx_) +{ + struct netdev_doca_esw_ctx *ctx = ctx_; + + memset(ctx->pci_addr, 0, sizeof ctx->pci_addr); +} + +static struct ds * +netdev_doca_esw_ctx_dump(struct ds *s, void *key_, void *ctx OVS_UNUSED) +{ + struct netdev_doca_esw_key *key = key_; + char pci_addr[PCI_PRI_STR_SIZE]; + + rte_pci_device_name(&key->rte_pci, pci_addr, sizeof pci_addr); + ds_put_format(s, "pci=%s", pci_addr); + + return s; +} + +static int +netdev_doca_class_init(void) +{ + static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; + static struct netdev_dpdk_watchdog_params watchdog_params = { + .mutex = &doca_mutex, + .list = &doca_list, + }; + + if (!ovsthread_once_start(&once)) { + return 0; + } + + ovs_thread_create("doca_watchdog", netdev_dpdk_watchdog, &watchdog_params); + netdev_doca_esw_rfm = refmap_create("netdev-doca-esw", + sizeof(struct netdev_doca_esw_key), + sizeof(struct netdev_doca_esw_ctx), + netdev_doca_esw_ctx_init, + netdev_doca_esw_ctx_uninit, + netdev_doca_esw_ctx_dump); + + ovsthread_once_done(&once); + return 0; +} + +/* Extract the PCI part from 'devargs' to rte_pci. + * Return -EINVAL for error or 0 for success. + */ +static int +netdev_doca_parse_dpdk_devargs_pci(const char *devargs, + struct rte_pci_addr *rte_pci) +{ + struct rte_devargs da; + int rv = 0; + + if (rte_devargs_parse(&da, devargs)) { + VLOG_ERR("%s: rte_devargs_parse failed for %s", + OVS_SOURCE_LOCATOR, devargs); + return EINVAL; + } + + if (rte_pci_addr_parse(da.name, rte_pci)) { + VLOG_ERR("%s: rte_pci_addr_parse failed for %s", + OVS_SOURCE_LOCATOR, da.name); + rv = EINVAL; + goto out; + } + +out: + rte_devargs_reset(&da); + return rv; +} + +/* Changing the netdev of the ESW require changes of its representor ports. + * This helper traverses them with a callback to run on each representor. + * For each representor, request a reconfigure of it. + */ + +static void +netdev_doca_do_foreach_representor(struct netdev_doca *esw_dev, + bool (*cb)(struct netdev_doca *)) + OVS_REQUIRES(doca_mutex) +{ + bool need_reconfigure = false; + struct rte_pci_addr esw_pci; + struct rte_pci_addr rep_pci; + struct netdev_doca *dev; + + if (netdev_doca_parse_dpdk_devargs_pci(esw_dev->common.devargs, + &esw_pci)) { + return; + } + + LIST_FOR_EACH (dev, common.list_node, &doca_list) { + if (esw_dev == dev) { + continue; + } + + if (!dev->common.devargs || + netdev_doca_parse_dpdk_devargs_pci(dev->common.devargs, + &rep_pci)) { + continue; + } + + if (rte_pci_addr_cmp(&rep_pci, &esw_pci)) { + continue; + } + + ovs_mutex_lock(&dev->common.mutex); + need_reconfigure |= cb(dev); + ovs_mutex_unlock(&dev->common.mutex); + netdev_request_reconfigure(&dev->common.up); + } + + if (need_reconfigure) { + /* If a representor is reconfigured a result of its ESW manager + * change, it might not be synced in the bridge's database. Signal it + * to reconfigure, to make it right. + */ + rtnetlink_report_link(); + } +} + +static void +netdev_doca_dev_close(struct netdev_doca *dev) +{ + struct netdev_dpdk_common *common = &dev->common; + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + struct rte_eth_dev_info dev_info; + char *pci_addr; + bool last; + int err; + + memset(&dev_info, 0, sizeof dev_info); + + if (rte_eth_dev_is_valid_port(common->port_id)) { + err = rte_eth_dev_info_get(common->port_id, &dev_info); + if (err) { + VLOG_ERR("Failed to get info of port "DPDK_PORT_ID_FMT": %s", + common->port_id, rte_strerror(-err)); + } + + err = rte_eth_dev_close(common->port_id); + if (err) { + VLOG_ERR("Failed to close port "DPDK_PORT_ID_FMT": %s", + common->port_id, rte_strerror(-err)); + } + } + + if (!esw) { + return; + } + + pci_addr = xstrdup(esw->pci_addr); + if (common->port_id != dev->esw_mgr_port_id && dev->dev_rep) { + VLOG_DBG("%s: Closing doca dev_rep for port_id "DPDK_PORT_ID_FMT + ". %p", netdev_get_name(&common->up), + common->port_id, dev->dev_rep); + err = doca_dev_rep_close(dev->dev_rep); + if (err) { + VLOG_ERR("Failed to close doca dev_rep with port id " + DPDK_PORT_ID_FMT". Error: %d (%s)", + common->port_id, err, doca_error_get_descr(err)); + } + + dev->dev_rep = NULL; + } + + last = refmap_unref(netdev_doca_esw_rfm, esw); + /* The last is the ESW. */ + if (last && esw->dev) { + /* The esw->cmd_fd is closed inside. */ + if (dev_info.device) { + err = rte_dev_remove(dev_info.device); + if (err) { + VLOG_ERR("Failed to remove device %s: %s", common->devargs, + rte_strerror(-err)); + } + } + + VLOG_DBG("Closing '%s'", pci_addr); + err = doca_dev_close(esw->dev); + if (err) { + VLOG_ERR("Failed to close doca dev %s. Error: %d (%s)", pci_addr, + err, doca_error_get_descr(err)); + } + + esw->dev = NULL; + esw->cmd_fd = -1; + } + + dev->esw_ctx = NULL; + free(pci_addr); +} + +static bool +netdev_doca_rep_stop(struct netdev_doca *dev) + OVS_REQUIRES(doca_mutex) +{ + struct netdev_dpdk_common *common = &dev->common; + + if (!dpdk_dev_is_started(common)) { + return false; + } + + netdev_doca_port_stop(&common->up); + netdev_doca_dev_close(dev); + common->port_id = DPDK_ETH_PORT_ID_INVALID; + dev->esw_mgr_port_id = DPDK_ETH_PORT_ID_INVALID; + common->attached = false; + + return true; +} + +static int +netdev_doca_port_stop(struct netdev *netdev) + OVS_REQUIRES(doca_mutex) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + bool started = dpdk_dev_is_started(common); + int err = 0; + + if (started) { + if (netdev_doca_get_esw_mgr_port_id(netdev) == common->port_id) { + netdev_doca_do_foreach_representor(dev, netdev_doca_rep_stop); + } + + VLOG_INFO("%s: Stopping '%s', port_id="DPDK_PORT_ID_FMT, + netdev_get_name(netdev), common->devargs, + common->port_id); + atomic_store(&common->started, false); + } + + netdev_doca_rss_entries_uninit(netdev); + netdev_doca_egress_entry_uninit(netdev); + + if (common->port_id == dev->esw_mgr_port_id) { + netdev_doca_esw_port_uninit(netdev); + } + + if (dev->port) { + err = doca_flow_port_stop(dev->port); + dev->port = NULL; + } + + rte_eth_dev_stop(common->port_id); + + return err; +} + +static void +common_destruct(struct netdev_doca *dev) + OVS_REQUIRES(doca_mutex) + OVS_EXCLUDED(dev->common.mutex) +{ + struct netdev_dpdk_common *common = &dev->common; + + rte_free(common->tx_q); + ovs_list_remove(&common->list_node); + free(dev->sw_tx_stats); + free(common->sw_stats); + ovs_mutex_destroy(&common->mutex); +} + +static void +netdev_doca_destruct(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + bool is_esw_mgr = netdev_doca_is_esw_mgr(netdev); + + ovs_mutex_lock(&doca_mutex); + + netdev_doca_port_stop(netdev); + + if (common->attached) { + netdev_doca_dev_close(dev); + common->port_id = DPDK_ETH_PORT_ID_INVALID; + + VLOG_INFO("Device '%s' has been removed", common->devargs); + } + + ovs_mutex_lock(&common->mutex); + netdev_dpdk_clear_xstats(common); + ovs_mutex_unlock(&common->mutex); + free(common->devargs); + common_destruct(dev); + if (is_esw_mgr) { + rte_mempool_free(common->mp); + } + + common->mp = NULL; + + ovs_mutex_unlock(&doca_mutex); +} + +static int +netdev_doca_get_sw_custom_stats(const struct netdev *netdev, + struct netdev_custom_stats *custom_stats) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + int i, n; + +#define SW_CSTATS \ + SW_CSTAT(tx_retries) \ + SW_CSTAT(tx_failure_drops) \ + SW_CSTAT(tx_mtu_exceeded_drops) \ + SW_CSTAT(tx_invalid_hwol_drops) + +#define SW_CSTAT(NAME) + 1 + custom_stats->size = SW_CSTATS; +#undef SW_CSTAT + custom_stats->counters = xcalloc(custom_stats->size, + sizeof *custom_stats->counters); + + ovs_mutex_lock(&common->mutex); + + rte_spinlock_lock(&common->stats_lock); + i = 0; +#define SW_CSTAT(NAME) \ + custom_stats->counters[i++].value = common->sw_stats->NAME; + SW_CSTATS; +#undef SW_CSTAT + rte_spinlock_unlock(&common->stats_lock); + + ovs_mutex_unlock(&common->mutex); + + i = 0; + n = 0; +#define SW_CSTAT(NAME) \ + if (custom_stats->counters[i].value != UINT64_MAX) { \ + ovs_strlcpy(custom_stats->counters[n].name, \ + "ovs_"#NAME, NETDEV_CUSTOM_STATS_NAME_SIZE); \ + custom_stats->counters[n].value = custom_stats->counters[i].value; \ + n++; \ + } \ + i++; + SW_CSTATS; +#undef SW_CSTAT + + custom_stats->size = n; + return 0; +} + +static int +netdev_doca_get_custom_stats(const struct netdev *netdev, + struct netdev_custom_stats *custom_stats) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw_ctx = dev->esw_ctx; + struct netdev_dpdk_common *common = &dev->common; + dpdk_port_t port_id = common->port_id; + struct doca_flow_resource_query stats; + struct netdev_custom_counter *counter; + uint64_t n_sw_packets, n_sw_bytes; + uint64_t n_packets, n_bytes; + int n_txq = netdev->n_txq; + unsigned int n_rxq; + int sw_stats_size; + enum { + PACKETS, + BYTES, + }; + int err; + int i; + + if (!dpdk_dev_is_started(common)) { + return EAGAIN; + } + + netdev_doca_get_sw_custom_stats(netdev, custom_stats); + + sw_stats_size = custom_stats->size; + n_rxq = dev->esw_ctx->n_rxq; + custom_stats->size += 2 * (NETDEV_DOCA_RSS_NUM_ENTRIES + n_rxq + n_txq + + 1); + + custom_stats->counters = xrealloc(custom_stats->counters, + custom_stats->size * + sizeof *custom_stats->counters); + counter = &custom_stats->counters[sw_stats_size]; + + for (i = 0; i < NETDEV_DOCA_RSS_NUM_ENTRIES; i++, counter += 2) { + const char *stats_name = netdev_doca_stats_name(i); + + err = doca_flow_resource_query_entry(dev->rss_entries[i], &stats); + if (err) { + VLOG_ERR("%s: Failed to query '%s' RSS entry. Error: %d (%s)", + common->devargs, stats_name, err, + doca_error_get_descr(err)); + return err; + } + + counter[PACKETS].value = stats.counter.total_pkts; + snprintf(counter[PACKETS].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "%s_packets", stats_name); + counter[BYTES].value = stats.counter.total_bytes; + snprintf(counter[BYTES].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "%s_bytes", stats_name); + } + + n_sw_packets = 0; + n_sw_bytes = 0; + + for (i = 0; i < n_rxq; i++, counter += 2) { + atomic_read_relaxed(&esw_ctx->port_queues[port_id][i].n_packets, + &n_packets); + atomic_read_relaxed(&esw_ctx->port_queues[port_id][i].n_bytes, + &n_bytes); + + n_sw_packets += n_packets; + n_sw_bytes += n_bytes; + + counter[PACKETS].value = n_packets; + snprintf(counter[PACKETS].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "rx_q%d_packets", i); + counter[BYTES].value = n_bytes; + snprintf(counter[BYTES].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "rx_q%d_bytes", i); + } + + counter[PACKETS].value = n_sw_packets; + snprintf(counter[PACKETS].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "sw_rx_packets"); + counter[BYTES].value = n_sw_bytes; + snprintf(counter[BYTES].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "sw_rx_bytes"); + counter += 2; + + for (i = 0; i < n_txq; i++, counter += 2) { + atomic_read_relaxed(&dev->sw_tx_stats[i].n_packets, &n_packets); + atomic_read_relaxed(&dev->sw_tx_stats[i].n_bytes, &n_bytes); + + counter[PACKETS].value = n_packets; + snprintf(counter[PACKETS].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "tx_q%d_packets", i); + counter[BYTES].value = n_bytes; + snprintf(counter[BYTES].name, NETDEV_CUSTOM_STATS_NAME_SIZE, + "tx_q%d_bytes", i); + } + + return 0; +} + +static int +netdev_doca_get_status(const struct netdev *netdev, struct smap *args) +{ + return netdev_dpdk_get_status__(netdev, &doca_mutex, args); +} + +/* Mempools are allocated for ESW managers only. + * Estimation of number of mbufs required for this port: + * ( + * + + * + + * + ) + */ +static uint32_t +doca_calculate_mbufs(struct netdev_doca *dev) +{ + struct netdev_dpdk_common *common = &dev->common; + uint32_t n_mbufs; + + n_mbufs = common->requested_n_rxq * common->requested_rxq_size + + common->requested_n_txq * common->requested_txq_size + + MIN(RTE_MAX_LCORE, common->requested_n_rxq) * NETDEV_MAX_BURST + + MIN(RTE_MAX_LCORE, 1 + common->requested_n_rxq) * MP_CACHE_SZ; + + return n_mbufs; +} + +static int +netdev_doca_mempool_configure(struct netdev_doca *dev) + OVS_REQUIRES(dev->common.mutex) +{ + struct netdev_dpdk_common *common = &dev->common; + uint32_t buf_size = netdev_dpdk_buf_size(common->requested_mtu); + const char *netdev_name = netdev_get_name(&common->up); + int socket_id = common->requested_socket_id; + struct rte_mempool *mp; + uint32_t mbuf_size; + uint32_t n_mbufs; + int mtu; + + + if (!netdev_doca_is_esw_mgr(&common->up)) { + struct netdev_doca *esw_dev; + + esw_dev = netdev_doca_cast(dev->esw_ctx->esw_netdev); + + common->mp = esw_dev->common.mp; + common->mtu = esw_dev->common.mtu; + common->requested_mtu = common->mtu; + common->max_packet_len = esw_dev->common.max_packet_len; + return 0; + } + + mtu = FRAME_LEN_TO_MTU(buf_size); + mbuf_size = MTU_TO_FRAME_LEN(mtu); + n_mbufs = doca_calculate_mbufs(dev); + + mp = netdev_dpdk_mp_create_pool(netdev_name, n_mbufs, mbuf_size, + socket_id); + if (!mp) { + VLOG_ERR("%s: Failed to create mempool", netdev_name); + return ENOMEM; + } + + VLOG_DBG("%s: Allocated a mempool of %u mbufs of size %u " + "on socket %d with %d Rx and %d Tx queues, " + "cache line size of %u", + netdev_name, n_mbufs, mbuf_size, socket_id, + common->requested_n_rxq, common->requested_n_txq, + RTE_CACHE_LINE_SIZE); + common->mp = mp; + common->mtu = common->requested_mtu; + common->socket_id = common->requested_socket_id; + common->max_packet_len = MTU_TO_FRAME_LEN(common->mtu); + + return 0; +} + +static int +dpdk_eth_dev_port_config_complete(struct netdev_doca *dev, + int n_rxq, int n_txq) +{ + struct netdev_dpdk_common *common = &dev->common; + uint16_t conf_mtu; + int diag; + + free(dev->sw_tx_stats); + dev->sw_tx_stats = xcalloc(n_txq, sizeof *dev->sw_tx_stats); + for (int i = 0; i < n_txq; i++) { + atomic_init(&dev->sw_tx_stats[i].n_packets, 0); + atomic_init(&dev->sw_tx_stats[i].n_bytes, 0); + } + + common->up.n_rxq = n_rxq; + common->up.n_txq = n_txq; + + diag = rte_eth_dev_set_mtu(common->port_id, common->mtu); + if (diag) { + /* A device may not support rte_eth_dev_set_mtu, in this case + * flag a warning to the user and include the devices configured + * MTU value that will be used instead. */ + if (-ENOTSUP == diag) { + rte_eth_dev_get_mtu(common->port_id, &conf_mtu); + VLOG_WARN("Interface %s does not support MTU configuration, " + "max packet size supported is %"PRIu16".", + common->up.name, conf_mtu); + } else { + VLOG_ERR("Interface %s MTU (%d) setup error: %s", + common->up.name, common->mtu, rte_strerror(-diag)); + } + } + + return diag; +} + +static int +dpdk_eth_dev_port_config(struct netdev_doca *dev, + const struct rte_eth_dev_info *info, + int n_rxq, int n_txq) +{ + struct netdev_dpdk_common *common = &dev->common; + struct rte_eth_conf conf = port_conf; + int diag = 0; + int i; + + netdev_dpdk_build_port_conf(common, info, &conf); + + if (!netdev_doca_is_esw_mgr(&common->up)) { + rte_eth_dev_configure(common->port_id, 0, 0, &conf); + return dpdk_eth_dev_port_config_complete(dev, n_rxq, n_txq); + } + + /* A device may report more queues than it makes available (this has + * been observed for Intel xl710, which reserves some of them for + * SRIOV): rte_eth_*_queue_setup will fail if a queue is not + * available. When this happens we can retry the configuration + * and request less queues. */ + while (n_rxq && n_txq) { + if (diag) { + VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); + } + + diag = rte_eth_dev_configure(common->port_id, n_rxq, + n_txq, &conf); + if (diag) { + VLOG_WARN("Interface %s eth_dev setup error %s\n", + common->up.name, rte_strerror(-diag)); + break; + } + + for (i = 0; i < n_txq; i++) { + diag = rte_eth_tx_queue_setup(common->port_id, i, common->txq_size, + common->socket_id, NULL); + if (diag) { + VLOG_INFO("Interface %s unable to setup txq(%d): %s", + common->up.name, i, rte_strerror(-diag)); + break; + } + } + + if (i != n_txq) { + /* Retry with less tx queues. */ + n_txq = i; + continue; + } + + for (i = 0; i < n_rxq; i++) { + diag = rte_eth_rx_queue_setup(common->port_id, i, common->rxq_size, + common->socket_id, NULL, + common->mp); + if (diag) { + VLOG_INFO("Interface %s unable to setup rxq(%d): %s", + common->up.name, i, rte_strerror(-diag)); + break; + } + } + + if (i != n_rxq) { + /* Retry with less rx queues. */ + n_rxq = i; + continue; + } + + return dpdk_eth_dev_port_config_complete(dev, n_rxq, n_txq); + } + + return diag; +} + +static int +netdev_doca_esw_key_parse(const char *devargs, + struct netdev_doca_esw_key *esw_key) +{ + struct rte_pci_addr *rte_pci = &esw_key->rte_pci; + + memset(esw_key, 0, sizeof *esw_key); + return netdev_doca_parse_dpdk_devargs_pci(devargs, rte_pci); +} + +static int +netdev_doca_dev_probe(struct netdev_doca *dev, const char *devargs) +{ + struct ds rte_devargs = DS_EMPTY_INITIALIZER; + struct netdev_doca_esw_ctx_arg ctx_arg; + struct netdev_doca_esw_key esw_key; + struct ibv_pd *pd; + int rv = 0; + + ovs_assert(!dev->esw_ctx); + + if (netdev_doca_esw_key_parse(devargs, &esw_key)) { + VLOG_ERR("%s: esw_key_parse failed for %s", + OVS_SOURCE_LOCATOR, devargs); + return EINVAL; + } + + ctx_arg = (struct netdev_doca_esw_ctx_arg) { + .esw_key = &esw_key, + .dev = dev, + }; + + dev->esw_ctx = refmap_ref(netdev_doca_esw_rfm, &esw_key, &ctx_arg); + if (!dev->esw_ctx) { + VLOG_ERR("Could not get esw context for %s", devargs); + return EINVAL; + } + + if (doca_rdma_bridge_get_dev_pd(dev->esw_ctx->dev, &pd)) { + VLOG_ERR("Could not get pd for %s", devargs); + rv = EINVAL; + goto out; + } + + if (dev->esw_ctx->cmd_fd == -1) { + dev->esw_ctx->cmd_fd = dup(pd->context->cmd_fd); + if (dev->esw_ctx->cmd_fd == -1) { + VLOG_ERR("Could not dup fd for %s. Error %s", devargs, + ovs_strerror(errno)); + rv = EBADF; + goto out; + } + } + + ds_put_format(&rte_devargs, "%s,cmd_fd=%d,pd_handle=%u", devargs, + dev->esw_ctx->cmd_fd, pd->handle); + + VLOG_DBG("Probing '%s'", ds_cstr(&rte_devargs)); + if (rte_dev_probe(ds_cstr(&rte_devargs))) { + VLOG_ERR("%s: rte_dev_probe failed for %s", OVS_SOURCE_LOCATOR, + ds_cstr(&rte_devargs)); + close(dev->esw_ctx->cmd_fd); + dev->esw_ctx->cmd_fd = -1; + rv = ENODEV; + goto out; + } + +out: + ds_destroy(&rte_devargs); + if (rv) { + netdev_doca_dev_close(dev); + } + return rv; +} + +static int +netdev_doca_port_start(struct netdev *netdev) + OVS_REQUIRES(doca_mutex) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + const char *devargs = common->devargs; + dpdk_port_t port_id = common->port_id; + struct doca_flow_port_cfg *port_cfg; + struct netdev_doca_esw_ctx *esw; + int err; + + if (!rte_eth_dev_is_valid_port(dev->esw_mgr_port_id)) { + VLOG_ERR("Cannot start port "DPDK_PORT_ID_FMT" '%s', invalid proxy " + "port", port_id, + devargs); + return DOCA_ERROR_NOT_FOUND; + } + + err = doca_flow_port_cfg_create(&port_cfg); + if (err) { + VLOG_ERR("Failed to create doca flow port_cfg. Error: %d (%s)", + err, doca_error_get_descr(err)); + return err; + } + + esw = dev->esw_ctx; + if (!esw) { + err = DOCA_ERROR_INVALID_VALUE; + goto out; + } + + err = doca_flow_port_cfg_set_port_id(port_cfg, port_id); + if (err) { + VLOG_ERR("%s: Failed to set doca flow port_cfg port_id " + DPDK_PORT_ID_FMT". Error: %d (%s)", + netdev_get_name(netdev), port_id, err, + doca_error_get_descr(err)); + goto out; + } + + if (!netdev_doca_is_esw_mgr(netdev)) { + err = doca_dpdk_open_dev_rep_by_port_id(port_id, esw->dev, + &dev->dev_rep); + if (err) { + VLOG_ERR("%s: Failed to open doca dev_rep for port_id " + DPDK_PORT_ID_FMT". Error: %d (%s)", + netdev_get_name(netdev), port_id, err, + doca_error_get_descr(err)); + goto out; + } + + VLOG_DBG("%s: Opening doca dev_rep for port_id "DPDK_PORT_ID_FMT + ". %p", netdev_get_name(netdev), port_id, dev->dev_rep); + + err = doca_flow_port_cfg_set_dev_rep(port_cfg, dev->dev_rep); + if (err) { + VLOG_ERR("%s: Failed to set doca flow port_cfg dev_rep. " + "Error: %d (%s)", netdev_get_name(netdev), err, + doca_error_get_descr(err)); + goto out; + } + } + + err = doca_flow_port_cfg_set_dev(port_cfg, esw->dev); + if (err) { + VLOG_ERR("%s: Failed to set doca flow port_cfg dev. Error: %d (%s)", + netdev_get_name(netdev), err, doca_error_get_descr(err)); + goto out; + } + + VLOG_INFO("%s: Starting '%s', port_id="DPDK_PORT_ID_FMT, + netdev_get_name(netdev), devargs, port_id); + if (common->port_id == dev->esw_mgr_port_id) { + err = doca_flow_port_cfg_set_actions_mem_size( + port_cfg, NETDEV_DOCA_ACTIONS_MEM_SIZE); + if (err) { + VLOG_ERR("Failed set_actions_mem_size for port_id " + DPDK_PORT_ID_FMT". Error: %d (%s)", + common->port_id, err, + doca_error_get_descr(err)); + goto out; + } + + err = rte_eth_dev_start(common->port_id); + if (err) { + VLOG_ERR("Failed to start dpdk port_id "DPDK_PORT_ID_FMT + ". Error: %s", common->port_id, + rte_strerror(-err)); + err = DOCA_ERROR_DRIVER; + goto out; + } + + err = doca_flow_port_cfg_set_nr_resources(port_cfg, + DOCA_FLOW_RESOURCE_COUNTER, + ovs_doca_max_counters()); + if (err) { + VLOG_ERR("Failed set_nr_resources counters for port_id " + DPDK_PORT_ID_FMT". Error: %d (%s)", + common->port_id, err, + doca_error_get_descr(err)); + goto out; + } + } + + err = doca_flow_port_start(port_cfg, &dev->port); + if (err) { + VLOG_ERR("Failed to start doca flow port_id "DPDK_PORT_ID_FMT + ". Error: %d (%s)", port_id, err, + doca_error_get_descr(err)); + goto out; + } + + if (common->port_id == dev->esw_mgr_port_id) { + err = netdev_doca_esw_init(netdev); + if (err) { + goto out; + } + } + + err = netdev_doca_egress_entry_init(dev); + if (err) { + goto out; + } + + err = netdev_doca_rss_entries_init(netdev); + if (err) { + goto out; + } + +out: + doca_flow_port_cfg_destroy(port_cfg); + if (err) { + netdev_doca_port_stop(netdev); + } + return err; +} + +static bool +netdev_doca_rep_reconfigure(struct netdev_doca *dev OVS_UNUSED) +{ + return true; +} + +static int +dpdk_eth_dev_init(struct netdev_doca *dev) + OVS_REQUIRES(doca_mutex) + OVS_REQUIRES(dev->common.mutex) +{ + struct netdev_dpdk_common *common = &dev->common; + struct netdev *netdev = &common->up; + struct rte_ether_addr eth_addr; + struct rte_eth_dev_info info; + int n_rxq, n_txq; + int diag; + + diag = rte_eth_dev_info_get(common->port_id, &info); + if (diag < 0) { + VLOG_ERR("Interface %s rte_eth_dev_info_get error: %s", + common->up.name, rte_strerror(-diag)); + return -diag; + } + + common->is_representor = common->devargs && + strstr(common->devargs, "representor="); + + netdev_dpdk_detect_hw_ol_features(common, &info); + + n_rxq = MIN(info.max_rx_queues, common->up.n_rxq); + n_txq = MIN(info.max_tx_queues, common->up.n_txq); + + diag = dpdk_eth_dev_port_config(dev, &info, n_rxq, n_txq); + if (diag) { + VLOG_ERR("Interface %s(rxq:%d txq:%d lsc interrupt mode:%s) " + "configure error: %s", + common->up.name, n_rxq, n_txq, + common->lsc_interrupt_mode ? "true" : "false", + rte_strerror(-diag)); + return -diag; + } + + /* When a representor is probed before its ESW, dpdk implicitly + * probes the latter, thus probe is not called from + * netdev_doca_process_devargs(). In this case we call probe at + * netdev_doca_port_start(), and make sure the device is marked as + * "attached". + */ + common->attached = true; + diag = netdev_doca_port_start(netdev); + if (diag) { + VLOG_ERR("Failed to init DOCA port %s port_id "DPDK_PORT_ID_FMT + ". Error: %d (%s)", netdev_get_name(netdev), + common->port_id, diag, doca_error_get_descr(diag)); + return diag; + } + + atomic_store(&common->started, true); + + netdev_dpdk_configure_xstats(common); + + memset(ð_addr, 0x0, sizeof(eth_addr)); + rte_eth_macaddr_get(common->port_id, ð_addr); + VLOG_INFO_RL(&rl, "Port %d: "ETH_ADDR_FMT, + common->port_id, ETH_ADDR_BYTES_ARGS(eth_addr.addr_bytes)); + + memcpy(common->hwaddr.ea, eth_addr.addr_bytes, ETH_ADDR_LEN); + if (rte_eth_link_get_nowait(common->port_id, &common->link) < 0) { + memset(&common->link, 0, sizeof common->link); + } + + /* Upon success of esw_mgr port, update the representor's field of it. */ + if (netdev_doca_get_esw_mgr_port_id(netdev) == common->port_id) { + netdev_doca_do_foreach_representor(dev, netdev_doca_rep_reconfigure); + } + + return 0; +} + +static struct doca_tx_queue * +netdev_doca_alloc_txq(unsigned int n_txqs) +{ + struct doca_tx_queue *txqs; + unsigned i; + + txqs = doca_rte_mzalloc("ovs_doca_txq", n_txqs * sizeof *txqs); + if (txqs) { + for (i = 0; i < n_txqs; i++) { + rte_spinlock_init(&txqs[i].tx_lock); + } + } + + return txqs; +} + +static int +netdev_doca_reconfigure(struct netdev *netdev) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + int err = 0; + + /* If an ESW manager is not attached to OVS, a representor cannot be + * configured. + */ + if (!netdev_doca_is_esw_mgr(netdev) && + netdev_doca_get_esw_mgr_port_id(netdev) == + DPDK_ETH_PORT_ID_INVALID) { + return EOPNOTSUPP; + } + + ovs_mutex_lock(&doca_mutex); + ovs_mutex_lock(&dev->common.mutex); + + common->requested_n_rxq = common->user_n_rxq; + + if (netdev->n_txq == common->requested_n_txq + && netdev->n_rxq == common->requested_n_rxq + && common->mtu == common->requested_mtu + && common->lsc_interrupt_mode == common->requested_lsc_interrupt_mode + && common->rxq_size == common->requested_rxq_size + && common->txq_size == common->requested_txq_size + && eth_addr_equals(common->hwaddr, common->requested_hwaddr) + && common->socket_id == common->requested_socket_id + && dpdk_dev_is_started(common)) { + /* Reconfiguration is unnecessary. */ + goto out; + } + + netdev_doca_port_stop(netdev); + + err = netdev_doca_mempool_configure(dev); + if (err) { + goto out; + } + + common->lsc_interrupt_mode = common->requested_lsc_interrupt_mode; + + netdev->n_txq = common->requested_n_txq; + netdev->n_rxq = common->requested_n_rxq; + if (!netdev_doca_is_esw_mgr(netdev)) { + int esw_n_rxq; + + esw_n_rxq = dev->esw_ctx->n_rxq; + if (esw_n_rxq < 0) { + err = -1; + goto out; + } + if (common->requested_n_rxq != esw_n_rxq) { + VLOG_WARN("%s: requested_n_rxq=%d is ignored. DOCA binds the " + "number of rx queues to the esw's n_rxq=%d", + netdev_get_name(netdev), common->requested_n_rxq, + esw_n_rxq); + } + netdev->n_rxq = esw_n_rxq; + } + + common->rxq_size = common->requested_rxq_size; + common->txq_size = common->requested_txq_size; + + rte_free(common->tx_q); + common->tx_q = NULL; + + if (!eth_addr_equals(common->hwaddr, common->requested_hwaddr)) { + err = netdev_dpdk_set_etheraddr__(common, + common->requested_hwaddr); + if (err) { + goto out; + } + } + + err = dpdk_eth_dev_init(dev); + if (err) { + goto out; + } + netdev_dpdk_update_netdev_flags(common); + + /* If both requested and actual hw-addr were previously + * unset (initialized to 0), then first device init above + * will have set actual hw-addr to something new. + * This would trigger spurious MAC reconfiguration unless + * the requested MAC is kept in sync. + * + * This is harmless in case requested_hwaddr was + * configured by the user, as netdev_dpdk_set_etheraddr__() + * will have succeeded to get to this point. + */ + common->requested_hwaddr = common->hwaddr; + + common->tx_q = netdev_doca_alloc_txq(netdev->n_txq); + if (!common->tx_q) { + err = ENOMEM; + } + + netdev_change_seq_changed(netdev); + +out: + ovs_mutex_unlock(&dev->common.mutex); + ovs_mutex_unlock(&doca_mutex); + return err; +} + +static int +common_construct(struct netdev *netdev, dpdk_port_t port_no, int socket_id) + OVS_REQUIRES(doca_mutex) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + + ovs_mutex_init(&common->mutex); + + rte_spinlock_init(&common->stats_lock); + + /* If the 'sid' is negative, it means that the kernel fails + * to obtain the pci numa info. In that situation, always + * use 'SOCKET0'. */ + common->socket_id = socket_id < 0 ? SOCKET0 : socket_id; + common->requested_socket_id = common->socket_id; + common->port_id = port_no; + dev->esw_mgr_port_id = port_no; + common->flags = 0; + common->requested_mtu = RTE_ETHER_MTU; + common->max_packet_len = MTU_TO_FRAME_LEN(common->mtu); + common->requested_lsc_interrupt_mode = 0; + common->attached = false; + atomic_store(&common->started, false); + + netdev->n_rxq = 0; + netdev->n_txq = 0; + common->user_n_rxq = NR_QUEUE; + common->requested_n_rxq = NR_QUEUE; + common->requested_n_txq = NR_QUEUE; + common->requested_rxq_size = NIC_PORT_DEFAULT_RXQ_SIZE; + common->requested_txq_size = NIC_PORT_DEFAULT_TXQ_SIZE; + + /* Initialize the flow control to NULL. */ + memset(&common->fc_conf, 0, sizeof common->fc_conf); + + /* Initialize the hardware offload flags to 0. */ + common->hw_ol_features = 0; + + common->rx_metadata_delivery_configured = false; + + common->flags = NETDEV_UP | NETDEV_PROMISC; + + ovs_list_push_back(&doca_list, &common->list_node); + + netdev_request_reconfigure(netdev); + + common->rte_xstats_names = NULL; + common->rte_xstats_names_size = 0; + + common->rte_xstats_ids = NULL; + common->rte_xstats_ids_size = 0; + + common->sw_stats = xzalloc(sizeof *common->sw_stats); + common->sw_stats->tx_retries = UINT64_MAX; + + return 0; +} + +static int +netdev_doca_construct(struct netdev *netdev) +{ + int err; + + ovs_mutex_lock(&doca_mutex); + err = common_construct(netdev, DPDK_ETH_PORT_ID_INVALID, SOCKET0); + ovs_mutex_unlock(&doca_mutex); + + return err; +} + +static int +netdev_doca_get_config(const struct netdev *netdev, struct smap *args) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + + ovs_mutex_lock(&common->mutex); + netdev_dpdk_get_config_common(common, args); + ovs_mutex_unlock(&common->mutex); + + return 0; +} + +static char * +netdev_doca_generate_devargs(const char *name, char *devargs, size_t maxlen, + char iface[IFNAMSIZ]) +{ + char phys_port_name_[IFNAMSIZ], *phys_port_name = phys_port_name_; + char iface_tmp[IFNAMSIZ]; + char device[PATH_MAX]; + char *mlx5_devargs; + char *rep_part; + bool is_rep; + bool is_pf; + char *pci; + int port; + int len; + + if (get_dpdk_iface_name(name, iface_tmp)) { + VLOG_ERR("%s: get_dpdk_iface_name failed for %s", + OVS_SOURCE_LOCATOR, name); + return NULL; + } + + name = iface_tmp; + ovs_strlcpy(iface, name, IFNAMSIZ); + + if (get_pci(name, device, sizeof device, &is_rep)) { + VLOG_ERR("%s: get_pci failed for %s", OVS_SOURCE_LOCATOR, name); + return NULL; + } + + pci = device; + + if (get_phys_port_name(name, phys_port_name_, sizeof phys_port_name_)) { + VLOG_ERR("%s: get_phys_port_name failed for %s", + OVS_SOURCE_LOCATOR, name); + return NULL; + } + + /* In some kernels, there is a controller prefix, like "c1". Ignore it. */ + if (sscanf(phys_port_name, "c%d", &port) == 1) { + phys_port_name += 2; + } + + is_pf = false; + + if (sscanf(phys_port_name, "p%d", &port) == 1) { + is_pf = true; + } else if (sscanf(phys_port_name, "pf%d", &port) != 1) { + VLOG_ERR("%s: unrecognized phys_port_name %s", + OVS_SOURCE_LOCATOR, phys_port_name); + return NULL; + } + + mlx5_devargs = + "dv_xmeta_en=4," + "dv_flow_en=2," + "probe_opt_en=1"; + + len = strlen(phys_port_name); + /* HPF's phys_port_name is pf0/pf1. */ + if (len == 3 && !strncmp(phys_port_name, "pf", 2)) { + /* "" to workaround a false positive checkpatch issue. */ + if (snprintf(devargs, maxlen, "%s,%s,representor=(pf%d)""vf65535", pci, + mlx5_devargs, port) < 0) { + VLOG_ERR("%s: snprintf failed for HPF devargs", + OVS_SOURCE_LOCATOR); + return NULL; + } + + return devargs; + } + + /* PF ports. */ + if (is_pf) { + if (!is_rep) { + len = snprintf(devargs, maxlen, "%s,%s", pci, mlx5_devargs); + } else { + len = snprintf(devargs, maxlen, "%s,%s,representor=pf%d", pci, + mlx5_devargs, port); + } + + if (len < 0) { + VLOG_ERR("%s: snprintf failed for PF devargs", OVS_SOURCE_LOCATOR); + return NULL; + } + + return devargs; + } + + /* Representors. */ + rep_part = strstr(phys_port_name, "vf"); + if (!rep_part) { + rep_part = strstr(phys_port_name, "sf"); + } + + if (!rep_part) { + VLOG_ERR("%s: no vf/sf in phys_port_name %s", + OVS_SOURCE_LOCATOR, phys_port_name); + return NULL; + } + + /* Format as (pfX)vfY or (pfX)sfY. */ + if (snprintf(devargs, maxlen, "%s,%s,representor=(%.*s)%s", pci, + mlx5_devargs, (int) (rep_part - phys_port_name), + phys_port_name, rep_part) < 0) { + VLOG_ERR("%s: snprintf failed for representor devargs", + OVS_SOURCE_LOCATOR); + return NULL; + } + + return devargs; +} + +static dpdk_port_t +netdev_doca_process_devargs(struct netdev_doca *dev, + const char *devargs, char **errp) + OVS_REQUIRES(doca_mutex) +{ + dpdk_port_t new_port_id; + + new_port_id = netdev_dpdk_get_port_by_devargs(devargs); + if (!rte_eth_dev_is_valid_port(new_port_id)) { + int err; + + /* Device not found in DPDK, attempt to attach it. */ + err = netdev_doca_dev_probe(dev, devargs); + if (err) { + new_port_id = DPDK_ETH_PORT_ID_INVALID; + } else { + new_port_id = netdev_dpdk_get_port_by_devargs(devargs); + if (rte_eth_dev_is_valid_port(new_port_id)) { + /* Attach successful. */ + dev->common.attached = true; + VLOG_INFO("Device '%s' attached to DPDK", devargs); + } else { + /* Attach unsuccessful. */ + new_port_id = DPDK_ETH_PORT_ID_INVALID; + } + } + } + + if (new_port_id == DPDK_ETH_PORT_ID_INVALID) { + VLOG_WARN_BUF(errp, "Error attaching device '%s' to DPDK", devargs); + } + + return new_port_id; +} + +static struct netdev_doca * +netdev_doca_lookup_by_port_id(dpdk_port_t port_id) + OVS_REQUIRES(doca_mutex) +{ + struct netdev_dpdk_common *common; + + common = netdev_dpdk_lookup_by_port_id__(port_id, &doca_list); + if (common) { + return CONTAINER_OF(common, struct netdev_doca, common); + } + + return NULL; +} + +static dpdk_port_t +netdev_doca_find_esw_mgr_port_id(dpdk_port_t dev_port_id) + OVS_REQUIRES(doca_mutex) +{ + struct rte_eth_dev_info info; + struct netdev_doca *dev; + uint16_t domain_id; + + if (!rte_eth_dev_is_valid_port(dev_port_id)) { + return DPDK_ETH_PORT_ID_INVALID; + } + + if (rte_eth_dev_info_get(dev_port_id, &info) < 0) { + VLOG_DBG_RL(&rl, "Failed to retrieve device info for port " + DPDK_PORT_ID_FMT, dev_port_id); + return DPDK_ETH_PORT_ID_INVALID; + } + + domain_id = info.switch_info.domain_id; + LIST_FOR_EACH (dev, common.list_node, &doca_list) { + if (!rte_eth_dev_is_valid_port(dev->common.port_id)) { + continue; + } + + if (rte_eth_dev_info_get(dev->common.port_id, &info) < 0) { + VLOG_DBG_RL(&rl, "Failed to retrieve device info for port " + DPDK_PORT_ID_FMT, dev->common.port_id); + continue; + } + + if (info.switch_info.domain_id == domain_id && + !(*info.dev_flags & RTE_ETH_DEV_REPRESENTOR)) { + VLOG_INFO("Found ESW manager port "DPDK_PORT_ID_FMT" for " + "device "DPDK_PORT_ID_FMT, dev->common.port_id, + dev_port_id); + return dev->common.port_id; + } + } + + return DPDK_ETH_PORT_ID_INVALID; +} + +static int +netdev_doca_set_config(struct netdev *netdev, const struct smap *args, + char **errp) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + char generated[PATH_MAX]; + bool lsc_interrupt_mode; + const char *new_devargs; + char iface[IFNAMSIZ]; + const char *dev_name; + const char *vf_mac; + int err = 0; + bool is_rep; + + ovs_mutex_lock(&doca_mutex); + ovs_mutex_lock(&common->mutex); + + memset(iface, 0, sizeof iface); + if (!common->devargs) { + dev_name = netdev_get_name(netdev); + new_devargs = netdev_doca_generate_devargs(dev_name, generated, + sizeof generated, iface); + if (!new_devargs) { + VLOG_WARN("%s: Could not generate DPDK devargs", + netdev_get_name(netdev)); + err = ENODEV; + goto out; + } + + common->devargs = xstrdup(new_devargs); + } + + is_rep = strstr(common->devargs, "representor="); + if (is_rep) { + struct netdev_doca_esw_key esw_key; + struct netdev_doca_esw_ctx *esw; + + if (netdev_doca_esw_key_parse(common->devargs, &esw_key)) { + VLOG_ERR("%s: esw_key_parse failed for %s", + OVS_SOURCE_LOCATOR, common->devargs); + err = EINVAL; + goto out; + } + + esw = refmap_try_ref(netdev_doca_esw_rfm, &esw_key); + if (!esw) { + goto out; + } + + refmap_unref(netdev_doca_esw_rfm, esw); + } + + netdev_dpdk_set_rxq_config(common, args); + + /* Don't process dpdk-devargs if value is unchanged and port id + * is valid. */ + if (!(rte_eth_dev_is_valid_port(common->port_id) && common->attached)) { + dpdk_port_t new_port_id = + netdev_doca_process_devargs(dev, common->devargs, errp); + + if (!rte_eth_dev_is_valid_port(new_port_id)) { + err = EINVAL; + } else if (new_port_id == common->port_id) { + /* Already configured, do not reconfigure again. */ + err = 0; + } else { + struct netdev_doca *dup_dev; + + dup_dev = netdev_doca_lookup_by_port_id(new_port_id); + if (dup_dev) { + VLOG_WARN_BUF(errp, "'%s' is trying to use device '%s' " + "which is already in use by '%s'", + netdev_get_name(netdev), common->devargs, + netdev_get_name(&dup_dev->common.up)); + err = EADDRINUSE; + } else { + int sid = rte_eth_dev_socket_id(new_port_id); + + common->requested_socket_id = sid < 0 ? SOCKET0 : sid; + common->port_id = new_port_id; + dev->esw_mgr_port_id = + netdev_doca_find_esw_mgr_port_id(new_port_id); + netdev_request_reconfigure(&common->up); + err = 0; + } + } + } + + if (err) { + goto out; + } + + vf_mac = smap_get(args, "dpdk-vf-mac"); + if (vf_mac) { + struct eth_addr mac; + + if (!common->is_representor) { + VLOG_WARN("'%s' is trying to set the VF MAC '%s' " + "but 'options:dpdk-vf-mac' is only supported for " + "VF representors.", + netdev_get_name(netdev), vf_mac); + } else if (!eth_addr_from_string(vf_mac, &mac)) { + VLOG_WARN("interface '%s': cannot parse VF MAC '%s'.", + netdev_get_name(netdev), vf_mac); + } else if (eth_addr_is_multicast(mac)) { + VLOG_WARN("interface '%s': cannot set VF MAC to multicast " + "address '%s'.", netdev_get_name(netdev), vf_mac); + } else if (!eth_addr_equals(common->requested_hwaddr, mac)) { + common->requested_hwaddr = mac; + netdev_request_reconfigure(netdev); + } + } + + lsc_interrupt_mode = smap_get_bool(args, "dpdk-lsc-interrupt", false); + if (common->requested_lsc_interrupt_mode != lsc_interrupt_mode) { + common->requested_lsc_interrupt_mode = lsc_interrupt_mode; + netdev_request_reconfigure(netdev); + } + +out: + ovs_mutex_unlock(&common->mutex); + ovs_mutex_unlock(&doca_mutex); + + return err; +} + +static void +classify_in_port(struct dp_packet_batch *rx_batch, + struct netdev_doca_port_queue *pq[RTE_MAX_ETHPORTS], + uint16_t queue_id) +{ + struct dp_packet *pkt; + uint64_t old_count; + uint32_t pkt_size; + uint32_t port_id; + int rv; + + DP_PACKET_BATCH_FOR_EACH (i, pkt, rx_batch) { + dp_packet_reset_cutlen(pkt); + pkt->packet_type = htonl(PT_ETH); + pkt->has_hash = !!(pkt->mbuf.ol_flags & RTE_MBUF_F_RX_RSS_HASH); + pkt->has_mark = !!(pkt->mbuf.ol_flags & RTE_MBUF_F_RX_FDIR_ID); + pkt->offloads = + pkt->mbuf.ol_flags & (RTE_MBUF_F_RX_IP_CKSUM_BAD + | RTE_MBUF_F_RX_IP_CKSUM_GOOD + | RTE_MBUF_F_RX_L4_CKSUM_BAD + | RTE_MBUF_F_RX_L4_CKSUM_GOOD); + + if (!dp_packet_has_flow_mark(pkt, &port_id)) { + COVERAGE_INC(netdev_doca_no_mark); + dp_packet_delete(pkt); + continue; + } + + pkt->has_mark = false; + if (!rte_eth_dev_is_valid_port(port_id)) { + COVERAGE_INC(netdev_doca_invalid_classify_port); + dp_packet_delete(pkt); + continue; + } + + pkt_size = dp_packet_size(pkt); + rv = rte_ring_sp_enqueue(pq[port_id][queue_id].ring, pkt); + if (rv) { + COVERAGE_INC(netdev_doca_drop_ring_full); + dp_packet_delete(pkt); + continue; + } + + atomic_add_relaxed(&pq[port_id][queue_id].n_bytes, pkt_size, + &old_count); + } +} + +static int +netdev_doca_rxq_recv(struct netdev_rxq *rxq, struct dp_packet_batch *batch, + int *qfill) +{ + struct netdev_doca *dev = netdev_doca_cast(rxq->netdev); + struct netdev_rxq_dpdk *rx = netdev_rxq_dpdk_cast(rxq); + struct netdev_dpdk_common *common = &dev->common; + struct netdev_doca_port_queue *pq; + struct dp_packet_batch rx_batch; + dpdk_port_t esw_mgr_port_id; + dpdk_port_t port_id; + uint64_t old_count; + int nb_rx; + + if (OVS_UNLIKELY(!(common->flags & NETDEV_UP) || + !dpdk_dev_is_started(common))) { + return EAGAIN; + } + + esw_mgr_port_id = dev->esw_ctx->port_id; + port_id = common->port_id; + + if (port_id == esw_mgr_port_id) { + rx_batch.count = + rte_eth_rx_burst(esw_mgr_port_id, rxq->queue_id, + (struct rte_mbuf **) rx_batch.packets, + NETDEV_MAX_BURST); + if (rx_batch.count == 0) { + return 0; + } + + classify_in_port(&rx_batch, dev->esw_ctx->port_queues, rxq->queue_id); + } + + pq = &dev->esw_ctx->port_queues[port_id][rxq->queue_id]; + batch->count = + rte_ring_sc_dequeue_burst(pq->ring, (void **) batch->packets, + NETDEV_MAX_BURST, NULL); + atomic_add_relaxed(&pq->n_packets, batch->count, &old_count); + + nb_rx = batch->count; + + if (!nb_rx) { + return EAGAIN; + } + + if (qfill) { + if (nb_rx == NETDEV_MAX_BURST) { + *qfill = rte_eth_rx_queue_count(rx->port_id, rxq->queue_id); + } else { + *qfill = 0; + } + } + + return 0; +} + +static size_t +netdev_doca_common_send(struct netdev *netdev, struct dp_packet_batch *batch, + struct netdev_doca_sw_stats *stats) +{ + struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; + struct netdev_doca *dev = netdev_doca_cast(netdev); + size_t cnt, pkt_cnt = dp_packet_batch_size(batch); + struct netdev_dpdk_common *common = &dev->common; + struct dp_packet *packet; + bool need_copy = false; + + memset(stats, 0, sizeof *stats); + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + if (packet->source != DPBUF_DPDK) { + need_copy = true; + break; + } + } + + /* Copy dp-packets to mbufs. */ + if (OVS_UNLIKELY(need_copy)) { + cnt = netdev_dpdk_copy_batch_to_mbuf(common, batch); + stats->tx_failure_drops += pkt_cnt - cnt; + pkt_cnt = cnt; + } + + /* Drop over-sized packets. */ + cnt = netdev_dpdk_filter_packet_len(common, pkts, pkt_cnt); + stats->tx_mtu_exceeded_drops += pkt_cnt - cnt; + pkt_cnt = cnt; + + /* Prepare each mbuf for hardware offloading. */ + cnt = netdev_dpdk_prep_hwol_batch(common, pkts, pkt_cnt); + stats->tx_invalid_hwol_drops += pkt_cnt - cnt; + pkt_cnt = cnt; + + return cnt; +} + +static inline void +packet_set_meta(struct dp_packet *p, uint32_t meta) +{ + *RTE_MBUF_DYNFIELD(&p->mbuf, rte_flow_dynf_metadata_offs, + uint32_t *) = meta; + p->mbuf.ol_flags |= RTE_MBUF_DYNFLAG_TX_METADATA; +} + +static int +netdev_doca_eth_send(struct netdev *netdev, int qid, + struct dp_packet_batch *batch, bool concurrent_txq) +{ + struct rte_mbuf **pkts = (struct rte_mbuf **) batch->packets; + uint32_t port_id_meta = netdev_doca_get_port_id(netdev); + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_dpdk_common *common = &dev->common; + int batch_cnt = dp_packet_batch_size(batch); + struct netdev_doca_sw_stats stats; + struct dp_packet *packet; + uint64_t n_bytes = 0; + uint64_t old_count; + int cnt, dropped; + + if (OVS_UNLIKELY(!(common->flags & NETDEV_UP))) { + rte_spinlock_lock(&common->stats_lock); + common->stats.tx_dropped += dp_packet_batch_size(batch); + rte_spinlock_unlock(&common->stats_lock); + dp_packet_delete_batch(batch, true); + return 0; + } + + if (OVS_UNLIKELY(concurrent_txq)) { + qid = qid % common->up.n_txq; + rte_spinlock_lock(&common->tx_q[qid].tx_lock); + } + + cnt = netdev_doca_common_send(netdev, batch, &stats); + + DP_PACKET_BATCH_FOR_EACH (i, packet, batch) { + /* Set metadata for egress pipe rules to match on. */ + packet_set_meta(packet, port_id_meta); + n_bytes += dp_packet_size(packet); + } + + atomic_add_relaxed(&dev->sw_tx_stats[qid].n_packets, batch->count, + &old_count); + atomic_add_relaxed(&dev->sw_tx_stats[qid].n_bytes, n_bytes, &old_count); + + dropped = netdev_dpdk_eth_tx_burst(common, dev->esw_mgr_port_id, + qid, pkts, cnt); + stats.tx_failure_drops += dropped; + dropped += batch_cnt - cnt; + if (OVS_UNLIKELY(dropped)) { + struct netdev_doca_sw_stats *sw_stats = common->sw_stats; + + rte_spinlock_lock(&common->stats_lock); + common->stats.tx_dropped += dropped; + sw_stats->tx_failure_drops += stats.tx_failure_drops; + sw_stats->tx_mtu_exceeded_drops += stats.tx_mtu_exceeded_drops; + sw_stats->tx_invalid_hwol_drops += stats.tx_invalid_hwol_drops; + rte_spinlock_unlock(&common->stats_lock); + } + + if (OVS_UNLIKELY(concurrent_txq)) { + rte_spinlock_unlock(&common->tx_q[qid].tx_lock); + } + + return 0; +} + +#define NETDEV_DOCA_CLASS_COMMON \ + .is_pmd = true, \ + .alloc = netdev_doca_alloc, \ + .dealloc = netdev_doca_dealloc, \ + .get_numa_id = netdev_dpdk_get_numa_id, \ + .set_etheraddr = netdev_dpdk_set_etheraddr, \ + .get_etheraddr = netdev_dpdk_get_etheraddr, \ + .get_mtu = netdev_dpdk_get_mtu, \ + .set_mtu = netdev_doca_set_mtu, \ + .get_ifindex = netdev_dpdk_get_ifindex, \ + .get_carrier_resets = netdev_dpdk_get_carrier_resets, \ + .set_miimon_interval = netdev_dpdk_set_miimon, \ + .update_flags = netdev_dpdk_update_flags, \ + .rxq_alloc = netdev_dpdk_rxq_alloc, \ + .rxq_construct = netdev_dpdk_rxq_construct, \ + .rxq_destruct = netdev_dpdk_rxq_destruct, \ + .rxq_dealloc = netdev_dpdk_rxq_dealloc + +#define NETDEV_DOCA_CLASS_BASE \ + NETDEV_DOCA_CLASS_COMMON, \ + .init = netdev_doca_class_init, \ + .destruct = netdev_doca_destruct, \ + .set_tx_multiq = netdev_dpdk_set_tx_multiq, \ + .get_carrier = netdev_dpdk_get_carrier, \ + .get_stats = netdev_dpdk_get_stats, \ + .get_custom_stats = netdev_doca_get_custom_stats, \ + .get_features = netdev_dpdk_get_features, \ + .get_speed = netdev_dpdk_get_speed, \ + .get_status = netdev_doca_get_status, \ + .reconfigure = netdev_doca_reconfigure, \ + .rxq_recv = netdev_doca_rxq_recv + +static const struct netdev_class netdev_doca_class = { + .type = "doca", + NETDEV_DOCA_CLASS_BASE, + .construct = netdev_doca_construct, + .get_config = netdev_doca_get_config, + .set_config = netdev_doca_set_config, + .send = netdev_doca_eth_send, +}; + +void +netdev_doca_register(void) +{ + netdev_register_provider(&netdev_doca_class); +} diff --git a/lib/netdev-doca.h b/lib/netdev-doca.h new file mode 100644 index 000000000..b774318fa --- /dev/null +++ b/lib/netdev-doca.h @@ -0,0 +1,159 @@ +/* + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. + * All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef NETDEV_DOCA_H +#define NETDEV_DOCA_H + +#include + +#include +#include +#include + +#include + +#include "netdev-provider.h" +#include "ovs-doca.h" +#include "util.h" + +#include "openvswitch/list.h" + +struct doca_tx_queue; +struct netdev_doca_sw_stats; +extern struct ovs_mutex doca_mutex; +#define NETDEV_DPDK_TX_Q_TYPE struct doca_tx_queue +#define NETDEV_DPDK_SW_STATS_TYPE struct netdev_doca_sw_stats +#define NETDEV_DPDK_GLOBAL_MUTEX doca_mutex +#include "netdev-dpdk-private.h" + +struct rte_ring; + +enum netdev_doca_rss_type { + NETDEV_DOCA_RSS_IPV4_TCP, + NETDEV_DOCA_RSS_IPV4_UDP, + NETDEV_DOCA_RSS_IPV4_ICMP, + NETDEV_DOCA_RSS_IPV4_ESP, + NETDEV_DOCA_RSS_IPV4_OTHER, + NETDEV_DOCA_RSS_IPV6_TCP, + NETDEV_DOCA_RSS_IPV6_UDP, + NETDEV_DOCA_RSS_IPV6_ICMP, + NETDEV_DOCA_RSS_IPV6_ESP, + NETDEV_DOCA_RSS_IPV6_OTHER, + NETDEV_DOCA_RSS_OTHER, +}; +/* Must be the last enum type. */ +#define NETDEV_DOCA_RSS_NUM_ENTRIES (NETDEV_DOCA_RSS_OTHER + 1) + +/* Custom software stats for dpdk ports */ +struct netdev_doca_sw_stats { + /* No. of retries when unable to transmit. */ + uint64_t tx_retries; + /* Packet drops when unable to transmit; Probably Tx queue is full. */ + uint64_t tx_failure_drops; + /* Packet length greater than device MTU. */ + uint64_t tx_mtu_exceeded_drops; + /* Packet drops in HWOL processing. */ + uint64_t tx_invalid_hwol_drops; +}; + +struct netdev_doca_tx_stats { + PADDED_MEMBERS(CACHE_LINE_SIZE, + atomic_uint64_t n_packets; + atomic_uint64_t n_bytes; + ); +}; + +enum netdev_doca_port_dir { + NETDEV_DOCA_PORT_DIR_RX, + NETDEV_DOCA_PORT_DIR_TX, + NUM_NETDEV_DOCA_PORT_DIR, +}; + +enum pre_miss_types { + SEND_TO_KERNEL_LACP, + SEND_TO_KERNEL_LLDP, + NUM_SEND_TO_KERNEL, +}; + +struct netdev_doca_port_queue { + PADDED_MEMBERS(CACHE_LINE_SIZE, + struct rte_ring *ring; + atomic_uint64_t n_packets; + atomic_uint64_t n_bytes; + ); +}; + +struct netdev_doca_esw_ctx { + struct netdev_doca_port_queue *port_queues[RTE_MAX_ETHPORTS]; + dpdk_port_t port_id; + struct ovs_doca_offload_queue + offload_queues[OVS_DOCA_MAX_OFFLOAD_QUEUES]; + struct doca_flow_port *esw_port; + struct netdev *esw_netdev; + /* miss-path */ + struct { + struct doca_flow_pipe *egress_pipe; + struct doca_flow_pipe *rss_pipe; + struct doca_flow_pipe *meta_tag0_pipe; + struct doca_flow_pipe_entry *meta_tag0_entry; + struct doca_flow_pipe *pre_miss_pipe; + struct doca_flow_pipe_entry *pre_miss_entries[NUM_SEND_TO_KERNEL]; + struct doca_flow_pipe *root_pipe; + }; + unsigned int n_rxq; + char pci_addr[PCI_PRI_STR_SIZE]; + struct doca_dev *dev; + uint32_t op_state; + int cmd_fd; +}; + +/* There should be one 'struct doca_tx_queue' created for + * each netdev tx queue. */ +struct doca_tx_queue { + /* Padding to make doca_tx_queue exactly one cache line long. */ + PADDED_MEMBERS(CACHE_LINE_SIZE, + /* Protects the members and the NIC queue from concurrent access. + * It is used only if the queue is shared among different pmd threads + * (see 'concurrent_txq'). */ + rte_spinlock_t tx_lock; + ); +}; + +struct netdev_doca { + struct netdev_dpdk_common common; /* Must be first (offset 0). */ + + dpdk_port_t esw_mgr_port_id; + struct netdev_doca_tx_stats *sw_tx_stats; + + PADDED_MEMBERS_CACHELINE_MARKER(CACHE_LINE_SIZE, cacheline10, + struct doca_flow_port *port; + struct netdev_doca_esw_ctx *esw_ctx; + struct doca_flow_pipe_entry *rss_entries[NETDEV_DOCA_RSS_NUM_ENTRIES]; + struct doca_flow_pipe_entry *egress_entry; + char *peer_name; + enum netdev_doca_port_dir port_dir; + struct doca_dev_rep *dev_rep; + ); +}; + +void netdev_doca_register(void); + +struct netdev_doca * +netdev_doca_cast(const struct netdev *netdev); + +#endif /* NETDEV_DOCA_H */ diff --git a/lib/ovs-doca.c b/lib/ovs-doca.c index eae361a21..f78d0efdd 100644 --- a/lib/ovs-doca.c +++ b/lib/ovs-doca.c @@ -24,13 +24,34 @@ #ifdef DOCA_NETDEV +#include +#include +#include +#include + #include +#include +#include #include +#include +#include +#include #include +#include "coverage.h" +#include "dpdk.h" +#include "netdev.h" +#include "netdev-doca.h" +#include "smap.h" +#include "unixctl.h" +#include "util.h" + +#include "openvswitch/list.h" +#include "openvswitch/vlog.h" + /* DOCA disables dpdk steering as a constructor in higher priority. - * Set a lower priority one to enable it back. Disable it only upon using + * Set a lower priority one to enable it back. Disable it only upon using * doca ports. */ RTE_INIT(dpdk_steering_enable) @@ -38,9 +59,714 @@ RTE_INIT(dpdk_steering_enable) rte_pmd_mlx5_enable_steering(); } +VLOG_DEFINE_THIS_MODULE(ovs_doca); +static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(600, 600); + +#define OVS_DOCA_SLOWPATH_COUNTERS \ + ((NETDEV_DOCA_RSS_NUM_ENTRIES + 1) * RTE_MAX_ETHPORTS) + +COVERAGE_DEFINE(ovs_doca_queue_block); +COVERAGE_DEFINE(ovs_doca_queue_empty); +COVERAGE_DEFINE(ovs_doca_queue_none_processed); + +static atomic_bool doca_initialized = false; +static unsigned int ovs_doca_max_megaflows_counters; +static FILE *log_stream = NULL; /* Stream for DOCA log redirection */ +static struct doca_log_backend *ovs_doca_log = NULL; + +static const char * const levels[] = { + [DOCA_LOG_LEVEL_CRIT] = "CRT", + [DOCA_LOG_LEVEL_ERROR] = "ERR", + [DOCA_LOG_LEVEL_WARNING] = "WRN", + [DOCA_LOG_LEVEL_INFO] = "INF", + [DOCA_LOG_LEVEL_DEBUG] = "DBG", + [DOCA_LOG_LEVEL_TRACE] = "TRC", +}; + +static int +ovs_doca_parse_log_level(const char *s) +{ + for (int i = 0; i < ARRAY_SIZE(levels); ++i) { + if (levels[i] && !strncmp(s, levels[i], strlen(levels[i]))) { + return i; + } + } + + return -1; +} + +static const char * +ovs_doca_log_level_to_str(uint32_t log_level) +{ + for (int i = 0; i < ARRAY_SIZE(levels); ++i) { + if (i == log_level && levels[i]) { + return levels[i]; + } + } + + OVS_NOT_REACHED(); + return "UNKONWN"; +} + +static enum doca_log_level +get_buf_log_level(const char *buf, size_t size) +{ + const char *p = buf; + int level; + + for (int i = 0; i < 4; i++) { + while (size && *p && *p != '[') { + size--; + p++; + } + + if (!size || !*p) { + return DOCA_LOG_LEVEL_DISABLE; + } + + size--; + p++; + } + + level = ovs_doca_parse_log_level(p); + if (level < 0) { + return DOCA_LOG_LEVEL_DISABLE; + } + + return level; +} + +static ssize_t +ovs_doca_log_write(void *c OVS_UNUSED, const char *buf, size_t size) +{ + static struct vlog_rate_limit dbg_rl = VLOG_RATE_LIMIT_INIT(600, 600); + enum doca_log_level level = get_buf_log_level(buf, size); + + switch (level) { + case DOCA_LOG_LEVEL_DISABLE: + VLOG_EMER("(Failed to parse level): %.*s", (int) size, buf); + break; + case DOCA_LOG_LEVEL_TRACE: + case DOCA_LOG_LEVEL_DEBUG: + VLOG_DBG_RL(&dbg_rl, "%.*s", (int) size, buf); + break; + case DOCA_LOG_LEVEL_INFO: + VLOG_INFO_RL(&rl, "%.*s", (int) size, buf); + break; + case DOCA_LOG_LEVEL_WARNING: + VLOG_WARN_RL(&rl, "%.*s", (int) size, buf); + break; + case DOCA_LOG_LEVEL_ERROR: + VLOG_ERR_RL(&rl, "%.*s", (int) size, buf); + break; + case DOCA_LOG_LEVEL_CRIT: + VLOG_EMER("%.*s", (int) size, buf); + break; + default: + OVS_NOT_REACHED(); + } + + return size; +} + +static cookie_io_functions_t ovs_doca_log_func = { + .write = ovs_doca_log_write, +}; + +static void +ovs_doca_unixctl_log_set(struct unixctl_conn *conn, int argc, + const char *argv[], void *aux OVS_UNUSED) +{ + int level = DOCA_LOG_LEVEL_DEBUG; + + /* With no argument, level is set to 'debug'. */ + + if (argc == 2) { + const char *level_string; + + level_string = argv[1]; + level = ovs_doca_parse_log_level(level_string); + if (level < 0) { + char *err_msg = xasprintf("invalid log level: '%s'", level_string); + + unixctl_command_reply_error(conn, err_msg); + free(err_msg); + return; + } + } + + doca_log_level_set_global_sdk_limit(level); + unixctl_command_reply(conn, NULL); +} + +static void +ovs_doca_log_get(FILE *stream) +{ + uint32_t log_level; + + log_level = doca_log_level_get_global_sdk_limit(); + fprintf(stream, "DOCA log level is %s", + ovs_doca_log_level_to_str(log_level)); +} + +static void +ovs_doca_destroy_defs(struct doca_flow_definitions *defs, + struct doca_flow_definitions_cfg *defs_cfg) +{ + if (defs) { + doca_flow_definitions_destroy(defs); + } + + if (defs_cfg) { + doca_flow_definitions_cfg_destroy(defs_cfg); + } +} + +static doca_error_t +ovs_doca_init_defs(struct doca_flow_cfg *cfg, + struct doca_flow_definitions **defs, + struct doca_flow_definitions_cfg **defs_cfg) +{ +#define DEF_FIELD(str_val, struct_name, field_name) { \ + .str = str_val, \ + .offset = offsetof(struct struct_name, field_name), \ + .size = MEMBER_SIZEOF(struct struct_name, field_name) \ +} + struct def_field { + const char *str; + size_t offset; + size_t size; + } def_fields[] = { + DEF_FIELD("actions.packet.meta.mark", ovs_doca_flow_actions, mark), + }; + doca_error_t result; + + result = doca_flow_definitions_cfg_create(defs_cfg); + if (result != DOCA_SUCCESS) { + VLOG_ERR("Failed to create defs cfg. Error: %d (%s)", result, + doca_error_get_descr(result)); + return result; + } + + result = doca_flow_definitions_create(*defs_cfg, defs); + if (result != DOCA_SUCCESS) { + VLOG_ERR("Failed to create definitions. Error: %d (%s)", result, + doca_error_get_descr(result)); + goto out; + } + + for (int i = 0; i < ARRAY_SIZE(def_fields); i++) { + result = doca_flow_definitions_add_field(*defs, def_fields[i].str, + def_fields[i].offset, + def_fields[i].size); + if (result != DOCA_SUCCESS) { + VLOG_ERR("Failed to add definition field '%s'. Error: %d (%s)", + def_fields[i].str, result, doca_error_get_descr(result)); + goto out; + } + } + + result = doca_flow_cfg_set_definitions(cfg, *defs); + if (result != DOCA_SUCCESS) { + VLOG_ERR("Failed to set doca_flow_cfg defs. Error: %d (%s)", result, + doca_error_get_descr(result)); + goto out; + } + +out: + if (result) { + ovs_doca_destroy_defs(*defs, *defs_cfg); + } + + return result; +} + +static void +ovs_doca_offload_entry_process(struct doca_flow_pipe_entry *entry, + uint16_t qid, + enum doca_flow_entry_status status, + enum doca_flow_entry_op op, + void *aux) +{ + static const char *status_desc[] = { + [DOCA_FLOW_ENTRY_STATUS_IN_PROCESS] = "in-process", + [DOCA_FLOW_ENTRY_STATUS_SUCCESS] = "success", + [DOCA_FLOW_ENTRY_STATUS_ERROR] = "failure", + }; + static const char *op_desc[] = { + [DOCA_FLOW_ENTRY_OP_ADD] = "add", + [DOCA_FLOW_ENTRY_OP_DEL] = "del", + [DOCA_FLOW_ENTRY_OP_UPD] = "mod", + [DOCA_FLOW_ENTRY_OP_AGED] = "age", + }; + bool error = status == DOCA_FLOW_ENTRY_STATUS_ERROR; + struct ovs_doca_offload_queue *queues = aux; + + ovs_assert(status < ARRAY_SIZE(status_desc)); + ovs_assert(op < ARRAY_SIZE(op_desc)); + + VLOG_RL(&rl, error ? VLL_ERR : VLL_DBG, + "%s: [qid:%" PRIu16 "] %s aux=%p entry %p %s", + __func__, qid, op_desc[op], aux, entry, status_desc[status]); + + if (queues && status != DOCA_FLOW_ENTRY_STATUS_IN_PROCESS) { + queues[qid].n_waiting_entries--; + } +} + +static int +ovs_doca_init__(const struct smap *ovs_other_config) +{ + struct doca_flow_definitions_cfg *defs_cfg = NULL; + struct doca_flow_definitions *defs = NULL; + struct doca_flow_cfg *cfg; + doca_error_t err; + + if (!dpdk_available()) { + VLOG_ERR("DOCA requires DPDK. Set other_config:dpdk-init=true."); + return ENODEV; + } + + if (rte_flow_dynf_metadata_register() < 0) { + VLOG_ERR("Failed to register dynamic metadata, err: %s.", + rte_strerror(rte_errno)); + return ENOTSUP; + } + + log_stream = fopencookie(NULL, "w+", ovs_doca_log_func); + if (!log_stream) { + VLOG_ERR("Can't redirect DOCA log: %s.", ovs_strerror(errno)); + } else { + /* Create a logger back-end that prints to the redirected log */ + err = doca_log_backend_create_with_file_sdk(log_stream, + &ovs_doca_log); + if (err != DOCA_SUCCESS) { + VLOG_ERR("%s: doca_log_backend_create_with_file_sdk failed." + " Error: %d (%s)", + OVS_SOURCE_LOCATOR, err, doca_error_get_descr(err)); + return ENODEV; + } + + doca_log_level_set_global_sdk_limit(DOCA_LOG_LEVEL_WARNING); + } + + unixctl_command_register("doca/log-set", "{level}. " + "level=CRT/ERR/WRN/INF/DBG/TRC", 0, 1, + ovs_doca_unixctl_log_set, NULL); + unixctl_command_register("doca/log-get", "", 0, 0, + unixctl_mem_stream, ovs_doca_log_get); + + /* DOCA configuration happens earlier than dpif-netdev's. + * To avoid reorganizing them, read the relevant item directly. */ + ovs_doca_max_megaflows_counters = + smap_get_uint(ovs_other_config, "flow-limit", + OVS_DOCA_MAX_MEGAFLOWS_COUNTERS); + +#define RV_TEST(call) \ + do { \ + err = (call); \ + if (err) { \ + VLOG_ERR("%s failed. Error: %d (%s)", \ + #call, err, doca_error_get_descr(err)); \ + return ENODEV; \ + } \ + } while (0) + + RV_TEST(doca_flow_cfg_create(&cfg)); + RV_TEST(doca_flow_cfg_set_pipe_queues(cfg, + OVS_DOCA_MAX_OFFLOAD_QUEUES)); + RV_TEST(doca_flow_cfg_set_resource_mode(cfg, + DOCA_FLOW_RESOURCE_MODE_PORT)); + RV_TEST(doca_flow_cfg_set_mode_args(cfg, + "switch" + ",hws" + ",isolated" + ",expert" + "")); + RV_TEST(doca_flow_cfg_set_queue_depth(cfg, OVS_DOCA_QUEUE_DEPTH)); + RV_TEST(doca_flow_cfg_set_cb_entry_process( + cfg, ovs_doca_offload_entry_process)); + RV_TEST(ovs_doca_init_defs(cfg, &defs, &defs_cfg)); + + VLOG_INFO("DOCA Enabled - initializing..."); + RV_TEST(doca_flow_init(cfg)); + ovs_doca_destroy_defs(defs, defs_cfg); + RV_TEST(doca_flow_cfg_destroy(cfg)); + +#undef RV_TEST + + netdev_doca_register(); + return 0; +} + +static bool +ovs_doca_available(void) +{ + bool available; + + atomic_read_relaxed(&doca_initialized, &available); + return available; +} + +/* Complete the queue 'qid' on the netdev's ESW until OVS_DOCA_QUEUE_DEPTH + * entries are available. + */ +static doca_error_t +ovs_doca_complete_queue_esw(struct netdev_doca_esw_ctx *esw, + unsigned int qid) +{ + struct ovs_doca_offload_queue *queue; + long long int timeout_ms; + unsigned int n_waiting; + doca_error_t err; + uint32_t room; + int retries; + + queue = &esw->offload_queues[qid]; + n_waiting = queue->n_waiting_entries; + + if (n_waiting == 0) { + COVERAGE_INC(ovs_doca_queue_empty); + return DOCA_SUCCESS; + } + + /* 1 second timeout. */ + timeout_ms = time_msec() + 1 * 1000; + retries = 100; + do { + unsigned int n_processed; + + /* Use 'max_processed_entries' == 0 to always attempt processing + * the full length of the queue. */ + err = doca_flow_entries_process(esw->esw_port, qid, + OVS_DOCA_ENTRY_PROCESS_TIMEOUT_US, 0); + if (err) { + VLOG_WARN_RL(&rl, "%s: Failed to process entries in queue " + "%u. Error: %d (%s)", + netdev_get_name(esw->esw_netdev), qid, + err, doca_error_get_descr(err)); + return err; + } + + n_processed = n_waiting - queue->n_waiting_entries; + if (n_processed == 0) { + COVERAGE_INC(ovs_doca_queue_none_processed); + } + n_waiting = queue->n_waiting_entries; + + room = OVS_DOCA_QUEUE_DEPTH - n_waiting; + if (n_processed == 0 && retries-- <= 0) { + COVERAGE_INC(ovs_doca_queue_block); + break; + } + + if (timeout_ms && time_msec() > timeout_ms) { + ovs_abort(0, "Timeout reached trying to complete queue %u: " + "%u remaining entries", qid, n_waiting); + } + } while (err == DOCA_SUCCESS && room < OVS_DOCA_QUEUE_DEPTH); + + return err; +} + +static doca_error_t +ovs_doca_add_generic(unsigned int qid, + uint32_t hash_index, + struct doca_flow_pipe *pipe, + enum doca_flow_pipe_type pipe_type, + const struct ovs_doca_flow_match *omatch, + const struct ovs_doca_flow_actions *oactions, + const struct doca_flow_monitor *monitor, + const struct doca_flow_fwd *fwd, + uint32_t flags, + struct netdev_doca_esw_ctx *esw, + struct doca_flow_pipe_entry **pentry) + OVS_NO_THREAD_SAFETY_ANALYSIS +{ + const struct doca_flow_actions *actions; + struct ovs_doca_offload_queue *queues; + const struct doca_flow_match *match; + doca_error_t err; + + ovs_assert(qid < OVS_DOCA_MAX_OFFLOAD_QUEUES); + + ovs_assert(esw); + queues = esw->offload_queues; + match = omatch ? &omatch->d : NULL; + actions = oactions ? &oactions->d : NULL; + + ovs_assert(queues); + + switch (pipe_type) { + case DOCA_FLOW_PIPE_BASIC: + err = doca_flow_pipe_basic_add_entry(qid, pipe, match, 0, actions, + monitor, fwd, flags, queues, + pentry); + break; + case DOCA_FLOW_PIPE_HASH: + err = doca_flow_pipe_hash_add_entry(qid, pipe, hash_index, 0, actions, + monitor, fwd, flags, queues, + pentry); + break; + case DOCA_FLOW_PIPE_CONTROL: + case DOCA_FLOW_PIPE_LPM: + case DOCA_FLOW_PIPE_CT: + case DOCA_FLOW_PIPE_ACL: + case DOCA_FLOW_PIPE_ORDERED_LIST: + OVS_NOT_REACHED(); + } + + if (err == DOCA_SUCCESS) { + queues[qid].n_waiting_entries++; + } + + return err; +} + +doca_error_t +ovs_doca_add_entry(struct netdev *netdev, + unsigned int qid, + struct doca_flow_pipe *pipe, + const struct ovs_doca_flow_match *match, + const struct ovs_doca_flow_actions *actions, + const struct doca_flow_monitor *monitor, + const struct doca_flow_fwd *fwd, + uint32_t flags, + struct doca_flow_pipe_entry **pentry) +{ + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct netdev_doca_esw_ctx *esw = dev->esw_ctx; + doca_error_t err; + + err = ovs_doca_add_generic(qid, 0, pipe, DOCA_FLOW_PIPE_BASIC, match, + actions, monitor, fwd, flags, esw, pentry); + if (err) { + VLOG_WARN_RL(&rl, "%s: Failed to create basic pipe entry. " + "Error: %d (%s)", netdev_get_name(netdev), err, + doca_error_get_descr(err)); + return err; + } + + if (DOCA_FLOW_FLAGS_IS_SET(flags, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT)) { + err = ovs_doca_complete_queue_esw(esw, qid); + if (err != DOCA_SUCCESS) { + VLOG_ERR("%s: ovs_doca_complete_queue_esw failed." + " Error: %d (%s)", + OVS_SOURCE_LOCATOR, err, doca_error_get_descr(err)); + return err; + } + } + + return err; +} + +doca_error_t +ovs_doca_remove_entry(struct netdev_doca_esw_ctx *esw, + unsigned int qid, uint32_t flags, + struct doca_flow_pipe_entry **entry) +{ + doca_error_t err; + + if (!*entry) { + return DOCA_SUCCESS; + } + + ovs_assert(qid < OVS_DOCA_MAX_OFFLOAD_QUEUES); + + err = doca_flow_pipe_remove_entry(qid, flags, *entry); + if (err == DOCA_SUCCESS) { + esw->offload_queues[qid].n_waiting_entries++; + if (DOCA_FLOW_FLAGS_IS_SET(flags, DOCA_FLOW_ENTRY_FLAGS_NO_WAIT)) { + /* Ignore potential errors here, as even if the queue completion + * failed, the entry removal would still be issued. The caller + * requires knowing so. */ + ovs_doca_complete_queue_esw(esw, qid); + } + *entry = NULL; + } else { + VLOG_ERR("Failed to remove entry %p qid=%d. Error: %d (%s)", + *entry, qid, err, doca_error_get_descr(err)); + } + + return err; +} + +doca_error_t +ovs_doca_pipe_cfg_allow_queues(struct doca_flow_pipe_cfg *cfg, + uint64_t queues_bitmap) +{ + ovs_assert(cfg); + + for (unsigned int qid = 0; qid < OVS_DOCA_MAX_OFFLOAD_QUEUES; qid++) { + doca_error_t err; + + if ((UINT64_C(1) << qid) & queues_bitmap) { + continue; + } + + err = doca_flow_pipe_cfg_set_excluded_queue(cfg, qid); + if (DOCA_IS_ERROR(err)) { + VLOG_ERR("Failed to exclude queue %u in pipe configuration." + " Error: %d (%s)", qid, err, doca_error_get_descr(err)); + return err; + } + } + + return DOCA_SUCCESS; +} + void -ovs_doca_init(const struct smap *ovs_other_config OVS_UNUSED) +ovs_doca_destroy_pipe(struct doca_flow_pipe **ppipe) { + if (!ppipe || !*ppipe) { + return; + } + + doca_flow_pipe_destroy(*ppipe); + *ppipe = NULL; +} + +doca_error_t +ovs_doca_pipe_create(struct netdev *netdev, + struct ovs_doca_flow_match *match, + struct ovs_doca_flow_match *match_mask, + struct doca_flow_monitor *monitor, + struct ovs_doca_flow_actions *actions, + struct ovs_doca_flow_actions *actions_mask, + struct doca_flow_action_desc *desc, + struct doca_flow_fwd *fwd, + struct doca_flow_fwd *fwd_miss, + uint32_t nr_entries, + bool is_egress, bool is_root, + uint64_t queues_bitmap, + const char *pipe_str, + struct doca_flow_pipe **pipe) +{ + struct doca_flow_actions *actions_arr[1], *actions_masks_arr[1]; + struct netdev_doca *dev = netdev_doca_cast(netdev); + struct doca_flow_action_descs descs, *descs_arr[1]; + char pipe_name[OVS_DOCA_MAX_PIPE_NAME_LEN]; + struct doca_flow_port *doca_port; + struct doca_flow_pipe_cfg *cfg; + int ret; + + ovs_assert(!*pipe); + + doca_port = doca_flow_port_switch_get(dev->port); + ovs_assert(doca_port); + + snprintf(pipe_name, sizeof pipe_name, "%s: %s", netdev_get_name(netdev), + pipe_str); + + ret = doca_flow_pipe_cfg_create(&cfg, doca_port); + if (ret) { + VLOG_ERR("%s: Could not create doca_flow_pipe_cfg for %s." + " Error: %d (%s)", netdev_get_name(netdev), pipe_name, + ret, doca_error_get_descr(ret)); + return ret; + } + + actions_arr[0] = actions ? &actions->d : NULL; + actions_masks_arr[0] = actions_mask ? &actions_mask->d : NULL; + descs.desc_array = desc; + descs.nb_action_desc = 1; + descs_arr[0] = &descs; + +#define PIPE_CFG_SET(call) \ + do { \ + ret = (call); \ + if (ret) { \ + VLOG_ERR("%s: %s failed for %s. Error: %d (%s)", \ + netdev_get_name(netdev), #call, pipe_name, \ + ret, doca_error_get_descr(ret)); \ + goto error; \ + } \ + } while (0) + + PIPE_CFG_SET(doca_flow_pipe_cfg_set_name(cfg, pipe_name)); + PIPE_CFG_SET(doca_flow_pipe_cfg_set_type(cfg, DOCA_FLOW_PIPE_BASIC)); + PIPE_CFG_SET(doca_flow_pipe_cfg_set_nr_entries(cfg, nr_entries)); + PIPE_CFG_SET(ovs_doca_pipe_cfg_allow_queues(cfg, queues_bitmap)); + if (is_egress) { + PIPE_CFG_SET(doca_flow_pipe_cfg_set_domain( + cfg, DOCA_FLOW_PIPE_DOMAIN_EGRESS)); + } + + PIPE_CFG_SET(doca_flow_pipe_cfg_set_is_root(cfg, is_root)); + if (match) { + PIPE_CFG_SET(doca_flow_pipe_cfg_set_match(cfg, &match->d, + match_mask + ? &match_mask->d + : &match->d)); + } + + if (monitor) { + PIPE_CFG_SET(doca_flow_pipe_cfg_set_monitor(cfg, monitor)); + } + + if (actions) { + PIPE_CFG_SET(doca_flow_pipe_cfg_set_actions(cfg, actions_arr, + actions_mask + ? actions_masks_arr + : actions_arr, + desc + ? descs_arr + : NULL, 1)); + } + +#undef PIPE_CFG_SET + + ret = doca_flow_pipe_create(cfg, fwd, fwd_miss, pipe); + if (ret) { + VLOG_ERR("%s: Failed to create basic pipe '%s'. Error: %d (%s)", + netdev_get_name(netdev), pipe_name, ret, + doca_error_get_descr(ret)); + } + +error: + doca_flow_pipe_cfg_destroy(cfg); + return ret; +} + +unsigned int +ovs_doca_max_counters(void) +{ + return ovs_doca_max_megaflows_counters + OVS_DOCA_SLOWPATH_COUNTERS; +} + +void +ovs_doca_init(const struct smap *ovs_other_config) +{ + static bool enabled = false; + int rv; + + if (enabled || !ovs_other_config) { + return; + } + + if (smap_get_bool(ovs_other_config, "doca-init", false)) { + static struct ovsthread_once once_enable = OVSTHREAD_ONCE_INITIALIZER; + + if (!ovsthread_once_start(&once_enable)) { + return; + } + + VLOG_INFO("Using DOCA %s", doca_version_runtime()); + VLOG_INFO("DOCA Enabled - initializing..."); + rv = ovs_doca_init__(ovs_other_config); + if (!rv) { + VLOG_INFO("DOCA Enabled - initialized"); + enabled = true; + } else { + ovs_abort(rv, "DOCA Initialization Failed."); + } + + ovsthread_once_done(&once_enable); + } else { + VLOG_INFO_ONCE("DOCA Disabled - Use other_config:doca-init to enable"); + } + + atomic_store_relaxed(&doca_initialized, enabled); } void @@ -56,7 +782,7 @@ ovs_doca_status(const struct ovsrec_open_vswitch *cfg) return; } - ovsrec_open_vswitch_set_doca_initialized(cfg, false); + ovsrec_open_vswitch_set_doca_initialized(cfg, ovs_doca_available()); ovsrec_open_vswitch_set_doca_version(cfg, doca_version_runtime()); } diff --git a/lib/ovs-doca.h b/lib/ovs-doca.h index 9bd96c941..8a66572ef 100644 --- a/lib/ovs-doca.h +++ b/lib/ovs-doca.h @@ -24,6 +24,88 @@ struct ovsrec_open_vswitch; struct smap; +#ifdef DOCA_NETDEV + +#include +#include + +#include "dp-packet.h" +#include "ovs-thread.h" +#include "util.h" + +#define AUX_QUEUE 0 +#define OVS_DOCA_MAX_OFFLOAD_QUEUES 1 +#define OVS_DOCA_QUEUE_DEPTH 32 +#define OVS_DOCA_ENTRY_PROCESS_TIMEOUT_US 1000 + +/* Estimated maximum number of megaflows */ +#define OVS_DOCA_MAX_MEGAFLOWS_COUNTERS (1 << 19) + +#define OVS_DOCA_MAX_PIPE_NAME_LEN 128 + +struct netdev_doca_esw_ctx; + +struct ovs_doca_offload_queue { + PADDED_MEMBERS(CACHE_LINE_SIZE, + unsigned int n_waiting_entries; + ); +}; + +struct ovs_doca_flow_actions { + struct doca_flow_actions d; + uint32_t mark; +}; +BUILD_ASSERT_DECL(offsetof(struct ovs_doca_flow_actions, d) == 0); + +struct ovs_doca_flow_match { + struct doca_flow_match d; +}; +BUILD_ASSERT_DECL(offsetof(struct ovs_doca_flow_match, d) == 0); + +doca_error_t +ovs_doca_add_entry(struct netdev *netdev, + unsigned int qid, + struct doca_flow_pipe *pipe, + const struct ovs_doca_flow_match *match, + const struct ovs_doca_flow_actions *actions, + const struct doca_flow_monitor *monitor, + const struct doca_flow_fwd *fwd, + uint32_t flags, + struct doca_flow_pipe_entry **pentry); + +doca_error_t +ovs_doca_remove_entry(struct netdev_doca_esw_ctx *esw, + unsigned int qid, uint32_t flags, + struct doca_flow_pipe_entry **entry); + +void +ovs_doca_destroy_pipe(struct doca_flow_pipe **ppipe); + +doca_error_t +ovs_doca_pipe_create(struct netdev *netdev, + struct ovs_doca_flow_match *match, + struct ovs_doca_flow_match *match_mask, + struct doca_flow_monitor *monitor, + struct ovs_doca_flow_actions *actions, + struct ovs_doca_flow_actions *actions_mask, + struct doca_flow_action_desc *desc, + struct doca_flow_fwd *fwd, + struct doca_flow_fwd *fwd_miss, + uint32_t nr_entries, + bool is_egress, bool is_root, + uint64_t queues_bitmap, + const char *pipe_str, + struct doca_flow_pipe **pipe); + +doca_error_t +ovs_doca_pipe_cfg_allow_queues(struct doca_flow_pipe_cfg *cfg, + uint64_t queues_bitmap); + +unsigned int +ovs_doca_max_counters(void); + +#endif /* DOCA_NETDEV */ + void ovs_doca_init(const struct smap *ovs_other_config); void print_doca_version(void); void ovs_doca_status(const struct ovsrec_open_vswitch *); diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at index 7f6ab8904..e19f7a2d0 100644 --- a/tests/ofproto-macros.at +++ b/tests/ofproto-macros.at @@ -223,6 +223,7 @@ m4_define([_OVS_VSWITCHD_START], /netdev_linux|INFO|.*device has unknown hardware address family/d /ofproto|INFO|datapath ID changed to fedcba9876543210/d /dpdk|INFO|DPDK Disabled - Use other_config:dpdk-init to enable/d +/ovs_doca|INFO|DOCA Disabled - Use other_config:doca-init to enable/d /netlink_socket|INFO|netlink: could not enable listening to all nsid/d /dpif_offload|INFO|Flow HW offload is enabled/d /probe tc:/d diff --git a/utilities/checkpatch_dict.txt b/utilities/checkpatch_dict.txt index 6a454bcf8..37a30077f 100644 --- a/utilities/checkpatch_dict.txt +++ b/utilities/checkpatch_dict.txt @@ -235,6 +235,7 @@ recirc recirculation recirculations refmap +representor revalidate revalidation revalidator diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index 9edd1027e..df0819f26 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -337,6 +337,22 @@

+ +

+ A value of true will cause the ovs-vswitchd process to + abort if DOCA cannot be initialized. +

+

+ The default value is false. Changing this value + requires restarting the daemon +

+

+ If this value is false at startup, any doca ports which + are configured in the bridge will fail as an unknown port type. +

+
+

@@ -510,6 +526,10 @@ +

+ NOTE: For DOCA, there is a mempool per ESW manager, which is shared + with all its representors. This configuration is N/A for doca. +

By default OVS DPDK uses a shared memory model wherein devices that have the same MTU and socket values can share the same @@ -526,36 +546,40 @@ -

Specifies dpdk shared mempool config.

-

Value should be set in the following form:

-

- other_config:shared-mempool-config=< - user-shared-mempool-mtu-list> -

-

where

-

-

    -
  • - <user-shared-mempool-mtu-list> ::= - NULL | <non-empty-list> -
  • -
  • - <non-empty-list> ::= <user-mtus> | - <user-mtus> , - <non-empty-list> -
  • -
  • - <user-mtus> ::= <mtu-all-socket> | - <mtu-socket-pair> -
  • -
  • - <mtu-all-socket> ::= <mtu> -
  • -
  • - <mtu-socket-pair> ::= <mtu> : <socket-id> -
  • -
-

+

+ NOTE: For DOCA, there is a mempool per ESW manager, which is shared + with all its representors. This configuration is N/A for doca. +

+

Specifies dpdk shared mempool config.

+

Value should be set in the following form:

+

+ other_config:shared-mempool-config=< + user-shared-mempool-mtu-list> +

+

where

+

+

    +
  • + <user-shared-mempool-mtu-list> ::= + NULL | <non-empty-list> +
  • +
  • + <non-empty-list> ::= <user-mtus> | + <user-mtus> , + <non-empty-list> +
  • +
  • + <user-mtus> ::= <mtu-all-socket> | + <mtu-socket-pair> +
  • +
  • + <mtu-all-socket> ::= <mtu> +
  • +
  • + <mtu-socket-pair> ::= <mtu> : <socket-id> +
  • +
+

Changing this value requires restarting the daemon if dpdk-init has already been set to true. @@ -942,7 +966,8 @@ - Always false. + True if is set to + true and the DOCA library is successfully initialized.