From patchwork Wed Dec 22 09:16:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1572097 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=OrtnwAFA; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JJnl36JScz9sRN for ; Wed, 22 Dec 2021 20:16:47 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 4FC7940A96; Wed, 22 Dec 2021 09:16:45 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id vhMSNxnBUvFH; Wed, 22 Dec 2021 09:16:44 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 3418740A99; Wed, 22 Dec 2021 09:16:43 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 018B0C001E; Wed, 22 Dec 2021 09:16:43 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 65761C0012 for ; Wed, 22 Dec 2021 09:16:41 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 463E9812CA for ; Wed, 22 Dec 2021 09:16:41 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MYp6s1ifrvTO for ; Wed, 22 Dec 2021 09:16:40 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 90ACD80E42 for ; Wed, 22 Dec 2021 09:16:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640164599; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YQQxrgceISGyi28ll+pjylQXVMz3F2kwYm3aWWDWJgA=; b=OrtnwAFAHMt5sb9a9yKri0+0WJaKdv9f83mhR9DtphwtiwqGOL8torGD8QSzuit75slidD KKFxRmvn84zdow/DCJFf9ulxfnc9UC/GhSGUHyamkUcPkDhVELd5xhmXaQZwJO+qOybuET DKH52wpVo0mMlszo+kL2jtMH6nIOo80= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-552-pxbJDHELMyONCuWLcLKNlw-1; Wed, 22 Dec 2021 04:16:37 -0500 X-MC-Unique: pxbJDHELMyONCuWLcLKNlw-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1FD6E1006AA5 for ; Wed, 22 Dec 2021 09:16:37 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.186]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7BC1B5BD0F; Wed, 22 Dec 2021 09:16:36 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Wed, 22 Dec 2021 10:16:25 +0100 Message-Id: <164016456502.146460.501036453462533691.stgit@ebuild> In-Reply-To: <164016453900.146460.16964767898519268258.stgit@ebuild> References: <164016453900.146460.16964767898519268258.stgit@ebuild> User-Agent: StGit/1.1 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=echaudro@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 1/5] configure: add --enable-usdt option to enable USDT probes X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Allow inclusion of User Statically Defined Trace (USDT) probes in the OVS binaries using the --enable-usdt-probes option to the ./configure script. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio --- v2: - Fixed option name in the commit message - Fixed help string (added space) configure.ac | 1 + m4/openvswitch.m4 | 28 ++++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/configure.ac b/configure.ac index eaa9bf7ee..3e72e28bf 100644 --- a/configure.ac +++ b/configure.ac @@ -88,6 +88,7 @@ OVS_CHECK_WIN32 OVS_CHECK_VISUAL_STUDIO_DDK OVS_CHECK_COVERAGE OVS_CHECK_NDEBUG +OVS_CHECK_USDT OVS_CHECK_NETLINK OVS_CHECK_OPENSSL OVS_CHECK_LIBCAPNG diff --git a/m4/openvswitch.m4 b/m4/openvswitch.m4 index 772825a71..4c3bace6e 100644 --- a/m4/openvswitch.m4 +++ b/m4/openvswitch.m4 @@ -60,6 +60,34 @@ AC_DEFUN([OVS_CHECK_NDEBUG], [ndebug=false]) AM_CONDITIONAL([NDEBUG], [test x$ndebug = xtrue])]) +dnl Checks for --enable-usdt-probes and defines HAVE_USDT if it is specified. +AC_DEFUN([OVS_CHECK_USDT], [ + AC_ARG_ENABLE( + [usdt-probes], + [AC_HELP_STRING([--enable-usdt-probes], + [Enable User Statically Defined Tracing (USDT) probes])], + [case "${enableval}" in + (yes) usdt=true ;; + (no) usdt=false ;; + (*) AC_MSG_ERROR([bad value ${enableval} for --enable-usdt-probes]) ;; + esac], + [usdt=false]) + + AC_MSG_CHECKING([whether USDT probes are enabled]) + if test "$usdt" != true; then + AC_MSG_RESULT([no]) + else + AC_MSG_RESULT([yes]) + + AC_CHECK_HEADER([sys/sdt.h], [], + [AC_MSG_ERROR([unable to find sys/sdt.h needed for USDT support])]) + + AC_DEFINE([HAVE_USDT_PROBES], [1], + [Define to 1 if USDT probes are enabled.]) + fi + AM_CONDITIONAL([HAVE_USDT_PROBES], [test $usdt = true]) +]) + dnl Checks for MSVC x64 compiler. AC_DEFUN([OVS_CHECK_WIN64], [AC_CACHE_CHECK( From patchwork Wed Dec 22 09:17:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1572098 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=gh5um50F; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JJnly1XCWz9sPC for ; Wed, 22 Dec 2021 20:17:34 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 574E640A95; Wed, 22 Dec 2021 09:17:32 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yf_-I2JmFtxg; Wed, 22 Dec 2021 09:17:31 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id 2C24340201; Wed, 22 Dec 2021 09:17:30 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id E982AC0012; Wed, 22 Dec 2021 09:17:29 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp4.osuosl.org (smtp4.osuosl.org [IPv6:2605:bc80:3010::137]) by lists.linuxfoundation.org (Postfix) with ESMTP id 43659C0012 for ; Wed, 22 Dec 2021 09:17:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp4.osuosl.org (Postfix) with ESMTP id 25495415DD for ; Wed, 22 Dec 2021 09:17:29 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp4.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp4.osuosl.org ([127.0.0.1]) by localhost (smtp4.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VZRfPK-zkDkY for ; Wed, 22 Dec 2021 09:17:28 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp4.osuosl.org (Postfix) with ESMTPS id 26FEB415D9 for ; Wed, 22 Dec 2021 09:17:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640164646; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Bp19wEhr363TKKbgx+Gxklq+ivJ1a1JZfOEdM0Kyf08=; b=gh5um50FR18WNrxEz1T8/rjTkuX+TvNeEwm36pykRbbeZuw0KICnglrm9rm22rQq0vjKc5 I326fgkRFYi1vvt84jHB+hjNKWK1JAr9ROWpEPLUmjp4Oe6lpTgyt3m1kspxuhwtMl2Mf8 ba60LpDVgRwHyKSs5leaJ/WX7hbafPM= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-68-eIcITwTlNLOPAqewy72MIg-1; Wed, 22 Dec 2021 04:17:25 -0500 X-MC-Unique: eIcITwTlNLOPAqewy72MIg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8674818B9EC4 for ; Wed, 22 Dec 2021 09:17:24 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.186]) by smtp.corp.redhat.com (Postfix) with ESMTP id DEA784BC65; Wed, 22 Dec 2021 09:17:23 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Wed, 22 Dec 2021 10:17:12 +0100 Message-Id: <164016460235.146460.602090268145085032.stgit@ebuild> In-Reply-To: <164016453900.146460.16964767898519268258.stgit@ebuild> References: <164016453900.146460.16964767898519268258.stgit@ebuild> User-Agent: StGit/1.1 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=echaudro@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 2/5] openvswitch: define the OVS_STATIC_TRACE() macro X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This patch defines the OVS_STATIC_TRACE() macro, and as an example, adds two of them in the bridge run loop. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio --- v2: - Added Paolo's Acked-by include/openvswitch/automake.mk | 1 + include/openvswitch/usdt_probes.h | 50 +++++++++++++++++++++++++++++++++++++ vswitchd/ovs-vswitchd.c | 3 ++ 3 files changed, 54 insertions(+) create mode 100644 include/openvswitch/usdt_probes.h diff --git a/include/openvswitch/automake.mk b/include/openvswitch/automake.mk index 1fa6d88fa..317dc404f 100644 --- a/include/openvswitch/automake.mk +++ b/include/openvswitch/automake.mk @@ -43,6 +43,7 @@ openvswitchinclude_HEADERS = \ include/openvswitch/tun-metadata.h \ include/openvswitch/type-props.h \ include/openvswitch/types.h \ + include/openvswitch/usdt_probes.h \ include/openvswitch/util.h \ include/openvswitch/uuid.h \ include/openvswitch/version.h \ diff --git a/include/openvswitch/usdt_probes.h b/include/openvswitch/usdt_probes.h new file mode 100644 index 000000000..2f8882bd4 --- /dev/null +++ b/include/openvswitch/usdt_probes.h @@ -0,0 +1,50 @@ +/* + * Copyright (c) 2021 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef OPENVSWITCH_USDT_PROBES_H +#define OPENVSWITCH_USDT_PROBES_H 1 + +#ifdef HAVE_USDT_PROBES +#include +#endif + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef HAVE_USDT_PROBES + +#define GET_DTRACE_FUNC(_0, _1, _2, _3, _4, _5, _6, _7, _8, _9, _10, \ + NAME, ...) NAME + +#define OVS_USDT_PROBE(provider, name, ...) \ + GET_DTRACE_FUNC(_0, ##__VA_ARGS__, DTRACE_PROBE10, DTRACE_PROBE9, \ + DTRACE_PROBE8, DTRACE_PROBE7, DTRACE_PROBE6, \ + DTRACE_PROBE5, DTRACE_PROBE4, DTRACE_PROBE3, \ + DTRACE_PROBE2, DTRACE_PROBE1, DTRACE_PROBE) \ + (provider, name, ##__VA_ARGS__) + +#else + +#define OVS_USDT_PROBE(...) + +#endif + +#ifdef __cplusplus +} +#endif + +#endif /* usdt_probes.h */ diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index f007f9c0b..932a1ba0b 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -50,6 +50,7 @@ #include "util.h" #include "openvswitch/vconn.h" #include "openvswitch/vlog.h" +#include "openvswitch/usdt_probes.h" #include "lib/vswitch-idl.h" #include "lib/dns-resolve.h" @@ -115,6 +116,7 @@ main(int argc, char *argv[]) exiting = false; cleanup = false; while (!exiting) { + OVS_USDT_PROBE(main, run_start); memory_run(); if (memory_should_report()) { struct simap usage; @@ -135,6 +137,7 @@ main(int argc, char *argv[]) if (exiting) { poll_immediate_wake(); } + OVS_USDT_PROBE(main, poll_block); poll_block(); if (should_service_stop()) { exiting = true; From patchwork Wed Dec 22 09:17:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1572099 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=XDiRZ2oj; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JJnmd0309z9sPC for ; Wed, 22 Dec 2021 20:18:08 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 325E540A98; Wed, 22 Dec 2021 09:18:07 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mvLcwcm9KafD; Wed, 22 Dec 2021 09:18:05 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 4624440291; Wed, 22 Dec 2021 09:18:04 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 17306C001E; Wed, 22 Dec 2021 09:18:04 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) by lists.linuxfoundation.org (Postfix) with ESMTP id E425FC0012 for ; Wed, 22 Dec 2021 09:18:02 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id C33B740291 for ; Wed, 22 Dec 2021 09:18:02 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id V5HmjSpYhnfl for ; Wed, 22 Dec 2021 09:18:01 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp2.osuosl.org (Postfix) with ESMTPS id D46EE4015F for ; Wed, 22 Dec 2021 09:18:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640164679; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UQszk0EhSl8An+NjNDElu0t2CtjV8c/SS9aYGZ2dOOk=; b=XDiRZ2ojDO552AoEKjepjrAmzM0s+maI2lTUjiEwsAP4MyV4YTXq7dphOCt1SyNzprMv0N B+ggO0p19rzGMfz75650oc8QFwiKcBbHLTVZVPThX+WUIo1OvMr1/MrixJgvQ5nRGPOKsu hphSJDZJxpwbwoj9VjoOo+Ys7K8JpCU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-249-_kKACegWPFuSfUZLLowU6Q-1; Wed, 22 Dec 2021 04:17:56 -0500 X-MC-Unique: _kKACegWPFuSfUZLLowU6Q-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CE0CE1030C22 for ; Wed, 22 Dec 2021 09:17:55 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.186]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0BFE2E144; Wed, 22 Dec 2021 09:17:54 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Wed, 22 Dec 2021 10:17:44 +0100 Message-Id: <164016464977.146460.10393819877295313822.stgit@ebuild> In-Reply-To: <164016453900.146460.16964767898519268258.stgit@ebuild> References: <164016453900.146460.16964767898519268258.stgit@ebuild> User-Agent: StGit/1.1 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=echaudro@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 3/5] Documentation: add USDT documentation and bpftrace example X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Add the USDT documentation and a bpftrace example using the bridge run USDT probes. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio --- v2: - Fixed path naming - Added Paolo's Acked-by Documentation/automake.mk | 1 Documentation/topics/index.rst | 1 Documentation/topics/usdt-probes.rst | 267 +++++++++++++++++++++++++++++++++ NEWS | 1 utilities/automake.mk | 13 +- utilities/usdt_scripts/bridge_loop.bt | 120 +++++++++++++++ 6 files changed, 397 insertions(+), 6 deletions(-) create mode 100644 Documentation/topics/usdt-probes.rst create mode 100755 utilities/usdt_scripts/bridge_loop.bt diff --git a/Documentation/automake.mk b/Documentation/automake.mk index 137cc57c5..9b9c72cff 100644 --- a/Documentation/automake.mk +++ b/Documentation/automake.mk @@ -58,6 +58,7 @@ DOC_SOURCE = \ Documentation/topics/record-replay.rst \ Documentation/topics/tracing.rst \ Documentation/topics/userspace-tso.rst \ + Documentation/topics/usdt-probes.rst \ Documentation/topics/windows.rst \ Documentation/howto/index.rst \ Documentation/howto/dpdk.rst \ diff --git a/Documentation/topics/index.rst b/Documentation/topics/index.rst index d8ccbd757..7ca364340 100644 --- a/Documentation/topics/index.rst +++ b/Documentation/topics/index.rst @@ -55,3 +55,4 @@ OVS userspace-tso idl-compound-indexes ovs-extensions + usdt-probes diff --git a/Documentation/topics/usdt-probes.rst b/Documentation/topics/usdt-probes.rst new file mode 100644 index 000000000..345f078dd --- /dev/null +++ b/Documentation/topics/usdt-probes.rst @@ -0,0 +1,267 @@ +.. + Licensed under the Apache License, Version 2.0 (the "License"); you may + not use this file except in compliance with the License. You may obtain + a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, WITHOUT + WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the + License for the specific language governing permissions and limitations + under the License. + + Convention for heading levels in Open vSwitch documentation: + + ======= Heading 0 (reserved for the title in a document) + ------- Heading 1 + ~~~~~~~ Heading 2 + +++++++ Heading 3 + ''''''' Heading 4 + + Avoid deeper levels because they do not render well. + +============================================= +User Statically-Defined Tracing (USDT) probes +============================================= + +Sometimes it's desired to troubleshoot one of OVS's components in the field. +One of the techniques used is to add dynamic tracepoints, for example using +perf_. However, the desired dynamic tracepoint and/or the desired variable, +might not be available due to compiler optimizations. + +In this case, a well-thought-off, static tracepoint could be permanently added, +so it's always available. For OVS we use the DTrace probe macro's, which have +little to no overhead when disabled. Various tools exist to enable them. See +some examples below. + + +Compiling with USDT probes enabled +---------------------------------- + +Since USDT probes are compiled out by default, a compile-time option is +available to include them. To add the probes to the generated code, use the +following configure option :: + + $ ./configure --enable-usdt-probes + +The following line should be seen in the configure output when the above option +is used :: + + checking whether USDT probes are enabled... yes + + +Listing available probes +------------------------ + +There are various ways to display USDT probes available in a userspace +application. Here we show three examples. All assuming ovs-vswitchd is in the +search path with USDT probes enabled: + +You can use the **perf** tool as follows :: + + $ perf buildid-cache --add $(which ovs-vswitchd) + $ perf list | grep sdt_ + sdt_main:poll_block [SDT event] + sdt_main:run_start [SDT event] + +You can use the bpftrace_ tool :: + + # bpftrace -l "usdt:$(which ovs-vswitchd):*" + usdt:/usr/sbin/ovs-vswitchd:main:poll_block + usdt:/usr/sbin/ovs-vswitchd:main:run_start + +NOTE: If you execute this on a running process, +``bpftrace -lp $(pidof ovs-vswitchd) "usdt:*"`` , it will list all USDT events, +i.e., also the ones available in the used shared libraries. + +Finally, you can use the **tplist** tool which is part of the bcc_ framework :: + + $ /usr/share/bcc/tools/tplist -vv -l $(which ovs-vswitchd) + b'main':b'poll_block' [sema 0x0] + location #1 b'/usr/sbin/ovs-vswitchd' 0x407fdc + b'main':b'run_start' [sema 0x0] + location #1 b'/usr/sbin/ovs-vswitchd' 0x407ff6 + + +Using probes +------------ + +We will use the OVS sandbox environment in combination with the probes shown +above to try out some of the available trace tools. To start up the virtual +environment use the ``make sandbox`` command. In addition we have to create +a bridge to kick of the main bridge loop :: + + $ ovs-vsctl add-br test_bridge + $ ovs-vsctl show + 055acdca-2f0c-4f6e-b542-f4b6d2c44e08 + Bridge test_bridge + Port test_bridge + Interface test_bridge + type: internal + +perf +~~~~ + +Perf is using Linux uprobe based event tracing to for capturing the events. +To enable the main:\* probes as displayed above and take an actual trace, you +need to execute the following sequence of perf commands :: + + # perf buildid-cache --add $(which ovs-vswitchd) + + # perf list | grep sdt_ + sdt_main:poll_block [SDT event] + sdt_main:run_start [SDT event] + + # perf probe --add=sdt_main:poll_block --add=sdt_main:run_start + Added new events: + sdt_main:poll_block (on %poll_block in /usr/sbin/ovs-vswitchd) + sdt_main:run_start (on %run_start in /usr/sbin/ovs-vswitchd) + + You can now use it in all perf tools, such as: + + perf record -e sdt_main:run_start -aR sleep 1 + + # perf record -e sdt_main:run_start -e sdt_main:poll_block \ + -p $(pidof ovs-vswitchd) sleep 30 + [ perf record: Woken up 1 times to write data ] + [ perf record: Captured and wrote 0.039 MB perf.data (132 samples) ] + + # perf script + ovs-vswitchd 8576 [011] 21031.340433: sdt_main:run_start: (407ff6) + ovs-vswitchd 8576 [011] 21031.340516: sdt_main:poll_block: (407fdc) + ovs-vswitchd 8576 [011] 21031.841726: sdt_main:run_start: (407ff6) + ovs-vswitchd 8576 [011] 21031.842088: sdt_main:poll_block: (407fdc) + ... + +Note that the above examples works with the sandbox environment, so make sure +you execute the above command while in the sandbox shell! + +There are a lot more options available with perf, for example, the +``--call-graph dwarf`` option, which would give you a call graph in the +``perf script`` output. See the perf documentation for more information. + +One other interesting feature is that the perf data can be converted for use +by the trace visualizer `Trace Compass`_. This can be done using the +``--all --to-ctf`` option to the ``perf data convert`` tool. + + +bpftrace +~~~~~~~~ + +bpftrace is a high-level tracing language based on eBPF, which can be used to +script USDT probes. Here we will show a simple one-liner to display the +USDT probes being hit. However, the script section below reference some more +advanced bpftrace scripts. + +This is a simple bpftrace one-liner to show all ``main:*`` USDT probes :: + + # bpftrace -p $(pidof ovs-vswitchd) -e \ + 'usdt::main:* { printf("%s %u [%u] %u %s\n", + comm, pid, cpu, elapsed, probe); }' + Attaching 2 probes... + ovs-vswitchd 8576 [11] 203833199 usdt:main:run_start + ovs-vswitchd 8576 [11] 204086854 usdt:main:poll_block + ovs-vswitchd 8576 [11] 221611985 usdt:main:run_start + ovs-vswitchd 8576 [11] 221892019 usdt:main:poll_block + + +bcc +~~~ + +The BPF Compiler Collection (BCC) is a set of tools and scripts that also use +eBPF for tracing. The example below uses the ``trace`` tool to show the events +while they are being generated :: + + # /usr/share/bcc/tools/trace -T -p $(pidof ovs-vswitchd) \ + 'u::main:run_start' 'u::main:poll_block' + TIME PID TID COMM FUNC + 15:49:06 8576 8576 ovs-vswitchd main:run_start + 15:49:06 8576 8576 ovs-vswitchd main:poll_block + 15:49:06 8576 8576 ovs-vswitchd main:run_start + 15:49:06 8576 8576 ovs-vswitchd main:poll_block + ^C + + +Scripts +------- +To not have to re-invent the wheel when trying to debug complex OVS issues, a +set of scripts are provided in the source repository. They are located in the +``utilities/usdt_scripts/`` directory, and each script contains detailed +information on how they should be used, and what information they provide. + + +Available probes +---------------- +The next sections describes all the available probes, their use case, and if +used in any script, which one. Any new probes being added to OVS should get +their own section. See the below "Adding your own probes" section for the +used naming convention. + +Available probes in ``ovs_vswitchd``: + +- main:poll_block +- main:run_start + + +probe main:run_start +~~~~~~~~~~~~~~~~~~~~ + +**Description**: +The ovs-vswitchd's main process contains a loop that runs every time some work +needs to be done. This probe gets triggered every time the loop starts from the +beginning. See also the ``main:poll_block`` probe below. + +**Arguments**: + +*None* + +**Script references**: + +- ``utilities/usdt_scripts/bridge_loop.bt`` + + +probe main:poll_block +~~~~~~~~~~~~~~~~~~~~~ + +**Description**: +The ovs-vswitchd's main process contains a loop that runs every time some work +needs to be done. This probe gets triggered every time the loop is done, and +it's about to wait for being re-started by a poll_block() call returning. +See also the ``main:run_start`` probe above. + +**Arguments**: + +*None* + +**Script references**: + +- ``utilities/usdt_scripts/bridge_loop.bt`` + + +Adding your own probes +---------------------- + +Adding your own probes is as simple as adding the ``OVS_USDT_PROBE()`` macro +to the code. It's similar to the ``DTRACE_PROBExx`` macro's with the difference +that it does automatically determine the number of optional arguments. + +The macro requires at least two arguments. The first one being the *provider*, +and the second one being the *name*. To keep some consistency with the probe +naming, please use the following convention. The *provider* should be the +function name, and the *name* should be the name of the tracepoint. If you do +function entry and exit like probes, please use ``entry`` and ``exit``. + +If, for some reason, you do not like to use the function name as a *provider*, +please prefix it with ``__``, so we know it's not a function name. + +The remaining parameters, up to 10, can be variables, pointers, etc., that +might be of interest to capture at this point in the code. Note that the +provided variables can cause the compiler to be less effective in optimizing. + + + +.. _perf : https://developers.redhat.com/blog/2020/05/29/debugging-vhost-user-tx-contention-in-open-vswitch# +.. _bpftrace : https://github.com/iovisor/bpftrace +.. _bcc : https://github.com/iovisor/bcc +.. _Trace Compass : https://www.eclipse.org/tracecompass/ diff --git a/NEWS b/NEWS index 2a4856c1a..903fd2dc1 100644 --- a/NEWS +++ b/NEWS @@ -23,6 +23,7 @@ Post-v2.16.0 * Default selection method for select groups with up to 256 buckets is now dp_hash. Previously this was limited to 64 buckets. This change is mainly for the benefit of OVN load balancing configurations. + - Add User Statically-Defined Tracing (USDT) probe framework support. v2.16.0 - 16 Aug 2021 diff --git a/utilities/automake.mk b/utilities/automake.mk index e2e22c39a..55c7b0022 100644 --- a/utilities/automake.mk +++ b/utilities/automake.mk @@ -56,12 +56,13 @@ EXTRA_DIST += \ utilities/ovs-vlan-test.in \ utilities/ovs-vsctl-bashcomp.bash \ utilities/checkpatch.py \ - utilities/docker/Makefile \ - utilities/docker/ovs-override.conf \ - utilities/docker/start-ovs \ - utilities/docker/create_ovs_db.sh \ - utilities/docker/debian/Dockerfile \ - utilities/docker/debian/build-kernel-modules.sh + utilities/docker/Makefile \ + utilities/docker/ovs-override.conf \ + utilities/docker/start-ovs \ + utilities/docker/create_ovs_db.sh \ + utilities/docker/debian/Dockerfile \ + utilities/docker/debian/build-kernel-modules.sh \ + utilities/usdt_scripts/bridge_loop.bt MAN_ROOTS += \ utilities/ovs-testcontroller.8.in \ utilities/ovs-dpctl.8.in \ diff --git a/utilities/usdt_scripts/bridge_loop.bt b/utilities/usdt_scripts/bridge_loop.bt new file mode 100755 index 000000000..011626947 --- /dev/null +++ b/utilities/usdt_scripts/bridge_loop.bt @@ -0,0 +1,120 @@ +#!/usr/bin/env bpftrace +/* + * Copyright (c) 2021 Red Hat, Inc. + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at: + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * + * Script information: + * ------------------- + * bridge_loop.bt uses the USDTs in the ovs-vswitchd's main bridge loop + * to report how long it spends running a single loop, as well as the time + * it spends waiting in the poll_loop(). Once done, it will also print a + * histogram for the time spent. + * + * The following is an example of how to use the script on the running + * ovs-vswitchd process: + * + * $ bridge_loop.bt -p `pgrep -n ovs-vswitchd` + * Attaching 4 probes... + * -------------------------------------------------------------- + * Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end. + * -------------------------------------------------------------- + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000230706 + * - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.501854292 + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000266445 + * - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499750288 + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000254856 + * - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.499944280 + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000267390 + * - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.093566288 + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000316774 + * - [886467@ovs-vswitchd] poll_block() wait time: 0:00:00.406697754 + * - [886467@ovs-vswitchd] bridge run loop time : 0:00:00.000264505 + * ^C + * -------------------------------------------------------------- + * Showing run time histograms in micro seconds: + * -------------------------------------------------------------- + * + * @bridge_run_time: + * [0, 1000) 6 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| + * + * + * @poll_block_wait_time: + * [90000, 120000) 1 |@@@@@@@@@@@@@@@@@ | + * [120000, 150000) 0 | | + * [150000, 180000) 0 | | + * [180000, 210000) 0 | | + * [210000, 240000) 0 | | + * [240000, 270000) 0 | | + * [270000, 300000) 0 | | + * [300000, 330000) 0 | | + * [330000, 360000) 0 | | + * [360000, 390000) 0 | | + * [390000, 420000) 1 |@@@@@@@@@@@@@@@@@ | + * [420000, 450000) 0 | | + * [450000, 480000) 0 | | + * [480000, 510000) 3 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| + */ + + +BEGIN +{ + printf("--------------------------------------------------------------\n"); + printf("Tracing ovs-vswitchd's main() loop... Hit Ctrl-C to end.\n"); + printf("--------------------------------------------------------------\n"); +} + + +usdt::main:poll_block +{ + @pb_start[tid] = nsecs; + if (@rs_start[tid] != 0) { + $delta = nsecs - @rs_start[tid]; + printf("- [%d@%s] bridge run loop time : %u:%2.2u:%2.2u.%9.9u\n", + tid, comm, + $delta / 3600 / 1000000000, + $delta / 60 / 1000000000 % 60, + $delta / 1000000000 % 60, + $delta % 1000000000); + @bridge_run_time = lhist($delta / 1000, 0, 1000000, 1000); + } +} + + +usdt::main:run_start +{ + @rs_start[tid] = nsecs; + if (@pb_start[tid] != 0) { + $delta = nsecs - @pb_start[tid]; + printf("- [%d@%s] poll_block() wait time: %u:%2.2u:%2.2u.%9.9u\n", + tid, comm, + $delta / 3600 / 1000000000, + $delta / 60 / 1000000000 % 60, + $delta / 1000000000 % 60, + $delta % 1000000000); + @poll_block_wait_time = lhist($delta / 1000, 0, 30000000, 30000); + } +} + + +END +{ + clear(@rs_start); + clear(@pb_start); + + printf("\n"); + printf("--------------------------------------------------------------\n"); + printf("Showing run time histograms in micro seconds:\n"); + printf("--------------------------------------------------------------"); +} From patchwork Wed Dec 22 09:18:23 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1572100 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=gyrpRoqq; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=140.211.166.133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp2.osuosl.org (smtp2.osuosl.org [140.211.166.133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JJnnN2S0Xz9sPC for ; Wed, 22 Dec 2021 20:18:48 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 887CE40A8D; Wed, 22 Dec 2021 09:18:46 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WyCsXK_fY-Pz; Wed, 22 Dec 2021 09:18:44 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [140.211.9.56]) by smtp2.osuosl.org (Postfix) with ESMTPS id 400A640291; Wed, 22 Dec 2021 09:18:43 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 0BEB9C001E; Wed, 22 Dec 2021 09:18:43 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [IPv6:2605:bc80:3010::138]) by lists.linuxfoundation.org (Postfix) with ESMTP id EC09FC0012 for ; Wed, 22 Dec 2021 09:18:40 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id DAE3F83083 for ; Wed, 22 Dec 2021 09:18:40 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Tp2QjKx56n5A for ; Wed, 22 Dec 2021 09:18:39 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 2E90F813CF for ; Wed, 22 Dec 2021 09:18:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640164718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Vrvvo/Av6MtnoEyfP5xtN3JWyNU8nIqD8glpoDrtnzg=; b=gyrpRoqq5UW20WvzgTKwkZ8ywv7dFTg5JxvoVSsSMpy/8AcfYYd1QvUr8Cl+RrsN7vJu4g B7MbYjADj4BQ57lSebpk9DuVUqrxzzFds3vTVq0QqMsAYpGIdYFpwix6/YI/cw+3E9zeBH Q/WBq5Mo1vAKZKBfeEyOO7crmiggSUQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-368-TCs6MFTsOuCUzWP-BOlOVg-1; Wed, 22 Dec 2021 04:18:35 -0500 X-MC-Unique: TCs6MFTsOuCUzWP-BOlOVg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 06A0E5F9C5 for ; Wed, 22 Dec 2021 09:18:35 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.186]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D2973AA2; Wed, 22 Dec 2021 09:18:33 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Wed, 22 Dec 2021 10:18:23 +0100 Message-Id: <164016468106.146460.15037328983859324774.stgit@ebuild> In-Reply-To: <164016453900.146460.16964767898519268258.stgit@ebuild> References: <164016453900.146460.16964767898519268258.stgit@ebuild> User-Agent: StGit/1.1 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=echaudro@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 4/5] utilities: add upcall USDT probe and associated script X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Added the dpif_recv:recv_upcall USDT probe, which is used by the included upcall_monitor.py script. This script receives all upcall packets sent by the kernel to ovs-vswitchd. By default, it will show all upcall events, which looks something like this: TIME CPU COMM PID DPIF_NAME TYPE PKT_LEN FLOW_KEY_LEN 5952147.003848809 2 handler4 1381158 system@ovs-system 0 98 132 5952147.003879643 2 handler4 1381158 system@ovs-system 0 70 160 5952147.003914924 2 handler4 1381158 system@ovs-system 0 98 152 It can also dump the packet and NetLink content, and if required, the packets can also be written to a pcap file. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio --- v2: - Fixed path naming - Added --buffer-page-count option and event miss counter - Fixes some spelling problems Documentation/topics/usdt-probes.rst | 26 + lib/dpif.c | 23 + utilities/automake.mk | 6 utilities/usdt_scripts/upcall_monitor.py | 533 ++++++++++++++++++++++++++++++ 4 files changed, 578 insertions(+), 10 deletions(-) create mode 100755 utilities/usdt_scripts/upcall_monitor.py diff --git a/Documentation/topics/usdt-probes.rst b/Documentation/topics/usdt-probes.rst index 345f078dd..bf43bfcd0 100644 --- a/Documentation/topics/usdt-probes.rst +++ b/Documentation/topics/usdt-probes.rst @@ -200,10 +200,36 @@ used naming convention. Available probes in ``ovs_vswitchd``: +- dpif_recv:recv_upcall - main:poll_block - main:run_start +probe dpif_recv:recv_upcall +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Description**: + +This probe gets triggered when the datapath independent layer gets notified +that a packet needs to be processed by userspace. This allows the probe to +intercept all packets sent by the kernel to ``ovs-vswitchd``. The +``upcall_monitor.py`` script uses this probe to display and capture all packets +sent to ``ovs-vswitchd``. + +**Arguments**: + +- *arg0*: ``(struct dpif *)->full_name`` +- *arg1*: ``(struct dpif_upcall *)->type`` +- *arg2*: ``dp_packet_data((struct dpif_upcall *)->packet)`` +- *arg3*: ``dp_packet_size((struct dpif_upcall *)->packet)`` +- *arg4*: ``(struct dpif_upcall *)->key`` +- *arg5*: ``(struct dpif_upcall *)->key_len`` + +**Script references**: + +- ``utilities/usdt_scripts/upcall_monitor.py`` + + probe main:run_start ~~~~~~~~~~~~~~~~~~~~ diff --git a/lib/dpif.c b/lib/dpif.c index 38bcb47cb..ab75b3c17 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -15,7 +15,6 @@ */ #include -#include "dpif-provider.h" #include #include @@ -24,22 +23,19 @@ #include #include "coverage.h" -#include "dpctl.h" #include "dp-packet.h" +#include "dpctl.h" #include "dpif-netdev.h" -#include "openvswitch/dynamic-string.h" +#include "dpif-provider.h" #include "flow.h" +#include "netdev-provider.h" #include "netdev.h" #include "netlink.h" #include "odp-execute.h" #include "odp-util.h" -#include "openvswitch/ofp-print.h" -#include "openvswitch/ofpbuf.h" #include "packets.h" -#include "openvswitch/poll-loop.h" #include "route-table.h" #include "seq.h" -#include "openvswitch/shash.h" #include "sset.h" #include "timeval.h" #include "tnl-neigh-cache.h" @@ -47,9 +43,14 @@ #include "util.h" #include "uuid.h" #include "valgrind.h" +#include "openvswitch/dynamic-string.h" #include "openvswitch/ofp-errors.h" +#include "openvswitch/ofp-print.h" +#include "openvswitch/ofpbuf.h" +#include "openvswitch/poll-loop.h" +#include "openvswitch/shash.h" +#include "openvswitch/usdt_probes.h" #include "openvswitch/vlog.h" -#include "lib/netdev-provider.h" VLOG_DEFINE_THIS_MODULE(dpif); @@ -1602,6 +1603,12 @@ dpif_recv(struct dpif *dpif, uint32_t handler_id, struct dpif_upcall *upcall, if (dpif->dpif_class->recv) { error = dpif->dpif_class->recv(dpif, handler_id, upcall, buf); if (!error) { + OVS_USDT_PROBE(dpif_recv, recv_upcall, dpif->full_name, + upcall->type, + dp_packet_data(&upcall->packet), + dp_packet_size(&upcall->packet), + upcall->key, upcall->key_len); + dpif_print_packet(dpif, upcall); } else if (error != EAGAIN) { log_operation(dpif, "recv", error); diff --git a/utilities/automake.mk b/utilities/automake.mk index 55c7b0022..382f8e789 100644 --- a/utilities/automake.mk +++ b/utilities/automake.mk @@ -62,7 +62,8 @@ EXTRA_DIST += \ utilities/docker/create_ovs_db.sh \ utilities/docker/debian/Dockerfile \ utilities/docker/debian/build-kernel-modules.sh \ - utilities/usdt_scripts/bridge_loop.bt + utilities/usdt_scripts/bridge_loop.bt \ + utilities/usdt_scripts/upcall_monitor.py MAN_ROOTS += \ utilities/ovs-testcontroller.8.in \ utilities/ovs-dpctl.8.in \ @@ -129,6 +130,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ utilities/checkpatch.py utilities/ovs-dev.py \ utilities/ovs-check-dead-ifs.in \ utilities/ovs-tcpdump.in \ - utilities/ovs-pipegen.py + utilities/ovs-pipegen.py \ + utilities/usdt_scripts/upcall_monitor.py include utilities/bugtool/automake.mk diff --git a/utilities/usdt_scripts/upcall_monitor.py b/utilities/usdt_scripts/upcall_monitor.py new file mode 100755 index 000000000..d856547f0 --- /dev/null +++ b/utilities/usdt_scripts/upcall_monitor.py @@ -0,0 +1,533 @@ +#!/usr/bin/env python3 +# +# Copyright (c) 2021 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Script information: +# ------------------- +# upcall_monitor.py uses the dpif_recv:recv_upcall USDT to receive all upcall +# packets sent by the kernel to ovs-vswitchd. By default, it will show all +# upcall events, which looks something like this: +# +# TIME CPU COMM PID DPIF_NAME TYPE PKT_LEN... +# 5952147.003848809 2 handler4 1381158 system@ovs-system 0 98 132 +# 5952147.003879643 2 handler4 1381158 system@ovs-system 0 70 160 +# 5952147.003914924 2 handler4 1381158 system@ovs-system 0 98 152 +# +# In addition, the packet and flow key data can be dumped. This can be done +# using the --packet-decode and --flow-key decode options (see below). +# +# Note that by default only 64 bytes of the packet and flow key are retrieved. +# If you would like to capture all or more of the packet and/or flow key data, +# the ----packet-size and --flow-key-size options can be used. +# +# If required, the received packets can also be stored in a pcap file using the +# --pcap option. +# +# The following are the available options: +# +# usage: upcall_monitor.py [-h] [-D [DEBUG]] [-d {none,hex,decode}] +# [-f [64-2048]] [-k {none,hex,nlraw}] +# [-p VSWITCHD_PID] [-s [64-2048]] [-w PCAP_FILE] +# +# optional arguments: +# -h, --help show this help message and exit +# -D [DEBUG], --debug [DEBUG] +# Enable eBPF debugging +# -d {none,hex,decode}, --packet-decode {none,hex,decode} +# Display packet content in selected mode, +# default none +# -f [64-2048], --flow-key-size [64-2048] +# Set maximum flow key size to capture, default 64 +# -k {none,hex,nlraw}, --flow-key-decode {none,hex,nlraw} +# Display flow-key content in selected mode, default +# none +# -p VSWITCHD_PID, --pid VSWITCHD_PID +# ovs-vswitch's PID +# -s [64-2048], --packet-size [64-2048] +# Set maximum packet size to capture, default 64 +# -w PCAP_FILE, --pcap PCAP_FILE +# Write upcall packets to specified pcap file. +# +# The following is an example of how to use the script on the running +# ovs-vswitchd process with a packet and flow key dump enabled: +# +# $ ./upcall_monitor.py --packet-decode decode --flow-key-decode nlraw \ +# --packet-size 128 --flow-key-size 256 +# TIME CPU COMM PID DPIF_NAME ... +# 5953013.333214231 2 handler4 1381158 system@ovs-system ... +# Flow key size 132 bytes, size captured 132 bytes. +# nla_len 8, nla_type OVS_KEY_ATTR_RECIRC_ID[20], data: 00 00 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_DP_HASH[19], data: 00 00 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_IN_PORT[3], data: 02 00 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_CT_STATE[22], data: 00 00 00 00 +# nla_len 6, nla_type OVS_KEY_ATTR_CT_ZONE[23], data: 00 00 +# nla_len 8, nla_type OVS_KEY_ATTR_CT_MARK[24], data: 00 00 00 00 +# nla_len 20, nla_type OVS_KEY_ATTR_CT_LABELS[25], data: 00 00 00 00 ... +# nla_len 16, nla_type OVS_KEY_ATTR_ETHERNET[4], data: 04 f4 bc 28 57 ... +# nla_len 6, nla_type OVS_KEY_ATTR_ETHERTYPE[6], data: 08 00 +# nla_len 16, nla_type OVS_KEY_ATTR_IPV4[7], data: 01 01 01 64 01 01 ... +# nla_len 6, nla_type OVS_KEY_ATTR_ICMP[11], data: 00 00 +# 1: Receive dp_port 2, packet size 98 bytes, size captured 98 bytes. +# ###[ Ethernet ]### +# dst = 3c:fd:fe:9e:7f:68 +# src = 04:f4:bc:28:57:01 +# type = IPv4 +# ###[ IP ]### +# version = 4 +# ihl = 5 +# tos = 0x0 +# len = 84 +# id = 41404 +# flags = DF +# frag = 0 +# ttl = 64 +# proto = icmp +# chksum = 0x940c +# src = 1.1.1.100 +# dst = 1.1.1.123 +# \options \ +# ###[ ICMP ]### +# type = echo-reply +# code = 0 +# chksum = 0x2f55 +# id = 0x90e6 +# seq = 0x1 +# ###[ Raw ]### +# load = 'GBTa\x00\x00\x00\x00\xd8L\r\x00\x00\x00\x00\... +# + +from bcc import BPF, USDT, USDTException +from os.path import exists +from scapy.all import hexdump, wrpcap +from scapy.layers.l2 import Ether + +import argparse +import psutil +import re +import struct +import sys +import time + +# +# Actual eBPF source code +# +ebpf_source = """ +#include + +#define MAX_PACKET +#define MAX_KEY + +struct event_t { + u32 cpu; + u32 pid; + u32 upcall_type; + u64 ts; + u32 pkt_size; + u64 key_size; + char comm[TASK_COMM_LEN]; + char dpif_name[32]; + unsigned char pkt[MAX_PACKET]; + unsigned char key[MAX_KEY]; +}; +BPF_RINGBUF_OUTPUT(events, ); +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1); + +int do_trace(struct pt_regs *ctx) { + uint64_t addr; + uint64_t size; + + struct event_t *event = events.ringbuf_reserve(sizeof(struct event_t)); + if (!event) { + uint32_t type = 0; + uint64_t *value = dropcnt.lookup(&type); + if (value) + __sync_fetch_and_add(value, 1); + + return 1; + } + + event->ts = bpf_ktime_get_ns(); + event->cpu = bpf_get_smp_processor_id(); + event->pid = bpf_get_current_pid_tgid(); + bpf_get_current_comm(&event->comm, sizeof(event->comm)); + + bpf_usdt_readarg(1, ctx, &addr); + bpf_probe_read_str(&event->dpif_name, sizeof(event->dpif_name), + (void *)addr); + + bpf_usdt_readarg(2, ctx, &event->upcall_type); + bpf_usdt_readarg(4, ctx, &event->pkt_size); + bpf_usdt_readarg(6, ctx, &event->key_size); + + if (event->pkt_size > MAX_PACKET) + size = MAX_PACKET; + else + size = event->pkt_size; + bpf_usdt_readarg(3, ctx, &addr); + bpf_probe_read(&event->pkt, size, (void *)addr); + + if (event->key_size > MAX_KEY) + size = MAX_KEY; + else + size = event->key_size; + bpf_usdt_readarg(5, ctx, &addr); + bpf_probe_read(&event->key, size, (void *)addr); + + events.ringbuf_submit(event, 0); + return 0; +}; +""" + + +# +# print_event() +# +def print_event(ctx, data, size): + event = b['events'].event(data) + print("{:<18.9f} {:<4} {:<16} {:<10} {:<32} {:<4} {:<10} {:<10}". + format(event.ts / 1000000000, + event.cpu, + event.comm.decode("utf-8"), + event.pid, + event.dpif_name.decode("utf-8"), + event.upcall_type, + event.pkt_size, + event.key_size)) + + # + # Dump flow key information + # + if event.key_size < options.flow_key_size: + key_len = event.key_size + else: + key_len = options.flow_key_size + + if options.flow_key_decode != 'none': + print(" Flow key size {} bytes, size captured {} bytes.". + format(event.key_size, key_len)) + + if options.flow_key_decode == 'hex': + # + # Abuse scapy's hex dump to dump flow key + # + print(re.sub('^', ' ' * 4, hexdump(Ether(bytes(event.key)[:key_len]), + dump=True), + flags=re.MULTILINE)) + + if options.flow_key_decode == 'nlraw': + nla = decode_nlm(bytes(event.key)[:key_len]) + else: + nla = decode_nlm(bytes(event.key)[:key_len], dump=False) + + if "OVS_KEY_ATTR_IN_PORT" in nla: + port = struct.unpack("=I", nla["OVS_KEY_ATTR_IN_PORT"])[0] + else: + port = "Unknown" + + # + # Decode packet only if there is data + # + if event.pkt_size <= 0: + return + + pkt_id = get_pkt_id() + + if event.pkt_size < options.packet_size: + pkt_len = event.pkt_size + pkt_data = bytes(event.pkt)[:event.pkt_size] + else: + pkt_len = options.packet_size + pkt_data = bytes(event.pkt) + + if options.packet_decode != 'none' or options.pcap is not None: + print(" {}: Receive dp_port {}, packet size {} bytes, size " + "captured {} bytes.".format(pkt_id, port, event.pkt_size, + pkt_len)) + + if options.packet_decode == 'hex': + print(re.sub('^', ' ' * 4, hexdump(pkt_data, dump=True), + flags=re.MULTILINE)) + + packet = Ether(pkt_data) + packet.wirelen = event.pkt_size + + if options.packet_decode == 'decode': + print(re.sub('^', ' ' * 4, packet.show(dump=True), flags=re.MULTILINE)) + + if options.pcap is not None: + wrpcap(options.pcap, packet, append=True, snaplen=options.packet_size) + + +# +# decode_nlm() +# +def decode_nlm(msg, indent=4, dump=True): + bytes_left = len(msg) + result = {} + + while bytes_left: + if bytes_left < 4: + if dump: + print("{}WARN: decode truncated; can't read header".format( + ' ' * indent)) + break + + nla_len, nla_type = struct.unpack("=HH", msg[:4]) + + if nla_len < 4: + if dump: + print("{}WARN: decode truncated; nla_len < 4".format( + ' ' * indent)) + break + + nla_data = msg[4:nla_len] + trunc = "" + + if nla_len > bytes_left: + trunc = "..." + nla_data = nla_data[:(bytes_left - 4)] + else: + result[get_ovs_key_attr_str(nla_type)] = nla_data + + if dump: + print("{}nla_len {}, nla_type {}[{}], data: {}{}".format( + ' ' * indent, nla_len, get_ovs_key_attr_str(nla_type), + nla_type, + "".join("{:02x} ".format(b) for b in nla_data), trunc)) + + if trunc != "": + if dump: + print("{}WARN: decode truncated; nla_len > msg_len[{}] ". + format(' ' * indent, bytes_left)) + break + + # update next offset, but make sure it's aligned correctly + next_offset = (nla_len + 3) & ~(3) + msg = msg[next_offset:] + bytes_left -= next_offset + + return result + + +# +# get_ovs_key_attr_str() +# +def get_ovs_key_attr_str(attr): + ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC", + "OVS_KEY_ATTR_ENCAP", + "OVS_KEY_ATTR_PRIORITY", + "OVS_KEY_ATTR_IN_PORT", + "OVS_KEY_ATTR_ETHERNET", + "OVS_KEY_ATTR_VLAN", + "OVS_KEY_ATTR_ETHERTYPE", + "OVS_KEY_ATTR_IPV4", + "OVS_KEY_ATTR_IPV6", + "OVS_KEY_ATTR_TCP", + "OVS_KEY_ATTR_UDP", + "OVS_KEY_ATTR_ICMP", + "OVS_KEY_ATTR_ICMPV6", + "OVS_KEY_ATTR_ARP", + "OVS_KEY_ATTR_ND", + "OVS_KEY_ATTR_SKB_MARK", + "OVS_KEY_ATTR_TUNNEL", + "OVS_KEY_ATTR_SCTP", + "OVS_KEY_ATTR_TCP_FLAGS", + "OVS_KEY_ATTR_DP_HASH", + "OVS_KEY_ATTR_RECIRC_ID", + "OVS_KEY_ATTR_MPLS", + "OVS_KEY_ATTR_CT_STATE", + "OVS_KEY_ATTR_CT_ZONE", + "OVS_KEY_ATTR_CT_MARK", + "OVS_KEY_ATTR_CT_LABELS", + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4", + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6", + "OVS_KEY_ATTR_NSH"] + + if attr < 0 or attr > len(ovs_key_attr): + return "" + + return ovs_key_attr[attr] + + +# +# get_pkt_id() +# +def get_pkt_id(): + if not hasattr(get_pkt_id, "counter"): + get_pkt_id.counter = 0 + get_pkt_id.counter += 1 + return get_pkt_id.counter + + +# +# buffer_size_type() +# +def buffer_size_type(astr, min=64, max=2048): + value = int(astr) + if min <= value <= max: + return value + else: + raise argparse.ArgumentTypeError( + 'value not in range {}-{}'.format(min, max)) + + +# +# next_power_of_two() +# +def next_power_of_two(val): + np = 1 + while np < val: + np *= 2 + return np + + +# +# main() +# +def main(): + # + # Don't like these globals, but ctx passing does not seem to work with the + # existing open_ring_buffer() API :( + # + global b + global options + + # + # Argument parsing + # + parser = argparse.ArgumentParser() + + parser.add_argument("--buffer-page-count", + help="Number of BPF ring buffer pages, default 1024", + type=int, default=1024, metavar="NUMBER") + parser.add_argument("-D", "--debug", + help="Enable eBPF debugging", + type=int, const=0x3f, default=0, nargs='?') + parser.add_argument('-d', '--packet-decode', + help='Display packet content in selected mode, ' + 'default none', + choices=['none', 'hex', 'decode'], default='none') + parser.add_argument("-f", "--flow-key-size", + help="Set maximum flow key size to capture, " + "default 64", type=buffer_size_type, default=64, + metavar="[64-2048]") + parser.add_argument('-k', '--flow-key-decode', + help='Display flow-key content in selected mode, ' + 'default none', + choices=['none', 'hex', 'nlraw'], default='none') + parser.add_argument("-p", "--pid", metavar="VSWITCHD_PID", + help="ovs-vswitch's PID", + type=int, default=None) + parser.add_argument("-s", "--packet-size", + help="Set maximum packet size to capture, " + "default 64", type=buffer_size_type, default=64, + metavar="[64-2048]") + parser.add_argument("-w", "--pcap", metavar="PCAP_FILE", + help="Write upcall packets to specified pcap file.", + type=str, default=None) + + options = parser.parse_args() + + # + # Find the PID of the ovs-vswitchd daemon if not specified. + # + if options.pid is None: + for proc in psutil.process_iter(): + if 'ovs-vswitchd' in proc.name(): + if options.pid is not None: + print("ERROR: Multiple ovs-vswitchd daemons running, " + "use the -p option!") + sys.exit(-1) + + options.pid = proc.pid + + # + # Error checking on input parameters + # + if options.pid is None: + print("ERROR: Failed to find ovs-vswitchd's PID!") + sys.exit(-1) + + if options.pcap is not None: + if exists(options.pcap): + print("ERROR: Destination capture file \"{}\" already exists!". + format(options.pcap)) + sys.exit(-1) + + options.buffer_page_count = next_power_of_two(options.buffer_page_count) + + # + # Attach the usdt probe + # + u = USDT(pid=int(options.pid)) + try: + u.enable_probe(probe="recv_upcall", fn_name="do_trace") + except USDTException as e: + print("ERROR: {}" + "ovs-vswitchd!".format( + (re.sub('^', ' ' * 7, str(e), flags=re.MULTILINE)).strip(). + replace("--with-dtrace or --enable-dtrace", + "--enable-usdt-probes"))) + sys.exit(-1) + + # + # Uncomment to see how arguments are decoded. + # print(u.get_text()) + # + + # + # Attach probe to running process + # + source = ebpf_source.replace("", str(options.packet_size)) + source = source.replace("", str(options.flow_key_size)) + source = source.replace("", + str(options.buffer_page_count)) + + b = BPF(text=source, usdt_contexts=[u], debug=options.debug) + + # + # Print header + # + print("{:<18} {:<4} {:<16} {:<10} {:<32} {:<4} {:<10} {:<10}".format( + "TIME", "CPU", "COMM", "PID", "DPIF_NAME", "TYPE", "PKT_LEN", + "FLOW_KEY_LEN")) + + # + # Dump out all events + # + b['events'].open_ring_buffer(print_event) + while 1: + try: + b.ring_buffer_poll() + time.sleep(0.5) + except KeyboardInterrupt: + break + + dropcnt = b.get_table("dropcnt") + for k in dropcnt.keys(): + count = dropcnt.sum(k).value + if k.value == 0 and count > 0: + print("\nWARNING: Not all upcalls were captured, {} were dropped!" + "\n Increase the BPF ring buffer size with the " + "--buffer-page-count option.".format(count)) + + +# +# Start main() as the default entry point... +# +if __name__ == '__main__': + main() From patchwork Wed Dec 22 09:19:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Eelco Chaudron X-Patchwork-Id: 1572102 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: bilbo.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=GeHjdYgH; dkim-atps=neutral Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::136; helo=smtp3.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JJnpG2XM7z9s1l for ; Wed, 22 Dec 2021 20:19:34 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id E90EF60F9A; Wed, 22 Dec 2021 09:19:31 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IFgAABXxfxYA; Wed, 22 Dec 2021 09:19:29 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp3.osuosl.org (Postfix) with ESMTPS id 6F07360593; Wed, 22 Dec 2021 09:19:28 +0000 (UTC) Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 45AC2C001E; Wed, 22 Dec 2021 09:19:28 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp1.osuosl.org (smtp1.osuosl.org [140.211.166.138]) by lists.linuxfoundation.org (Postfix) with ESMTP id 36DFEC0012 for ; Wed, 22 Dec 2021 09:19:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp1.osuosl.org (Postfix) with ESMTP id 0D7DE83077 for ; Wed, 22 Dec 2021 09:19:27 +0000 (UTC) X-Virus-Scanned: amavisd-new at osuosl.org Authentication-Results: smtp1.osuosl.org (amavisd-new); dkim=pass (1024-bit key) header.d=redhat.com Received: from smtp1.osuosl.org ([127.0.0.1]) by localhost (smtp1.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id oRqdhnmEPkEo for ; Wed, 22 Dec 2021 09:19:24 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by smtp1.osuosl.org (Postfix) with ESMTPS id 5FD6C812BF for ; Wed, 22 Dec 2021 09:19:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1640164763; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CQIi0c1o4/hfwTFI58ZMGP5279Lzy6kogNMX8IVfdRE=; b=GeHjdYgH5+xXABaxeSPM2WTETbkOxuFklS3pWlPne0AWLAWkAKABaJ4Ae8usykIu12Gkhu AOIbohTeA4g2TosyRQJtX12iMatxj1QbagfQXeO8oq6qAX+AYh6UbPEFQIc5cy9pchwOml 2XUYDBiWaGeNDrQpVpKLBVSRB1DX4CU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-614-9nA8RduhPgGF3xJv_aVqPQ-1; Wed, 22 Dec 2021 04:19:20 -0500 X-MC-Unique: 9nA8RduhPgGF3xJv_aVqPQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id D508F102CB73 for ; Wed, 22 Dec 2021 09:19:19 +0000 (UTC) Received: from localhost.localdomain (unknown [10.39.194.186]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9AAF16AB8B; Wed, 22 Dec 2021 09:19:18 +0000 (UTC) From: Eelco Chaudron To: dev@openvswitch.org Date: Wed, 22 Dec 2021 10:19:02 +0100 Message-Id: <164016472025.146460.11968502759020410592.stgit@ebuild> In-Reply-To: <164016453900.146460.16964767898519268258.stgit@ebuild> References: <164016453900.146460.16964767898519268258.stgit@ebuild> User-Agent: StGit/1.1 MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=echaudro@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Subject: [ovs-dev] [PATCH v2 5/5] utilities: add netlink flow operation USDT probes and upcall_cost script X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" This patch adds a series of NetLink flow operation USDT probes. These probes are in turn used in the upcall_cost Python script, which in addition of some kernel tracepoints, give an insight into the time spent on processing upcall. Signed-off-by: Eelco Chaudron Acked-by: Paolo Valerio --- v2: - Fixed path naming - Fixed some typos - Updated script header Documentation/topics/usdt-probes.rst | 86 ++ lib/dpif-netlink.c | 16 utilities/automake.mk | 4 utilities/usdt_scripts/upcall_cost.py | 1787 +++++++++++++++++++++++++++++++++ 4 files changed, 1892 insertions(+), 1 deletion(-) create mode 100755 utilities/usdt_scripts/upcall_cost.py diff --git a/Documentation/topics/usdt-probes.rst b/Documentation/topics/usdt-probes.rst index bf43bfcd0..a053651ff 100644 --- a/Documentation/topics/usdt-probes.rst +++ b/Documentation/topics/usdt-probes.rst @@ -200,11 +200,96 @@ used naming convention. Available probes in ``ovs_vswitchd``: +- dpif_netlink_operate\_\_:op_flow_del +- dpif_netlink_operate\_\_:op_flow_execute +- dpif_netlink_operate\_\_:op_flow_get +- dpif_netlink_operate\_\_:op_flow_put - dpif_recv:recv_upcall - main:poll_block - main:run_start +dpif_netlink_operate\_\_:op_flow_del +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Description**: + +This probe gets triggered when the Netlink datapath is about to execute the +DPIF_OP_FLOW_DEL operation as part of the dpif ``operate()`` callback. + +**Arguments**: + +- *arg0*: ``(struct dpif_netlink *) dpif`` +- *arg1*: ``(struct dpif_flow_del *) del`` +- *arg2*: ``(struct dpif_netlink_flow *) flow`` +- *arg3*: ``(struct ofpbuf *) aux->request`` + +**Script references**: + +- *None* + + +dpif_netlink_operate\_\_:op_flow_execute +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Description**: + +This probe gets triggered when the Netlink datapath is about to execute the +DPIF_OP_FLOW_EXECUTE operation as part of the dpif ``operate()`` callback. + +**Arguments**: + +- *arg0*: ``(struct dpif_netlink *) dpif`` +- *arg1*: ``(struct dpif_execute *) op->execute`` +- *arg2*: ``dp_packet_data(op->execute.packet)`` +- *arg3*: ``dp_packet_size(op->execute.packet)`` +- *arg4*: ``(struct ofpbuf *) aux->request`` + +**Script references**: + +- ``utilities/usdt_scripts/upcall_cost.py`` + + +dpif_netlink_operate\_\_:op_flow_get +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Description**: + +This probe gets triggered when the Netlink datapath is about to execute the +DPIF_OP_FLOW_GET operation as part of the dpif ``operate()`` callback. + +**Arguments**: + +- *arg0*: ``(struct dpif_netlink *) dpif`` +- *arg1*: ``(struct dpif_flow_get *) get`` +- *arg2*: ``(struct dpif_netlink_flow *) flow`` +- *arg3*: ``(struct ofpbuf *) aux->request`` + +**Script references**: + +- *None* + + +dpif_netlink_operate\_\_:op_flow_put +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +**Description**: + +This probe gets triggered when the Netlink datapath is about to execute the +DPIF_OP_FLOW_PUT operation as part of the dpif ``operate()`` callback. + +**Arguments**: + +- *arg0*: ``(struct dpif_netlink *) dpif`` +- *arg1*: ``(struct dpif_flow_put *) put`` +- *arg2*: ``(struct dpif_netlink_flow *) flow`` +- *arg3*: ``(struct ofpbuf *) aux->request`` + +**Script references**: + +- ``utilities/usdt_scripts/upcall_cost.py`` + + probe dpif_recv:recv_upcall ~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -227,6 +312,7 @@ sent to ``ovs-vswitchd``. **Script references**: +- ``utilities/usdt_scripts/upcall_cost.py`` - ``utilities/usdt_scripts/upcall_monitor.py`` diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c index 424a28401..52c229568 100644 --- a/lib/dpif-netlink.c +++ b/lib/dpif-netlink.c @@ -56,6 +56,7 @@ #include "openvswitch/poll-loop.h" #include "openvswitch/shash.h" #include "openvswitch/thread.h" +#include "openvswitch/usdt_probes.h" #include "openvswitch/vlog.h" #include "packets.h" #include "random.h" @@ -2052,6 +2053,9 @@ dpif_netlink_operate__(struct dpif_netlink *dpif, aux->txn.reply = &aux->reply; } dpif_netlink_flow_to_ofpbuf(&flow, &aux->request); + + OVS_USDT_PROBE(dpif_netlink_operate__, op_flow_put, + dpif, put, &flow, &aux->request); break; case DPIF_OP_FLOW_DEL: @@ -2062,6 +2066,9 @@ dpif_netlink_operate__(struct dpif_netlink *dpif, aux->txn.reply = &aux->reply; } dpif_netlink_flow_to_ofpbuf(&flow, &aux->request); + + OVS_USDT_PROBE(dpif_netlink_operate__, op_flow_del, + dpif, del, &flow, &aux->request); break; case DPIF_OP_EXECUTE: @@ -2082,6 +2089,12 @@ dpif_netlink_operate__(struct dpif_netlink *dpif, } else { dpif_netlink_encode_execute(dpif->dp_ifindex, &op->execute, &aux->request); + + OVS_USDT_PROBE(dpif_netlink_operate__, op_flow_execute, + dpif, &op->execute, + dp_packet_data(op->execute.packet), + dp_packet_size(op->execute.packet), + &aux->request); } break; @@ -2090,6 +2103,9 @@ dpif_netlink_operate__(struct dpif_netlink *dpif, dpif_netlink_init_flow_get(dpif, get, &flow); aux->txn.reply = get->buffer; dpif_netlink_flow_to_ofpbuf(&flow, &aux->request); + + OVS_USDT_PROBE(dpif_netlink_operate__, op_flow_get, + dpif, get, &flow, &aux->request); break; default: diff --git a/utilities/automake.mk b/utilities/automake.mk index 382f8e789..7d0bddcd8 100644 --- a/utilities/automake.mk +++ b/utilities/automake.mk @@ -63,6 +63,7 @@ EXTRA_DIST += \ utilities/docker/debian/Dockerfile \ utilities/docker/debian/build-kernel-modules.sh \ utilities/usdt_scripts/bridge_loop.bt \ + utilities/usdt_scripts/upcall_cost.py \ utilities/usdt_scripts/upcall_monitor.py MAN_ROOTS += \ utilities/ovs-testcontroller.8.in \ @@ -131,6 +132,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \ utilities/ovs-check-dead-ifs.in \ utilities/ovs-tcpdump.in \ utilities/ovs-pipegen.py \ - utilities/usdt_scripts/upcall_monitor.py + utilities/usdt_scripts/upcall_monitor.py \ + utilities/usdt_scripts/upcall_cost.py include utilities/bugtool/automake.mk diff --git a/utilities/usdt_scripts/upcall_cost.py b/utilities/usdt_scripts/upcall_cost.py new file mode 100755 index 000000000..765669585 --- /dev/null +++ b/utilities/usdt_scripts/upcall_cost.py @@ -0,0 +1,1787 @@ +#!/usr/bin/env python3 +# +# Copyright (c) 2021 Red Hat, Inc. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at: +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Script information: +# ------------------- +# upcall_cost.py uses various user space and kernel space probes to determine +# the costs (in time) for handling the first packet in user space. It +# calculates the following costs: +# +# - Time it takes from the kernel sending the upcall till it's received by the +# ovs-vswitchd process. +# - Time it takes from ovs-vswitchd sending the execute actions command till +# the kernel receives it. +# - The total time it takes from the kernel to sent the upcall until it +# receives the packet execute command. +# - The total time of the above, minus the time it takes for the actual lookup. +# +# In addition, it will also report the number of packets batched, as OVS will +# first try to read UPCALL_MAX_BATCH(64) packets from kernel space and then +# does the flow lookups and execution. So the smaller the batch size, the more +# realistic are the cost estimates. +# +# The script does not need any options to attach to a running instance of +# ovs-vswitchd. However, it's recommended always run the script with the +# --write-events option. This way, if something does go wrong, the collected +# data is saved. Use the --help option to see all the available options. +# +# Note: In addition to the bcc tools for your specific setup, you need the +# following Python packages: +# pip install alive-progress halo psutil scapy strenum text_histogram3 +# + +try: + from bcc import BPF, USDT, USDTException +except ModuleNotFoundError: + print("WARNING: Can't find the BPF Compiler Collection (BCC) tools!") + print(" This is NOT problem if you analyzing previously collected" + " data.\n") +from alive_progress import alive_bar +from collections import namedtuple +from halo import Halo +from scapy.all import TCP, UDP +from scapy.layers.l2 import Ether +from strenum import StrEnum +from text_histogram3 import histogram +from time import process_time + +import argparse +import ast +import psutil +import re +import struct +import subprocess +import sys +import time + +# +# Global definitions +# +DP_TUNNEL_PORT = -1 + + +# +# Actual eBPF source code +# +ebpf_source = """ +#include +#include +#include + +#define MAX_PACKET +#define MAX_KEY + +enum { + EVENT_RECV_UPCALL = 0, + EVENT_DP_UPCALL, + EVENT_OP_FLOW_PUT, + EVENT_OP_FLOW_EXECUTE, + EVENT_OVS_PKT_EXEC, + _EVENT_MAX_EVENT +}; + +#define barrier_var(var) asm volatile("" : "=r"(var) : "0"(var)) + +struct event_t { + u32 event; + u32 cpu; + u32 pid; + u32 upcall_type; + u64 ts; + u32 pkt_frag_size; + u32 pkt_size; + u64 key_size; + char comm[TASK_COMM_LEN]; + char dpif_name[32]; + char dev_name[16]; + unsigned char pkt[MAX_PACKET]; + unsigned char key[MAX_KEY]; +}; + +BPF_RINGBUF_OUTPUT(events, ); +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, _EVENT_MAX_EVENT); + +static struct event_t *init_event(u32 type) +{ + struct event_t *event = events.ringbuf_reserve(sizeof(struct event_t)); + + if (!event) { + uint64_t *value = dropcnt.lookup(&type); + if (value) + __sync_fetch_and_add(value, 1); + + return NULL; + } + + event->event = type; + event->ts = bpf_ktime_get_ns(); + event->cpu = bpf_get_smp_processor_id(); + event->pid = bpf_get_current_pid_tgid(); + bpf_get_current_comm(&event->comm, sizeof(event->comm)); + + return event; +} + +int trace__recv_upcall(struct pt_regs *ctx) { + uint32_t upcall_type; + uint64_t addr; + uint64_t size; + + bpf_usdt_readarg(2, ctx, &upcall_type); + if (upcall_type != 0) + return 0; + + struct event_t *event = init_event(EVENT_RECV_UPCALL); + if (!event) + return 1; + + bpf_usdt_readarg(1, ctx, &addr); + bpf_probe_read_str(&event->dpif_name, sizeof(event->dpif_name), + (void *)addr); + + event->upcall_type = upcall_type; + bpf_usdt_readarg(4, ctx, &event->pkt_size); + bpf_usdt_readarg(6, ctx, &event->key_size); + + if (event->pkt_size > MAX_PACKET) + size = MAX_PACKET; + else + size = event->pkt_size; + bpf_usdt_readarg(3, ctx, &addr); + bpf_probe_read(&event->pkt, size, (void *)addr); + + if (event->key_size > MAX_KEY) + size = MAX_KEY; + else + size = event->key_size; + bpf_usdt_readarg(5, ctx, &addr); + bpf_probe_read(&event->key, size, (void *)addr); + + events.ringbuf_submit(event, 0); + return 0; +}; + + +int trace__op_flow_put(struct pt_regs *ctx) { + uint64_t addr; + uint64_t size; + + struct event_t *event = init_event(EVENT_OP_FLOW_PUT); + if (!event) { + return 1; + } + + events.ringbuf_submit(event, 0); + return 0; +}; + + +int trace__op_flow_execute(struct pt_regs *ctx) { + uint64_t addr; + uint64_t size; + + struct event_t *event = init_event(EVENT_OP_FLOW_EXECUTE); + if (!event) { + return 1; + } + + bpf_usdt_readarg(4, ctx, &event->pkt_size); + + if (event->pkt_size > MAX_PACKET) + size = MAX_PACKET; + else + size = event->pkt_size; + bpf_usdt_readarg(3, ctx, &addr); + bpf_probe_read(&event->pkt, size, (void *)addr); + + events.ringbuf_submit(event, 0); + return 0; +}; + + +TRACEPOINT_PROBE(openvswitch, ovs_dp_upcall) { + uint64_t size; + struct sk_buff *skb = args->skbaddr; + + if (args->upcall_cmd != 1 || skb == NULL || skb->data == NULL) + return 0; + + struct event_t *event = init_event(EVENT_DP_UPCALL); + if (!event) { + return 1; + } + + event->upcall_type = args->upcall_cmd; + event->pkt_size = args->len; + + TP_DATA_LOC_READ_CONST(&event->dpif_name, dp_name, + sizeof(event->dpif_name)); + TP_DATA_LOC_READ_CONST(&event->dev_name, dev_name, + sizeof(event->dev_name)); + + if (skb->data_len != 0) { + event->pkt_frag_size = (skb->len - skb->data_len) & 0xfffffff; + size = event->pkt_frag_size; + } else { + event->pkt_frag_size = 0; + size = event->pkt_size; + } + + /* Prevent clang from using register mirroring (or any optimization) on + * the 'size' variable. */ + barrier_var(size); + + if (size > MAX_PACKET) + size = MAX_PACKET; + bpf_probe_read_kernel(event->pkt, size, skb->data); + + events.ringbuf_submit(event, 0); + return 0; +} + +int kprobe__ovs_packet_cmd_execute(struct pt_regs *ctx, struct sk_buff *skb) +{ + uint64_t size; + + if (skb == NULL || skb->data == NULL) + return 0; + + struct event_t *event = init_event(EVENT_OVS_PKT_EXEC); + if (!event) { + return 1; + } + + events.ringbuf_submit(event, 0); + return 0; +} +""" + + +# +# Event types +# +class EventType(StrEnum): + RECV_UPCALL = 'dpif_recv__recv_upcall' + DP_UPCALL = 'openvswitch__dp_upcall' + OP_FLOW_PUT = 'dpif_netlink_operate__op_flow_put' + OP_FLOW_EXECUTE = 'dpif_netlink_operate__op_flow_execute' + OVS_PKT_EXEC = 'ktrace__ovs_packet_cmd_execute' + + def short_name(name, length=22): + if len(name) < length: + return name + + return '..' + name[-(length - 2):] + + def from_trace(trace_event): + if trace_event == 0: + return EventType.RECV_UPCALL + elif trace_event == 1: + return EventType.DP_UPCALL + elif trace_event == 2: + return EventType.OP_FLOW_PUT + elif trace_event == 3: + return EventType.OP_FLOW_EXECUTE + elif trace_event == 4: + return EventType.OVS_PKT_EXEC + + raise ValueError + + +# +# Simple event class +# +class Event(object): + def __init__(self, ts, pid, comm, cpu, event_type): + self.ts = ts + self.pid = pid + self.comm = comm + self.cpu = cpu + self.event_type = event_type + + def __str__(self): + return "[{:<22}] {:<16} {:8} [{:03}] {:18.9f}".format( + EventType.short_name(self.event_type), + self.comm, + self.pid, + self.cpu, + self.ts / 1000000000) + + def __repr__(self): + more = "" + if self.__class__.__name__ != "Event": + more = ", ..." + + return "{}({}, {}, {}, {}, {}{})".format(self.__class__.__name__, + self.ts, self.pid, + self.comm, self.cpu, + self.event_type, more) + + def handle_event(event): + event = Event(event.ts, event.pid, event.comm.decode("utf-8"), + event.cpu, EventType.from_trace(event.event)) + + if not options.quiet: + print(event) + + return event + + def get_event_header_str(): + return "{:<24} {:<16} {:>8} {:<3} {:<18} {}".format( + "EVENT", "COMM", "PID", "CPU", "TIME", + "EVENT DATA[dpif_name/dp_port/pkt_len/pkt_frag_len]") + + +# +# dp_upcall event class +# +class DpUpcall(Event): + def __init__(self, ts, pid, comm, cpu, dpif_name, port, pkt, pkt_len, + pkt_frag_len): + super(DpUpcall, self).__init__(ts, pid, comm, cpu, EventType.DP_UPCALL) + self.dpif_name = dpif_name + self.dp_port = get_dp_mapping(dpif_name, port) + if self.dp_port is None: + # + # As we only identify interfaces at startup, new interfaces could + # have been added, causing the lookup to fail. Just something to + # keep in mind when running this in a dynamic environment. + # + raise LookupError("Can't find datapath port mapping!") + self.pkt = pkt + self.pkt_len = pkt_len + self.pkt_frag_len = pkt_frag_len + + def __str__(self): + return "[{:<22}] {:<16} {:8} [{:03}] {:18.9f}: " \ + "{:<17} {:4} {:4} {:4}".format(self.event_type, + self.comm, + self.pid, + self.cpu, + self.ts / 1000000000, + self.dpif_name, + self.dp_port, + self.pkt_len, + self.pkt_frag_len) + + def handle_event(event): + if event.pkt_size < options.packet_size: + pkt_len = event.pkt_size + else: + pkt_len = options.packet_size + + pkt_data = bytes(event.pkt)[:pkt_len] + + if len(pkt_data) <= 0 or event.pkt_size == 0: + return + + try: + event = DpUpcall(event.ts, event.pid, event.comm.decode("utf-8"), + event.cpu, event.dpif_name.decode("utf-8"), + event.dev_name.decode("utf-8"), + pkt_data, + event.pkt_size, + event.pkt_frag_size) + except LookupError: + # + # If we can't do the port lookup, ignore this event. + # + return None + + if not options.quiet: + print(event) + + return event + + +# +# recv_upcall event class +# +class RecvUpcall(Event): + def __init__(self, ts, pid, comm, cpu, dpif_name, key, pkt, pkt_len): + super(RecvUpcall, self).__init__(ts, pid, comm, cpu, + EventType.RECV_UPCALL) + + if dpif_name.startswith("system@"): + dpif_name = dpif_name[len("system@"):] + self.dpif_name = dpif_name + + nla = RecvUpcall.decode_nlm(key, dump=False) + if "OVS_KEY_ATTR_IN_PORT" in nla: + self.dp_port = struct.unpack('=L', nla["OVS_KEY_ATTR_IN_PORT"])[0] + elif "OVS_KEY_ATTR_TUNNEL" in nla: + self.dp_port = DP_TUNNEL_PORT + else: + self.dp_port = RecvUpcall.get_system_dp_port(self.dpif_name) + + if self.dp_port is None: + raise LookupError("Can't find RecvUpcall dp port mapping!") + + self.pkt = pkt + self.pkt_len = pkt_len + + def __str__(self): + return "[{:<22}] {:<16} {:8} [{:03}] {:18.9f}: {:<17} {:4} {:4}". \ + format( + self.event_type, + self.comm, + self.pid, + self.cpu, + self.ts / 1000000000, + self.dpif_name, + self.dp_port, + self.pkt_len) + + def get_system_dp_port(dpif_name): + dp_map = get_dp_mapping(dpif_name, "ovs-system", return_map=True) + if dpif_name not in dp_map: + return None + try: + return dp_map[dpif_name]["ovs-system"] + except KeyError: + return None + + def decode_nlm(msg, indent=4, dump=True): + bytes_left = len(msg) + result = {} + + while bytes_left: + if bytes_left < 4: + if dump: + print("{}WARN: decode truncated; can't read header".format( + ' ' * indent)) + break + + nla_len, nla_type = struct.unpack("=HH", msg[:4]) + + if nla_len < 4: + if dump: + print("{}WARN: decode truncated; nla_len < 4".format( + ' ' * indent)) + break + + nla_data = msg[4:nla_len] + trunc = "" + + if nla_len > bytes_left: + trunc = "..." + nla_data = nla_data[:(bytes_left - 4)] + if RecvUpcall.get_ovs_key_attr_str(nla_type) == \ + "OVS_KEY_ATTR_TUNNEL": + # + # If we have truncated tunnel information, we still would + # like to know. This is due to the special tunnel handling + # needed for port matching. + # + result[RecvUpcall.get_ovs_key_attr_str(nla_type)] = bytes() + else: + result[RecvUpcall.get_ovs_key_attr_str(nla_type)] = nla_data + + if dump: + print("{}nla_len {}, nla_type {}[{}], data: {}{}".format( + ' ' * indent, nla_len, + RecvUpcall.get_ovs_key_attr_str(nla_type), + nla_type, + "".join("{:02x} ".format(b) for b in nla_data), trunc)) + + if trunc != "": + if dump: + print("{}WARN: decode truncated; nla_len > msg_len[{}] ". + format(' ' * indent, bytes_left)) + break + + # Update next offset, but make sure it's aligned correctly. + next_offset = (nla_len + 3) & ~(3) + msg = msg[next_offset:] + bytes_left -= next_offset + + return result + + def get_ovs_key_attr_str(attr): + ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC", + "OVS_KEY_ATTR_ENCAP", + "OVS_KEY_ATTR_PRIORITY", + "OVS_KEY_ATTR_IN_PORT", + "OVS_KEY_ATTR_ETHERNET", + "OVS_KEY_ATTR_VLAN", + "OVS_KEY_ATTR_ETHERTYPE", + "OVS_KEY_ATTR_IPV4", + "OVS_KEY_ATTR_IPV6", + "OVS_KEY_ATTR_TCP", + "OVS_KEY_ATTR_UDP", + "OVS_KEY_ATTR_ICMP", + "OVS_KEY_ATTR_ICMPV6", + "OVS_KEY_ATTR_ARP", + "OVS_KEY_ATTR_ND", + "OVS_KEY_ATTR_SKB_MARK", + "OVS_KEY_ATTR_TUNNEL", + "OVS_KEY_ATTR_SCTP", + "OVS_KEY_ATTR_TCP_FLAGS", + "OVS_KEY_ATTR_DP_HASH", + "OVS_KEY_ATTR_RECIRC_ID", + "OVS_KEY_ATTR_MPLS", + "OVS_KEY_ATTR_CT_STATE", + "OVS_KEY_ATTR_CT_ZONE", + "OVS_KEY_ATTR_CT_MARK", + "OVS_KEY_ATTR_CT_LABELS", + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4", + "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6", + "OVS_KEY_ATTR_NSH"] + + if attr < 0 or attr > len(ovs_key_attr): + return "" + + return ovs_key_attr[attr] + + def handle_event(event): + # + # For us, only upcalls with a packet, flow_key, and upcall_type + # DPIF_UC_MISS are of interest. + # + if event.pkt_size <= 0 or event.key_size <= 0 or \ + event.upcall_type != 0: + return + + if event.key_size < options.flow_key_size: + key_len = event.key_size + else: + key_len = options.flow_key_size + + if event.pkt_size < options.packet_size: + pkt_len = event.pkt_size + else: + pkt_len = options.packet_size + + try: + event = RecvUpcall(event.ts, event.pid, event.comm.decode("utf-8"), + event.cpu, event.dpif_name.decode("utf-8"), + bytes(event.key)[:key_len], + bytes(event.pkt)[:pkt_len], + event.pkt_size) + except LookupError: + return None + + if not options.quiet: + print(event) + + return event + + +# +# op_flow_execute event class +# +class OpFlowExecute(Event): + def __init__(self, ts, pid, comm, cpu, pkt, pkt_len): + super(OpFlowExecute, self).__init__(ts, pid, comm, cpu, + EventType.OP_FLOW_EXECUTE) + self.pkt = pkt + self.pkt_len = pkt_len + + def __str__(self): + return "[{:<22}] {:<16} {:8} [{:03}] {:18.9f}: " \ + "{:<17} {:4} {:4}".format(EventType.short_name(self.event_type), + self.comm, + self.pid, + self.cpu, + self.ts / 1000000000, + "", + "", + self.pkt_len) + + def handle_event(event): + if event.pkt_size < options.packet_size: + pkt_len = event.pkt_size + else: + pkt_len = options.packet_size + + pkt_data = bytes(event.pkt)[:pkt_len] + + if len(pkt_data) <= 0 or event.pkt_size == 0: + return + + event = OpFlowExecute(event.ts, event.pid, event.comm.decode("utf-8"), + event.cpu, pkt_data, event.pkt_size) + + if not options.quiet: + print(event) + + return event + + +# +# get_dp_mapping() +# +def get_dp_mapping(dp, port, return_map=False, dp_map=None): + if options.unit_test: + return port + + if dp_map is not None: + get_dp_mapping.dp_port_map_cache = dp_map + + # + # Build a cache, so we do not have to execue the ovs command each time. + # + if not hasattr(get_dp_mapping, "dp_port_map_cache"): + try: + output = subprocess.check_output(['ovs-appctl', 'dpctl/show'], + encoding='utf8').split("\n") + except subprocess.CalledProcessError: + output = "" + pass + + current_dp = None + get_dp_mapping.dp_port_map_cache = {} + + for line in output: + match = re.match("^system@(.*):$", line) + if match is not None: + current_dp = match.group(1) + + match = re.match("^ port ([0-9]+): ([^ /]*)", line) + if match is not None and current_dp is not None: + try: + get_dp_mapping.dp_port_map_cache[ + current_dp][match.group(2)] = int(match.group(1)) + except KeyError: + get_dp_mapping.dp_port_map_cache[current_dp] = \ + {match.group(2): int(match.group(1))} + + if return_map: + return get_dp_mapping.dp_port_map_cache + + if dp not in get_dp_mapping.dp_port_map_cache or \ + port not in get_dp_mapping.dp_port_map_cache[dp]: + return None + + return get_dp_mapping.dp_port_map_cache[dp][port] + + +# +# event_to_dict() +# +def event_to_dict(event): + event_dict = {} + + for field, _ in event._fields_: + if isinstance(getattr(event, field), (int, bytes)): + event_dict[field] = getattr(event, field) + else: + if (field == "key" and event.key_size == 0) or \ + (field == "pkt" and event.pkt_size == 0): + data = bytes() + else: + data = bytes(getattr(event, field)) + + event_dict[field] = data + + return event_dict + + +# +# receive_event_bcc() +# +def receive_event_bcc(ctx, data, size): + global events_received + events_received += 1 + + event = b['events'].event(data) + + if export_file is not None: + export_file.write("event = {}\n".format(event_to_dict(event))) + + receive_event(event) + + +# +# receive_event() +# +def receive_event(event): + global event_count + + if event.event == 0: + trace_event = RecvUpcall.handle_event(event) + elif event.event == 1: + trace_event = DpUpcall.handle_event(event) + elif event.event == 2: + trace_event = Event.handle_event(event) + elif event.event == 3: + trace_event = OpFlowExecute.handle_event(event) + elif event.event == 4: + trace_event = Event.handle_event(event) + + try: + event_count['total'][EventType.from_trace(event.event)] += 1 + except KeyError: + event_count['total'][EventType.from_trace(event.event)] = 1 + event_count['valid'][EventType.from_trace(event.event)] = 0 + + if trace_event is not None: + event_count['valid'][EventType.from_trace(event.event)] += 1 + trace_data.append(trace_event) + + +# +# collect_event_sets() +# +def collect_event_sets(events, collect_stats=False, profile=False, + spinner=False): + t1_time = 0 + + def t1_start(): + nonlocal t1_time + t1_time = process_time() + + def t1_stop(description): + print("* PROFILING: {:<50}: {:.06f} seconds".format( + description, process_time() - t1_time)) + + warn_parcial_match = False + warn_frag = False + + if profile: + t1_start() + # + # First let's create a dict of per handler thread events. + # + threads = {} + threads_result = {} + for idx, event in enumerate(events): + if event.event_type == EventType.DP_UPCALL: + continue + if event.pid not in threads: + threads[event.pid] = [] + threads[event.pid].append([idx, event]) + + if profile: + t1_stop("Creating per thread dictionary") + t1_start() + # + # Now spit them in per upcall sets, but remember that + # RecvUpcall event can be batched. + # + batch_stats = [] + for thread, items in threads.items(): + thread_set = [] + batch = [] + ovs_pkt_exec_set = [] + batching = True + collecting = 0 + has_flow_put = False + has_flow_exec = False + + def next_batch(): + nonlocal batching, batch, collecting, has_flow_put, has_flow_exec + nonlocal ovs_pkt_exec_set, thread_set + + if len(batch) > 0: + # + # If we are done with the batch, see if we need to match up + # any batched OVS_PKT_EXEC events. + # + for event in batch: + if len(ovs_pkt_exec_set) <= 0: + break + if any(isinstance(item, + OpFlowExecute) for item in event[2]): + event[2].append(ovs_pkt_exec_set.pop(0)) + # + # Append the batch to the thread-specific set. + # + thread_set = thread_set + batch + if collect_stats: + batch_stats.append(len(batch)) + batching = True + batch = [] + ovs_pkt_exec_set = [] + has_flow_put = False + has_flow_exec = False + collecting = 0 + + def next_batch_set(): + nonlocal has_flow_put, has_flow_exec, collecting + has_flow_put = False + has_flow_exec = False + collecting += 1 + + for item in items: + idx, event = item + + if batching: + if event.event_type == EventType.RECV_UPCALL: + batch.append(item + [[]]) + elif len(batch) > 0: + batching = False + collecting = 0 + else: + continue + + if not batching: + if event.event_type == EventType.RECV_UPCALL: + next_batch() + batch.append(item + [[]]) + else: + if event.event_type == EventType.OP_FLOW_PUT: + if has_flow_put: + next_batch_set() + if collecting >= len(batch): + next_batch() + continue + + batch[collecting][2].append(item[1]) + has_flow_put = True + + elif event.event_type == EventType.OP_FLOW_EXECUTE: + if has_flow_exec: + next_batch_set() + if collecting >= len(batch): + next_batch() + continue + + if (event.pkt_len == batch[collecting][1].pkt_len + and event.pkt == batch[collecting][1].pkt): + batch[collecting][2].append(item[1]) + has_flow_put = True + has_flow_exec = True + else: + # + # If we end up here it could be that an upcall in a + # batch did not generate an EXECUTE and we are out + # of sync. Try to match it to the next batch entry. + # + next_idx = collecting + 1 + while True: + if next_idx >= len(batch): + next_batch() + break + if (event.pkt_len == batch[next_idx][1].pkt_len + and event.pkt == batch[next_idx][1].pkt): + + batch[next_idx][2] = batch[collecting][2] + batch[collecting][2] = [] + collecting = next_idx + batch[collecting][2].append(item[1]) + has_flow_put = True + has_flow_exec = True + break + + next_idx += 1 + + elif event.event_type == EventType.OVS_PKT_EXEC: + # + # The OVS_PKT_EXEC might also be batched, so we keep + # them in a separate list and assign them to the + # correct set when completing the set. + # + ovs_pkt_exec_set.append(item[1]) + continue + + if collecting >= len(batch): + next_batch() + next_batch() + threads_result[thread] = thread_set + + if profile: + t1_stop("Creating upcall sets") + t1_start() + + # + # Move thread results from list to dictionary + # + thread_stats = {} + for thread, sets in threads_result.items(): + if len(sets) > 0: + thread_stats[sets[0][1].comm] = len(sets) + + threads_result[thread] = {} + for upcall in sets: + threads_result[thread][upcall[0]] = [upcall[1]] + upcall[2] + + if profile: + t1_stop("Moving upcall list to dictionary") + t1_start() + + if options.debug & 0x4000000 != 0: + print() + for thread, sets in threads_result.items(): + for idx, idx_set in sets.items(): + print("DBG: {}".format(idx_set)) + + # + # Create two lists on with DP_UPCALLs and RECV_UPCALLs + # + dp_upcall_list = [] + recv_upcall_list = [] + for idx, event in enumerate(events): + if event.event_type == EventType.DP_UPCALL: + dp_upcall_list.append([idx, event]) + elif event.event_type == EventType.RECV_UPCALL: + recv_upcall_list.append([idx, event]) + + if profile: + t1_stop("Creating DP_UPCALL and RECV_UPCALL lists") + t1_start() + + if options.debug & 0x4000000 != 0: + print() + for dp_upcall in dp_upcall_list: + print("DBG: {}".format(dp_upcall)) + print() + for recv_upcall in recv_upcall_list: + print("DBG: {}".format(recv_upcall)) + + # + # Now find the matching DP_UPCALL and RECV_UPCALL events + # + event_sets = [] + if spinner: + print() + with alive_bar(len(dp_upcall_list), + title="- Matching DP_UPCALLs to RECV_UPCALLs", + spinner=None, disable=not spinner) as bar: + + for (idx, event) in dp_upcall_list: + remove_indexes = [] + this_set = None + # + # TODO: This part needs some optimization, as it's slow in the + # PVP test scenario. This is because a lot of DP_UPCALLS + # will not have a matching RECV_UPCALL leading to walking + # the entire recv_upcall_list list. + # + # Probably some dictionary, but in the PVP scenario packets + # come from a limited set of ports, and the length is all the + # same. So we do need the key to be recv.dport + + # len(recv.pkt) + recv.pkt, however, the recv.pkt compare + # needs to happen on min(len(event.pkt), len(recv.pkt)). + # + for idx_in_list, (idx_recv, recv) in enumerate(recv_upcall_list): + match = False + + if idx_recv < idx: + remove_indexes.append(idx_in_list) + continue + # + # If the RecvUpcall is a tunnel port, we can not map it to + # the correct tunnel. For now, we assume the first matching + # packet is the correct one. For more details see the OVS + # ukey_to_flow_netdev() function. + # + if (event.dp_port == recv.dp_port or + recv.dp_port == DP_TUNNEL_PORT) \ + and event.pkt_len == recv.pkt_len: + + compare_len = min(len(event.pkt), len(recv.pkt)) + + if len(event.pkt) != len(recv.pkt) \ + and event.pkt_frag_len == 0: + warn_parcial_match = True + elif event.pkt_frag_len != 0: + warn_frag = True + compare_len = min(compare_len, event.pkt_frag_len) + + if event.pkt[:compare_len] == recv.pkt[:compare_len]: + match = True + else: + # + # There are still some corner cases due to the fact + # the kernel dp_upcall tracepoint is hit before the + # packet is prepared/modified for upcall pass on. + # Example cases are packet checksum update, VLAN + # insertion, etc., etc. For now, we try to handle the + # checksum part, but we might need to add more + # exceptions in the future. + # + diff_bytes = sum(i != j for i, j in zip( + event.pkt[:compare_len], recv.pkt[:compare_len])) + + if diff_bytes <= 2 and compare_len > 56: + # This could be a TCP or UDP checksum + event_pkt = Ether(bytes(event.pkt)[:compare_len]) + recv_pkt = Ether(bytes(recv.pkt)[:compare_len]) + if (event_pkt.haslayer(TCP) and + recv_pkt.haslayer(TCP)) or ( + event_pkt.haslayer(UDP) and + recv_pkt.haslayer(UDP)): + + if event_pkt.haslayer(TCP): + event_chksum = event_pkt[TCP].chksum + recv_chksum = recv_pkt[TCP].chksum + else: + event_chksum = event_pkt[UDP].chksum + recv_chksum = recv_pkt[UDP].chksum + + if event_chksum & 0xff != recv_chksum & 0xff: + diff_bytes -= 1 + if event_chksum & 0xff00 != \ + recv_chksum & 0xff00: + diff_bytes -= 1 + + if diff_bytes == 0: + match = True + + if match: + this_set = {event.event_type: event} + for sevent in threads_result[recv.pid][idx_recv]: + this_set[sevent.event_type] = sevent + event_sets.append(this_set) + remove_indexes.append(idx_in_list) + + if options.debug & 0x4000000 != 0: + print("DBG: Matched DpUpcall({:6}) => " + "RecvUpcall({:6})".format(idx, idx_recv)) + + break + + elif options.debug & 0x8000000 != 0: + print("DBG: COMPARE DpUpcall({:6}) != " + "RecvUpcall({:6})".format(idx, idx_recv)) + event_pkt = Ether(bytes(event.pkt)[:compare_len]) + recv_pkt = Ether(bytes(recv.pkt)[:compare_len]) + print(re.sub('^', 'DBG:' + ' ' * 4, + event_pkt.show(dump=True), + flags=re.MULTILINE)) + print(re.sub('^', 'DBG:' + ' ' * 4, + recv_pkt.show(dump=True), + flags=re.MULTILINE)) + + elif options.debug & 0x8000000 != 0: + print("DBG: COMPATE DpUpcall({:6}) != " + "RecvUpcall({:6}) -> port {}, {} -> " + "len = {}, {}".format(idx, idx_recv, + event.dp_port, + recv.dp_port, + event.pkt_len, + recv.pkt_len)) + + bar() + for remove_idx in sorted(remove_indexes, reverse=True): + del recv_upcall_list[remove_idx] + + if profile: + t1_stop("Matching DP_UPCALLs to a set") + + if warn_parcial_match: + print("WARNING: Packets not fully captured for matching!\n " + "Increase the packet buffer with the '--packet-size' option.") + if warn_frag: + print("WARNING: SKB from kernel had fragments, we could only copy/" + "compare the first part!") + + if collect_stats: + return event_sets, batch_stats, thread_stats + + return event_sets + + +# +# unit_test() +# +def unit_test(): + pkt1 = b'\x01\x02\x03\x04\x05' + pkt2 = b'\x01\x02\x03\x04\x06' + pkt3 = b'\x01\x02\x03\x04\x07' + key = b'\x08\x00\x03\x00\x01\x00\x00\x00' # Port 1 + # + # Basic test with all events in line + # + t1_events = [DpUpcall(1, 100, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + RecvUpcall(2, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + Event(3, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(4, 1, "hndl", 1, pkt1, len(pkt1)), + Event(5, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t1_result = [{EventType.DP_UPCALL: t1_events[0], + EventType.RECV_UPCALL: t1_events[1], + EventType.OP_FLOW_PUT: t1_events[2], + EventType.OP_FLOW_EXECUTE: t1_events[3], + EventType.OVS_PKT_EXEC: t1_events[4]}] + # + # Basic test with missing flow put + # + t2_events = [DpUpcall(1, 100, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + RecvUpcall(2, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + OpFlowExecute(4, 1, "hndl", 1, pkt1, len(pkt1)), + Event(5, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t2_result = [{EventType.DP_UPCALL: t2_events[0], + EventType.RECV_UPCALL: t2_events[1], + EventType.OP_FLOW_EXECUTE: t2_events[2], + EventType.OVS_PKT_EXEC: t2_events[3]}] + # + # Test with RecvUpcall's being batched + # + t3_events = [DpUpcall(1, 101, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + DpUpcall(2, 102, "ping", 2, "system", 1, pkt2, len(pkt2), 0), + DpUpcall(3, 101, "ping", 3, "system", 1, pkt3, len(pkt3), 0), + RecvUpcall(4, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + RecvUpcall(5, 1, "hndl", 1, "systen", key, pkt3, len(pkt3)), + RecvUpcall(6, 1, "hndl", 1, "systen", key, pkt2, len(pkt2)), + Event(7, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(8, 1, "hndl", 1, pkt1, len(pkt1)), + Event(9, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(10, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(11, 1, "hndl", 1, pkt3, len(pkt3)), + Event(12, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(13, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(14, 1, "hndl", 1, pkt2, len(pkt2)), + Event(15, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t3_result = [{EventType.DP_UPCALL: t3_events[0], + EventType.RECV_UPCALL: t3_events[3], + EventType.OP_FLOW_PUT: t3_events[6], + EventType.OP_FLOW_EXECUTE: t3_events[7], + EventType.OVS_PKT_EXEC: t3_events[8]}, + {EventType.DP_UPCALL: t3_events[1], + EventType.RECV_UPCALL: t3_events[5], + EventType.OP_FLOW_PUT: t3_events[12], + EventType.OP_FLOW_EXECUTE: t3_events[13], + EventType.OVS_PKT_EXEC: t3_events[14]}, + {EventType.DP_UPCALL: t3_events[2], + EventType.RECV_UPCALL: t3_events[4], + EventType.OP_FLOW_PUT: t3_events[9], + EventType.OP_FLOW_EXECUTE: t3_events[10], + EventType.OVS_PKT_EXEC: t3_events[11]}] + # + # Test with RecvUpcall's single + batch + # + t4_events = [DpUpcall(1, 100, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + RecvUpcall(2, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + Event(3, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(4, 1, "hndl", 1, pkt1, len(pkt1)), + Event(5, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + DpUpcall(6, 101, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + DpUpcall(7, 102, "ping", 2, "system", 1, pkt2, len(pkt2), 0), + DpUpcall(8, 101, "ping", 3, "system", 1, pkt3, len(pkt3), 0), + RecvUpcall(9, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + RecvUpcall(10, 1, "hndl", 1, "systen", key, pkt3, len(pkt3)), + RecvUpcall(11, 1, "hndl", 1, "systen", key, pkt2, len(pkt2)), + Event(12, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(13, 1, "hndl", 1, pkt1, len(pkt1)), + Event(14, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(15, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(16, 1, "hndl", 1, pkt3, len(pkt3)), + Event(17, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(18, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(14, 1, "hndl", 1, pkt2, len(pkt2)), + Event(19, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t4_result = [{EventType.DP_UPCALL: t4_events[0], + EventType.RECV_UPCALL: t4_events[1], + EventType.OP_FLOW_PUT: t4_events[2], + EventType.OP_FLOW_EXECUTE: t4_events[3], + EventType.OVS_PKT_EXEC: t4_events[4]}, + {EventType.DP_UPCALL: t4_events[5], + EventType.RECV_UPCALL: t4_events[8], + EventType.OP_FLOW_PUT: t4_events[11], + EventType.OP_FLOW_EXECUTE: t4_events[12], + EventType.OVS_PKT_EXEC: t4_events[13]}, + {EventType.DP_UPCALL: t4_events[6], + EventType.RECV_UPCALL: t4_events[10], + EventType.OP_FLOW_PUT: t4_events[17], + EventType.OP_FLOW_EXECUTE: t4_events[18], + EventType.OVS_PKT_EXEC: t4_events[19]}, + {EventType.DP_UPCALL: t4_events[7], + EventType.RECV_UPCALL: t4_events[9], + EventType.OP_FLOW_PUT: t4_events[14], + EventType.OP_FLOW_EXECUTE: t4_events[15], + EventType.OVS_PKT_EXEC: t4_events[16]}] + # + # Test with two threads interleaved + # + t5_events = [DpUpcall(1, 100, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + DpUpcall(2, 100, "ping", 1, "system", 1, pkt2, len(pkt2), 0), + RecvUpcall(3, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + RecvUpcall(4, 2, "hndl", 2, "systen", key, pkt2, len(pkt2)), + Event(5, 2, "hndl", 2, EventType.OP_FLOW_PUT), + Event(6, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(7, 2, "hndl", 1, pkt2, len(pkt2)), + OpFlowExecute(8, 1, "hndl", 1, pkt1, len(pkt1)), + Event(9, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(10, 2, "hndl", 1, EventType.OVS_PKT_EXEC)] + t5_result = [{EventType.DP_UPCALL: t5_events[0], + EventType.RECV_UPCALL: t5_events[2], + EventType.OP_FLOW_PUT: t5_events[5], + EventType.OP_FLOW_EXECUTE: t5_events[7], + EventType.OVS_PKT_EXEC: t5_events[8]}, + {EventType.DP_UPCALL: t5_events[1], + EventType.RECV_UPCALL: t5_events[3], + EventType.OP_FLOW_PUT: t5_events[4], + EventType.OP_FLOW_EXECUTE: t5_events[6], + EventType.OVS_PKT_EXEC: t5_events[9]}] + # + # Test batch with missing events + # + t6_events = [DpUpcall(1, 101, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + DpUpcall(2, 102, "ping", 2, "system", 1, pkt2, len(pkt2), 0), + RecvUpcall(3, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + RecvUpcall(4, 1, "hndl", 1, "systen", key, pkt2, len(pkt2)), + Event(5, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(6, 1, "hndl", 1, pkt2, len(pkt2)), + Event(7, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t6_result = [{EventType.DP_UPCALL: t6_events[0], + EventType.RECV_UPCALL: t6_events[2]}, + {EventType.DP_UPCALL: t6_events[1], + EventType.RECV_UPCALL: t6_events[3], + EventType.OP_FLOW_PUT: t6_events[4], + EventType.OP_FLOW_EXECUTE: t6_events[5], + EventType.OVS_PKT_EXEC: t6_events[6]}] + # + # Test with RecvUpcall's and OVS_PKT_EXEC being batched + # + t7_events = [DpUpcall(1, 101, "ping", 1, "system", 1, pkt1, len(pkt1), 0), + DpUpcall(2, 102, "ping", 2, "system", 1, pkt2, len(pkt2), 0), + DpUpcall(3, 101, "ping", 3, "system", 1, pkt3, len(pkt3), 0), + RecvUpcall(4, 1, "hndl", 1, "systen", key, pkt1, len(pkt1)), + RecvUpcall(5, 1, "hndl", 1, "systen", key, pkt2, len(pkt2)), + RecvUpcall(6, 1, "hndl", 1, "systen", key, pkt3, len(pkt3)), + Event(7, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(8, 1, "hndl", 1, pkt1, len(pkt1)), + Event(9, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(10, 1, "hndl", 1, pkt2, len(pkt2)), + Event(11, 1, "hndl", 1, EventType.OP_FLOW_PUT), + OpFlowExecute(12, 1, "hndl", 1, pkt3, len(pkt3)), + Event(13, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(14, 1, "hndl", 1, EventType.OVS_PKT_EXEC), + Event(15, 1, "hndl", 1, EventType.OVS_PKT_EXEC)] + t7_result = [{EventType.DP_UPCALL: t7_events[0], + EventType.RECV_UPCALL: t7_events[3], + EventType.OP_FLOW_PUT: t7_events[6], + EventType.OP_FLOW_EXECUTE: t7_events[7], + EventType.OVS_PKT_EXEC: t7_events[12]}, + {EventType.DP_UPCALL: t7_events[1], + EventType.RECV_UPCALL: t7_events[4], + EventType.OP_FLOW_PUT: t7_events[8], + EventType.OP_FLOW_EXECUTE: t7_events[9], + EventType.OVS_PKT_EXEC: t7_events[13]}, + {EventType.DP_UPCALL: t7_events[2], + EventType.RECV_UPCALL: t7_events[5], + EventType.OP_FLOW_PUT: t7_events[10], + EventType.OP_FLOW_EXECUTE: t7_events[11], + EventType.OVS_PKT_EXEC: t7_events[14]}] + # + # Actual test sets + # + test_set = [["Simple single event", t1_events, t1_result], + ["Single event, missing flow_put", t2_events, t2_result], + ["Batched events", t3_events, t3_result], + ["Single + batched events", t4_events, t4_result], + ["Two sets, different threads", t5_events, t5_result], + ["Batch with missing exec", t6_events, t6_result], + ["Batched events including exec", t7_events, t7_result]] + + print("Running some simple unit tests:") + + for test in test_set: + print("- {:<32} ".format(test[0]), end="") + result = collect_event_sets(test[1][:]) + if result == test[2]: + print("PASS") + else: + print("FAIL") + print(" OUTPUT :") + for event_set in result: + hdr = " - " + for event_type, event in event_set.items(): + print("{} {:<16}: {}".format(hdr, event_type.name, event)) + hdr = " " + print(" EXPECTED:") + for event_set in test[2]: + hdr = " - " + for event_type, event in event_set.items(): + print("{} {:<16}: {}".format(hdr, event_type.name, event)) + hdr = " " + + +# +# show_key_value() +# +def show_key_value(data_set, description=None): + if description is not None: + print("\n=> {}:".format(description)) + + for k, v in data_set.items(): + print(" {:36}: {:>10}".format(str(k), str(v))) + + +# +# show_batch_histogram() +# +def show_batch_histogram(data_set, description=None): + nr_of_buckets = 64 + + if description is not None: + print("\n=> {}:".format(description)) + + if len(data_set) == 0: + print("# NumSamples = 0") + return + + min_val = nr_of_buckets + max_val = 0 + entries = 0 + high_buckets = 0 + buckets = [0] * nr_of_buckets + + for entry in data_set: + min_val = min(min_val, entry) + max_val = max(max_val, entry) + if entry == 0: + continue + elif entry > nr_of_buckets: + high_buckets += 1 + else: + buckets[entry - 1] += 1 + + entries += 1 + + if max(buckets + [high_buckets]) > 4: + scale = int(max(buckets + [high_buckets]) / 4) + else: + scale = 1 + + print("# NumSamples = {}; Min = {}; Max = {}".format(entries, min_val, + max_val)) + print("# each ∎ represents a count of {}".format(scale)) + + for idx in range(int(nr_of_buckets / 2)): + idx_2nd = idx + int(nr_of_buckets / 2) + print("{:5} [{:8}]: {:22} {:5} [{:8}]: {:22}".format( + idx + 1, buckets[idx], "∎" * int(buckets[idx] / scale), + idx_2nd + 1, buckets[idx_2nd], + "∎" * int(buckets[idx_2nd] / scale))) + + if high_buckets > 0: + print("{:>5} [{:8}]: {:22}".format(">" + str(nr_of_buckets), + high_buckets, + "∎" * int(high_buckets / scale))) + + +# +# show_histogram() +# +def show_histogram(data_set, description=None, options=None, + minimum=None, maximum=None, buckets=None, custbuckets=None): + if description is not None: + print("\n=> {}:".format(description)) + + if options is not None: + if buckets is None: + buckets = options.histogram_buckets + if options is not None and options.sets: + print(data_set) + + if len(data_set) == 0: + print("# NumSamples = 0") + elif len(data_set) == 1: + print("# NumSamples = 1; Min = {0:.4f}; Max = {0:.4f}". + format(data_set[0])) + elif len(set(data_set)) == 1 and maximum is None and minimum is None and \ + custbuckets is None: + histogram(data_set, buckets=buckets, minimum=list(set(data_set))[0], + maximum=list(set(data_set))[0] + 1) + else: + histogram(data_set, buckets=buckets, + minimum=minimum, maximum=maximum, custbuckets=custbuckets) + + +# +# buffer_size_type() +# +def buffer_size_type(astr, min=64, max=2048): + value = int(astr) + if min <= value <= max: + return value + else: + raise argparse.ArgumentTypeError( + 'value not in range {}-{}'.format(min, max)) + + +# +# next_power_of_two() +# +def next_power_of_two(val): + np = 1 + while np < val: + np *= 2 + return np + + +# +# main() +# +def main(): + # + # Don't like these globals, but ctx passing does not seem to work with the + # existing open_ring_buffer() API :( + # + global b + global options + global trace_data + global events_received + global event_count + global export_file + + # + # Argument parsing + # + parser = argparse.ArgumentParser() + + parser.add_argument("-b", "--histogram-buckets", + help="Number of buckets per histogram, default 20", + type=int, default=20, metavar="BUCKETS") + parser.add_argument("--buffer-page-count", + help="Number of BPF ring buffer pages, default 1024", + type=int, default=1024, metavar="NUMBER") + parser.add_argument("-D", "--debug", + help="Enable eBPF debugging", + type=lambda x: int(x, 0), const=0x3f, default=0, + nargs='?') + parser.add_argument("-f", "--flow-key-size", + help="Set maximum flow key size to capture, " + "default 64", type=buffer_size_type, default=64, + metavar="[64-2048]") + parser.add_argument("--handler-filter", + help="Post processing handler thread filter", + type=str, default=None, metavar="HANDLERS") + parser.add_argument("-P", "--packet-size", + help="Set maximum packet size to capture, " + "default 256", type=buffer_size_type, default=256, + metavar="[64-2048]") + parser.add_argument("-p", "--pid", metavar="VSWITCHD_PID", + help="ovs-vswitch's PID", + type=int, default=None) + parser.add_argument("-q", "--quiet", action="store_true", + help="Do not show individual events") + parser.add_argument("-r", "--read-events", + help="Read events from FILE instead of installing " + "tracepoints", type=str, default=None, metavar="FILE") + parser.add_argument("--sets", action="store_true", + help="Dump content of data sets") + parser.add_argument("-s", "--stop", + help="Stop after receiving EVENTS number of trace " + "events", + type=int, default=0, metavar="EVENTS") + parser.add_argument("--unit-test", action="store_true", + help=argparse.SUPPRESS) + parser.add_argument("-w", "--write-events", + help="Write events to FILE", + type=str, default=None, metavar="FILE") + + options = parser.parse_args() + + if options.unit_test: + unit_test() + sys.exit(0) + + # + # Find the PID of the ovs-vswitchd daemon if not specified. + # + if options.pid is None and options.read_events is None: + for proc in psutil.process_iter(): + if 'ovs-vswitchd' in proc.name(): + if options.pid is not None: + print("ERROR: Multiple ovs-vswitchd daemons running, " + "use the -p option!") + sys.exit(-1) + + options.pid = proc.pid + + # + # Error checking on input parameters. + # + if options.pid is None and options.read_events is None: + print("ERROR: Failed to find ovs-vswitchd's PID!") + sys.exit(-1) + + if options.read_events is not None and options.write_events is not None: + print("ERROR: Either supply the read or write events option, " + "not both!") + sys.exit(-1) + + if options.handler_filter is not None and options.read_events is None: + print("ERROR: The --handler-filter option is only valid with the " + "--read-events option!") + sys.exit(-1) + + options.buffer_page_count = next_power_of_two(options.buffer_page_count) + + # + # Open write handle if needed. + # + if options.write_events is not None: + try: + export_file = open(options.write_events, "w") + except (FileNotFoundError, IOError, PermissionError) as e: + print("ERROR: Can't create export file \"{}\": {}".format( + options.write_events, e.strerror)) + sys.exit(-1) + else: + export_file = None + + trace_data = [] + event_count = {'total': {}, 'valid': {}, 'miss': {}} + if options.read_events is None: + # + # Call get_dp_mapping() to prepare the cache + # + dp_port_map = get_dp_mapping("ovs-system", "eth0", return_map=True) + if export_file is not None: + export_file.write("dp_port_map = {}\n".format(dp_port_map)) + + # + # Attach the usdt probe + # + u = USDT(pid=int(options.pid)) + try: + u.enable_probe(probe="recv_upcall", fn_name="trace__recv_upcall") + u.enable_probe(probe="op_flow_put", fn_name="trace__op_flow_put") + u.enable_probe(probe="op_flow_execute", + fn_name="trace__op_flow_execute") + except USDTException as e: + print("ERROR: {}" + "ovs-vswitchd!".format( + (re.sub('^', ' ' * 7, str(e), + flags=re.MULTILINE)).strip(). + replace("--with-dtrace or --enable-dtrace", + "--enable-usdt-probes"))) + sys.exit(-1) + + # + # Uncomment to see how arguments are decoded. + # print(u.get_text()) + # + print("- Compiling eBPF programs...") + + # + # Attach probes to the running process + # + source = ebpf_source.replace("", + str(options.packet_size)) + source = source.replace("", str(options.flow_key_size)) + source = source.replace("", + str(options.buffer_page_count)) + + b = BPF(text=source, usdt_contexts=[u], debug=options.debug & 0xffffff) + + # + # Dump out all events + # + print("- Capturing events [Press ^C to stop]...") + events_received = 0 + + if not options.quiet: + print("\n" + Event.get_event_header_str()) + + b['events'].open_ring_buffer(receive_event_bcc) + while 1: + try: + b.ring_buffer_poll() + if options.stop != 0 and events_received >= options.stop: + break + time.sleep(0.5) + except KeyboardInterrupt: + break + + dropcnt = b.get_table("dropcnt") + export_misses = {} + for k in dropcnt.keys(): + event = EventType.from_trace(k.value) + count = dropcnt.sum(k).value + if count > 0: + if event not in event_count['total']: + event_count['total'][event] = 0 + event_count['valid'][event] = 0 + event_count['miss'][event] = count + export_misses[k.value] = count + + if options.write_events is not None: + if sum(event_count['miss'].values()) > 0: + export_file.write("event_miss = {}\n".format(export_misses)) + + export_file.close() + + print() + else: + # + # Here we are requested to read event from an event export + # + thread_filter = None + if options.handler_filter is not None: + thread_filter = options.handler_filter.split(',') + + try: + dp_port_mapping_valid = False + with open(options.read_events, 'r') as fd: + events_received = 0 + + if options.quiet: + spinner = Halo(spinner="dots", color="cyan", + text="Reading events from \"{}\"...".format( + options.read_events)) + spinner.start() + else: + print("- Reading events from \"{}\"...".format( + options.read_events)) + + if not options.quiet: + print("\n" + Event.get_event_header_str()) + + for entry in fd: + if options.stop != 0 and events_received >= options.stop: + break + + entry.rstrip() + if entry.startswith('dp_port_map = {'): + if not dp_port_mapping_valid: + dp_port_mapping_valid = True + get_dp_mapping("", "", + dp_map=ast.literal_eval(entry[14:])) + elif (entry.startswith('event = {') and + dp_port_mapping_valid): + event = ast.literal_eval(entry[8:]) + event = namedtuple("EventObject", + event.keys())(*event.values()) + + if thread_filter is not None \ + and EventType.from_trace(event.event) != \ + EventType.DP_UPCALL \ + and event.comm.decode("utf-8") not in thread_filter: + # Skip none filtered threads + continue + + if len(event.pkt) > 0: + options.packet_size = len(event.pkt) + if len(event.key) > 0: + options.flow_key_size = len(event.key) + receive_event(event) + events_received += 1 + elif entry.startswith('event_miss = {'): + misses = ast.literal_eval(entry[13:]) + for e, count in misses.items(): + event = EventType.from_trace(e) + if count > 0: + if event not in event_count['total']: + event_count['total'][event] = 0 + event_count['valid'][event] = 0 + event_count['miss'][event] = count + + if options.quiet: + spinner.stop() + print("- Reading events from \"{}\"...".format( + options.read_events)) + + except (FileNotFoundError, PermissionError): + print("ERROR: Can't open file \"{}\" for reading!".format( + options.read_events)) + sys.exit(-1) + + # + # Start analyzing the data + # + print("- Analyzing results ({} events)...".format(len(trace_data))) + + if events_received > 0: + if sum(event_count['miss'].values()) > 0: + print("\nWARNING: Not all events were captured!\n " + "Increase the BPF ring buffer size with the " + "--buffer-page-count option.") + + print("\n=> Events received per type (usable/total) [missed events]:") + for event, total in sorted(event_count['total'].items()): + miss = event_count['miss'][event] if event in event_count['miss'] \ + else 0 + print(" {:36}: {:10}/{:10} [{:10}]".format( + event, event_count['valid'][event], total, miss)) + + collection, batch_stats, thread_stats = collect_event_sets( + trace_data, collect_stats=True, spinner=True) + + if len(collection) <= 0: + print("No upcall data sets where found!!") + sys.exit(0) + + print("\n- Analyzing {} event sets...".format(len(collection))) + + if options.debug & 0x1000000 != 0: + for upcall in collection: + print("DBG: {}{}{}{}{}".format( + "U" if EventType.DP_UPCALL in upcall else "-", + "u" if EventType.RECV_UPCALL in upcall else "-", + "p" if EventType.OP_FLOW_PUT in upcall else "-", + "e" if EventType.OP_FLOW_EXECUTE in upcall else "-", + "E" if EventType.OVS_PKT_EXEC in upcall else "-")) + if options.debug & 0x2000000 != 0: + try: + print("DBG: - {}".format(upcall[EventType.DP_UPCALL])) + print("DBG: - {}".format(upcall[EventType.RECV_UPCALL])) + print("DBG: - {}".format(upcall[EventType.OP_FLOW_PUT])) + print("DBG: - {}".format( + upcall[EventType.OP_FLOW_EXECUTE])) + print("DBG: - {}".format(upcall[EventType.OVS_PKT_EXEC])) + except LookupError: + continue + + show_key_value(thread_stats, description="Upcalls handled per thread") + show_batch_histogram(batch_stats, + description="Histogram of upcalls per batch") + + kernel_to_vswitchd = [] + kernel_to_kernel_exec = [] + vswitchd_to_kernel = [] + time_minus_lookup = [] + for upcall in collection: + kernel_to_vswitchd.append((upcall[EventType.RECV_UPCALL].ts - + upcall[EventType.DP_UPCALL].ts) / + 1000) + + if EventType.OP_FLOW_PUT in upcall and \ + EventType.OVS_PKT_EXEC in upcall: + time_minus_lookup.append( + ((upcall[EventType.OVS_PKT_EXEC].ts - + upcall[EventType.DP_UPCALL].ts) - + (upcall[EventType.OP_FLOW_PUT].ts - + upcall[EventType.RECV_UPCALL].ts)) / 1000) + + if EventType.OP_FLOW_EXECUTE in upcall and \ + EventType.OVS_PKT_EXEC in upcall: + vswitchd_to_kernel.append((upcall[EventType.OVS_PKT_EXEC].ts + - upcall[EventType.OP_FLOW_EXECUTE].ts) + / 1000) + + if EventType.OVS_PKT_EXEC in upcall: + kernel_to_kernel_exec.append((upcall[EventType.OVS_PKT_EXEC].ts - + upcall[EventType.DP_UPCALL].ts) / + 1000) + + show_histogram(kernel_to_vswitchd, + description="Kernel upcall action to vswitchd receive " + "(microseconds)", + options=options) + show_histogram(vswitchd_to_kernel, + description="vswitchd execute to kernel receive " + "(microseconds)", + options=options) + show_histogram(time_minus_lookup, + description="Upcall overhead (total time minus lookup) " + "(microseconds)", + options=options) + show_histogram(kernel_to_kernel_exec, + description="Kernel upcall to kernel packet execute " + "(microseconds)", + options=options) + + +# +# Start main() as the default entry point... +# +if __name__ == '__main__': + main()