From patchwork Wed Feb 22 17:07:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Aaron Conole X-Patchwork-Id: 1746385 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=openvswitch.org (client-ip=2605:bc80:3010::133; helo=smtp2.osuosl.org; envelope-from=ovs-dev-bounces@openvswitch.org; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fB4slT/Q; dkim-atps=neutral Received: from smtp2.osuosl.org (smtp2.osuosl.org [IPv6:2605:bc80:3010::133]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-384) server-digest SHA384) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4PMMzN3Js9z23yr for ; Thu, 23 Feb 2023 04:07:44 +1100 (AEDT) Received: from localhost (localhost [127.0.0.1]) by smtp2.osuosl.org (Postfix) with ESMTP id 3738B4053F; Wed, 22 Feb 2023 17:07:42 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org 3738B4053F Authentication-Results: smtp2.osuosl.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fB4slT/Q X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp2.osuosl.org ([127.0.0.1]) by localhost (smtp2.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GsFC6pQ3QcM4; Wed, 22 Feb 2023 17:07:40 +0000 (UTC) Received: from lists.linuxfoundation.org (lf-lists.osuosl.org [IPv6:2605:bc80:3010:104::8cd3:938]) by smtp2.osuosl.org (Postfix) with ESMTPS id B505A40232; Wed, 22 Feb 2023 17:07:39 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp2.osuosl.org B505A40232 Received: from lf-lists.osuosl.org (localhost [127.0.0.1]) by lists.linuxfoundation.org (Postfix) with ESMTP id 6F896C0032; Wed, 22 Feb 2023 17:07:39 +0000 (UTC) X-Original-To: dev@openvswitch.org Delivered-To: ovs-dev@lists.linuxfoundation.org Received: from smtp3.osuosl.org (smtp3.osuosl.org [IPv6:2605:bc80:3010::136]) by lists.linuxfoundation.org (Postfix) with ESMTP id 7B18CC002B for ; Wed, 22 Feb 2023 17:07:37 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by smtp3.osuosl.org (Postfix) with ESMTP id 3CACF60D4E for ; Wed, 22 Feb 2023 17:07:37 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 3CACF60D4E Authentication-Results: smtp3.osuosl.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=fB4slT/Q X-Virus-Scanned: amavisd-new at osuosl.org Received: from smtp3.osuosl.org ([127.0.0.1]) by localhost (smtp3.osuosl.org [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 44LLA7VH5CVS for ; Wed, 22 Feb 2023 17:07:35 +0000 (UTC) X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0 DKIM-Filter: OpenDKIM Filter v2.11.0 smtp3.osuosl.org 5123860B12 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by smtp3.osuosl.org (Postfix) with ESMTPS id 5123860B12 for ; Wed, 22 Feb 2023 17:07:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1677085654; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=LGkeat5scTnrH2MnYiKLddW1dsnpg8z3yRODiFruylk=; b=fB4slT/QHTLo+cA4YfPIMtwoxz+agCou9aVqisOu65RfmBiK8xb1Iwu9De7v2uawffbdJB DzRnwo9DP0B4pZa0LeJqe/+Jl3gR1LqAL76vWqJ5rcq62nHmjd0rrXR8N5MonWtANo1HCG T1kubrfb0WtVfyiJTIKmUdiX6M62pxg= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-13-0zTrY5tgNMGS9oyD5rI8_Q-1; Wed, 22 Feb 2023 12:07:31 -0500 X-MC-Unique: 0zTrY5tgNMGS9oyD5rI8_Q-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 3CF3D8533AF; Wed, 22 Feb 2023 17:07:29 +0000 (UTC) Received: from RHTPC1VM0NT.redhat.com (unknown [10.22.10.29]) by smtp.corp.redhat.com (Postfix) with ESMTP id 77B862026D4B; Wed, 22 Feb 2023 17:07:28 +0000 (UTC) From: Aaron Conole To: dev@openvswitch.org Date: Wed, 22 Feb 2023 12:07:28 -0500 Message-Id: <20230222170728.1698916-1-aconole@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Cc: Eli Britstein , Gaetan Rivet , Ilya Maximets , Maxime Coquelin , Jason Gunthorpe , Majd Dibbiny , David Marchand Subject: [ovs-dev] [RFC] dpdk: Allow retaining cap_sys_rawio privileges X-BeenThere: ovs-dev@openvswitch.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ovs-dev-bounces@openvswitch.org Sender: "dev" Open vSwitch generally tries to let the underlying operating system managed the low level details of hardware, for example DMA mapping, bus arbitration, etc. However, when using DPDK, the underlying operating system yields control of many of these details to userspace for management. In the case of some DPDK port drivers, configuring rte_flow or even allocating resources may require access to iopl/ioperm calls, which are guarded by the CAP_SYS_RAWIO privilege on linux systems. These calls are dangerous, and can allow a process to completely compromise a system. However, they are needed in the case of some userspace driver code which manages the hardware (for example, the mlx implementation of backend support for rte_flow). Here, we create an opt-in flag passed to the command line to allow this access. We need to do this before ever accessing the database, because we want to drop all privileges asap, and cannot wait for a connection to the database to be established and functional before dropping. There may be distribution specific ways to do capability management as well (using for example, systemd), but they are not as universal to the vswitchd as a flag. Signed-off-by: Aaron Conole --- NEWS | 4 ++++ lib/daemon-unix.c | 31 ++++++++++++++++++++++--------- lib/daemon.c | 2 +- lib/daemon.h | 4 ++-- ovsdb/ovsdb-client.c | 6 +++--- ovsdb/ovsdb-server.c | 4 ++-- tests/test-netflow.c | 2 +- tests/test-sflow.c | 2 +- tests/test-unixctl.c | 2 +- utilities/ovs-ofctl.c | 4 ++-- utilities/ovs-testcontroller.c | 4 ++-- vswitchd/ovs-vswitchd.8.in | 8 ++++++++ vswitchd/ovs-vswitchd.c | 11 ++++++++++- 13 files changed, 59 insertions(+), 25 deletions(-) diff --git a/NEWS b/NEWS index 85b3496214..65f35dcdd5 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,10 @@ Post-v3.1.0 in order to create OVSDB sockets with access mode of 0770. - QoS: * Added new configuration option 'jitter' for a linux-netem QoS type. + - DPDK: + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started + with the --hw-rawio-access command line option. This allows the + process extra privileges when mapping physical interconnect memory. v3.1.0 - 16 Feb 2023 diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c index 1a7ba427d7..8b895a48de 100644 --- a/lib/daemon-unix.c +++ b/lib/daemon-unix.c @@ -88,7 +88,8 @@ static bool switch_user = false; static uid_t uid; static gid_t gid; static char *user = NULL; -static void daemon_become_new_user__(bool access_datapath); +static void daemon_become_new_user__(bool access_datapath, + bool access_hardware_ports); static void check_already_running(void); static int lock_pidfile(FILE *, int command); @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) * daemonize_complete()) or that it failed to start up (by exiting with a * nonzero exit code). */ void -daemonize_start(bool access_datapath) +daemonize_start(bool access_datapath, bool access_hardware_ports) { assert_single_threaded(); daemonize_fd = -1; if (switch_user) { - daemon_become_new_user__(access_datapath); + daemon_become_new_user__(access_datapath, access_hardware_ports); switch_user = false; } @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) /* Linux specific implementation of daemon_become_new_user() * using libcap-ng. */ static void -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, + bool access_hardware_ports OVS_UNUSED) { #if defined __linux__ && HAVE_LIBCAPNG int ret; @@ -826,7 +828,17 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) if (access_datapath && !ret) { ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) - || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); + || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST) +#ifdef DPDK_NETDEV + || (access_hardware_ports && + capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO)) +#else + ; + if (access_hardware_ports) { + VLOG_WARN("hw port access requested, but no userspace ioport support. Dropping."); + } +#endif + ; } } else { ret = -1; @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) } static void -daemon_become_new_user__(bool access_datapath) +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports) { /* If vlog file has been created, change its owner to the non-root user * as specifed by the --user option. */ @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath) if (LINUX) { if (LIBCAPNG) { - daemon_become_new_user_linux(access_datapath); + daemon_become_new_user_linux(access_datapath, + access_hardware_ports); } else { VLOG_FATAL("%s: fail to downgrade user using libcap-ng. " "(libcap-ng is not configured at compile time), " @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath) * However, there in case the user switch needs to be done * before daemonize_start(), the following API can be used. */ void -daemon_become_new_user(bool access_datapath) +daemon_become_new_user(bool access_datapath, bool access_hardware_ports) { assert_single_threaded(); if (switch_user) { - daemon_become_new_user__(access_datapath); + daemon_become_new_user__(access_datapath, access_hardware_ports); /* daemonize_start() should not switch user again. */ switch_user = false; } diff --git a/lib/daemon.c b/lib/daemon.c index 3249c5ab4b..1e1c019eb1 100644 --- a/lib/daemon.c +++ b/lib/daemon.c @@ -48,7 +48,7 @@ get_detach(void) void daemonize(void) { - daemonize_start(false); + daemonize_start(false, false); daemonize_complete(); } diff --git a/lib/daemon.h b/lib/daemon.h index 0941574963..42372d1463 100644 --- a/lib/daemon.h +++ b/lib/daemon.h @@ -167,10 +167,10 @@ void set_detach(void); bool get_detach(void); void daemon_save_fd(int fd); void daemonize(void); -void daemonize_start(bool access_datapath); +void daemonize_start(bool access_datapath, bool access_hardware_ports); void daemonize_complete(void); void daemon_set_new_user(const char * user_spec); -void daemon_become_new_user(bool access_datapath); +void daemon_become_new_user(bool access_datapath, bool access_hardware_ports); void daemon_usage(void); void daemon_disable_self_confinement(void); bool daemon_should_self_confine(void); diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c index f1b8d64910..bae2c5f041 100644 --- a/ovsdb/ovsdb-client.c +++ b/ovsdb/ovsdb-client.c @@ -250,7 +250,7 @@ main(int argc, char *argv[]) parse_options(argc, argv); fatal_ignore_sigpipe(); - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (optind >= argc) { ovs_fatal(0, "missing command name; use --help for help"); } @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database, daemon_save_fd(STDOUT_FILENO); daemon_save_fd(STDERR_FILENO); - daemonize_start(false); + daemonize_start(false, false); if (get_detach()) { int error; @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const char *lock) getting a reply of the previous request. */ daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); lock_req_init(&lock_req, method, lock); if (get_detach()) { diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 33ca4910d7..4fea2dbda7 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -341,7 +341,7 @@ main(int argc, char *argv[]) &run_command, &sync_from, &sync_exclude, &active); is_backup = sync_from && !active; - daemon_become_new_user(false); + daemon_become_new_user(false, false); /* Create and initialize 'config_tmpfile' as a temporary file to hold * ovsdb-server's most basic configuration, and then save our initial @@ -359,7 +359,7 @@ main(int argc, char *argv[]) save_config__(config_tmpfile, &remotes, &db_filenames, sync_from, sync_exclude, is_backup); - daemonize_start(false); + daemonize_start(false, false); /* Load the saved config. */ load_config(config_tmpfile, &remotes, &db_filenames, &sync_from, diff --git a/tests/test-netflow.c b/tests/test-netflow.c index d2322d4509..7f89cfcae0 100644 --- a/tests/test-netflow.c +++ b/tests/test-netflow.c @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[]) } daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(NULL, &server); if (error) { diff --git a/tests/test-sflow.c b/tests/test-sflow.c index 460d4d6c54..3c617bdd16 100644 --- a/tests/test-sflow.c +++ b/tests/test-sflow.c @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[]) } daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(NULL, &server); if (error) { diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c index 3eadf54cd9..9e89827895 100644 --- a/tests/test-unixctl.c +++ b/tests/test-unixctl.c @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[]) fatal_ignore_sigpipe(); parse_options(&argc, &argv, &unixctl_path); - daemonize_start(false); + daemonize_start(false, false); int retval = unixctl_server_create(unixctl_path, &unixctl); if (retval) { exit(EXIT_FAILURE); diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c index eabec18a36..f81f5f759a 100644 --- a/utilities/ovs-ofctl.c +++ b/utilities/ovs-ofctl.c @@ -173,7 +173,7 @@ main(int argc, char *argv[]) ctx.argc = argc - optind; ctx.argv = argv + optind; - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (read_only) { ovs_cmdl_run_command_read_only(&ctx, get_all_commands()); } else { @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool reply_to_echo_requests, int error; daemon_save_fd(STDERR_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(unixctl_path, &server); if (error) { ovs_fatal(error, "failed to create unixctl server"); diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c index b489ff5fc7..9f2fbfdf51 100644 --- a/utilities/ovs-testcontroller.c +++ b/utilities/ovs-testcontroller.c @@ -109,7 +109,7 @@ main(int argc, char *argv[]) parse_options(argc, argv); fatal_ignore_sigpipe(); - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (argc - optind < 1) { ovs_fatal(0, "at least one vconn argument required; " @@ -148,7 +148,7 @@ main(int argc, char *argv[]) ovs_fatal(0, "no active or passive switch connections"); } - daemonize_start(false); + daemonize_start(false, false); retval = unixctl_server_create(unixctl_path, &unixctl); if (retval) { diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index 9569265fcb..a6a4a24606 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -81,6 +81,14 @@ unavailable or unsuccessful. .SS "DPDK Options" For details on initializing \fBovs\-vswitchd\fR to use DPDK ports, refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5). +.SS "DPDK HW Access Options" +.IP "\fB\-\-hw\-rawio\-access\fR" +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability, +to allow userspace drivers access to raw hardware memory. This will +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and +\fBioperm()\fR functions to set port access. This is a \fBvery\fR +powerful capability, so generally only enable as needed for specific +hardware. .SS "Daemon Options" .ds DD \ \fBovs\-vswitchd\fR detaches only after it has connected to the \ diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index 407bfc60eb..f62d1ad751 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd); * the kernel from paging any of its memory to disk. */ static bool want_mlockall; +/* --hw-access: If set, retains CAP_SYS_RAWIO privileges. */ +static bool hw_access; + static unixctl_cb_func ovs_vswitchd_exit; static char *parse_options(int argc, char *argv[], char **unixctl_path); @@ -89,7 +92,7 @@ main(int argc, char *argv[]) remote = parse_options(argc, argv, &unixctl_path); fatal_ignore_sigpipe(); - daemonize_start(true); + daemonize_start(true, true); if (want_mlockall) { #ifdef HAVE_MLOCKALL @@ -169,6 +172,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) OPT_DPDK, SSL_OPTION_ENUMS, OPT_DUMMY_NUMA, + OPT_HW_ACCESS, }; static const struct option long_options[] = { {"help", no_argument, NULL, 'h'}, @@ -185,6 +189,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) {"disable-system-route", no_argument, NULL, OPT_DISABLE_SYSTEM_ROUTE}, {"dpdk", optional_argument, NULL, OPT_DPDK}, {"dummy-numa", required_argument, NULL, OPT_DUMMY_NUMA}, + {"hw-rawio-access", no_argument, NULL, OPT_HW_ACCESS}, {NULL, 0, NULL, 0}, }; char *short_options = ovs_cmdl_long_options_to_short_options(long_options); @@ -249,6 +254,10 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) ovs_numa_set_dummy(optarg); break; + case OPT_HW_ACCESS: + hw_access = true; + break; + default: abort(); }