Message ID | 20230307130613.67743-1-aconole@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,v2] dpdk: Allow retaining CAP_SYS_RAWIO privileges | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/intel-ovs-compilation | success | test: success |
On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <aconole@redhat.com> wrote: > > Open vSwitch generally tries to let the underlying operating system > managed the low level details of hardware, for example DMA mapping, > bus arbitration, etc. However, when using DPDK, the underlying > operating system yields control of many of these details to userspace > for management. > > In the case of some DPDK port drivers, configuring rte_flow or even > allocating resources may require access to iopl/ioperm calls, which > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These > calls are dangerous, and can allow a process to completely compromise > a system. However, they are needed in the case of some userspace > driver code which manages the hardware (for example, the mlx > implementation of backend support for rte_flow). > > Here, we create an opt-in flag passed to the command line to allow > this access. We need to do this before ever accessing the database, > because we want to drop all privileges asap, and cannot wait for > a connection to the database to be established and functional before > dropping. There may be distribution specific ways to do capability > management as well (using for example, systemd), but they are not > as universal to the vswitchd as a flag. > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > Signed-off-by: Aaron Conole <aconole@redhat.com> > --- > NEWS | 4 ++++ > lib/daemon-unix.c | 29 +++++++++++++++++++++-------- > lib/daemon-windows.c | 3 ++- > lib/daemon.c | 2 +- > lib/daemon.h | 4 ++-- > ovsdb/ovsdb-client.c | 6 +++--- > ovsdb/ovsdb-server.c | 4 ++-- > tests/test-netflow.c | 2 +- > tests/test-sflow.c | 2 +- > tests/test-unixctl.c | 2 +- > utilities/ovs-ofctl.c | 4 ++-- > utilities/ovs-testcontroller.c | 4 ++-- > vswitchd/ovs-vswitchd.8.in | 8 ++++++++ > vswitchd/ovs-vswitchd.c | 11 ++++++++++- > 14 files changed, 60 insertions(+), 25 deletions(-) > > diff --git a/NEWS b/NEWS > index 85b3496214..65f35dcdd5 100644 > --- a/NEWS > +++ b/NEWS > @@ -10,6 +10,10 @@ Post-v3.1.0 > in order to create OVSDB sockets with access mode of 0770. > - QoS: > * Added new configuration option 'jitter' for a linux-netem QoS type. > + - DPDK: > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started > + with the --hw-rawio-access command line option. This allows the > + process extra privileges when mapping physical interconnect memory. > > > v3.1.0 - 16 Feb 2023 > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c > index 1a7ba427d7..a080facddc 100644 > --- a/lib/daemon-unix.c > +++ b/lib/daemon-unix.c > @@ -88,7 +88,8 @@ static bool switch_user = false; > static uid_t uid; > static gid_t gid; > static char *user = NULL; > -static void daemon_become_new_user__(bool access_datapath); > +static void daemon_become_new_user__(bool access_datapath, > + bool access_hardware_ports); > > static void check_already_running(void); > static int lock_pidfile(FILE *, int command); > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) > * daemonize_complete()) or that it failed to start up (by exiting with a > * nonzero exit code). */ > void > -daemonize_start(bool access_datapath) > +daemonize_start(bool access_datapath, bool access_hardware_ports) > { > assert_single_threaded(); > daemonize_fd = -1; > > if (switch_user) { > - daemon_become_new_user__(access_datapath); > + daemon_become_new_user__(access_datapath, access_hardware_ports); > switch_user = false; > } > > @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) > /* Linux specific implementation of daemon_become_new_user() > * using libcap-ng. */ > static void > -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, > + bool access_hardware_ports OVS_UNUSED) > { > #if defined __linux__ && HAVE_LIBCAPNG > int ret; > @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); > +#ifdef DPDK_NETDEV > + if (access_hardware_ports && !ret) { > + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); > + VLOG_INFO("CAP_SYS_RAWIO enabled."); > + } > +#else > + if (access_hardware_ports) { > + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); Does it mean we drop the capability? or do we drop the drop? I don't really have a better wording but the current log had me reading it a few times. > + } > +#endif > } > } else { > ret = -1; > @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > } > > static void > -daemon_become_new_user__(bool access_datapath) > +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports) > { > /* If vlog file has been created, change its owner to the non-root user > * as specifed by the --user option. */ > @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath) > > if (LINUX) { > if (LIBCAPNG) { > - daemon_become_new_user_linux(access_datapath); > + daemon_become_new_user_linux(access_datapath, > + access_hardware_ports); > } else { > VLOG_FATAL("%s: fail to downgrade user using libcap-ng. " > "(libcap-ng is not configured at compile time), " > @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath) > * However, there in case the user switch needs to be done > * before daemonize_start(), the following API can be used. */ > void > -daemon_become_new_user(bool access_datapath) > +daemon_become_new_user(bool access_datapath, bool access_hardware_ports) > { > assert_single_threaded(); > if (switch_user) { > - daemon_become_new_user__(access_datapath); > + daemon_become_new_user__(access_datapath, access_hardware_ports); > /* daemonize_start() should not switch user again. */ > switch_user = false; > } > diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c > index 7e5f264f5b..f8055197eb 100644 > --- a/lib/daemon-windows.c > +++ b/lib/daemon-windows.c > @@ -526,7 +526,8 @@ daemonize_complete(void) > } > > void > -daemon_become_new_user(bool access_datapath OVS_UNUSED) > +daemon_become_new_user(bool access_datapath OVS_UNUSED, > + bool access_hardware_ports OVS_UNUSED) > { > } > I think there is something missing for Windows. lib/daemon-windows.c:daemonize_start(bool access_datapath OVS_UNUSED) The CI will confirm this :-). > diff --git a/lib/daemon.c b/lib/daemon.c > index 3249c5ab4b..1e1c019eb1 100644 > --- a/lib/daemon.c > +++ b/lib/daemon.c > @@ -48,7 +48,7 @@ get_detach(void) > void > daemonize(void) > { > - daemonize_start(false); > + daemonize_start(false, false); > daemonize_complete(); > } > > diff --git a/lib/daemon.h b/lib/daemon.h > index 0941574963..42372d1463 100644 > --- a/lib/daemon.h > +++ b/lib/daemon.h > @@ -167,10 +167,10 @@ void set_detach(void); > bool get_detach(void); > void daemon_save_fd(int fd); > void daemonize(void); > -void daemonize_start(bool access_datapath); > +void daemonize_start(bool access_datapath, bool access_hardware_ports); > void daemonize_complete(void); > void daemon_set_new_user(const char * user_spec); > -void daemon_become_new_user(bool access_datapath); > +void daemon_become_new_user(bool access_datapath, bool access_hardware_ports); > void daemon_usage(void); > void daemon_disable_self_confinement(void); > bool daemon_should_self_confine(void); > diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c > index f1b8d64910..bae2c5f041 100644 > --- a/ovsdb/ovsdb-client.c > +++ b/ovsdb/ovsdb-client.c > @@ -250,7 +250,7 @@ main(int argc, char *argv[]) > parse_options(argc, argv); > fatal_ignore_sigpipe(); > > - daemon_become_new_user(false); > + daemon_become_new_user(false, false); > if (optind >= argc) { > ovs_fatal(0, "missing command name; use --help for help"); > } > @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database, > > daemon_save_fd(STDOUT_FILENO); > daemon_save_fd(STDERR_FILENO); > - daemonize_start(false); > + daemonize_start(false, false); > if (get_detach()) { > int error; > > @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const char *lock) > getting a reply of the previous > request. */ > daemon_save_fd(STDOUT_FILENO); > - daemonize_start(false); > + daemonize_start(false, false); > lock_req_init(&lock_req, method, lock); > > if (get_detach()) { > diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c > index 33ca4910d7..4fea2dbda7 100644 > --- a/ovsdb/ovsdb-server.c > +++ b/ovsdb/ovsdb-server.c > @@ -341,7 +341,7 @@ main(int argc, char *argv[]) > &run_command, &sync_from, &sync_exclude, &active); > is_backup = sync_from && !active; > > - daemon_become_new_user(false); > + daemon_become_new_user(false, false); > > /* Create and initialize 'config_tmpfile' as a temporary file to hold > * ovsdb-server's most basic configuration, and then save our initial > @@ -359,7 +359,7 @@ main(int argc, char *argv[]) > save_config__(config_tmpfile, &remotes, &db_filenames, sync_from, > sync_exclude, is_backup); > > - daemonize_start(false); > + daemonize_start(false, false); > > /* Load the saved config. */ > load_config(config_tmpfile, &remotes, &db_filenames, &sync_from, > diff --git a/tests/test-netflow.c b/tests/test-netflow.c > index d2322d4509..7f89cfcae0 100644 > --- a/tests/test-netflow.c > +++ b/tests/test-netflow.c > @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[]) > } > > daemon_save_fd(STDOUT_FILENO); > - daemonize_start(false); > + daemonize_start(false, false); > > error = unixctl_server_create(NULL, &server); > if (error) { > diff --git a/tests/test-sflow.c b/tests/test-sflow.c > index 460d4d6c54..3c617bdd16 100644 > --- a/tests/test-sflow.c > +++ b/tests/test-sflow.c > @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[]) > } > > daemon_save_fd(STDOUT_FILENO); > - daemonize_start(false); > + daemonize_start(false, false); > > error = unixctl_server_create(NULL, &server); > if (error) { > diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c > index 3eadf54cd9..9e89827895 100644 > --- a/tests/test-unixctl.c > +++ b/tests/test-unixctl.c > @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[]) > fatal_ignore_sigpipe(); > parse_options(&argc, &argv, &unixctl_path); > > - daemonize_start(false); > + daemonize_start(false, false); > int retval = unixctl_server_create(unixctl_path, &unixctl); > if (retval) { > exit(EXIT_FAILURE); > diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c > index eabec18a36..f81f5f759a 100644 > --- a/utilities/ovs-ofctl.c > +++ b/utilities/ovs-ofctl.c > @@ -173,7 +173,7 @@ main(int argc, char *argv[]) > ctx.argc = argc - optind; > ctx.argv = argv + optind; > > - daemon_become_new_user(false); > + daemon_become_new_user(false, false); > if (read_only) { > ovs_cmdl_run_command_read_only(&ctx, get_all_commands()); > } else { > @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool reply_to_echo_requests, > int error; > > daemon_save_fd(STDERR_FILENO); > - daemonize_start(false); > + daemonize_start(false, false); > error = unixctl_server_create(unixctl_path, &server); > if (error) { > ovs_fatal(error, "failed to create unixctl server"); > diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c > index b489ff5fc7..9f2fbfdf51 100644 > --- a/utilities/ovs-testcontroller.c > +++ b/utilities/ovs-testcontroller.c > @@ -109,7 +109,7 @@ main(int argc, char *argv[]) > parse_options(argc, argv); > fatal_ignore_sigpipe(); > > - daemon_become_new_user(false); > + daemon_become_new_user(false, false); > > if (argc - optind < 1) { > ovs_fatal(0, "at least one vconn argument required; " > @@ -148,7 +148,7 @@ main(int argc, char *argv[]) > ovs_fatal(0, "no active or passive switch connections"); > } > > - daemonize_start(false); > + daemonize_start(false, false); > > retval = unixctl_server_create(unixctl_path, &unixctl); > if (retval) { > diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in > index 9569265fcb..a6a4a24606 100644 > --- a/vswitchd/ovs-vswitchd.8.in > +++ b/vswitchd/ovs-vswitchd.8.in > @@ -81,6 +81,14 @@ unavailable or unsuccessful. > .SS "DPDK Options" > For details on initializing \fBovs\-vswitchd\fR to use DPDK ports, > refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5). > +.SS "DPDK HW Access Options" > +.IP "\fB\-\-hw\-rawio\-access\fR" > +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability, > +to allow userspace drivers access to raw hardware memory. This will > +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and > +\fBioperm()\fR functions to set port access. This is a \fBvery\fR The main usecase for this OVS option would be mlx5 + full hwoffload in DPDK, so if giving an example in the manual, I'd rather mention it. On the other hand, iopl() is (was?) used in DPDK for x86 + virtio pmd with virtio pre v1. And here, I don't expect many people plugging OVS on top of this. Do you have another case in mind for iopl()/ioperm()? The rest lgtm. Thanks for working on this Aaron. > +powerful capability, so generally only enable as needed for specific > +hardware. > .SS "Daemon Options" > .ds DD \ > \fBovs\-vswitchd\fR detaches only after it has connected to the \
On Tue, Mar 07, 2023 at 02:32:04PM +0100, David Marchand wrote: > On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <aconole@redhat.com> wrote: > > > > Open vSwitch generally tries to let the underlying operating system > > managed the low level details of hardware, for example DMA mapping, > > bus arbitration, etc. However, when using DPDK, the underlying > > operating system yields control of many of these details to userspace > > for management. > > > > In the case of some DPDK port drivers, configuring rte_flow or even > > allocating resources may require access to iopl/ioperm calls, which > > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These > > calls are dangerous, and can allow a process to completely compromise > > a system. However, they are needed in the case of some userspace > > driver code which manages the hardware (for example, the mlx > > implementation of backend support for rte_flow). > > > > Here, we create an opt-in flag passed to the command line to allow > > this access. We need to do this before ever accessing the database, > > because we want to drop all privileges asap, and cannot wait for > > a connection to the database to be established and functional before > > dropping. There may be distribution specific ways to do capability > > management as well (using for example, systemd), but they are not > > as universal to the vswitchd as a flag. > > > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > --- > > NEWS | 4 ++++ > > lib/daemon-unix.c | 29 +++++++++++++++++++++-------- > > lib/daemon-windows.c | 3 ++- > > lib/daemon.c | 2 +- > > lib/daemon.h | 4 ++-- > > ovsdb/ovsdb-client.c | 6 +++--- > > ovsdb/ovsdb-server.c | 4 ++-- > > tests/test-netflow.c | 2 +- > > tests/test-sflow.c | 2 +- > > tests/test-unixctl.c | 2 +- > > utilities/ovs-ofctl.c | 4 ++-- > > utilities/ovs-testcontroller.c | 4 ++-- > > vswitchd/ovs-vswitchd.8.in | 8 ++++++++ > > vswitchd/ovs-vswitchd.c | 11 ++++++++++- > > 14 files changed, 60 insertions(+), 25 deletions(-) > > > > diff --git a/NEWS b/NEWS > > index 85b3496214..65f35dcdd5 100644 > > --- a/NEWS > > +++ b/NEWS > > @@ -10,6 +10,10 @@ Post-v3.1.0 > > in order to create OVSDB sockets with access mode of 0770. > > - QoS: > > * Added new configuration option 'jitter' for a linux-netem QoS type. > > + - DPDK: > > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started > > + with the --hw-rawio-access command line option. This allows the > > + process extra privileges when mapping physical interconnect memory. > > > > > > v3.1.0 - 16 Feb 2023 > > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c > > index 1a7ba427d7..a080facddc 100644 > > --- a/lib/daemon-unix.c > > +++ b/lib/daemon-unix.c > > @@ -88,7 +88,8 @@ static bool switch_user = false; > > static uid_t uid; > > static gid_t gid; > > static char *user = NULL; > > -static void daemon_become_new_user__(bool access_datapath); > > +static void daemon_become_new_user__(bool access_datapath, > > + bool access_hardware_ports); > > > > static void check_already_running(void); > > static int lock_pidfile(FILE *, int command); > > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) > > * daemonize_complete()) or that it failed to start up (by exiting with a > > * nonzero exit code). */ > > void > > -daemonize_start(bool access_datapath) > > +daemonize_start(bool access_datapath, bool access_hardware_ports) > > { > > assert_single_threaded(); > > daemonize_fd = -1; > > > > if (switch_user) { > > - daemon_become_new_user__(access_datapath); > > + daemon_become_new_user__(access_datapath, access_hardware_ports); > > switch_user = false; > > } > > > > @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) > > /* Linux specific implementation of daemon_become_new_user() > > * using libcap-ng. */ > > static void > > -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > > +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, > > + bool access_hardware_ports OVS_UNUSED) > > { > > #if defined __linux__ && HAVE_LIBCAPNG > > int ret; > > @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > > ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) > > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) > > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); > > +#ifdef DPDK_NETDEV > > + if (access_hardware_ports && !ret) { > > + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); > > + VLOG_INFO("CAP_SYS_RAWIO enabled."); Perhaps "The Linux capability CAP_SYS_RAWIO is enabled." > > + } > > +#else > > + if (access_hardware_ports) { > > + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); > > Does it mean we drop the capability? or do we drop the drop? > I don't really have a better wording but the current log had me > reading it a few times. Same here. Perhaps: "No driver requires Linux capability CAP_SYS_RAWIO, disabling it." my $0.02 fbl
On Tue, Mar 07, 2023 at 12:24:37PM -0300, Flavio Leitner wrote: > On Tue, Mar 07, 2023 at 02:32:04PM +0100, David Marchand wrote: > > On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <aconole@redhat.com> wrote: > > > > > > Open vSwitch generally tries to let the underlying operating system > > > managed the low level details of hardware, for example DMA mapping, > > > bus arbitration, etc. However, when using DPDK, the underlying > > > operating system yields control of many of these details to userspace > > > for management. > > > > > > In the case of some DPDK port drivers, configuring rte_flow or even > > > allocating resources may require access to iopl/ioperm calls, which > > > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These > > > calls are dangerous, and can allow a process to completely compromise > > > a system. However, they are needed in the case of some userspace > > > driver code which manages the hardware (for example, the mlx > > > implementation of backend support for rte_flow). > > > > > > Here, we create an opt-in flag passed to the command line to allow > > > this access. We need to do this before ever accessing the database, > > > because we want to drop all privileges asap, and cannot wait for > > > a connection to the database to be established and functional before > > > dropping. There may be distribution specific ways to do capability > > > management as well (using for example, systemd), but they are not > > > as universal to the vswitchd as a flag. > > > > > > Reviewed-by: Simon Horman <simon.horman@corigine.com> > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > > --- > > > NEWS | 4 ++++ > > > lib/daemon-unix.c | 29 +++++++++++++++++++++-------- > > > lib/daemon-windows.c | 3 ++- > > > lib/daemon.c | 2 +- > > > lib/daemon.h | 4 ++-- > > > ovsdb/ovsdb-client.c | 6 +++--- > > > ovsdb/ovsdb-server.c | 4 ++-- > > > tests/test-netflow.c | 2 +- > > > tests/test-sflow.c | 2 +- > > > tests/test-unixctl.c | 2 +- > > > utilities/ovs-ofctl.c | 4 ++-- > > > utilities/ovs-testcontroller.c | 4 ++-- > > > vswitchd/ovs-vswitchd.8.in | 8 ++++++++ > > > vswitchd/ovs-vswitchd.c | 11 ++++++++++- > > > 14 files changed, 60 insertions(+), 25 deletions(-) > > > > > > diff --git a/NEWS b/NEWS > > > index 85b3496214..65f35dcdd5 100644 > > > --- a/NEWS > > > +++ b/NEWS > > > @@ -10,6 +10,10 @@ Post-v3.1.0 > > > in order to create OVSDB sockets with access mode of 0770. > > > - QoS: > > > * Added new configuration option 'jitter' for a linux-netem QoS type. > > > + - DPDK: > > > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started > > > + with the --hw-rawio-access command line option. This allows the > > > + process extra privileges when mapping physical interconnect memory. > > > > > > > > > v3.1.0 - 16 Feb 2023 > > > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c > > > index 1a7ba427d7..a080facddc 100644 > > > --- a/lib/daemon-unix.c > > > +++ b/lib/daemon-unix.c > > > @@ -88,7 +88,8 @@ static bool switch_user = false; > > > static uid_t uid; > > > static gid_t gid; > > > static char *user = NULL; > > > -static void daemon_become_new_user__(bool access_datapath); > > > +static void daemon_become_new_user__(bool access_datapath, > > > + bool access_hardware_ports); > > > > > > static void check_already_running(void); > > > static int lock_pidfile(FILE *, int command); > > > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) > > > * daemonize_complete()) or that it failed to start up (by exiting with a > > > * nonzero exit code). */ > > > void > > > -daemonize_start(bool access_datapath) > > > +daemonize_start(bool access_datapath, bool access_hardware_ports) > > > { > > > assert_single_threaded(); > > > daemonize_fd = -1; > > > > > > if (switch_user) { > > > - daemon_become_new_user__(access_datapath); > > > + daemon_become_new_user__(access_datapath, access_hardware_ports); > > > switch_user = false; > > > } > > > > > > @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) > > > /* Linux specific implementation of daemon_become_new_user() > > > * using libcap-ng. */ > > > static void > > > -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > > > +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, > > > + bool access_hardware_ports OVS_UNUSED) > > > { > > > #if defined __linux__ && HAVE_LIBCAPNG > > > int ret; > > > @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) > > > ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) > > > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) > > > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); > > > +#ifdef DPDK_NETDEV > > > + if (access_hardware_ports && !ret) { > > > + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); > > > + VLOG_INFO("CAP_SYS_RAWIO enabled."); > > Perhaps "The Linux capability CAP_SYS_RAWIO is enabled." Actually, it logs the message regardless of the call results. Not sure the chances of failures there, but maybe we should check if it succeed first. #ifdef DPDK_NETDEV if (access_hardware_ports && !ret) { ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); if (!ret) { VLOG_INFO(""The Linux capability CAP_SYS_RAWIO is enabled."); } fbl > > > > + } > > > +#else > > > + if (access_hardware_ports) { > > > + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); > > > > Does it mean we drop the capability? or do we drop the drop? > > I don't really have a better wording but the current log had me > > reading it a few times. > > Same here. Perhaps: > "No driver requires Linux capability CAP_SYS_RAWIO, disabling it." > > my $0.02 > fbl > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
David Marchand <david.marchand@redhat.com> writes: > On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <aconole@redhat.com> wrote: >> >> Open vSwitch generally tries to let the underlying operating system >> managed the low level details of hardware, for example DMA mapping, >> bus arbitration, etc. However, when using DPDK, the underlying >> operating system yields control of many of these details to userspace >> for management. >> >> In the case of some DPDK port drivers, configuring rte_flow or even >> allocating resources may require access to iopl/ioperm calls, which >> are guarded by the CAP_SYS_RAWIO privilege on linux systems. These >> calls are dangerous, and can allow a process to completely compromise >> a system. However, they are needed in the case of some userspace >> driver code which manages the hardware (for example, the mlx >> implementation of backend support for rte_flow). >> >> Here, we create an opt-in flag passed to the command line to allow >> this access. We need to do this before ever accessing the database, >> because we want to drop all privileges asap, and cannot wait for >> a connection to the database to be established and functional before >> dropping. There may be distribution specific ways to do capability >> management as well (using for example, systemd), but they are not >> as universal to the vswitchd as a flag. >> >> Reviewed-by: Simon Horman <simon.horman@corigine.com> >> Signed-off-by: Aaron Conole <aconole@redhat.com> >> --- >> NEWS | 4 ++++ >> lib/daemon-unix.c | 29 +++++++++++++++++++++-------- >> lib/daemon-windows.c | 3 ++- >> lib/daemon.c | 2 +- >> lib/daemon.h | 4 ++-- >> ovsdb/ovsdb-client.c | 6 +++--- >> ovsdb/ovsdb-server.c | 4 ++-- >> tests/test-netflow.c | 2 +- >> tests/test-sflow.c | 2 +- >> tests/test-unixctl.c | 2 +- >> utilities/ovs-ofctl.c | 4 ++-- >> utilities/ovs-testcontroller.c | 4 ++-- >> vswitchd/ovs-vswitchd.8.in | 8 ++++++++ >> vswitchd/ovs-vswitchd.c | 11 ++++++++++- >> 14 files changed, 60 insertions(+), 25 deletions(-) >> >> diff --git a/NEWS b/NEWS >> index 85b3496214..65f35dcdd5 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -10,6 +10,10 @@ Post-v3.1.0 >> in order to create OVSDB sockets with access mode of 0770. >> - QoS: >> * Added new configuration option 'jitter' for a linux-netem QoS type. >> + - DPDK: >> + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started >> + with the --hw-rawio-access command line option. This allows the >> + process extra privileges when mapping physical interconnect memory. >> >> >> v3.1.0 - 16 Feb 2023 >> diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c >> index 1a7ba427d7..a080facddc 100644 >> --- a/lib/daemon-unix.c >> +++ b/lib/daemon-unix.c >> @@ -88,7 +88,8 @@ static bool switch_user = false; >> static uid_t uid; >> static gid_t gid; >> static char *user = NULL; >> -static void daemon_become_new_user__(bool access_datapath); >> +static void daemon_become_new_user__(bool access_datapath, >> + bool access_hardware_ports); >> >> static void check_already_running(void); >> static int lock_pidfile(FILE *, int command); >> @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) >> * daemonize_complete()) or that it failed to start up (by exiting with a >> * nonzero exit code). */ >> void >> -daemonize_start(bool access_datapath) >> +daemonize_start(bool access_datapath, bool access_hardware_ports) >> { >> assert_single_threaded(); >> daemonize_fd = -1; >> >> if (switch_user) { >> - daemon_become_new_user__(access_datapath); >> + daemon_become_new_user__(access_datapath, access_hardware_ports); >> switch_user = false; >> } >> >> @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) >> /* Linux specific implementation of daemon_become_new_user() >> * using libcap-ng. */ >> static void >> -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) >> +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, >> + bool access_hardware_ports OVS_UNUSED) >> { >> #if defined __linux__ && HAVE_LIBCAPNG >> int ret; >> @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) >> ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) >> || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) >> || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); >> +#ifdef DPDK_NETDEV >> + if (access_hardware_ports && !ret) { >> + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); >> + VLOG_INFO("CAP_SYS_RAWIO enabled."); >> + } >> +#else >> + if (access_hardware_ports) { >> + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); > > Does it mean we drop the capability? or do we drop the drop? > I don't really have a better wording but the current log had me > reading it a few times. I'll rewrite the log using Flavio's wording. >> + } >> +#endif >> } >> } else { >> ret = -1; >> @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) >> } >> >> static void >> -daemon_become_new_user__(bool access_datapath) >> +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports) >> { >> /* If vlog file has been created, change its owner to the non-root user >> * as specifed by the --user option. */ >> @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath) >> >> if (LINUX) { >> if (LIBCAPNG) { >> - daemon_become_new_user_linux(access_datapath); >> + daemon_become_new_user_linux(access_datapath, >> + access_hardware_ports); >> } else { >> VLOG_FATAL("%s: fail to downgrade user using libcap-ng. " >> "(libcap-ng is not configured at compile time), " >> @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath) >> * However, there in case the user switch needs to be done >> * before daemonize_start(), the following API can be used. */ >> void >> -daemon_become_new_user(bool access_datapath) >> +daemon_become_new_user(bool access_datapath, bool access_hardware_ports) >> { >> assert_single_threaded(); >> if (switch_user) { >> - daemon_become_new_user__(access_datapath); >> + daemon_become_new_user__(access_datapath, access_hardware_ports); >> /* daemonize_start() should not switch user again. */ >> switch_user = false; >> } >> diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c >> index 7e5f264f5b..f8055197eb 100644 >> --- a/lib/daemon-windows.c >> +++ b/lib/daemon-windows.c >> @@ -526,7 +526,8 @@ daemonize_complete(void) >> } >> >> void >> -daemon_become_new_user(bool access_datapath OVS_UNUSED) >> +daemon_become_new_user(bool access_datapath OVS_UNUSED, >> + bool access_hardware_ports OVS_UNUSED) >> { >> } >> > > I think there is something missing for Windows. > lib/daemon-windows.c:daemonize_start(bool access_datapath OVS_UNUSED) > > The CI will confirm this :-). I wish it would give error instead of warning. :) It passed just fine, but yes, does give the following: "declared formal parameter list different from definition" but not the symbol name. Thanks for catching it in review. >> diff --git a/lib/daemon.c b/lib/daemon.c >> index 3249c5ab4b..1e1c019eb1 100644 >> --- a/lib/daemon.c >> +++ b/lib/daemon.c >> @@ -48,7 +48,7 @@ get_detach(void) >> void >> daemonize(void) >> { >> - daemonize_start(false); >> + daemonize_start(false, false); >> daemonize_complete(); >> } >> >> diff --git a/lib/daemon.h b/lib/daemon.h >> index 0941574963..42372d1463 100644 >> --- a/lib/daemon.h >> +++ b/lib/daemon.h >> @@ -167,10 +167,10 @@ void set_detach(void); >> bool get_detach(void); >> void daemon_save_fd(int fd); >> void daemonize(void); >> -void daemonize_start(bool access_datapath); >> +void daemonize_start(bool access_datapath, bool access_hardware_ports); >> void daemonize_complete(void); >> void daemon_set_new_user(const char * user_spec); >> -void daemon_become_new_user(bool access_datapath); >> +void daemon_become_new_user(bool access_datapath, bool access_hardware_ports); >> void daemon_usage(void); >> void daemon_disable_self_confinement(void); >> bool daemon_should_self_confine(void); >> diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c >> index f1b8d64910..bae2c5f041 100644 >> --- a/ovsdb/ovsdb-client.c >> +++ b/ovsdb/ovsdb-client.c >> @@ -250,7 +250,7 @@ main(int argc, char *argv[]) >> parse_options(argc, argv); >> fatal_ignore_sigpipe(); >> >> - daemon_become_new_user(false); >> + daemon_become_new_user(false, false); >> if (optind >= argc) { >> ovs_fatal(0, "missing command name; use --help for help"); >> } >> @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database, >> >> daemon_save_fd(STDOUT_FILENO); >> daemon_save_fd(STDERR_FILENO); >> - daemonize_start(false); >> + daemonize_start(false, false); >> if (get_detach()) { >> int error; >> >> @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const char *lock) >> getting a reply of the previous >> request. */ >> daemon_save_fd(STDOUT_FILENO); >> - daemonize_start(false); >> + daemonize_start(false, false); >> lock_req_init(&lock_req, method, lock); >> >> if (get_detach()) { >> diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c >> index 33ca4910d7..4fea2dbda7 100644 >> --- a/ovsdb/ovsdb-server.c >> +++ b/ovsdb/ovsdb-server.c >> @@ -341,7 +341,7 @@ main(int argc, char *argv[]) >> &run_command, &sync_from, &sync_exclude, &active); >> is_backup = sync_from && !active; >> >> - daemon_become_new_user(false); >> + daemon_become_new_user(false, false); >> >> /* Create and initialize 'config_tmpfile' as a temporary file to hold >> * ovsdb-server's most basic configuration, and then save our initial >> @@ -359,7 +359,7 @@ main(int argc, char *argv[]) >> save_config__(config_tmpfile, &remotes, &db_filenames, sync_from, >> sync_exclude, is_backup); >> >> - daemonize_start(false); >> + daemonize_start(false, false); >> >> /* Load the saved config. */ >> load_config(config_tmpfile, &remotes, &db_filenames, &sync_from, >> diff --git a/tests/test-netflow.c b/tests/test-netflow.c >> index d2322d4509..7f89cfcae0 100644 >> --- a/tests/test-netflow.c >> +++ b/tests/test-netflow.c >> @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[]) >> } >> >> daemon_save_fd(STDOUT_FILENO); >> - daemonize_start(false); >> + daemonize_start(false, false); >> >> error = unixctl_server_create(NULL, &server); >> if (error) { >> diff --git a/tests/test-sflow.c b/tests/test-sflow.c >> index 460d4d6c54..3c617bdd16 100644 >> --- a/tests/test-sflow.c >> +++ b/tests/test-sflow.c >> @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[]) >> } >> >> daemon_save_fd(STDOUT_FILENO); >> - daemonize_start(false); >> + daemonize_start(false, false); >> >> error = unixctl_server_create(NULL, &server); >> if (error) { >> diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c >> index 3eadf54cd9..9e89827895 100644 >> --- a/tests/test-unixctl.c >> +++ b/tests/test-unixctl.c >> @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[]) >> fatal_ignore_sigpipe(); >> parse_options(&argc, &argv, &unixctl_path); >> >> - daemonize_start(false); >> + daemonize_start(false, false); >> int retval = unixctl_server_create(unixctl_path, &unixctl); >> if (retval) { >> exit(EXIT_FAILURE); >> diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c >> index eabec18a36..f81f5f759a 100644 >> --- a/utilities/ovs-ofctl.c >> +++ b/utilities/ovs-ofctl.c >> @@ -173,7 +173,7 @@ main(int argc, char *argv[]) >> ctx.argc = argc - optind; >> ctx.argv = argv + optind; >> >> - daemon_become_new_user(false); >> + daemon_become_new_user(false, false); >> if (read_only) { >> ovs_cmdl_run_command_read_only(&ctx, get_all_commands()); >> } else { >> @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool reply_to_echo_requests, >> int error; >> >> daemon_save_fd(STDERR_FILENO); >> - daemonize_start(false); >> + daemonize_start(false, false); >> error = unixctl_server_create(unixctl_path, &server); >> if (error) { >> ovs_fatal(error, "failed to create unixctl server"); >> diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c >> index b489ff5fc7..9f2fbfdf51 100644 >> --- a/utilities/ovs-testcontroller.c >> +++ b/utilities/ovs-testcontroller.c >> @@ -109,7 +109,7 @@ main(int argc, char *argv[]) >> parse_options(argc, argv); >> fatal_ignore_sigpipe(); >> >> - daemon_become_new_user(false); >> + daemon_become_new_user(false, false); >> >> if (argc - optind < 1) { >> ovs_fatal(0, "at least one vconn argument required; " >> @@ -148,7 +148,7 @@ main(int argc, char *argv[]) >> ovs_fatal(0, "no active or passive switch connections"); >> } >> >> - daemonize_start(false); >> + daemonize_start(false, false); >> >> retval = unixctl_server_create(unixctl_path, &unixctl); >> if (retval) { >> diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in >> index 9569265fcb..a6a4a24606 100644 >> --- a/vswitchd/ovs-vswitchd.8.in >> +++ b/vswitchd/ovs-vswitchd.8.in >> @@ -81,6 +81,14 @@ unavailable or unsuccessful. >> .SS "DPDK Options" >> For details on initializing \fBovs\-vswitchd\fR to use DPDK ports, >> refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5). >> +.SS "DPDK HW Access Options" >> +.IP "\fB\-\-hw\-rawio\-access\fR" >> +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability, >> +to allow userspace drivers access to raw hardware memory. This will >> +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and >> +\fBioperm()\fR functions to set port access. This is a \fBvery\fR > > The main usecase for this OVS option would be mlx5 + full hwoffload in > DPDK, so if giving an example in the manual, I'd rather mention it. > > On the other hand, iopl() is (was?) used in DPDK for x86 + virtio pmd > with virtio pre v1. > And here, I don't expect many people plugging OVS on top of this. > > Do you have another case in mind for iopl()/ioperm()? No other case. The main one I know of is mlx5 + full offload, and that does call ioperm/iopl, iiuc. > The rest lgtm. > > Thanks for working on this Aaron. > >> +powerful capability, so generally only enable as needed for specific >> +hardware. >> .SS "Daemon Options" >> .ds DD \ >> \fBovs\-vswitchd\fR detaches only after it has connected to the \
Flavio Leitner <fbl@sysclose.org> writes: > On Tue, Mar 07, 2023 at 02:32:04PM +0100, David Marchand wrote: >> On Tue, Mar 7, 2023 at 2:06 PM Aaron Conole <aconole@redhat.com> wrote: >> > >> > Open vSwitch generally tries to let the underlying operating system >> > managed the low level details of hardware, for example DMA mapping, >> > bus arbitration, etc. However, when using DPDK, the underlying >> > operating system yields control of many of these details to userspace >> > for management. >> > >> > In the case of some DPDK port drivers, configuring rte_flow or even >> > allocating resources may require access to iopl/ioperm calls, which >> > are guarded by the CAP_SYS_RAWIO privilege on linux systems. These >> > calls are dangerous, and can allow a process to completely compromise >> > a system. However, they are needed in the case of some userspace >> > driver code which manages the hardware (for example, the mlx >> > implementation of backend support for rte_flow). >> > >> > Here, we create an opt-in flag passed to the command line to allow >> > this access. We need to do this before ever accessing the database, >> > because we want to drop all privileges asap, and cannot wait for >> > a connection to the database to be established and functional before >> > dropping. There may be distribution specific ways to do capability >> > management as well (using for example, systemd), but they are not >> > as universal to the vswitchd as a flag. >> > >> > Reviewed-by: Simon Horman <simon.horman@corigine.com> >> > Signed-off-by: Aaron Conole <aconole@redhat.com> >> > --- >> > NEWS | 4 ++++ >> > lib/daemon-unix.c | 29 +++++++++++++++++++++-------- >> > lib/daemon-windows.c | 3 ++- >> > lib/daemon.c | 2 +- >> > lib/daemon.h | 4 ++-- >> > ovsdb/ovsdb-client.c | 6 +++--- >> > ovsdb/ovsdb-server.c | 4 ++-- >> > tests/test-netflow.c | 2 +- >> > tests/test-sflow.c | 2 +- >> > tests/test-unixctl.c | 2 +- >> > utilities/ovs-ofctl.c | 4 ++-- >> > utilities/ovs-testcontroller.c | 4 ++-- >> > vswitchd/ovs-vswitchd.8.in | 8 ++++++++ >> > vswitchd/ovs-vswitchd.c | 11 ++++++++++- >> > 14 files changed, 60 insertions(+), 25 deletions(-) >> > >> > diff --git a/NEWS b/NEWS >> > index 85b3496214..65f35dcdd5 100644 >> > --- a/NEWS >> > +++ b/NEWS >> > @@ -10,6 +10,10 @@ Post-v3.1.0 >> > in order to create OVSDB sockets with access mode of 0770. >> > - QoS: >> > * Added new configuration option 'jitter' for a linux-netem QoS type. >> > + - DPDK: >> > + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started >> > + with the --hw-rawio-access command line option. This allows the >> > + process extra privileges when mapping physical interconnect memory. >> > >> > >> > v3.1.0 - 16 Feb 2023 >> > diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c >> > index 1a7ba427d7..a080facddc 100644 >> > --- a/lib/daemon-unix.c >> > +++ b/lib/daemon-unix.c >> > @@ -88,7 +88,8 @@ static bool switch_user = false; >> > static uid_t uid; >> > static gid_t gid; >> > static char *user = NULL; >> > -static void daemon_become_new_user__(bool access_datapath); >> > +static void daemon_become_new_user__(bool access_datapath, >> > + bool access_hardware_ports); >> > >> > static void check_already_running(void); >> > static int lock_pidfile(FILE *, int command); >> > @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) >> > * daemonize_complete()) or that it failed to start up (by exiting with a >> > * nonzero exit code). */ >> > void >> > -daemonize_start(bool access_datapath) >> > +daemonize_start(bool access_datapath, bool access_hardware_ports) >> > { >> > assert_single_threaded(); >> > daemonize_fd = -1; >> > >> > if (switch_user) { >> > - daemon_become_new_user__(access_datapath); >> > + daemon_become_new_user__(access_datapath, access_hardware_ports); >> > switch_user = false; >> > } >> > >> > @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) >> > /* Linux specific implementation of daemon_become_new_user() >> > * using libcap-ng. */ >> > static void >> > -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) >> > +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, >> > + bool access_hardware_ports OVS_UNUSED) >> > { >> > #if defined __linux__ && HAVE_LIBCAPNG >> > int ret; >> > @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) >> > ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) >> > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) >> > || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); >> > +#ifdef DPDK_NETDEV >> > + if (access_hardware_ports && !ret) { >> > + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); >> > + VLOG_INFO("CAP_SYS_RAWIO enabled."); > > Perhaps "The Linux capability CAP_SYS_RAWIO is enabled." > >> > + } >> > +#else >> > + if (access_hardware_ports) { >> > + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); >> >> Does it mean we drop the capability? or do we drop the drop? >> I don't really have a better wording but the current log had me >> reading it a few times. > > Same here. Perhaps: > "No driver requires Linux capability CAP_SYS_RAWIO, disabling it." Okay - I'll use these log messages. > my $0.02 > fbl
diff --git a/NEWS b/NEWS index 85b3496214..65f35dcdd5 100644 --- a/NEWS +++ b/NEWS @@ -10,6 +10,10 @@ Post-v3.1.0 in order to create OVSDB sockets with access mode of 0770. - QoS: * Added new configuration option 'jitter' for a linux-netem QoS type. + - DPDK: + * ovs-vswitchd will keep the CAP_SYS_RAWIO capability when started + with the --hw-rawio-access command line option. This allows the + process extra privileges when mapping physical interconnect memory. v3.1.0 - 16 Feb 2023 diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c index 1a7ba427d7..a080facddc 100644 --- a/lib/daemon-unix.c +++ b/lib/daemon-unix.c @@ -88,7 +88,8 @@ static bool switch_user = false; static uid_t uid; static gid_t gid; static char *user = NULL; -static void daemon_become_new_user__(bool access_datapath); +static void daemon_become_new_user__(bool access_datapath, + bool access_hardware_ports); static void check_already_running(void); static int lock_pidfile(FILE *, int command); @@ -443,13 +444,13 @@ monitor_daemon(pid_t daemon_pid) * daemonize_complete()) or that it failed to start up (by exiting with a * nonzero exit code). */ void -daemonize_start(bool access_datapath) +daemonize_start(bool access_datapath, bool access_hardware_ports) { assert_single_threaded(); daemonize_fd = -1; if (switch_user) { - daemon_become_new_user__(access_datapath); + daemon_become_new_user__(access_datapath, access_hardware_ports); switch_user = false; } @@ -807,7 +808,8 @@ daemon_become_new_user_unix(void) /* Linux specific implementation of daemon_become_new_user() * using libcap-ng. */ static void -daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) +daemon_become_new_user_linux(bool access_datapath OVS_UNUSED, + bool access_hardware_ports OVS_UNUSED) { #if defined __linux__ && HAVE_LIBCAPNG int ret; @@ -827,6 +829,16 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) ret = capng_update(CAPNG_ADD, cap_sets, CAP_NET_ADMIN) || capng_update(CAPNG_ADD, cap_sets, CAP_NET_RAW) || capng_update(CAPNG_ADD, cap_sets, CAP_NET_BROADCAST); +#ifdef DPDK_NETDEV + if (access_hardware_ports && !ret) { + ret = capng_update(CAPNG_ADD, cap_sets, CAP_SYS_RAWIO); + VLOG_INFO("CAP_SYS_RAWIO enabled."); + } +#else + if (access_hardware_ports) { + VLOG_WARN("Dropped CAP_SYS_RAWIO request (no drivers)."); + } +#endif } } else { ret = -1; @@ -854,7 +866,7 @@ daemon_become_new_user_linux(bool access_datapath OVS_UNUSED) } static void -daemon_become_new_user__(bool access_datapath) +daemon_become_new_user__(bool access_datapath, bool access_hardware_ports) { /* If vlog file has been created, change its owner to the non-root user * as specifed by the --user option. */ @@ -862,7 +874,8 @@ daemon_become_new_user__(bool access_datapath) if (LINUX) { if (LIBCAPNG) { - daemon_become_new_user_linux(access_datapath); + daemon_become_new_user_linux(access_datapath, + access_hardware_ports); } else { VLOG_FATAL("%s: fail to downgrade user using libcap-ng. " "(libcap-ng is not configured at compile time), " @@ -877,11 +890,11 @@ daemon_become_new_user__(bool access_datapath) * However, there in case the user switch needs to be done * before daemonize_start(), the following API can be used. */ void -daemon_become_new_user(bool access_datapath) +daemon_become_new_user(bool access_datapath, bool access_hardware_ports) { assert_single_threaded(); if (switch_user) { - daemon_become_new_user__(access_datapath); + daemon_become_new_user__(access_datapath, access_hardware_ports); /* daemonize_start() should not switch user again. */ switch_user = false; } diff --git a/lib/daemon-windows.c b/lib/daemon-windows.c index 7e5f264f5b..f8055197eb 100644 --- a/lib/daemon-windows.c +++ b/lib/daemon-windows.c @@ -526,7 +526,8 @@ daemonize_complete(void) } void -daemon_become_new_user(bool access_datapath OVS_UNUSED) +daemon_become_new_user(bool access_datapath OVS_UNUSED, + bool access_hardware_ports OVS_UNUSED) { } diff --git a/lib/daemon.c b/lib/daemon.c index 3249c5ab4b..1e1c019eb1 100644 --- a/lib/daemon.c +++ b/lib/daemon.c @@ -48,7 +48,7 @@ get_detach(void) void daemonize(void) { - daemonize_start(false); + daemonize_start(false, false); daemonize_complete(); } diff --git a/lib/daemon.h b/lib/daemon.h index 0941574963..42372d1463 100644 --- a/lib/daemon.h +++ b/lib/daemon.h @@ -167,10 +167,10 @@ void set_detach(void); bool get_detach(void); void daemon_save_fd(int fd); void daemonize(void); -void daemonize_start(bool access_datapath); +void daemonize_start(bool access_datapath, bool access_hardware_ports); void daemonize_complete(void); void daemon_set_new_user(const char * user_spec); -void daemon_become_new_user(bool access_datapath); +void daemon_become_new_user(bool access_datapath, bool access_hardware_ports); void daemon_usage(void); void daemon_disable_self_confinement(void); bool daemon_should_self_confine(void); diff --git a/ovsdb/ovsdb-client.c b/ovsdb/ovsdb-client.c index f1b8d64910..bae2c5f041 100644 --- a/ovsdb/ovsdb-client.c +++ b/ovsdb/ovsdb-client.c @@ -250,7 +250,7 @@ main(int argc, char *argv[]) parse_options(argc, argv); fatal_ignore_sigpipe(); - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (optind >= argc) { ovs_fatal(0, "missing command name; use --help for help"); } @@ -1392,7 +1392,7 @@ do_monitor__(struct jsonrpc *rpc, const char *database, daemon_save_fd(STDOUT_FILENO); daemon_save_fd(STDERR_FILENO); - daemonize_start(false); + daemonize_start(false, false); if (get_detach()) { int error; @@ -2276,7 +2276,7 @@ do_lock(struct jsonrpc *rpc, const char *method, const char *lock) getting a reply of the previous request. */ daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); lock_req_init(&lock_req, method, lock); if (get_detach()) { diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c index 33ca4910d7..4fea2dbda7 100644 --- a/ovsdb/ovsdb-server.c +++ b/ovsdb/ovsdb-server.c @@ -341,7 +341,7 @@ main(int argc, char *argv[]) &run_command, &sync_from, &sync_exclude, &active); is_backup = sync_from && !active; - daemon_become_new_user(false); + daemon_become_new_user(false, false); /* Create and initialize 'config_tmpfile' as a temporary file to hold * ovsdb-server's most basic configuration, and then save our initial @@ -359,7 +359,7 @@ main(int argc, char *argv[]) save_config__(config_tmpfile, &remotes, &db_filenames, sync_from, sync_exclude, is_backup); - daemonize_start(false); + daemonize_start(false, false); /* Load the saved config. */ load_config(config_tmpfile, &remotes, &db_filenames, &sync_from, diff --git a/tests/test-netflow.c b/tests/test-netflow.c index d2322d4509..7f89cfcae0 100644 --- a/tests/test-netflow.c +++ b/tests/test-netflow.c @@ -195,7 +195,7 @@ test_netflow_main(int argc, char *argv[]) } daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(NULL, &server); if (error) { diff --git a/tests/test-sflow.c b/tests/test-sflow.c index 460d4d6c54..3c617bdd16 100644 --- a/tests/test-sflow.c +++ b/tests/test-sflow.c @@ -709,7 +709,7 @@ test_sflow_main(int argc, char *argv[]) } daemon_save_fd(STDOUT_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(NULL, &server); if (error) { diff --git a/tests/test-unixctl.c b/tests/test-unixctl.c index 3eadf54cd9..9e89827895 100644 --- a/tests/test-unixctl.c +++ b/tests/test-unixctl.c @@ -83,7 +83,7 @@ test_unixctl_main(int argc, char *argv[]) fatal_ignore_sigpipe(); parse_options(&argc, &argv, &unixctl_path); - daemonize_start(false); + daemonize_start(false, false); int retval = unixctl_server_create(unixctl_path, &unixctl); if (retval) { exit(EXIT_FAILURE); diff --git a/utilities/ovs-ofctl.c b/utilities/ovs-ofctl.c index eabec18a36..f81f5f759a 100644 --- a/utilities/ovs-ofctl.c +++ b/utilities/ovs-ofctl.c @@ -173,7 +173,7 @@ main(int argc, char *argv[]) ctx.argc = argc - optind; ctx.argv = argv + optind; - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (read_only) { ovs_cmdl_run_command_read_only(&ctx, get_all_commands()); } else { @@ -2127,7 +2127,7 @@ monitor_vconn(struct vconn *vconn, bool reply_to_echo_requests, int error; daemon_save_fd(STDERR_FILENO); - daemonize_start(false); + daemonize_start(false, false); error = unixctl_server_create(unixctl_path, &server); if (error) { ovs_fatal(error, "failed to create unixctl server"); diff --git a/utilities/ovs-testcontroller.c b/utilities/ovs-testcontroller.c index b489ff5fc7..9f2fbfdf51 100644 --- a/utilities/ovs-testcontroller.c +++ b/utilities/ovs-testcontroller.c @@ -109,7 +109,7 @@ main(int argc, char *argv[]) parse_options(argc, argv); fatal_ignore_sigpipe(); - daemon_become_new_user(false); + daemon_become_new_user(false, false); if (argc - optind < 1) { ovs_fatal(0, "at least one vconn argument required; " @@ -148,7 +148,7 @@ main(int argc, char *argv[]) ovs_fatal(0, "no active or passive switch connections"); } - daemonize_start(false); + daemonize_start(false, false); retval = unixctl_server_create(unixctl_path, &unixctl); if (retval) { diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index 9569265fcb..a6a4a24606 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -81,6 +81,14 @@ unavailable or unsuccessful. .SS "DPDK Options" For details on initializing \fBovs\-vswitchd\fR to use DPDK ports, refer to the documentation or \fBovs\-vswitchd.conf.db\fR(5). +.SS "DPDK HW Access Options" +.IP "\fB\-\-hw\-rawio\-access\fR" +Tells \fBovs\-vswitchd\fR to retain the \fBCAP_SYS_RAWIO\fR capability, +to allow userspace drivers access to raw hardware memory. This will +also allow the \fBovs\-vswitchd\fR daemon to call \fBiopl()\fR and +\fBioperm()\fR functions to set port access. This is a \fBvery\fR +powerful capability, so generally only enable as needed for specific +hardware. .SS "Daemon Options" .ds DD \ \fBovs\-vswitchd\fR detaches only after it has connected to the \ diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index 407bfc60eb..a244d2f709 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -60,6 +60,9 @@ VLOG_DEFINE_THIS_MODULE(vswitchd); * the kernel from paging any of its memory to disk. */ static bool want_mlockall; +/* --hw-rawio-access: If set, retains CAP_SYS_RAWIO privileges. */ +static bool hw_rawio_access; + static unixctl_cb_func ovs_vswitchd_exit; static char *parse_options(int argc, char *argv[], char **unixctl_path); @@ -89,7 +92,7 @@ main(int argc, char *argv[]) remote = parse_options(argc, argv, &unixctl_path); fatal_ignore_sigpipe(); - daemonize_start(true); + daemonize_start(true, hw_rawio_access); if (want_mlockall) { #ifdef HAVE_MLOCKALL @@ -169,6 +172,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) OPT_DPDK, SSL_OPTION_ENUMS, OPT_DUMMY_NUMA, + OPT_HW_RAWIO_ACCESS, }; static const struct option long_options[] = { {"help", no_argument, NULL, 'h'}, @@ -185,6 +189,7 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) {"disable-system-route", no_argument, NULL, OPT_DISABLE_SYSTEM_ROUTE}, {"dpdk", optional_argument, NULL, OPT_DPDK}, {"dummy-numa", required_argument, NULL, OPT_DUMMY_NUMA}, + {"hw-rawio-access", no_argument, NULL, OPT_HW_RAWIO_ACCESS}, {NULL, 0, NULL, 0}, }; char *short_options = ovs_cmdl_long_options_to_short_options(long_options); @@ -249,6 +254,10 @@ parse_options(int argc, char *argv[], char **unixctl_pathp) ovs_numa_set_dummy(optarg); break; + case OPT_HW_RAWIO_ACCESS: + hw_rawio_access = true; + break; + default: abort(); }