Message ID | 20190418150528.10284-1-aconole@redhat.com |
---|---|
State | Rejected |
Headers | show |
Series | [ovs-dev] rhel: update udev rules to allow vfio access | expand |
On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > On some systems, it's possible that the initialization of the misc chardev > associated with /dev/vfio/vfio is delayed. This happens on machines with > large numbers of cores (at least 88+). If this delay exceeds the time > required for ovs-vswitchd to call the dpdk initialization routine, the > permissions won't be updated and the open call will return EACCES. > > To fix this, we explicitly allow global open. This means any user may > open() the vfio device before the vfio modules are initialized, thus > triggering a module load. The applications (including ovs-vswitchd) > would be pended while the module loads. This should be safe, as any user > may open the vfio device once the module is loaded anyway, since the. > module rewrites the permissions as 0666. > > Signed-off-by: Aaron Conole <aconole@redhat.com> > --- > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > index 8e34b2a2b..c0504ab5a 100644 > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > @@ -1 +1,2 @@ > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" Wouldn't be better to fix this at the module to fix the permissions before whatever is taking long to initialize? fbl
On Thu, 18 Apr 2019 13:23:54 -0300 Flavio Leitner <fbl@sysclose.org> wrote: > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > > On some systems, it's possible that the initialization of the misc chardev > > associated with /dev/vfio/vfio is delayed. This happens on machines with > > large numbers of cores (at least 88+). If this delay exceeds the time > > required for ovs-vswitchd to call the dpdk initialization routine, the > > permissions won't be updated and the open call will return EACCES. > > > > To fix this, we explicitly allow global open. This means any user may > > open() the vfio device before the vfio modules are initialized, thus > > triggering a module load. The applications (including ovs-vswitchd) > > would be pended while the module loads. This should be safe, as any user > > may open the vfio device once the module is loaded anyway, since the. > > module rewrites the permissions as 0666. > > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > --- > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > index 8e34b2a2b..c0504ab5a 100644 > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > @@ -1 +1,2 @@ > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" > > Wouldn't be better to fix this at the module to fix the permissions > before whatever is taking long to initialize? The stronger security policy is to not allow unprivileged users to trigger an operation which loads a module, so no, upper level system policy should relax that as necessary, but not the module. VFIO is unavailable until enabled through privileged entities, so there's some degree that ovs is "jumping the gun" in making use of it, afaict. Thanks, Alex
On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: > On Thu, 18 Apr 2019 13:23:54 -0300 > Flavio Leitner <fbl@sysclose.org> wrote: > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > > > On some systems, it's possible that the initialization of the misc chardev > > > associated with /dev/vfio/vfio is delayed. This happens on machines with > > > large numbers of cores (at least 88+). If this delay exceeds the time > > > required for ovs-vswitchd to call the dpdk initialization routine, the > > > permissions won't be updated and the open call will return EACCES. > > > > > > To fix this, we explicitly allow global open. This means any user may > > > open() the vfio device before the vfio modules are initialized, thus > > > triggering a module load. The applications (including ovs-vswitchd) > > > would be pended while the module loads. This should be safe, as any user > > > may open the vfio device once the module is loaded anyway, since the. > > > module rewrites the permissions as 0666. > > > > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > > --- > > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > index 8e34b2a2b..c0504ab5a 100644 > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > @@ -1 +1,2 @@ > > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" > > > > Wouldn't be better to fix this at the module to fix the permissions > > before whatever is taking long to initialize? > > The stronger security policy is to not allow unprivileged users to > trigger an operation which loads a module, so no, upper level system > policy should relax that as necessary, but not the module. VFIO is > unavailable until enabled through privileged entities, so there's some > degree that ovs is "jumping the gun" in making use of it, afaict. Ok, but if the machine has few cores, then the permission is set to 0666 and I assumed that it was done by the module itself. Assuming that it's true, then why can't the module fix the permission before? Another thing is that when the module is ready and the event is sent out, what holds OVS for not trying to open and get EACCESS before udev is triggered to fix the device permission? fbl
Alex Williamson <alex.williamson@redhat.com> writes: > On Thu, 18 Apr 2019 13:23:54 -0300 > Flavio Leitner <fbl@sysclose.org> wrote: > >> On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: >> > On some systems, it's possible that the initialization of the misc chardev >> > associated with /dev/vfio/vfio is delayed. This happens on machines with >> > large numbers of cores (at least 88+). If this delay exceeds the time >> > required for ovs-vswitchd to call the dpdk initialization routine, the >> > permissions won't be updated and the open call will return EACCES. >> > >> > To fix this, we explicitly allow global open. This means any user may >> > open() the vfio device before the vfio modules are initialized, thus >> > triggering a module load. The applications (including ovs-vswitchd) >> > would be pended while the module loads. This should be safe, as any user >> > may open the vfio device once the module is loaded anyway, since the. >> > module rewrites the permissions as 0666. >> > >> > Signed-off-by: Aaron Conole <aconole@redhat.com> >> > --- >> > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > b/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > index 8e34b2a2b..c0504ab5a 100644 >> > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > @@ -1 +1,2 @@ >> > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" >> > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" >> >> Wouldn't be better to fix this at the module to fix the permissions >> before whatever is taking long to initialize? It's a bit of chicken and egg. The module needs to be ready before we can open it, and we open it before it's ready because of the way DPDK is designed. There might be a better way of rewriting the DPDK initialization; it's a longer path, and I'm not sure it's really worth it but maybe it's better? > The stronger security policy is to not allow unprivileged users to > trigger an operation which loads a module, so no, upper level system > policy should relax that as necessary, but not the module. VFIO is > unavailable until enabled through privileged entities, so there's some > degree that ovs is "jumping the gun" in making use of it, afaict. Thanks. I agree, system policy is the right place to relax the requirement. I also agree, the way DPDK gets initialized it would be prematurely asking for vfio (since the module isn't even finished loading). I guess another approach could be to patch where the open() call happens, to retry on EACCES for a specific amount of time (delaying startup but allowing the module load to complete). It's a different sort of hack. Maybe it's an acceptable compromise? > Thanks, > > Alex
Flavio Leitner <fbl@sysclose.org> writes: > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: >> On Thu, 18 Apr 2019 13:23:54 -0300 >> Flavio Leitner <fbl@sysclose.org> wrote: >> >> > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: >> > > On some systems, it's possible that the initialization of the misc chardev >> > > associated with /dev/vfio/vfio is delayed. This happens on machines with >> > > large numbers of cores (at least 88+). If this delay exceeds the time >> > > required for ovs-vswitchd to call the dpdk initialization routine, the >> > > permissions won't be updated and the open call will return EACCES. >> > > >> > > To fix this, we explicitly allow global open. This means any user may >> > > open() the vfio device before the vfio modules are initialized, thus >> > > triggering a module load. The applications (including ovs-vswitchd) >> > > would be pended while the module loads. This should be safe, as any user >> > > may open the vfio device once the module is loaded anyway, since the. >> > > module rewrites the permissions as 0666. >> > > >> > > Signed-off-by: Aaron Conole <aconole@redhat.com> >> > > --- >> > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + >> > > 1 file changed, 1 insertion(+) >> > > >> > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > > b/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > > index 8e34b2a2b..c0504ab5a 100644 >> > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules >> > > @@ -1 +1,2 @@ >> > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" >> > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" >> > >> > Wouldn't be better to fix this at the module to fix the permissions >> > before whatever is taking long to initialize? >> >> The stronger security policy is to not allow unprivileged users to >> trigger an operation which loads a module, so no, upper level system >> policy should relax that as necessary, but not the module. VFIO is >> unavailable until enabled through privileged entities, so there's some >> degree that ovs is "jumping the gun" in making use of it, afaict. > > Ok, but if the machine has few cores, then the permission is set to > 0666 and I assumed that it was done by the module itself. Assuming > that it's true, then why can't the module fix the permission before? > > Another thing is that when the module is ready and the event is sent > out, what holds OVS for not trying to open and get EACCESS before > udev is triggered to fix the device permission? The device is there from the start because of the MODULE_ALIAS and MODULE_ALIAS_MISCDEV properties; it gets created on the devfs before it's loaded. Then, on the first open() call, if it isn't loaded the misc chardev driver will block while loading the module. So from the start it's in existence. The VFS layer is what returns the EACCES, and that prevents the module load / blocking behavior we want. If OvS is running as root, because the device is 0600, then the module will get loaded. > fbl
On Thu, 18 Apr 2019 13:56:23 -0300 Flavio Leitner <fbl@sysclose.org> wrote: > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: > > On Thu, 18 Apr 2019 13:23:54 -0300 > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > > > > On some systems, it's possible that the initialization of the misc chardev > > > > associated with /dev/vfio/vfio is delayed. This happens on machines with > > > > large numbers of cores (at least 88+). If this delay exceeds the time > > > > required for ovs-vswitchd to call the dpdk initialization routine, the > > > > permissions won't be updated and the open call will return EACCES. > > > > > > > > To fix this, we explicitly allow global open. This means any user may > > > > open() the vfio device before the vfio modules are initialized, thus > > > > triggering a module load. The applications (including ovs-vswitchd) > > > > would be pended while the module loads. This should be safe, as any user > > > > may open the vfio device once the module is loaded anyway, since the. > > > > module rewrites the permissions as 0666. > > > > > > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > > > --- > > > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > index 8e34b2a2b..c0504ab5a 100644 > > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > @@ -1 +1,2 @@ > > > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" > > > > > > Wouldn't be better to fix this at the module to fix the permissions > > > before whatever is taking long to initialize? > > > > The stronger security policy is to not allow unprivileged users to > > trigger an operation which loads a module, so no, upper level system > > policy should relax that as necessary, but not the module. VFIO is > > unavailable until enabled through privileged entities, so there's some > > degree that ovs is "jumping the gun" in making use of it, afaict. > > Ok, but if the machine has few cores, then the permission is set to > 0666 and I assumed that it was done by the module itself. Assuming > that it's true, then why can't the module fix the permission before? Because if the file were initially mode 0666 then an unprivileged user accessing the file will load the module, which as discussed above is a weaker security policy. Any case of the permissions being "fixed" by the module load, directed by a privileged entity, asynchronous to ovs using this interface is a race, no matter how early the module manages to fix those permissions. > Another thing is that when the module is ready and the event is sent > out, what holds OVS for not trying to open and get EACCESS before > udev is triggered to fix the device permission? If there were a race, could ovs ever run before udev on system startup? Probably not. Ideally perhaps a cleaner solution might be an explicit dependency on the vfio module specific to ovs startup rather than changing a system policy, but it really depends on the context and use cases. Thanks, Alex
On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote: > On Thu, 18 Apr 2019 13:56:23 -0300 > Flavio Leitner <fbl@sysclose.org> wrote: > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: > > > On Thu, 18 Apr 2019 13:23:54 -0300 > > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > > > > > On some systems, it's possible that the initialization of the misc chardev > > > > > associated with /dev/vfio/vfio is delayed. This happens on machines with > > > > > large numbers of cores (at least 88+). If this delay exceeds the time > > > > > required for ovs-vswitchd to call the dpdk initialization routine, the > > > > > permissions won't be updated and the open call will return EACCES. > > > > > > > > > > To fix this, we explicitly allow global open. This means any user may > > > > > open() the vfio device before the vfio modules are initialized, thus > > > > > triggering a module load. The applications (including ovs-vswitchd) > > > > > would be pended while the module loads. This should be safe, as any user > > > > > may open the vfio device once the module is loaded anyway, since the. > > > > > module rewrites the permissions as 0666. > > > > > > > > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > > > > --- > > > > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > index 8e34b2a2b..c0504ab5a 100644 > > > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > @@ -1 +1,2 @@ > > > > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > > > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" > > > > > > > > Wouldn't be better to fix this at the module to fix the permissions > > > > before whatever is taking long to initialize? > > > > > > The stronger security policy is to not allow unprivileged users to > > > trigger an operation which loads a module, so no, upper level system > > > policy should relax that as necessary, but not the module. VFIO is > > > unavailable until enabled through privileged entities, so there's some > > > degree that ovs is "jumping the gun" in making use of it, afaict. > > > > Ok, but if the machine has few cores, then the permission is set to > > 0666 and I assumed that it was done by the module itself. Assuming > > that it's true, then why can't the module fix the permission before? > > Because if the file were initially mode 0666 then an unprivileged user > accessing the file will load the module, which as discussed above is a > weaker security policy. Agreed, but I was referring to after the proper trigger. > Any case of the permissions being "fixed" by > the module load, directed by a privileged entity, asynchronous to ovs > using this interface is a race, no matter how early the module manages > to fix those permissions. True. > > Another thing is that when the module is ready and the event is sent > > out, what holds OVS for not trying to open and get EACCESS before > > udev is triggered to fix the device permission? > > If there were a race, could ovs ever run before udev on system > startup? Probably not. It does wait, but only for the udev to settle, which means if the module has not triggered an event until that time, OVS will not wait and we still have a race. > Ideally perhaps a cleaner solution might be an > explicit dependency on the vfio module specific to ovs startup rather > than changing a system policy, but it really depends on the context and > use cases. Thanks, It does have. The driverctl will bind the devices to vfio-pci but the problem is that which signal we should rely on to know when the vfio module is still initializing, or failed or finished. BTW, this is not specific to OVS because any DPDK based app using VFIO would face the issue. fbl
On Thu, 18 Apr 2019 15:50:43 -0300 Flavio Leitner <fbl@sysclose.org> wrote: > On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote: > > On Thu, 18 Apr 2019 13:56:23 -0300 > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: > > > > On Thu, 18 Apr 2019 13:23:54 -0300 > > > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > > > > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote: > > > > > > On some systems, it's possible that the initialization of the misc chardev > > > > > > associated with /dev/vfio/vfio is delayed. This happens on machines with > > > > > > large numbers of cores (at least 88+). If this delay exceeds the time > > > > > > required for ovs-vswitchd to call the dpdk initialization routine, the > > > > > > permissions won't be updated and the open call will return EACCES. > > > > > > > > > > > > To fix this, we explicitly allow global open. This means any user may > > > > > > open() the vfio device before the vfio modules are initialized, thus > > > > > > triggering a module load. The applications (including ovs-vswitchd) > > > > > > would be pended while the module loads. This should be safe, as any user > > > > > > may open the vfio device once the module is loaded anyway, since the. > > > > > > module rewrites the permissions as 0666. > > > > > > > > > > > > Signed-off-by: Aaron Conole <aconole@redhat.com> > > > > > > --- > > > > > > rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + > > > > > > 1 file changed, 1 insertion(+) > > > > > > > > > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > > index 8e34b2a2b..c0504ab5a 100644 > > > > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules > > > > > > @@ -1 +1,2 @@ > > > > > > ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" > > > > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666" > > > > > > > > > > Wouldn't be better to fix this at the module to fix the permissions > > > > > before whatever is taking long to initialize? > > > > > > > > The stronger security policy is to not allow unprivileged users to > > > > trigger an operation which loads a module, so no, upper level system > > > > policy should relax that as necessary, but not the module. VFIO is > > > > unavailable until enabled through privileged entities, so there's some > > > > degree that ovs is "jumping the gun" in making use of it, afaict. > > > > > > Ok, but if the machine has few cores, then the permission is set to > > > 0666 and I assumed that it was done by the module itself. Assuming > > > that it's true, then why can't the module fix the permission before? > > > > Because if the file were initially mode 0666 then an unprivileged user > > accessing the file will load the module, which as discussed above is a > > weaker security policy. > > Agreed, but I was referring to after the proper trigger. > > > Any case of the permissions being "fixed" by > > the module load, directed by a privileged entity, asynchronous to ovs > > using this interface is a race, no matter how early the module manages > > to fix those permissions. > > True. > > > > Another thing is that when the module is ready and the event is sent > > > out, what holds OVS for not trying to open and get EACCESS before > > > udev is triggered to fix the device permission? > > > > If there were a race, could ovs ever run before udev on system > > startup? Probably not. > > It does wait, but only for the udev to settle, which means if the > module has not triggered an event until that time, OVS will not wait > and we still have a race. But udev isn't waiting on the module to trigger an event, the module contains a MODULE_ALIAS, so I believe it's just the static processing of the modules.alias that triggers the event. > > Ideally perhaps a cleaner solution might be an > > explicit dependency on the vfio module specific to ovs startup rather > > than changing a system policy, but it really depends on the context and > > use cases. Thanks, > > It does have. The driverctl will bind the devices to vfio-pci but > the problem is that which signal we should rely on to know when > the vfio module is still initializing, or failed or finished. What signal/mechanism is being used currently? If driverctl is asked to set a driver override it does: 1) if module is not loaded, modprobe 2) unbinds device from existing driver, if any 3) sets driver_override 4) triggers drivers_probe 5) tests if device is bound to a driver, any driver There are certainly some deficiencies here, unbinding the device before setting the driver_override leaves the device open to getting bound by the wrong driver, and the verification in the last step could be more specific in testing for binding to the correct driver, but step #1 is the modprobe of the driver, which should be a synchronous operation. We shouldn't be able to complete a 'driverctl set-override $DEV vfio-pci' without vfio being initialized, afaict. Thanks, Alex
On Thu, Apr 18, 2019 at 01:46:22PM -0600, Alex Williamson wrote: > On Thu, 18 Apr 2019 15:50:43 -0300 > Flavio Leitner <fbl@sysclose.org> wrote: > > > On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote: > > > On Thu, 18 Apr 2019 13:56:23 -0300 > > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: > > > > > On Thu, 18 Apr 2019 13:23:54 -0300 > > > > > Flavio Leitner <fbl@sysclose.org> wrote: > > > > Another thing is that when the module is ready and the event is sent > > > > out, what holds OVS for not trying to open and get EACCESS before > > > > udev is triggered to fix the device permission? > > > > > > If there were a race, could ovs ever run before udev on system > > > startup? Probably not. > > > > It does wait, but only for the udev to settle, which means if the > > module has not triggered an event until that time, OVS will not wait > > and we still have a race. > > But udev isn't waiting on the module to trigger an event, the module > contains a MODULE_ALIAS, so I believe it's just the static processing > of the modules.alias that triggers the event. What I am saying is that driverctl will trigger load the module and bind the device, later on systemd will trigger OVS service which waits udev to settle, but none of that guarantees that the permissions are updated when OVS is initializing, see below. > > > Ideally perhaps a cleaner solution might be an > > > explicit dependency on the vfio module specific to ovs startup rather > > > than changing a system policy, but it really depends on the context and > > > use cases. Thanks, > > > > It does have. The driverctl will bind the devices to vfio-pci but > > the problem is that which signal we should rely on to know when > > the vfio module is still initializing, or failed or finished. > > What signal/mechanism is being used currently? If driverctl is asked > to set a driver override it does: > > 1) if module is not loaded, modprobe > 2) unbinds device from existing driver, if any > 3) sets driver_override > 4) triggers drivers_probe > 5) tests if device is bound to a driver, any driver > > There are certainly some deficiencies here, unbinding the device before > setting the driver_override leaves the device open to getting bound by > the wrong driver, and the verification in the last step could be more > specific in testing for binding to the correct driver, but step #1 is > the modprobe of the driver, which should be a synchronous operation. > We shouldn't be able to complete a 'driverctl set-override $DEV > vfio-pci' without vfio being initialized, afaict. Thanks, Right, sounds like systemd is starting openvswitch service before the driverctl is done with the devices. fbl
Flavio Leitner <fbl@sysclose.org> writes: > On Thu, Apr 18, 2019 at 01:46:22PM -0600, Alex Williamson wrote: >> On Thu, 18 Apr 2019 15:50:43 -0300 >> Flavio Leitner <fbl@sysclose.org> wrote: >> >> > On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote: >> > > On Thu, 18 Apr 2019 13:56:23 -0300 >> > > Flavio Leitner <fbl@sysclose.org> wrote: >> > > >> > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote: >> > > > > On Thu, 18 Apr 2019 13:23:54 -0300 >> > > > > Flavio Leitner <fbl@sysclose.org> wrote: >> > > > Another thing is that when the module is ready and the event is sent >> > > > out, what holds OVS for not trying to open and get EACCESS before >> > > > udev is triggered to fix the device permission? >> > > >> > > If there were a race, could ovs ever run before udev on system >> > > startup? Probably not. >> > >> > It does wait, but only for the udev to settle, which means if the >> > module has not triggered an event until that time, OVS will not wait >> > and we still have a race. >> >> But udev isn't waiting on the module to trigger an event, the module >> contains a MODULE_ALIAS, so I believe it's just the static processing >> of the modules.alias that triggers the event. > > What I am saying is that driverctl will trigger load the module and > bind the device, later on systemd will trigger OVS service which > waits udev to settle, but none of that guarantees that the permissions > are updated when OVS is initializing, see below. > >> > > Ideally perhaps a cleaner solution might be an >> > > explicit dependency on the vfio module specific to ovs startup rather >> > > than changing a system policy, but it really depends on the context and >> > > use cases. Thanks, >> > >> > It does have. The driverctl will bind the devices to vfio-pci but >> > the problem is that which signal we should rely on to know when >> > the vfio module is still initializing, or failed or finished. >> >> What signal/mechanism is being used currently? If driverctl is asked >> to set a driver override it does: >> >> 1) if module is not loaded, modprobe >> 2) unbinds device from existing driver, if any >> 3) sets driver_override >> 4) triggers drivers_probe >> 5) tests if device is bound to a driver, any driver >> >> There are certainly some deficiencies here, unbinding the device before >> setting the driver_override leaves the device open to getting bound by >> the wrong driver, and the verification in the last step could be more >> specific in testing for binding to the correct driver, but step #1 is >> the modprobe of the driver, which should be a synchronous operation. >> We shouldn't be able to complete a 'driverctl set-override $DEV >> vfio-pci' without vfio being initialized, afaict. Thanks, > > Right, sounds like systemd is starting openvswitch service before > the driverctl is done with the devices. I'm not sure. The ordering could be a problem. Perhaps we could try adding: After=basic.target for the ovs-vswitchd.service if we have a machine that exhibits this behavior, but I don't know if it will resolve the race. There is some kind of strange ordering looking at: https://www.freedesktop.org/software/systemd/man/systemd.special.html and https://www.freedesktop.org/software/systemd/man/bootup.html# I can't find how network.target dependency really works w.r.t. ordering and the driverctl+basic.target services. > fbl
diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules index 8e34b2a2b..c0504ab5a 100644 --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules @@ -1 +1,2 @@ ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660" +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"
On some systems, it's possible that the initialization of the misc chardev associated with /dev/vfio/vfio is delayed. This happens on machines with large numbers of cores (at least 88+). If this delay exceeds the time required for ovs-vswitchd to call the dpdk initialization routine, the permissions won't be updated and the open call will return EACCES. To fix this, we explicitly allow global open. This means any user may open() the vfio device before the vfio modules are initialized, thus triggering a module load. The applications (including ovs-vswitchd) would be pended while the module loads. This should be safe, as any user may open the vfio device once the module is loaded anyway, since the. module rewrites the permissions as 0666. Signed-off-by: Aaron Conole <aconole@redhat.com> --- rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 + 1 file changed, 1 insertion(+)