diff mbox series

[ovs-dev] rhel: update udev rules to allow vfio access

Message ID 20190418150528.10284-1-aconole@redhat.com
State Rejected
Headers show
Series [ovs-dev] rhel: update udev rules to allow vfio access | expand

Commit Message

Aaron Conole April 18, 2019, 3:05 p.m. UTC
On some systems, it's possible that the initialization of the misc chardev
associated with /dev/vfio/vfio is delayed.  This happens on machines with
large numbers of cores (at least 88+).  If this delay exceeds the time
required for ovs-vswitchd to call the dpdk initialization routine, the
permissions won't be updated and the open call will return EACCES.

To fix this, we explicitly allow global open.  This means any user may
open() the vfio device before the vfio modules are initialized, thus
triggering a module load.  The applications (including ovs-vswitchd)
would be pended while the module loads.  This should be safe, as any user
may open the vfio device once the module is loaded anyway, since the.
module rewrites the permissions as 0666.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
 1 file changed, 1 insertion(+)

Comments

Li,Rongqing via dev April 18, 2019, 4:23 p.m. UTC | #1
On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:
> On some systems, it's possible that the initialization of the misc chardev
> associated with /dev/vfio/vfio is delayed.  This happens on machines with
> large numbers of cores (at least 88+).  If this delay exceeds the time
> required for ovs-vswitchd to call the dpdk initialization routine, the
> permissions won't be updated and the open call will return EACCES.
> 
> To fix this, we explicitly allow global open.  This means any user may
> open() the vfio device before the vfio modules are initialized, thus
> triggering a module load.  The applications (including ovs-vswitchd)
> would be pended while the module loads.  This should be safe, as any user
> may open the vfio device once the module is loaded anyway, since the.
> module rewrites the permissions as 0666.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> index 8e34b2a2b..c0504ab5a 100644
> --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> @@ -1 +1,2 @@
>  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"

Wouldn't be better to fix this at the module to fix the permissions
before whatever is taking long to initialize?

fbl
Alex Williamson April 18, 2019, 4:43 p.m. UTC | #2
On Thu, 18 Apr 2019 13:23:54 -0300
Flavio Leitner <fbl@sysclose.org> wrote:

> On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:
> > On some systems, it's possible that the initialization of the misc chardev
> > associated with /dev/vfio/vfio is delayed.  This happens on machines with
> > large numbers of cores (at least 88+).  If this delay exceeds the time
> > required for ovs-vswitchd to call the dpdk initialization routine, the
> > permissions won't be updated and the open call will return EACCES.
> > 
> > To fix this, we explicitly allow global open.  This means any user may
> > open() the vfio device before the vfio modules are initialized, thus
> > triggering a module load.  The applications (including ovs-vswitchd)
> > would be pended while the module loads.  This should be safe, as any user
> > may open the vfio device once the module is loaded anyway, since the.
> > module rewrites the permissions as 0666.
> > 
> > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > ---
> >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > index 8e34b2a2b..c0504ab5a 100644
> > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > @@ -1 +1,2 @@
> >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"  
> 
> Wouldn't be better to fix this at the module to fix the permissions
> before whatever is taking long to initialize?

The stronger security policy is to not allow unprivileged users to
trigger an operation which loads a module, so no, upper level system
policy should relax that as necessary, but not the module.  VFIO is
unavailable until enabled through privileged entities, so there's some
degree that ovs is "jumping the gun" in making use of it, afaict.
Thanks,

Alex
Li,Rongqing via dev April 18, 2019, 4:56 p.m. UTC | #3
On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:
> On Thu, 18 Apr 2019 13:23:54 -0300
> Flavio Leitner <fbl@sysclose.org> wrote:
> 
> > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:
> > > On some systems, it's possible that the initialization of the misc chardev
> > > associated with /dev/vfio/vfio is delayed.  This happens on machines with
> > > large numbers of cores (at least 88+).  If this delay exceeds the time
> > > required for ovs-vswitchd to call the dpdk initialization routine, the
> > > permissions won't be updated and the open call will return EACCES.
> > > 
> > > To fix this, we explicitly allow global open.  This means any user may
> > > open() the vfio device before the vfio modules are initialized, thus
> > > triggering a module load.  The applications (including ovs-vswitchd)
> > > would be pended while the module loads.  This should be safe, as any user
> > > may open the vfio device once the module is loaded anyway, since the.
> > > module rewrites the permissions as 0666.
> > > 
> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > ---
> > >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > index 8e34b2a2b..c0504ab5a 100644
> > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > @@ -1 +1,2 @@
> > >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"  
> > 
> > Wouldn't be better to fix this at the module to fix the permissions
> > before whatever is taking long to initialize?
> 
> The stronger security policy is to not allow unprivileged users to
> trigger an operation which loads a module, so no, upper level system
> policy should relax that as necessary, but not the module.  VFIO is
> unavailable until enabled through privileged entities, so there's some
> degree that ovs is "jumping the gun" in making use of it, afaict.

Ok, but if the machine has few cores, then the permission is set to
0666 and I assumed that it was done by the module itself. Assuming
that it's true, then why can't the module fix the permission before?

Another thing is that when the module is ready and the event is sent
out, what holds OVS for not trying to open and get EACCESS before
udev is triggered to fix the device permission?

fbl
Aaron Conole April 18, 2019, 4:57 p.m. UTC | #4
Alex Williamson <alex.williamson@redhat.com> writes:

> On Thu, 18 Apr 2019 13:23:54 -0300
> Flavio Leitner <fbl@sysclose.org> wrote:
>
>> On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:
>> > On some systems, it's possible that the initialization of the misc chardev
>> > associated with /dev/vfio/vfio is delayed.  This happens on machines with
>> > large numbers of cores (at least 88+).  If this delay exceeds the time
>> > required for ovs-vswitchd to call the dpdk initialization routine, the
>> > permissions won't be updated and the open call will return EACCES.
>> > 
>> > To fix this, we explicitly allow global open.  This means any user may
>> > open() the vfio device before the vfio modules are initialized, thus
>> > triggering a module load.  The applications (including ovs-vswitchd)
>> > would be pended while the module loads.  This should be safe, as any user
>> > may open the vfio device once the module is loaded anyway, since the.
>> > module rewrites the permissions as 0666.
>> > 
>> > Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > ---
>> >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
>> >  1 file changed, 1 insertion(+)
>> > 
>> > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > b/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > index 8e34b2a2b..c0504ab5a 100644
>> > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > @@ -1 +1,2 @@
>> >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
>> > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"  
>> 
>> Wouldn't be better to fix this at the module to fix the permissions
>> before whatever is taking long to initialize?

It's a bit of chicken and egg.  The module needs to be ready before we can
open it, and we open it before it's ready because of the way DPDK is
designed.

There might be a better way of rewriting the DPDK initialization; it's a
longer path, and I'm not sure it's really worth it but maybe it's better?

> The stronger security policy is to not allow unprivileged users to
> trigger an operation which loads a module, so no, upper level system
> policy should relax that as necessary, but not the module.  VFIO is
> unavailable until enabled through privileged entities, so there's some
> degree that ovs is "jumping the gun" in making use of it, afaict.

Thanks.  I agree, system policy is the right place to relax the
requirement.  I also agree, the way DPDK gets initialized it would be
prematurely asking for vfio (since the module isn't even finished
loading).

I guess another approach could be to patch where the open() call
happens, to retry on EACCES for a specific amount of time (delaying
startup but allowing the module load to complete).  It's a different
sort of hack.  Maybe it's an acceptable compromise?

> Thanks,
>
> Alex
Aaron Conole April 18, 2019, 5:17 p.m. UTC | #5
Flavio Leitner <fbl@sysclose.org> writes:

> On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:
>> On Thu, 18 Apr 2019 13:23:54 -0300
>> Flavio Leitner <fbl@sysclose.org> wrote:
>> 
>> > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:
>> > > On some systems, it's possible that the initialization of the misc chardev
>> > > associated with /dev/vfio/vfio is delayed.  This happens on machines with
>> > > large numbers of cores (at least 88+).  If this delay exceeds the time
>> > > required for ovs-vswitchd to call the dpdk initialization routine, the
>> > > permissions won't be updated and the open call will return EACCES.
>> > > 
>> > > To fix this, we explicitly allow global open.  This means any user may
>> > > open() the vfio device before the vfio modules are initialized, thus
>> > > triggering a module load.  The applications (including ovs-vswitchd)
>> > > would be pended while the module loads.  This should be safe, as any user
>> > > may open the vfio device once the module is loaded anyway, since the.
>> > > module rewrites the permissions as 0666.
>> > > 
>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > > ---
>> > >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
>> > >  1 file changed, 1 insertion(+)
>> > > 
>> > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > > b/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > > index 8e34b2a2b..c0504ab5a 100644
>> > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
>> > > @@ -1 +1,2 @@
>> > >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
>> > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"  
>> > 
>> > Wouldn't be better to fix this at the module to fix the permissions
>> > before whatever is taking long to initialize?
>> 
>> The stronger security policy is to not allow unprivileged users to
>> trigger an operation which loads a module, so no, upper level system
>> policy should relax that as necessary, but not the module.  VFIO is
>> unavailable until enabled through privileged entities, so there's some
>> degree that ovs is "jumping the gun" in making use of it, afaict.
>
> Ok, but if the machine has few cores, then the permission is set to
> 0666 and I assumed that it was done by the module itself. Assuming
> that it's true, then why can't the module fix the permission before?
>
> Another thing is that when the module is ready and the event is sent
> out, what holds OVS for not trying to open and get EACCESS before
> udev is triggered to fix the device permission?

The device is there from the start because of the
MODULE_ALIAS and MODULE_ALIAS_MISCDEV properties; it gets created on the
devfs before it's loaded.  Then, on the first open() call, if it isn't
loaded the misc chardev driver will block while loading the module.  So
from the start it's in existence.  The VFS layer is what returns the
EACCES, and that prevents the module load / blocking behavior we want.

If OvS is running as root, because the device is 0600, then the module
will get loaded.

> fbl
Alex Williamson April 18, 2019, 6:06 p.m. UTC | #6
On Thu, 18 Apr 2019 13:56:23 -0300
Flavio Leitner <fbl@sysclose.org> wrote:

> On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:
> > On Thu, 18 Apr 2019 13:23:54 -0300
> > Flavio Leitner <fbl@sysclose.org> wrote:
> >   
> > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:  
> > > > On some systems, it's possible that the initialization of the misc chardev
> > > > associated with /dev/vfio/vfio is delayed.  This happens on machines with
> > > > large numbers of cores (at least 88+).  If this delay exceeds the time
> > > > required for ovs-vswitchd to call the dpdk initialization routine, the
> > > > permissions won't be updated and the open call will return EACCES.
> > > > 
> > > > To fix this, we explicitly allow global open.  This means any user may
> > > > open() the vfio device before the vfio modules are initialized, thus
> > > > triggering a module load.  The applications (including ovs-vswitchd)
> > > > would be pended while the module loads.  This should be safe, as any user
> > > > may open the vfio device once the module is loaded anyway, since the.
> > > > module rewrites the permissions as 0666.
> > > > 
> > > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > > ---
> > > >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > index 8e34b2a2b..c0504ab5a 100644
> > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > @@ -1 +1,2 @@
> > > >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"    
> > > 
> > > Wouldn't be better to fix this at the module to fix the permissions
> > > before whatever is taking long to initialize?  
> > 
> > The stronger security policy is to not allow unprivileged users to
> > trigger an operation which loads a module, so no, upper level system
> > policy should relax that as necessary, but not the module.  VFIO is
> > unavailable until enabled through privileged entities, so there's some
> > degree that ovs is "jumping the gun" in making use of it, afaict.  
> 
> Ok, but if the machine has few cores, then the permission is set to
> 0666 and I assumed that it was done by the module itself. Assuming
> that it's true, then why can't the module fix the permission before?

Because if the file were initially mode 0666 then an unprivileged user
accessing the file will load the module, which as discussed above is a
weaker security policy.  Any case of the permissions being "fixed" by
the module load, directed by a privileged entity, asynchronous to ovs
using this interface is a race, no matter how early the module manages
to fix those permissions.
 
> Another thing is that when the module is ready and the event is sent
> out, what holds OVS for not trying to open and get EACCESS before
> udev is triggered to fix the device permission?

If there were a race, could ovs ever run before udev on system
startup?  Probably not.  Ideally perhaps a cleaner solution might be an
explicit dependency on the vfio module specific to ovs startup rather
than changing a system policy, but it really depends on the context and
use cases.  Thanks,

Alex
Li,Rongqing via dev April 18, 2019, 6:50 p.m. UTC | #7
On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote:
> On Thu, 18 Apr 2019 13:56:23 -0300
> Flavio Leitner <fbl@sysclose.org> wrote:
> 
> > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:
> > > On Thu, 18 Apr 2019 13:23:54 -0300
> > > Flavio Leitner <fbl@sysclose.org> wrote:
> > >   
> > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:  
> > > > > On some systems, it's possible that the initialization of the misc chardev
> > > > > associated with /dev/vfio/vfio is delayed.  This happens on machines with
> > > > > large numbers of cores (at least 88+).  If this delay exceeds the time
> > > > > required for ovs-vswitchd to call the dpdk initialization routine, the
> > > > > permissions won't be updated and the open call will return EACCES.
> > > > > 
> > > > > To fix this, we explicitly allow global open.  This means any user may
> > > > > open() the vfio device before the vfio modules are initialized, thus
> > > > > triggering a module load.  The applications (including ovs-vswitchd)
> > > > > would be pended while the module loads.  This should be safe, as any user
> > > > > may open the vfio device once the module is loaded anyway, since the.
> > > > > module rewrites the permissions as 0666.
> > > > > 
> > > > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > > > ---
> > > > >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > index 8e34b2a2b..c0504ab5a 100644
> > > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > @@ -1 +1,2 @@
> > > > >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> > > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"    
> > > > 
> > > > Wouldn't be better to fix this at the module to fix the permissions
> > > > before whatever is taking long to initialize?  
> > > 
> > > The stronger security policy is to not allow unprivileged users to
> > > trigger an operation which loads a module, so no, upper level system
> > > policy should relax that as necessary, but not the module.  VFIO is
> > > unavailable until enabled through privileged entities, so there's some
> > > degree that ovs is "jumping the gun" in making use of it, afaict.  
> > 
> > Ok, but if the machine has few cores, then the permission is set to
> > 0666 and I assumed that it was done by the module itself. Assuming
> > that it's true, then why can't the module fix the permission before?
> 
> Because if the file were initially mode 0666 then an unprivileged user
> accessing the file will load the module, which as discussed above is a
> weaker security policy. 

Agreed, but I was referring to after the proper trigger.

> Any case of the permissions being "fixed" by
> the module load, directed by a privileged entity, asynchronous to ovs
> using this interface is a race, no matter how early the module manages
> to fix those permissions.

True.

> > Another thing is that when the module is ready and the event is sent
> > out, what holds OVS for not trying to open and get EACCESS before
> > udev is triggered to fix the device permission?
> 
> If there were a race, could ovs ever run before udev on system
> startup?  Probably not.

It does wait, but only for the udev to settle, which means if the
module has not triggered an event until that time, OVS will not wait
and we still have a race.

>  Ideally perhaps a cleaner solution might be an
> explicit dependency on the vfio module specific to ovs startup rather
> than changing a system policy, but it really depends on the context and
> use cases.  Thanks,

It does have. The driverctl will bind the devices to vfio-pci but
the problem is that which signal we should rely on to know when
the vfio module is still initializing, or failed or finished.

BTW, this is not specific to OVS because any DPDK based app using
VFIO would face the issue.

fbl
Alex Williamson April 18, 2019, 7:46 p.m. UTC | #8
On Thu, 18 Apr 2019 15:50:43 -0300
Flavio Leitner <fbl@sysclose.org> wrote:

> On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote:
> > On Thu, 18 Apr 2019 13:56:23 -0300
> > Flavio Leitner <fbl@sysclose.org> wrote:
> >   
> > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:  
> > > > On Thu, 18 Apr 2019 13:23:54 -0300
> > > > Flavio Leitner <fbl@sysclose.org> wrote:
> > > >     
> > > > > On Thu, Apr 18, 2019 at 11:05:28AM -0400, Aaron Conole wrote:    
> > > > > > On some systems, it's possible that the initialization of the misc chardev
> > > > > > associated with /dev/vfio/vfio is delayed.  This happens on machines with
> > > > > > large numbers of cores (at least 88+).  If this delay exceeds the time
> > > > > > required for ovs-vswitchd to call the dpdk initialization routine, the
> > > > > > permissions won't be updated and the open call will return EACCES.
> > > > > > 
> > > > > > To fix this, we explicitly allow global open.  This means any user may
> > > > > > open() the vfio device before the vfio modules are initialized, thus
> > > > > > triggering a module load.  The applications (including ovs-vswitchd)
> > > > > > would be pended while the module loads.  This should be safe, as any user
> > > > > > may open the vfio device once the module is loaded anyway, since the.
> > > > > > module rewrites the permissions as 0666.
> > > > > > 
> > > > > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > > > > ---
> > > > > >  rhel/usr_lib_udev_rules.d_91-vfio.rules | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > > 
> > > > > > diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > > index 8e34b2a2b..c0504ab5a 100644
> > > > > > --- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > > +++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
> > > > > > @@ -1 +1,2 @@
> > > > > >  ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
> > > > > > +ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"      
> > > > > 
> > > > > Wouldn't be better to fix this at the module to fix the permissions
> > > > > before whatever is taking long to initialize?    
> > > > 
> > > > The stronger security policy is to not allow unprivileged users to
> > > > trigger an operation which loads a module, so no, upper level system
> > > > policy should relax that as necessary, but not the module.  VFIO is
> > > > unavailable until enabled through privileged entities, so there's some
> > > > degree that ovs is "jumping the gun" in making use of it, afaict.    
> > > 
> > > Ok, but if the machine has few cores, then the permission is set to
> > > 0666 and I assumed that it was done by the module itself. Assuming
> > > that it's true, then why can't the module fix the permission before?  
> > 
> > Because if the file were initially mode 0666 then an unprivileged user
> > accessing the file will load the module, which as discussed above is a
> > weaker security policy.   
> 
> Agreed, but I was referring to after the proper trigger.
> 
> > Any case of the permissions being "fixed" by
> > the module load, directed by a privileged entity, asynchronous to ovs
> > using this interface is a race, no matter how early the module manages
> > to fix those permissions.  
> 
> True.
> 
> > > Another thing is that when the module is ready and the event is sent
> > > out, what holds OVS for not trying to open and get EACCESS before
> > > udev is triggered to fix the device permission?  
> > 
> > If there were a race, could ovs ever run before udev on system
> > startup?  Probably not.  
> 
> It does wait, but only for the udev to settle, which means if the
> module has not triggered an event until that time, OVS will not wait
> and we still have a race.

But udev isn't waiting on the module to trigger an event, the module
contains a MODULE_ALIAS, so I believe it's just the static processing
of the modules.alias that triggers the event.

> >  Ideally perhaps a cleaner solution might be an
> > explicit dependency on the vfio module specific to ovs startup rather
> > than changing a system policy, but it really depends on the context and
> > use cases.  Thanks,  
> 
> It does have. The driverctl will bind the devices to vfio-pci but
> the problem is that which signal we should rely on to know when
> the vfio module is still initializing, or failed or finished.

What signal/mechanism is being used currently?  If driverctl is asked
to set a driver override it does:

 1) if module is not loaded, modprobe
 2) unbinds device from existing driver, if any
 3) sets driver_override
 4) triggers drivers_probe
 5) tests if device is bound to a driver, any driver

There are certainly some deficiencies here, unbinding the device before
setting the driver_override leaves the device open to getting bound by
the wrong driver, and the verification in the last step could be more
specific in testing for binding to the correct driver, but step #1 is
the modprobe of the driver, which should be a synchronous operation.
We shouldn't be able to complete a 'driverctl set-override $DEV
vfio-pci' without vfio being initialized, afaict.  Thanks,

Alex
Li,Rongqing via dev April 18, 2019, 9:38 p.m. UTC | #9
On Thu, Apr 18, 2019 at 01:46:22PM -0600, Alex Williamson wrote:
> On Thu, 18 Apr 2019 15:50:43 -0300
> Flavio Leitner <fbl@sysclose.org> wrote:
> 
> > On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote:
> > > On Thu, 18 Apr 2019 13:56:23 -0300
> > > Flavio Leitner <fbl@sysclose.org> wrote:
> > >   
> > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:  
> > > > > On Thu, 18 Apr 2019 13:23:54 -0300
> > > > > Flavio Leitner <fbl@sysclose.org> wrote:
> > > > Another thing is that when the module is ready and the event is sent
> > > > out, what holds OVS for not trying to open and get EACCESS before
> > > > udev is triggered to fix the device permission?  
> > > 
> > > If there were a race, could ovs ever run before udev on system
> > > startup?  Probably not.  
> > 
> > It does wait, but only for the udev to settle, which means if the
> > module has not triggered an event until that time, OVS will not wait
> > and we still have a race.
> 
> But udev isn't waiting on the module to trigger an event, the module
> contains a MODULE_ALIAS, so I believe it's just the static processing
> of the modules.alias that triggers the event.

What I am saying is that driverctl will trigger load the module and
bind the device, later on systemd will trigger OVS service which
waits udev to settle, but none of that guarantees that the permissions
are updated when OVS is initializing, see below.

> > >  Ideally perhaps a cleaner solution might be an
> > > explicit dependency on the vfio module specific to ovs startup rather
> > > than changing a system policy, but it really depends on the context and
> > > use cases.  Thanks,  
> > 
> > It does have. The driverctl will bind the devices to vfio-pci but
> > the problem is that which signal we should rely on to know when
> > the vfio module is still initializing, or failed or finished.
> 
> What signal/mechanism is being used currently?  If driverctl is asked
> to set a driver override it does:
> 
>  1) if module is not loaded, modprobe
>  2) unbinds device from existing driver, if any
>  3) sets driver_override
>  4) triggers drivers_probe
>  5) tests if device is bound to a driver, any driver
> 
> There are certainly some deficiencies here, unbinding the device before
> setting the driver_override leaves the device open to getting bound by
> the wrong driver, and the verification in the last step could be more
> specific in testing for binding to the correct driver, but step #1 is
> the modprobe of the driver, which should be a synchronous operation.
> We shouldn't be able to complete a 'driverctl set-override $DEV
> vfio-pci' without vfio being initialized, afaict.  Thanks,

Right, sounds like systemd is starting openvswitch service before
the driverctl is done with the devices.

fbl
Aaron Conole April 29, 2019, 8:22 p.m. UTC | #10
Flavio Leitner <fbl@sysclose.org> writes:

> On Thu, Apr 18, 2019 at 01:46:22PM -0600, Alex Williamson wrote:
>> On Thu, 18 Apr 2019 15:50:43 -0300
>> Flavio Leitner <fbl@sysclose.org> wrote:
>> 
>> > On Thu, Apr 18, 2019 at 12:06:57PM -0600, Alex Williamson wrote:
>> > > On Thu, 18 Apr 2019 13:56:23 -0300
>> > > Flavio Leitner <fbl@sysclose.org> wrote:
>> > >   
>> > > > On Thu, Apr 18, 2019 at 10:43:11AM -0600, Alex Williamson wrote:  
>> > > > > On Thu, 18 Apr 2019 13:23:54 -0300
>> > > > > Flavio Leitner <fbl@sysclose.org> wrote:
>> > > > Another thing is that when the module is ready and the event is sent
>> > > > out, what holds OVS for not trying to open and get EACCESS before
>> > > > udev is triggered to fix the device permission?  
>> > > 
>> > > If there were a race, could ovs ever run before udev on system
>> > > startup?  Probably not.  
>> > 
>> > It does wait, but only for the udev to settle, which means if the
>> > module has not triggered an event until that time, OVS will not wait
>> > and we still have a race.
>> 
>> But udev isn't waiting on the module to trigger an event, the module
>> contains a MODULE_ALIAS, so I believe it's just the static processing
>> of the modules.alias that triggers the event.
>
> What I am saying is that driverctl will trigger load the module and
> bind the device, later on systemd will trigger OVS service which
> waits udev to settle, but none of that guarantees that the permissions
> are updated when OVS is initializing, see below.
>
>> > >  Ideally perhaps a cleaner solution might be an
>> > > explicit dependency on the vfio module specific to ovs startup rather
>> > > than changing a system policy, but it really depends on the context and
>> > > use cases.  Thanks,  
>> > 
>> > It does have. The driverctl will bind the devices to vfio-pci but
>> > the problem is that which signal we should rely on to know when
>> > the vfio module is still initializing, or failed or finished.
>> 
>> What signal/mechanism is being used currently?  If driverctl is asked
>> to set a driver override it does:
>> 
>>  1) if module is not loaded, modprobe
>>  2) unbinds device from existing driver, if any
>>  3) sets driver_override
>>  4) triggers drivers_probe
>>  5) tests if device is bound to a driver, any driver
>> 
>> There are certainly some deficiencies here, unbinding the device before
>> setting the driver_override leaves the device open to getting bound by
>> the wrong driver, and the verification in the last step could be more
>> specific in testing for binding to the correct driver, but step #1 is
>> the modprobe of the driver, which should be a synchronous operation.
>> We shouldn't be able to complete a 'driverctl set-override $DEV
>> vfio-pci' without vfio being initialized, afaict.  Thanks,
>
> Right, sounds like systemd is starting openvswitch service before
> the driverctl is done with the devices.

I'm not sure.  The ordering could be a problem.

Perhaps we could try adding:

  After=basic.target

for the ovs-vswitchd.service if we have a machine that exhibits this
behavior, but I don't know if it will resolve the race.  There is some
kind of strange ordering looking at:

https://www.freedesktop.org/software/systemd/man/systemd.special.html
and
https://www.freedesktop.org/software/systemd/man/bootup.html#

I can't find how network.target dependency really works w.r.t. ordering
and the driverctl+basic.target services.

> fbl
diff mbox series

Patch

diff --git a/rhel/usr_lib_udev_rules.d_91-vfio.rules b/rhel/usr_lib_udev_rules.d_91-vfio.rules
index 8e34b2a2b..c0504ab5a 100644
--- a/rhel/usr_lib_udev_rules.d_91-vfio.rules
+++ b/rhel/usr_lib_udev_rules.d_91-vfio.rules
@@ -1 +1,2 @@ 
 ACTION=="add", SUBSYSTEM=="vfio*", GROUP="hugetlbfs", MODE="0660"
+ACTION=="add", SUBSYSTEM=="misc", KERNEL=="vfio", MODE="0666"