diff mbox

[ovs-dev,v2,2/2] netdev-dpdk: Support user-defined socket attribs

Message ID 39F64FC4-A666-400B-8630-1DB4D3374BEC@vmware.com
State Not Applicable
Headers show

Commit Message

Daniele Di Proietto June 10, 2016, 8:50 p.m. UTC
On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:

>Aaron Conole <aconole@redhat.com> writes:

>

>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:

>>

>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:

>>>

>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:

>>>>

>>>> > Hi Aaron,

>>>> >

>>>> > I'm still a little bit nervous about calling chown on a (partially)

>>>> > user controlled file name.

>>>>

>>>> I agree, that always seems scary.

>>>>

>>>> > Before moving forward I wanted to discuss a couple of other options:

>>>> >

>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way

>>>> > qemu can open the socket as root and drop privileges before starting

>>>> > guest execution.

>>>>

>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron

>>>> plugin.  I also don't know if it would be an acceptable workaround for

>>>> users.  Additionally, I recall there being something of a "don't even

>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)

>>>> can expound on it.

>>>>

>>>

>>> Hi,

>>> IIRC we kind of agree that long term a proper MAC will be much better but

>>> most involved people needed something to get it working like "now".

>>> Since they are complementary (other than the fix removing a bit of the

>>> urgency for more MAC) it was kind of the least bad option.

>>>

>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see

>>> [1] and [2]:

>>> But this will take time and eventually still be the applications task to

>>> "do something" - no matter if via API or via the chmod's right now.

>>> So Aaron is trying to get something that works now until the long term

>>> things are in place, which I appreciate.

>>>

>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get

>>> this in time I run with [3] for now.

>>> I never intended to suggest that, but with the discussion in place, one

>>> could ask if you (Aaron) want to pick up that instead.

>>> That would keep OVS free for now until DPDK made up the API (see [2]) for

>>> socket ownership control and this then could be implemented in OVS?

>>>

>>> (I hope) In some months/years we will all be happy to drop this bunch of

>>> interim solutions, never the less we need it for now.

>>>

>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/

>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html

>>> [3]:

>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7

>>>

>>> [...]

>>>

>>>

>>>> I think originally we quickly discussed 4 possible solutions (and

>>>> hopefully I captured them correctly):

>>>>

>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs

>>>>    group.  I don't actually like this solution because kvm could then

>>>>    pollute the ovs database.

>>>>

>>>> 2. OVS runs as some user and sets the user/group ownership of the socket

>>>>    via chown/chmod where permissions come from the database (the

>>>>    original context had ovs running as root - but as I described above

>>>>    it doesn't need to be root provided ovs+DPDK can start without root).

>>>>

>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and

>>>>    downgrades.  IIRC, this doesn't actually work, or it may have

>>>>    implications on other projects.  I don't remember exactly what was

>>>>    not as great about this solution, TBH.

>>>>

>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the

>>>>    layering between them.

>>>>

>>>> I think solution 2 and solution 4 don't actually interfere with each

>>>> other, and can be used to a complementary effect (if implemented

>>>> properly) so that the MAC layer enforces access, but even without MAC,

>>>> the DAC layer can provide appropriate whitelisting behavior.

>>>>

>>>

>>> I also remember several complex changes needed for the #1 and #3 that

>>> always would end up with huge effort and a high risk not being accepted.

>>> Probably that is what you refer to with "implications on other projects".

>>>

>>> Also keep in mind the position of dpdk out of the last few discussions

>>> which I'd like to summarize as "dpdk got this path from an app, so this app

>>> OWNS that path".

>>

>> I'd like to continue on, but I am not sure what the concerns are right

>> now.  Is it possible to enumerate them point by point so that I can

>> understand them?  I think there are two outstanding concerns right now:

>>

>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)

>>

>> 2. the proposed approach would be better implemented in the utility

>>    that wants access to the sockets (vis-a-vis the libvirt discussion)

>>

>> Am I understanding the concerns correctly?

>

>Ping?


I found another theoretical problem with the chmod approach, let me try to
explain:

There's an extremely small race window between the socket creation and the
chmod which could theoretically be exploited to change the owner of a socket
(e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:

1. There's no southbound database running, because it's not yet been
   started or because it's being restarted.
2. The user creates a vhost port, naming it ovnsb_db.sock.
   rte_vhost_driver_register() succeeds and creates a socket in the file
   system.
3. The southbound database is started, it removes ovnsb_db.sock and recreates
   it.
4. Now OVS changes the owner and the permission of what it thinks is a
   vhost-user socket.

If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
window, and it's unlikely that an attacker can control when the southbound
database is restarted.

I feel like I'm nitpicking, but I'm not sure how serious is the security
impact of what I'm describing.

I suggested an alternative approach, and I've tried implementing a quick
POC on top on your patch:

---8<---

---8<---

Compared to the chmod approach this has some limitation:

1. It doesn't support changing permissions, only the owner.  This
   could be done with umask, but I couldn't find any system call
   to change the umask for a single thread.
2. Unless vhost-sock-dir is owned by the target owner, the socket
   cannot be created.  I'm not sure whether this is a reasonable
   limitation for the use cases you have in mind.
3. setfsuid() is Linux specific and somehow deprecated according
   to the manpage:
   
   "Thus, setfsuid() is nowadays unneeded and should be avoided
   in new applications"

   I haven't used seteuid, because it changes the euid of the whole
   process and that may interfere with other operations on OVS.

If these limitations are unacceptable, I can see how we can use
chmod.  After all, as you point out, it's probably better to do it
in OVS than in some script.

Thanks for your patience in solving this problem,

Daniele

Comments

Aaron Conole June 13, 2016, 9:36 p.m. UTC | #1
Daniele Di Proietto <diproiettod@vmware.com> writes:

> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
>
>>Aaron Conole <aconole@redhat.com> writes:
>>
>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>
>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:
>>>>
>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>
>>>>> > Hi Aaron,
>>>>> >
>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>> > user controlled file name.
>>>>>
>>>>> I agree, that always seems scary.
>>>>>
>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>> >
>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>> > guest execution.
>>>>>
>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>> can expound on it.
>>>>>
>>>>
>>>> Hi,
>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>> most involved people needed something to get it working like "now".
>>>> Since they are complementary (other than the fix removing a bit of the
>>>> urgency for more MAC) it was kind of the least bad option.
>>>>
>>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see
>>>> [1] and [2]:
>>>> But this will take time and eventually still be the applications task to
>>>> "do something" - no matter if via API or via the chmod's right now.
>>>> So Aaron is trying to get something that works now until the long term
>>>> things are in place, which I appreciate.
>>>>
>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>> this in time I run with [3] for now.
>>>> I never intended to suggest that, but with the discussion in place, one
>>>> could ask if you (Aaron) want to pick up that instead.
>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>> socket ownership control and this then could be implemented in OVS?
>>>>
>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>> interim solutions, never the less we need it for now.
>>>>
>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>> [3]:
>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>
>>>> [...]
>>>>
>>>>
>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>> hopefully I captured them correctly):
>>>>>
>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>    pollute the ovs database.
>>>>>
>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>    via chown/chmod where permissions come from the database (the
>>>>>    original context had ovs running as root - but as I described above
>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>
>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>    not as great about this solution, TBH.
>>>>>
>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>    layering between them.
>>>>>
>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>> other, and can be used to a complementary effect (if implemented
>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>
>>>>
>>>> I also remember several complex changes needed for the #1 and #3 that
>>>> always would end up with huge effort and a high risk not being accepted.
>>>> Probably that is what you refer to with "implications on other projects".
>>>>
>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>> OWNS that path".
>>>
>>> I'd like to continue on, but I am not sure what the concerns are right
>>> now.  Is it possible to enumerate them point by point so that I can
>>> understand them?  I think there are two outstanding concerns right now:
>>>
>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>
>>> 2. the proposed approach would be better implemented in the utility
>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>
>>> Am I understanding the concerns correctly?
>>
>>Ping?
>
> I found another theoretical problem with the chmod approach, let me try to
> explain:
>
> There's an extremely small race window between the socket creation and the
> chmod which could theoretically be exploited to change the owner of a socket
> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>
> 1. There's no southbound database running, because it's not yet been
>    started or because it's being restarted.
> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>    rte_vhost_driver_register() succeeds and creates a socket in the file
>    system.
> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>    it.
> 4. Now OVS changes the owner and the permission of what it thinks is a
>    vhost-user socket.
>
> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
> window, and it's unlikely that an attacker can control when the southbound
> database is restarted.
>
> I feel like I'm nitpicking, but I'm not sure how serious is the security
> impact of what I'm describing.
>
> I suggested an alternative approach, and I've tried implementing a quick
> POC on top on your patch:
>
> ---8<---
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 24ebb41..d7adc66 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -30,6 +30,7 @@
>  #include <sys/types.h>
>  #include <sys/stat.h>
>  #include <getopt.h>
> +#include <sys/fsuid.h>
>  
>  #include "chutil.h"
>  #include "dirs.h"
> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>       */
>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>               vhost_sock_dir, name);
> +    uid_t orig_u = geteuid();
> +    gid_t orig_g = getegid();
> +    if (vhost_sock_def_owner) {
> +        uid_t u;
> +        gid_t g;
> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
> +            VLOG_INFO("UID: %d GID: %d", u, g);
> +            setfsuid(u);
> +            setfsgid(g);
> +        }
> +    }
>  
>      err = rte_vhost_driver_register(dev->vhost_id);
>      if (err) {
> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>          err = vhost_construct_helper(netdev);
>      }
>  
> -    ovs_mutex_unlock(&dpdk_mutex);
> -    if (!err && vhost_sock_def_owner &&
> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
> -        VLOG_ERR("vhost-user socket device ownership change failed.");
> +    if (vhost_sock_def_owner) {
> +        setfsuid(orig_u);
> +        setfsgid(orig_g);
>      }
>  
> -    if (!err && vhost_sock_def_perms &&
> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
> -        VLOG_ERR("vhost-user socket device permission change failed.");
> -    }
> +    ovs_mutex_unlock(&dpdk_mutex);
>  
>      return err;
>  }
>
> ---8<---
>
> Compared to the chmod approach this has some limitation:
>
> 1. It doesn't support changing permissions, only the owner.  This
>    could be done with umask, but I couldn't find any system call
>    to change the umask for a single thread.
> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>    cannot be created.  I'm not sure whether this is a reasonable
>    limitation for the use cases you have in mind.
> 3. setfsuid() is Linux specific and somehow deprecated according
>    to the manpage:
>    
>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>    in new applications"
>
>    I haven't used seteuid, because it changes the euid of the whole
>    process and that may interfere with other operations on OVS.

Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
like to work on trying to solve the problem.

> If these limitations are unacceptable, I can see how we can use
> chmod.  After all, as you point out, it's probably better to do it
> in OVS than in some script.

I think fchmod and fchown may actually be the correct calls to have, and
will refactor these chown/chmod utils functions as such, which (I
believe) avoids the race as you describe.

> Thanks for your patience in solving this problem,

Thanks for your reviews!

> Daniele

Here's an elephant of a question, though.  Would it make sense to try
and work towards some kind of scheme whereby OvS is aware of the various
unix sockets it creates, and allows setting the permissions, ownership,
etc. in a common way?  I'm not committed to finding / solving that
problem, but would it even be acceptable / appropriate?

-Aaron
Ansis June 14, 2016, 2:46 a.m. UTC | #2
On 13 June 2016 at 14:36, Aaron Conole <aconole@redhat.com> wrote:

> Daniele Di Proietto <diproiettod@vmware.com> writes:
>
> > On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
> >
> >>Aaron Conole <aconole@redhat.com> writes:
> >>
> >>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
> >>>
> >>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com>
> wrote:
> >>>>
> >>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
> >>>>>
> >>>>> > Hi Aaron,
> >>>>> >
> >>>>> > I'm still a little bit nervous about calling chown on a (partially)
> >>>>> > user controlled file name.
> >>>>>
> >>>>> I agree, that always seems scary.
> >>>>>
> >>>>> > Before moving forward I wanted to discuss a couple of other
> options:
> >>>>> >
> >>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
> >>>>> > qemu can open the socket as root and drop privileges before
> starting
> >>>>> > guest execution.
> >>>>>
> >>>>> I'm not sure how to do this with libvirt, or via the OpenStack
> Neutron
> >>>>> plugin.  I also don't know if it would be an acceptable workaround
> for
> >>>>> users.  Additionally, I recall there being something of a "don't even
> >>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
> >>>>> can expound on it.
> >>>>>
> >>>>
> >>>> Hi,
> >>>> IIRC we kind of agree that long term a proper MAC will be much better
> but
> >>>> most involved people needed something to get it working like "now".
> >>>> Since they are complementary (other than the fix removing a bit of the
> >>>> urgency for more MAC) it was kind of the least bad option.
> >>>>
> >>>> You have to be aware that I brought up the discussion on dev@dpdk.org
> - see
> >>>> [1] and [2]:
> >>>> But this will take time and eventually still be the applications task
> to
> >>>> "do something" - no matter if via API or via the chmod's right now.
> >>>> So Aaron is trying to get something that works now until the long term
> >>>> things are in place, which I appreciate.
> >>>>
> >>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't
> get
> >>>> this in time I run with [3] for now.
> >>>> I never intended to suggest that, but with the discussion in place,
> one
> >>>> could ask if you (Aaron) want to pick up that instead.
> >>>> That would keep OVS free for now until DPDK made up the API (see [2])
> for
> >>>> socket ownership control and this then could be implemented in OVS?
> >>>>
> >>>> (I hope) In some months/years we will all be happy to drop this bunch
> of
> >>>> interim solutions, never the less we need it for now.
> >>>>
> >>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
> >>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
> >>>> [3]:
> >>>>
> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
> >>>>
> >>>> [...]
> >>>>
> >>>>
> >>>>> I think originally we quickly discussed 4 possible solutions (and
> >>>>> hopefully I captured them correctly):
> >>>>>
> >>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
> >>>>>    group.  I don't actually like this solution because kvm could then
> >>>>>    pollute the ovs database.
> >>>>>
> >>>>> 2. OVS runs as some user and sets the user/group ownership of the
> socket
> >>>>>    via chown/chmod where permissions come from the database (the
> >>>>>    original context had ovs running as root - but as I described
> above
> >>>>>    it doesn't need to be root provided ovs+DPDK can start without
> root).
> >>>>>
> >>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
> >>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
> >>>>>    implications on other projects.  I don't remember exactly what was
> >>>>>    not as great about this solution, TBH.
> >>>>>
> >>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
> >>>>>    layering between them.
> >>>>>
> >>>>> I think solution 2 and solution 4 don't actually interfere with each
> >>>>> other, and can be used to a complementary effect (if implemented
> >>>>> properly) so that the MAC layer enforces access, but even without
> MAC,
> >>>>> the DAC layer can provide appropriate whitelisting behavior.
> >>>>>
> >>>>
> >>>> I also remember several complex changes needed for the #1 and #3 that
> >>>> always would end up with huge effort and a high risk not being
> accepted.
> >>>> Probably that is what you refer to with "implications on other
> projects".
> >>>>
> >>>> Also keep in mind the position of dpdk out of the last few discussions
> >>>> which I'd like to summarize as "dpdk got this path from an app, so
> this app
> >>>> OWNS that path".
> >>>
> >>> I'd like to continue on, but I am not sure what the concerns are right
> >>> now.  Is it possible to enumerate them point by point so that I can
> >>> understand them?  I think there are two outstanding concerns right now:
> >>>
> >>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
> >>>
> >>> 2. the proposed approach would be better implemented in the utility
> >>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
> >>>
> >>> Am I understanding the concerns correctly?
> >>
> >>Ping?
> >
> > I found another theoretical problem with the chmod approach, let me try
> to
> > explain:
> >
> > There's an extremely small race window between the socket creation and
> the
> > chmod which could theoretically be exploited to change the owner of a
> socket
> > (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
> >
> > 1. There's no southbound database running, because it's not yet been
> >    started or because it's being restarted.
> > 2. The user creates a vhost port, naming it ovnsb_db.sock.
> >    rte_vhost_driver_register() succeeds and creates a socket in the file
> >    system.
> > 3. The southbound database is started, it removes ovnsb_db.sock and
> recreates
> >    it.
> > 4. Now OVS changes the owner and the permission of what it thinks is a
> >    vhost-user socket.
> >
> > If 3 manages to get between 2 and 4, we have a problem. It's a pretty
> small
> > window, and it's unlikely that an attacker can control when the
> southbound
> > database is restarted.
> >
> > I feel like I'm nitpicking, but I'm not sure how serious is the security
> > impact of what I'm describing.
> >
> > I suggested an alternative approach, and I've tried implementing a quick
> > POC on top on your patch:
> >
> > ---8<---
> > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> > index 24ebb41..d7adc66 100644
> > --- a/lib/netdev-dpdk.c
> > +++ b/lib/netdev-dpdk.c
> > @@ -30,6 +30,7 @@
> >  #include <sys/types.h>
> >  #include <sys/stat.h>
> >  #include <getopt.h>
> > +#include <sys/fsuid.h>
> >
> >  #include "chutil.h"
> >  #include "dirs.h"
> > @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev
> *netdev)
> >       */
> >      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
> >               vhost_sock_dir, name);
> > +    uid_t orig_u = geteuid();
> > +    gid_t orig_g = getegid();
> > +    if (vhost_sock_def_owner) {
> > +        uid_t u;
> > +        gid_t g;
> > +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
> > +            VLOG_INFO("UID: %d GID: %d", u, g);
> > +            setfsuid(u);
> > +            setfsgid(g);
> > +        }
> > +    }
> >
> >      err = rte_vhost_driver_register(dev->vhost_id);
> >      if (err) {
> > @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev
> *netdev)
> >          err = vhost_construct_helper(netdev);
> >      }
> >
> > -    ovs_mutex_unlock(&dpdk_mutex);
> > -    if (!err && vhost_sock_def_owner &&
> > -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
> > -        VLOG_ERR("vhost-user socket device ownership change failed.");
> > +    if (vhost_sock_def_owner) {
> > +        setfsuid(orig_u);
> > +        setfsgid(orig_g);
> >      }
> >
> > -    if (!err && vhost_sock_def_perms &&
> > -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
> > -        VLOG_ERR("vhost-user socket device permission change failed.");
> > -    }
> > +    ovs_mutex_unlock(&dpdk_mutex);
> >
> >      return err;
> >  }
> >
> > ---8<---
> >
> > Compared to the chmod approach this has some limitation:
> >
> > 1. It doesn't support changing permissions, only the owner.  This
> >    could be done with umask, but I couldn't find any system call
> >    to change the umask for a single thread.
> > 2. Unless vhost-sock-dir is owned by the target owner, the socket
> >    cannot be created.  I'm not sure whether this is a reasonable
> >    limitation for the use cases you have in mind.
> > 3. setfsuid() is Linux specific and somehow deprecated according
> >    to the manpage:
> >
> >    "Thus, setfsuid() is nowadays unneeded and should be avoided
> >    in new applications"
> >
> >    I haven't used seteuid, because it changes the euid of the whole
> >    process and that may interfere with other operations on OVS.
>
> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
> like to work on trying to solve the problem.
>
> > If these limitations are unacceptable, I can see how we can use
> > chmod.  After all, as you point out, it's probably better to do it
> > in OVS than in some script.
>
> I think fchmod and fchown may actually be the correct calls to have, and
> will refactor these chown/chmod utils functions as such, which (I
> believe) avoids the race as you describe.
>
> > Thanks for your patience in solving this problem,
>
> Thanks for your reviews!
>
> > Daniele
>
> Here's an elephant of a question, though.  Would it make sense to try
> and work towards some kind of scheme whereby OvS is aware of the various
> unix sockets it creates, and allows setting the permissions, ownership,
> etc. in a common way?  I'm not committed to finding / solving that
> problem, but would it even be acceptable / appropriate?
>

Just my 2 cents... Besides QEMU that runs under non-root user and wants to
connect to the DPDK server sockets created by OVS, you might also have
OpenFlow controller that runs locally under non-root user and would want to
connect to a OpenFlow server socket created by OVS. So, yes, I think it may
be good to generalize this feature in a common way instead of making it
DPDK-specific (or at least try to design it to be extensible in the future).

However, OVSDB socket is slightly different from OpenFlow or DPDK user
socket because having access to a OVSDB socket would effectively allow
"write-ups". What this mean is that processes intended to be running at
lower security level (i.e. OVSDB client in this case) could grant
themselves access to at least some unauthorized resources by tricking OVS
to chown() a socket that it was not supposed to. Though, IMHO, if properly
documented, then also this feature would still add some value.









>
> -Aaron
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
Ben Pfaff June 15, 2016, 2:27 p.m. UTC | #3
On Mon, Jun 13, 2016 at 05:36:34PM -0400, Aaron Conole wrote:
> > If these limitations are unacceptable, I can see how we can use
> > chmod.  After all, as you point out, it's probably better to do it
> > in OVS than in some script.
> 
> I think fchmod and fchown may actually be the correct calls to have, and
> will refactor these chown/chmod utils functions as such, which (I
> believe) avoids the race as you describe.

There are some pitfalls with fchmod() on Unix domain sockets, especially
on non-Linux systems.  Please refer to bind_unix_socket() in
lib/socket-util-unix.c:

/* Binds Unix domain socket 'fd' to a file with permissions 0700. */
static int bind_unix_socket(int fd, struct sockaddr *sun, socklen_t sun_len)
{
    const mode_t mode = 0770;    /* Allow both user and group access. */

    if (LINUX) {
        /* On Linux, the fd's permissions become the file's permissions.
         * fchmod() does not affect other files, like umask() does. */
        if (fchmod(fd, mode)) {
            return errno;
        }

        /* Must be after fchmod(). */
        if (bind(fd, sun, sun_len)) {
            return errno;
        }
        return 0;
    } else {
        /* On FreeBSD and NetBSD, only the umask affects permissions.  The
         * umask is process-wide rather than thread-specific, so we have to use
         * a subprocess for safety. */
        pid_t pid = fork();

        if (!pid) {
            umask(mode ^ 0777);
            _exit(bind(fd, sun, sun_len) ? errno : 0);
        } else if (pid > 0) {
            int status;
            int error;

            do {
                error = waitpid(pid, &status, 0) < 0 ? errno : 0;
            } while (error == EINTR);

            return (error ? error
                    : WIFEXITED(status) ? WEXITSTATUS(status)
                    : WIFSIGNALED(status) ? EINTR
                    : ECHILD /* WTF? */);
        } else {
            return errno;
        }
    }
}

I do not know whether the same pitfalls apply to fchown().
Aaron Conole June 15, 2016, 9:35 p.m. UTC | #4
Ben Pfaff <blp@ovn.org> writes:

> On Mon, Jun 13, 2016 at 05:36:34PM -0400, Aaron Conole wrote:
>> > If these limitations are unacceptable, I can see how we can use
>> > chmod.  After all, as you point out, it's probably better to do it
>> > in OVS than in some script.
>> 
>> I think fchmod and fchown may actually be the correct calls to have, and
>> will refactor these chown/chmod utils functions as such, which (I
>> believe) avoids the race as you describe.

I've done quite a bit of illuminating reading on the subject.  The best
I've seen is a usenix paper from 08[1] which describes a specific type of
TOCTTOU mitigation that is still not 100% effective.  This is a rather
complicated subject.  Whoops!

> There are some pitfalls with fchmod() on Unix domain sockets, especially
> on non-Linux systems.  Please refer to bind_unix_socket() in
> ...
> I do not know whether the same pitfalls apply to fchown().

After much testing, it appears yes the same pitfalls apply.  However,
the downgrade with dpdk may not work correctly - I'm currently devising
some test cases to sort this out.

[1]:
https://www.usenix.org/legacy/event/fast08/tech/full_papers/tsafrir/tsafrir_html/index.html


Thanks both of you for your keen insights!

-Aaron
Aaron Conole July 6, 2016, 2:24 p.m. UTC | #5
Aaron Conole <aconole@redhat.com> writes:

> Daniele Di Proietto <diproiettod@vmware.com> writes:
>
>> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
>>
>>>Aaron Conole <aconole@redhat.com> writes:
>>>
>>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>>
>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:
>>>>>
>>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>>
>>>>>> > Hi Aaron,
>>>>>> >
>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>> > user controlled file name.
>>>>>>
>>>>>> I agree, that always seems scary.
>>>>>>
>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>> >
>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>> > guest execution.
>>>>>>
>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>> can expound on it.
>>>>>>
>>>>>
>>>>> Hi,
>>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>>> most involved people needed something to get it working like "now".
>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>
>>>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see
>>>>> [1] and [2]:
>>>>> But this will take time and eventually still be the applications task to
>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>> So Aaron is trying to get something that works now until the long term
>>>>> things are in place, which I appreciate.
>>>>>
>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>> this in time I run with [3] for now.
>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>
>>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>>> interim solutions, never the less we need it for now.
>>>>>
>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>> [3]:
>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>
>>>>> [...]
>>>>>
>>>>>
>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>> hopefully I captured them correctly):
>>>>>>
>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>    pollute the ovs database.
>>>>>>
>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>    original context had ovs running as root - but as I described above
>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>>
>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>    not as great about this solution, TBH.
>>>>>>
>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>    layering between them.
>>>>>>
>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>
>>>>>
>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>> always would end up with huge effort and a high risk not being accepted.
>>>>> Probably that is what you refer to with "implications on other projects".
>>>>>
>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>>> OWNS that path".
>>>>
>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>> now.  Is it possible to enumerate them point by point so that I can
>>>> understand them?  I think there are two outstanding concerns right now:
>>>>
>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>
>>>> 2. the proposed approach would be better implemented in the utility
>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>
>>>> Am I understanding the concerns correctly?
>>>
>>>Ping?
>>
>> I found another theoretical problem with the chmod approach, let me try to
>> explain:
>>
>> There's an extremely small race window between the socket creation and the
>> chmod which could theoretically be exploited to change the owner of a socket
>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>
>> 1. There's no southbound database running, because it's not yet been
>>    started or because it's being restarted.
>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>    system.
>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>>    it.
>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>    vhost-user socket.
>>
>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>> window, and it's unlikely that an attacker can control when the southbound
>> database is restarted.
>>
>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>> impact of what I'm describing.
>>
>> I suggested an alternative approach, and I've tried implementing a quick
>> POC on top on your patch:
>>
>> ---8<---
>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>> index 24ebb41..d7adc66 100644
>> --- a/lib/netdev-dpdk.c
>> +++ b/lib/netdev-dpdk.c
>> @@ -30,6 +30,7 @@
>>  #include <sys/types.h>
>>  #include <sys/stat.h>
>>  #include <getopt.h>
>> +#include <sys/fsuid.h>
>>  
>>  #include "chutil.h"
>>  #include "dirs.h"
>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>       */
>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>               vhost_sock_dir, name);
>> +    uid_t orig_u = geteuid();
>> +    gid_t orig_g = getegid();
>> +    if (vhost_sock_def_owner) {
>> +        uid_t u;
>> +        gid_t g;
>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>> +            setfsuid(u);
>> +            setfsgid(g);
>> +        }
>> +    }
>>  
>>      err = rte_vhost_driver_register(dev->vhost_id);
>>      if (err) {
>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>          err = vhost_construct_helper(netdev);
>>      }
>>  
>> -    ovs_mutex_unlock(&dpdk_mutex);
>> -    if (!err && vhost_sock_def_owner &&
>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>> +    if (vhost_sock_def_owner) {
>> +        setfsuid(orig_u);
>> +        setfsgid(orig_g);
>>      }
>>  
>> -    if (!err && vhost_sock_def_perms &&
>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>> -    }
>> +    ovs_mutex_unlock(&dpdk_mutex);
>>  
>>      return err;
>>  }
>>
>> ---8<---
>>
>> Compared to the chmod approach this has some limitation:
>>
>> 1. It doesn't support changing permissions, only the owner.  This
>>    could be done with umask, but I couldn't find any system call
>>    to change the umask for a single thread.
>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>    cannot be created.  I'm not sure whether this is a reasonable
>>    limitation for the use cases you have in mind.
>> 3. setfsuid() is Linux specific and somehow deprecated according
>>    to the manpage:
>>    
>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>    in new applications"
>>
>>    I haven't used seteuid, because it changes the euid of the whole
>>    process and that may interfere with other operations on OVS.
>
> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
> like to work on trying to solve the problem.
>
>> If these limitations are unacceptable, I can see how we can use
>> chmod.  After all, as you point out, it's probably better to do it
>> in OVS than in some script.
>
> I think fchmod and fchown may actually be the correct calls to have, and
> will refactor these chown/chmod utils functions as such, which (I
> believe) avoids the race as you describe.
>
>> Thanks for your patience in solving this problem,
>
> Thanks for your reviews!
>
>> Daniele

I've been trying to find a way to solve this in a portable fashion, but
it appears to either require changes in DPDK to let the application
create the unix server socket, or a linux-only guarantee with a
non-linux racy fallback.  My attempt at the former change was NAK'd by
upstream DPDK.  If the latter is unacceptable, then we can drop these
patches and hope that client mode will be a workable solution.

Thoughts?
Ansis Atteka July 6, 2016, 5:47 p.m. UTC | #6
On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <aconole@redhat.com> wrote:
> Aaron Conole <aconole@redhat.com> writes:
>
>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>
>>> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
>>>
>>>>Aaron Conole <aconole@redhat.com> writes:
>>>>
>>>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>>>
>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:
>>>>>>
>>>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>>>
>>>>>>> > Hi Aaron,
>>>>>>> >
>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>>> > user controlled file name.
>>>>>>>
>>>>>>> I agree, that always seems scary.
>>>>>>>
>>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>>> >
>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>>> > guest execution.
>>>>>>>
>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>>> can expound on it.
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>>>> most involved people needed something to get it working like "now".
>>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>>
>>>>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see
>>>>>> [1] and [2]:
>>>>>> But this will take time and eventually still be the applications task to
>>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>>> So Aaron is trying to get something that works now until the long term
>>>>>> things are in place, which I appreciate.
>>>>>>
>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>>> this in time I run with [3] for now.
>>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>>
>>>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>>>> interim solutions, never the less we need it for now.
>>>>>>
>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>>> [3]:
>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>>
>>>>>> [...]
>>>>>>
>>>>>>
>>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>>> hopefully I captured them correctly):
>>>>>>>
>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>>    pollute the ovs database.
>>>>>>>
>>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>>    original context had ovs running as root - but as I described above
>>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>>>
>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>>    not as great about this solution, TBH.
>>>>>>>
>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>>    layering between them.
>>>>>>>
>>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>>
>>>>>>
>>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>>> always would end up with huge effort and a high risk not being accepted.
>>>>>> Probably that is what you refer to with "implications on other projects".
>>>>>>
>>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>>>> OWNS that path".
>>>>>
>>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>>> now.  Is it possible to enumerate them point by point so that I can
>>>>> understand them?  I think there are two outstanding concerns right now:
>>>>>
>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>>
>>>>> 2. the proposed approach would be better implemented in the utility
>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>>
>>>>> Am I understanding the concerns correctly?
>>>>
>>>>Ping?
>>>
>>> I found another theoretical problem with the chmod approach, let me try to
>>> explain:
>>>
>>> There's an extremely small race window between the socket creation and the
>>> chmod which could theoretically be exploited to change the owner of a socket
>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>>
>>> 1. There's no southbound database running, because it's not yet been
>>>    started or because it's being restarted.
>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>>    system.
>>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>>>    it.
>>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>>    vhost-user socket.
>>>
>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>>> window, and it's unlikely that an attacker can control when the southbound
>>> database is restarted.
>>>
>>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>>> impact of what I'm describing.
>>>
>>> I suggested an alternative approach, and I've tried implementing a quick
>>> POC on top on your patch:
>>>
>>> ---8<---
>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>> index 24ebb41..d7adc66 100644
>>> --- a/lib/netdev-dpdk.c
>>> +++ b/lib/netdev-dpdk.c
>>> @@ -30,6 +30,7 @@
>>>  #include <sys/types.h>
>>>  #include <sys/stat.h>
>>>  #include <getopt.h>
>>> +#include <sys/fsuid.h>
>>>
>>>  #include "chutil.h"
>>>  #include "dirs.h"
>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>       */
>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>>               vhost_sock_dir, name);
>>> +    uid_t orig_u = geteuid();
>>> +    gid_t orig_g = getegid();
>>> +    if (vhost_sock_def_owner) {
>>> +        uid_t u;
>>> +        gid_t g;
>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>>> +            setfsuid(u);
>>> +            setfsgid(g);
>>> +        }
>>> +    }
>>>
>>>      err = rte_vhost_driver_register(dev->vhost_id);
>>>      if (err) {
>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>          err = vhost_construct_helper(netdev);
>>>      }
>>>
>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>> -    if (!err && vhost_sock_def_owner &&
>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>>> +    if (vhost_sock_def_owner) {
>>> +        setfsuid(orig_u);
>>> +        setfsgid(orig_g);
>>>      }
>>>
>>> -    if (!err && vhost_sock_def_perms &&
>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>>> -    }
>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>
>>>      return err;
>>>  }
>>>
>>> ---8<---
>>>
>>> Compared to the chmod approach this has some limitation:
>>>
>>> 1. It doesn't support changing permissions, only the owner.  This
>>>    could be done with umask, but I couldn't find any system call
>>>    to change the umask for a single thread.
>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>>    cannot be created.  I'm not sure whether this is a reasonable
>>>    limitation for the use cases you have in mind.
>>> 3. setfsuid() is Linux specific and somehow deprecated according
>>>    to the manpage:
>>>
>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>>    in new applications"
>>>
>>>    I haven't used seteuid, because it changes the euid of the whole
>>>    process and that may interfere with other operations on OVS.
>>
>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
>> like to work on trying to solve the problem.
>>
>>> If these limitations are unacceptable, I can see how we can use
>>> chmod.  After all, as you point out, it's probably better to do it
>>> in OVS than in some script.
>>
>> I think fchmod and fchown may actually be the correct calls to have, and
>> will refactor these chown/chmod utils functions as such, which (I
>> believe) avoids the race as you describe.
>>
>>> Thanks for your patience in solving this problem,
>>
>> Thanks for your reviews!
>>
>>> Daniele
>
> I've been trying to find a way to solve this in a portable fashion, but
> it appears to either require changes in DPDK to let the application
> create the unix server socket, or a linux-only guarantee with a
> non-linux racy fallback.  My attempt at the former change was NAK'd by
> upstream DPDK.  If the latter is unacceptable, then we can drop these
> patches and hope that client mode will be a workable solution.
>
> Thoughts?
Do you mean as if Open vSwitch would be the one connecting to DPDK
socket (the client and server roles swapped)?

If so, then I sent a similar patch, but for OpenFlow controller
sockets - http://openvswitch.org/pipermail/dev/2016-June/073733.html.
Basically --no-self-confinement flag makes OVS to enter in mode where
it can connect to sockets outside the /var/run/openvswitch directory.
Aaron Conole July 6, 2016, 6:26 p.m. UTC | #7
Ansis Atteka <aatteka@nicira.com> writes:

> On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <aconole@redhat.com> wrote:
>> Aaron Conole <aconole@redhat.com> writes:
>>
>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>
>>>> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
>>>>
>>>>>Aaron Conole <aconole@redhat.com> writes:
>>>>>
>>>>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>>>>
>>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:
>>>>>>>
>>>>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>>>>
>>>>>>>> > Hi Aaron,
>>>>>>>> >
>>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>>>> > user controlled file name.
>>>>>>>>
>>>>>>>> I agree, that always seems scary.
>>>>>>>>
>>>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>>>> >
>>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>>>> > guest execution.
>>>>>>>>
>>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>>>> can expound on it.
>>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>> IIRC we kind of agree that long term a proper MAC will be much better but
>>>>>>> most involved people needed something to get it working like "now".
>>>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>>>
>>>>>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see
>>>>>>> [1] and [2]:
>>>>>>> But this will take time and eventually still be the applications task to
>>>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>>>> So Aaron is trying to get something that works now until the long term
>>>>>>> things are in place, which I appreciate.
>>>>>>>
>>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>>>> this in time I run with [3] for now.
>>>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for
>>>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>>>
>>>>>>> (I hope) In some months/years we will all be happy to drop this bunch of
>>>>>>> interim solutions, never the less we need it for now.
>>>>>>>
>>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>>>> [3]:
>>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>>>
>>>>>>> [...]
>>>>>>>
>>>>>>>
>>>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>>>> hopefully I captured them correctly):
>>>>>>>>
>>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>>>    pollute the ovs database.
>>>>>>>>
>>>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket
>>>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>>>    original context had ovs running as root - but as I described above
>>>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).
>>>>>>>>
>>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>>>    not as great about this solution, TBH.
>>>>>>>>
>>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>>>    layering between them.
>>>>>>>>
>>>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>>>
>>>>>>>
>>>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>>>> always would end up with huge effort and a high risk not being accepted.
>>>>>>> Probably that is what you refer to with "implications on other projects".
>>>>>>>
>>>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>>>> which I'd like to summarize as "dpdk got this path from an app, so this app
>>>>>>> OWNS that path".
>>>>>>
>>>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>>>> now.  Is it possible to enumerate them point by point so that I can
>>>>>> understand them?  I think there are two outstanding concerns right now:
>>>>>>
>>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>>>
>>>>>> 2. the proposed approach would be better implemented in the utility
>>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>>>
>>>>>> Am I understanding the concerns correctly?
>>>>>
>>>>>Ping?
>>>>
>>>> I found another theoretical problem with the chmod approach, let me try to
>>>> explain:
>>>>
>>>> There's an extremely small race window between the socket creation and the
>>>> chmod which could theoretically be exploited to change the owner of a socket
>>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>>>
>>>> 1. There's no southbound database running, because it's not yet been
>>>>    started or because it's being restarted.
>>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>>>    system.
>>>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>>>>    it.
>>>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>>>    vhost-user socket.
>>>>
>>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>>>> window, and it's unlikely that an attacker can control when the southbound
>>>> database is restarted.
>>>>
>>>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>>>> impact of what I'm describing.
>>>>
>>>> I suggested an alternative approach, and I've tried implementing a quick
>>>> POC on top on your patch:
>>>>
>>>> ---8<---
>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>> index 24ebb41..d7adc66 100644
>>>> --- a/lib/netdev-dpdk.c
>>>> +++ b/lib/netdev-dpdk.c
>>>> @@ -30,6 +30,7 @@
>>>>  #include <sys/types.h>
>>>>  #include <sys/stat.h>
>>>>  #include <getopt.h>
>>>> +#include <sys/fsuid.h>
>>>>
>>>>  #include "chutil.h"
>>>>  #include "dirs.h"
>>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>       */
>>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>>>               vhost_sock_dir, name);
>>>> +    uid_t orig_u = geteuid();
>>>> +    gid_t orig_g = getegid();
>>>> +    if (vhost_sock_def_owner) {
>>>> +        uid_t u;
>>>> +        gid_t g;
>>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>>>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>>>> +            setfsuid(u);
>>>> +            setfsgid(g);
>>>> +        }
>>>> +    }
>>>>
>>>>      err = rte_vhost_driver_register(dev->vhost_id);
>>>>      if (err) {
>>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>          err = vhost_construct_helper(netdev);
>>>>      }
>>>>
>>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>>> -    if (!err && vhost_sock_def_owner &&
>>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>>>> +    if (vhost_sock_def_owner) {
>>>> +        setfsuid(orig_u);
>>>> +        setfsgid(orig_g);
>>>>      }
>>>>
>>>> -    if (!err && vhost_sock_def_perms &&
>>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>>>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>>>> -    }
>>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>>
>>>>      return err;
>>>>  }
>>>>
>>>> ---8<---
>>>>
>>>> Compared to the chmod approach this has some limitation:
>>>>
>>>> 1. It doesn't support changing permissions, only the owner.  This
>>>>    could be done with umask, but I couldn't find any system call
>>>>    to change the umask for a single thread.
>>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>>>    cannot be created.  I'm not sure whether this is a reasonable
>>>>    limitation for the use cases you have in mind.
>>>> 3. setfsuid() is Linux specific and somehow deprecated according
>>>>    to the manpage:
>>>>
>>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>>>    in new applications"
>>>>
>>>>    I haven't used seteuid, because it changes the euid of the whole
>>>>    process and that may interfere with other operations on OVS.
>>>
>>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
>>> like to work on trying to solve the problem.
>>>
>>>> If these limitations are unacceptable, I can see how we can use
>>>> chmod.  After all, as you point out, it's probably better to do it
>>>> in OVS than in some script.
>>>
>>> I think fchmod and fchown may actually be the correct calls to have, and
>>> will refactor these chown/chmod utils functions as such, which (I
>>> believe) avoids the race as you describe.
>>>
>>>> Thanks for your patience in solving this problem,
>>>
>>> Thanks for your reviews!
>>>
>>>> Daniele
>>
>> I've been trying to find a way to solve this in a portable fashion, but
>> it appears to either require changes in DPDK to let the application
>> create the unix server socket, or a linux-only guarantee with a
>> non-linux racy fallback.  My attempt at the former change was NAK'd by
>> upstream DPDK.  If the latter is unacceptable, then we can drop these
>> patches and hope that client mode will be a workable solution.
>>
>> Thoughts?
> Do you mean as if Open vSwitch would be the one connecting to DPDK
> socket (the client and server roles swapped)?

Exactly that.  The DPDK library calls from open vswitch would be for a
vhost-user client socket to connect to qemu (or whatever is serving the
vhost user socket).  The problem has to do with chmod/chown, and race
conditions therein; fchmod/fchown only work due to linux specific
behavior (and I've confirmed this).  There are a number of methods to
try and catch when directories have been modified, but the issue is
there's no way to close pandora's box (once you've modified the
permissions, you have no clue which inode you touched).  Since there's
no way, that I can tell, to build a transaction for this, it seems to me
either do a linux specific hack, or don't do anything.  I think Daniele
would prefer the don't do anything approach, but I'd like to get
confirmation.

> If so, then I sent a similar patch, but for OpenFlow controller
> sockets - http://openvswitch.org/pipermail/dev/2016-June/073733.html.
> Basically --no-self-confinement flag makes OVS to enter in mode where
> it can connect to sockets outside the /var/run/openvswitch directory.

That looks like a good check for using with vhost user client sockets,
so I'll help integrate it when such a library becomes available.
Daniele Di Proietto July 7, 2016, 1:42 a.m. UTC | #8
On 06/07/2016 11:26, "Aaron Conole" <aconole@redhat.com> wrote:

>Ansis Atteka <aatteka@nicira.com> writes:

>

>> On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <aconole@redhat.com> wrote:

>>> Aaron Conole <aconole@redhat.com> writes:

>>>

>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:

>>>>

>>>>> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:

>>>>>

>>>>>>Aaron Conole <aconole@redhat.com> writes:

>>>>>>

>>>>>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:

>>>>>>>

>>>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole <aconole@redhat.com> wrote:

>>>>>>>>

>>>>>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:

>>>>>>>>>

>>>>>>>>> > Hi Aaron,

>>>>>>>>> >

>>>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)

>>>>>>>>> > user controlled file name.

>>>>>>>>>

>>>>>>>>> I agree, that always seems scary.

>>>>>>>>>

>>>>>>>>> > Before moving forward I wanted to discuss a couple of other options:

>>>>>>>>> >

>>>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way

>>>>>>>>> > qemu can open the socket as root and drop privileges before starting

>>>>>>>>> > guest execution.

>>>>>>>>>

>>>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron

>>>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for

>>>>>>>>> users.  Additionally, I recall there being something of a "don't even

>>>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)

>>>>>>>>> can expound on it.

>>>>>>>>>

>>>>>>>>

>>>>>>>> Hi,

>>>>>>>> IIRC we kind of agree that long term a proper MAC will be much better but

>>>>>>>> most involved people needed something to get it working like "now".

>>>>>>>> Since they are complementary (other than the fix removing a bit of the

>>>>>>>> urgency for more MAC) it was kind of the least bad option.

>>>>>>>>

>>>>>>>> You have to be aware that I brought up the discussion on dev@dpdk.org - see

>>>>>>>> [1] and [2]:

>>>>>>>> But this will take time and eventually still be the applications task to

>>>>>>>> "do something" - no matter if via API or via the chmod's right now.

>>>>>>>> So Aaron is trying to get something that works now until the long term

>>>>>>>> things are in place, which I appreciate.

>>>>>>>>

>>>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get

>>>>>>>> this in time I run with [3] for now.

>>>>>>>> I never intended to suggest that, but with the discussion in place, one

>>>>>>>> could ask if you (Aaron) want to pick up that instead.

>>>>>>>> That would keep OVS free for now until DPDK made up the API (see [2]) for

>>>>>>>> socket ownership control and this then could be implemented in OVS?

>>>>>>>>

>>>>>>>> (I hope) In some months/years we will all be happy to drop this bunch of

>>>>>>>> interim solutions, never the less we need it for now.

>>>>>>>>

>>>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/

>>>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html

>>>>>>>> [3]:

>>>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7

>>>>>>>>

>>>>>>>> [...]

>>>>>>>>

>>>>>>>>

>>>>>>>>> I think originally we quickly discussed 4 possible solutions (and

>>>>>>>>> hopefully I captured them correctly):

>>>>>>>>>

>>>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs

>>>>>>>>>    group.  I don't actually like this solution because kvm could then

>>>>>>>>>    pollute the ovs database.

>>>>>>>>>

>>>>>>>>> 2. OVS runs as some user and sets the user/group ownership of the socket

>>>>>>>>>    via chown/chmod where permissions come from the database (the

>>>>>>>>>    original context had ovs running as root - but as I described above

>>>>>>>>>    it doesn't need to be root provided ovs+DPDK can start without root).

>>>>>>>>>

>>>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and

>>>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have

>>>>>>>>>    implications on other projects.  I don't remember exactly what was

>>>>>>>>>    not as great about this solution, TBH.

>>>>>>>>>

>>>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the

>>>>>>>>>    layering between them.

>>>>>>>>>

>>>>>>>>> I think solution 2 and solution 4 don't actually interfere with each

>>>>>>>>> other, and can be used to a complementary effect (if implemented

>>>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,

>>>>>>>>> the DAC layer can provide appropriate whitelisting behavior.

>>>>>>>>>

>>>>>>>>

>>>>>>>> I also remember several complex changes needed for the #1 and #3 that

>>>>>>>> always would end up with huge effort and a high risk not being accepted.

>>>>>>>> Probably that is what you refer to with "implications on other projects".

>>>>>>>>

>>>>>>>> Also keep in mind the position of dpdk out of the last few discussions

>>>>>>>> which I'd like to summarize as "dpdk got this path from an app, so this app

>>>>>>>> OWNS that path".

>>>>>>>

>>>>>>> I'd like to continue on, but I am not sure what the concerns are right

>>>>>>> now.  Is it possible to enumerate them point by point so that I can

>>>>>>> understand them?  I think there are two outstanding concerns right now:

>>>>>>>

>>>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)

>>>>>>>

>>>>>>> 2. the proposed approach would be better implemented in the utility

>>>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)

>>>>>>>

>>>>>>> Am I understanding the concerns correctly?

>>>>>>

>>>>>>Ping?

>>>>>

>>>>> I found another theoretical problem with the chmod approach, let me try to

>>>>> explain:

>>>>>

>>>>> There's an extremely small race window between the socket creation and the

>>>>> chmod which could theoretically be exploited to change the owner of a socket

>>>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:

>>>>>

>>>>> 1. There's no southbound database running, because it's not yet been

>>>>>    started or because it's being restarted.

>>>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.

>>>>>    rte_vhost_driver_register() succeeds and creates a socket in the file

>>>>>    system.

>>>>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates

>>>>>    it.

>>>>> 4. Now OVS changes the owner and the permission of what it thinks is a

>>>>>    vhost-user socket.

>>>>>

>>>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small

>>>>> window, and it's unlikely that an attacker can control when the southbound

>>>>> database is restarted.

>>>>>

>>>>> I feel like I'm nitpicking, but I'm not sure how serious is the security

>>>>> impact of what I'm describing.

>>>>>

>>>>> I suggested an alternative approach, and I've tried implementing a quick

>>>>> POC on top on your patch:

>>>>>

>>>>> ---8<---

>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c

>>>>> index 24ebb41..d7adc66 100644

>>>>> --- a/lib/netdev-dpdk.c

>>>>> +++ b/lib/netdev-dpdk.c

>>>>> @@ -30,6 +30,7 @@

>>>>>  #include <sys/types.h>

>>>>>  #include <sys/stat.h>

>>>>>  #include <getopt.h>

>>>>> +#include <sys/fsuid.h>

>>>>>

>>>>>  #include "chutil.h"

>>>>>  #include "dirs.h"

>>>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)

>>>>>       */

>>>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",

>>>>>               vhost_sock_dir, name);

>>>>> +    uid_t orig_u = geteuid();

>>>>> +    gid_t orig_g = getegid();

>>>>> +    if (vhost_sock_def_owner) {

>>>>> +        uid_t u;

>>>>> +        gid_t g;

>>>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {

>>>>> +            VLOG_INFO("UID: %d GID: %d", u, g);

>>>>> +            setfsuid(u);

>>>>> +            setfsgid(g);

>>>>> +        }

>>>>> +    }

>>>>>

>>>>>      err = rte_vhost_driver_register(dev->vhost_id);

>>>>>      if (err) {

>>>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)

>>>>>          err = vhost_construct_helper(netdev);

>>>>>      }

>>>>>

>>>>> -    ovs_mutex_unlock(&dpdk_mutex);

>>>>> -    if (!err && vhost_sock_def_owner &&

>>>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {

>>>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");

>>>>> +    if (vhost_sock_def_owner) {

>>>>> +        setfsuid(orig_u);

>>>>> +        setfsgid(orig_g);

>>>>>      }

>>>>>

>>>>> -    if (!err && vhost_sock_def_perms &&

>>>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {

>>>>> -        VLOG_ERR("vhost-user socket device permission change failed.");

>>>>> -    }

>>>>> +    ovs_mutex_unlock(&dpdk_mutex);

>>>>>

>>>>>      return err;

>>>>>  }

>>>>>

>>>>> ---8<---

>>>>>

>>>>> Compared to the chmod approach this has some limitation:

>>>>>

>>>>> 1. It doesn't support changing permissions, only the owner.  This

>>>>>    could be done with umask, but I couldn't find any system call

>>>>>    to change the umask for a single thread.

>>>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket

>>>>>    cannot be created.  I'm not sure whether this is a reasonable

>>>>>    limitation for the use cases you have in mind.

>>>>> 3. setfsuid() is Linux specific and somehow deprecated according

>>>>>    to the manpage:

>>>>>

>>>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided

>>>>>    in new applications"

>>>>>

>>>>>    I haven't used seteuid, because it changes the euid of the whole

>>>>>    process and that may interfere with other operations on OVS.

>>>>

>>>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd

>>>> like to work on trying to solve the problem.

>>>>

>>>>> If these limitations are unacceptable, I can see how we can use

>>>>> chmod.  After all, as you point out, it's probably better to do it

>>>>> in OVS than in some script.

>>>>

>>>> I think fchmod and fchown may actually be the correct calls to have, and

>>>> will refactor these chown/chmod utils functions as such, which (I

>>>> believe) avoids the race as you describe.

>>>>

>>>>> Thanks for your patience in solving this problem,

>>>>

>>>> Thanks for your reviews!

>>>>

>>>>> Daniele

>>>

>>> I've been trying to find a way to solve this in a portable fashion, but

>>> it appears to either require changes in DPDK to let the application

>>> create the unix server socket, or a linux-only guarantee with a

>>> non-linux racy fallback.  My attempt at the former change was NAK'd by

>>> upstream DPDK.  If the latter is unacceptable, then we can drop these

>>> patches and hope that client mode will be a workable solution.

>>>

>>> Thoughts?

>> Do you mean as if Open vSwitch would be the one connecting to DPDK

>> socket (the client and server roles swapped)?

>

>Exactly that.  The DPDK library calls from open vswitch would be for a

>vhost-user client socket to connect to qemu (or whatever is serving the

>vhost user socket).  The problem has to do with chmod/chown, and race

>conditions therein; fchmod/fchown only work due to linux specific

>behavior (and I've confirmed this).  There are a number of methods to

>try and catch when directories have been modified, but the issue is

>there's no way to close pandora's box (once you've modified the

>permissions, you have no clue which inode you touched).  Since there's

>no way, that I can tell, to build a transaction for this, it seems to me

>either do a linux specific hack, or don't do anything.  I think Daniele

>would prefer the don't do anything approach, but I'd like to get

>confirmation.


The code in netdev-dpdk is already Linux specific, I wouldn't object to
a Linux specific solution.

How would that work?  Does the latest DPDK expose the file descriptor?

Thanks,

Daniele
Aaron Conole July 7, 2016, 2:59 a.m. UTC | #9
Daniele Di Proietto <diproiettod@vmware.com> writes:

> On 06/07/2016 11:26, "Aaron Conole" <aconole@redhat.com> wrote:
>
>>Ansis Atteka <aatteka@nicira.com> writes:
>>
>>> On Wed, Jul 6, 2016 at 7:24 AM, Aaron Conole <aconole@redhat.com> wrote:
>>>> Aaron Conole <aconole@redhat.com> writes:
>>>>
>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>
>>>>>> On 10/06/2016 10:51, "Aaron Conole" <aconole@redhat.com> wrote:
>>>>>>
>>>>>>>Aaron Conole <aconole@redhat.com> writes:
>>>>>>>
>>>>>>>> Christian Ehrhardt <christian.ehrhardt@canonical.com> writes:
>>>>>>>>
>>>>>>>>> On Tue, May 24, 2016 at 4:10 PM, Aaron Conole
>>>>>>>>> <aconole@redhat.com> wrote:
>>>>>>>>>
>>>>>>>>>> Daniele Di Proietto <diproiettod@vmware.com> writes:
>>>>>>>>>>
>>>>>>>>>> > Hi Aaron,
>>>>>>>>>> >
>>>>>>>>>> > I'm still a little bit nervous about calling chown on a (partially)
>>>>>>>>>> > user controlled file name.
>>>>>>>>>>
>>>>>>>>>> I agree, that always seems scary.
>>>>>>>>>>
>>>>>>>>>> > Before moving forward I wanted to discuss a couple of other options:
>>>>>>>>>> >
>>>>>>>>>> > * Ansis (in CC) suggested using -runas parameter in qemu.  This way
>>>>>>>>>> > qemu can open the socket as root and drop privileges before starting
>>>>>>>>>> > guest execution.
>>>>>>>>>>
>>>>>>>>>> I'm not sure how to do this with libvirt, or via the OpenStack Neutron
>>>>>>>>>> plugin.  I also don't know if it would be an acceptable workaround for
>>>>>>>>>> users.  Additionally, I recall there being something of a "don't even
>>>>>>>>>> know if this works" around it.  Maybe Christian or Ansis (both in CC)
>>>>>>>>>> can expound on it.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>> IIRC we kind of agree that long term a proper MAC will be
>>>>>>>>> much better but
>>>>>>>>> most involved people needed something to get it working like "now".
>>>>>>>>> Since they are complementary (other than the fix removing a bit of the
>>>>>>>>> urgency for more MAC) it was kind of the least bad option.
>>>>>>>>>
>>>>>>>>> You have to be aware that I brought up the discussion on
>>>>>>>>> dev@dpdk.org - see
>>>>>>>>> [1] and [2]:
>>>>>>>>> But this will take time and eventually still be the
>>>>>>>>> applications task to
>>>>>>>>> "do something" - no matter if via API or via the chmod's right now.
>>>>>>>>> So Aaron is trying to get something that works now until the long term
>>>>>>>>> things are in place, which I appreciate.
>>>>>>>>>
>>>>>>>>> FYI - I was even more in a hurry as it was clear that OVS-2.5 won't get
>>>>>>>>> this in time I run with [3] for now.
>>>>>>>>> I never intended to suggest that, but with the discussion in place, one
>>>>>>>>> could ask if you (Aaron) want to pick up that instead.
>>>>>>>>> That would keep OVS free for now until DPDK made up the API
>>>>>>>>> (see [2]) for
>>>>>>>>> socket ownership control and this then could be implemented in OVS?
>>>>>>>>>
>>>>>>>>> (I hope) In some months/years we will all be happy to drop
>>>>>>>>> this bunch of
>>>>>>>>> interim solutions, never the less we need it for now.
>>>>>>>>>
>>>>>>>>> [1]: http://dpdk.org/dev/patchwork/patch/12222/
>>>>>>>>> [2]: http://dpdk.org/ml/archives/dev/2015-December/030326.html
>>>>>>>>> [3]:
>>>>>>>>> https://git.launchpad.net/~ubuntu-server/dpdk/commit/?h=ubuntu-xenial-to-dpdk2.2&id=f3c7aa1b2ddea8e092ad4a89e41a0e19d01ed4e7
>>>>>>>>>
>>>>>>>>> [...]
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I think originally we quickly discussed 4 possible solutions (and
>>>>>>>>>> hopefully I captured them correctly):
>>>>>>>>>>
>>>>>>>>>> 1. OVS downgrades to the ovs user, and kvm runs under the ovs
>>>>>>>>>>    group.  I don't actually like this solution because kvm could then
>>>>>>>>>>    pollute the ovs database.
>>>>>>>>>>
>>>>>>>>>> 2. OVS runs as some user and sets the user/group ownership
>>>>>>>>>> of the socket
>>>>>>>>>>    via chown/chmod where permissions come from the database (the
>>>>>>>>>>    original context had ovs running as root - but as I described above
>>>>>>>>>>    it doesn't need to be root provided ovs+DPDK can start
>>>>>>>>>> without root).
>>>>>>>>>>
>>>>>>>>>> 3. OVS runs as some user, kvm starts as root, opens the socket and
>>>>>>>>>>    downgrades.  IIRC, this doesn't actually work, or it may have
>>>>>>>>>>    implications on other projects.  I don't remember exactly what was
>>>>>>>>>>    not as great about this solution, TBH.
>>>>>>>>>>
>>>>>>>>>> 4. OVS and KVM run as whatever users; MAC is used to enforce the
>>>>>>>>>>    layering between them.
>>>>>>>>>>
>>>>>>>>>> I think solution 2 and solution 4 don't actually interfere with each
>>>>>>>>>> other, and can be used to a complementary effect (if implemented
>>>>>>>>>> properly) so that the MAC layer enforces access, but even without MAC,
>>>>>>>>>> the DAC layer can provide appropriate whitelisting behavior.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I also remember several complex changes needed for the #1 and #3 that
>>>>>>>>> always would end up with huge effort and a high risk not
>>>>>>>>> being accepted.
>>>>>>>>> Probably that is what you refer to with "implications on
>>>>>>>>> other projects".
>>>>>>>>>
>>>>>>>>> Also keep in mind the position of dpdk out of the last few discussions
>>>>>>>>> which I'd like to summarize as "dpdk got this path from an
>>>>>>>>> app, so this app
>>>>>>>>> OWNS that path".
>>>>>>>>
>>>>>>>> I'd like to continue on, but I am not sure what the concerns are right
>>>>>>>> now.  Is it possible to enumerate them point by point so that I can
>>>>>>>> understand them?  I think there are two outstanding concerns right now:
>>>>>>>>
>>>>>>>> 1. the proposed approach is not good enough (vis-a-vis DAC vs. MAC)
>>>>>>>>
>>>>>>>> 2. the proposed approach would be better implemented in the utility
>>>>>>>>    that wants access to the sockets (vis-a-vis the libvirt discussion)
>>>>>>>>
>>>>>>>> Am I understanding the concerns correctly?
>>>>>>>
>>>>>>>Ping?
>>>>>>
>>>>>> I found another theoretical problem with the chmod approach, let me try to
>>>>>> explain:
>>>>>>
>>>>>> There's an extremely small race window between the socket creation and the
>>>>>> chmod which could theoretically be exploited to change the owner of a socket
>>>>>> (e.g. ovnsb_db.sock) in ovs rundir, by controlling the name of the port:
>>>>>>
>>>>>> 1. There's no southbound database running, because it's not yet been
>>>>>>    started or because it's being restarted.
>>>>>> 2. The user creates a vhost port, naming it ovnsb_db.sock.
>>>>>>    rte_vhost_driver_register() succeeds and creates a socket in the file
>>>>>>    system.
>>>>>> 3. The southbound database is started, it removes ovnsb_db.sock and recreates
>>>>>>    it.
>>>>>> 4. Now OVS changes the owner and the permission of what it thinks is a
>>>>>>    vhost-user socket.
>>>>>>
>>>>>> If 3 manages to get between 2 and 4, we have a problem. It's a pretty small
>>>>>> window, and it's unlikely that an attacker can control when the southbound
>>>>>> database is restarted.
>>>>>>
>>>>>> I feel like I'm nitpicking, but I'm not sure how serious is the security
>>>>>> impact of what I'm describing.
>>>>>>
>>>>>> I suggested an alternative approach, and I've tried implementing a quick
>>>>>> POC on top on your patch:
>>>>>>
>>>>>> ---8<---
>>>>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>>>>> index 24ebb41..d7adc66 100644
>>>>>> --- a/lib/netdev-dpdk.c
>>>>>> +++ b/lib/netdev-dpdk.c
>>>>>> @@ -30,6 +30,7 @@
>>>>>>  #include <sys/types.h>
>>>>>>  #include <sys/stat.h>
>>>>>>  #include <getopt.h>
>>>>>> +#include <sys/fsuid.h>
>>>>>>
>>>>>>  #include "chutil.h"
>>>>>>  #include "dirs.h"
>>>>>> @@ -891,6 +892,17 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>>>       */
>>>>>>      snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
>>>>>>               vhost_sock_dir, name);
>>>>>> +    uid_t orig_u = geteuid();
>>>>>> +    gid_t orig_g = getegid();
>>>>>> +    if (vhost_sock_def_owner) {
>>>>>> +        uid_t u;
>>>>>> +        gid_t g;
>>>>>> +        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
>>>>>> +            VLOG_INFO("UID: %d GID: %d", u, g);
>>>>>> +            setfsuid(u);
>>>>>> +            setfsgid(g);
>>>>>> +        }
>>>>>> +    }
>>>>>>
>>>>>>      err = rte_vhost_driver_register(dev->vhost_id);
>>>>>>      if (err) {
>>>>>> @@ -903,16 +915,12 @@ netdev_dpdk_vhost_user_construct(struct netdev *netdev)
>>>>>>          err = vhost_construct_helper(netdev);
>>>>>>      }
>>>>>>
>>>>>> -    ovs_mutex_unlock(&dpdk_mutex);
>>>>>> -    if (!err && vhost_sock_def_owner &&
>>>>>> -        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
>>>>>> -        VLOG_ERR("vhost-user socket device ownership change failed.");
>>>>>> +    if (vhost_sock_def_owner) {
>>>>>> +        setfsuid(orig_u);
>>>>>> +        setfsgid(orig_g);
>>>>>>      }
>>>>>>
>>>>>> -    if (!err && vhost_sock_def_perms &&
>>>>>> -        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
>>>>>> -        VLOG_ERR("vhost-user socket device permission change failed.");
>>>>>> -    }
>>>>>> +    ovs_mutex_unlock(&dpdk_mutex);
>>>>>>
>>>>>>      return err;
>>>>>>  }
>>>>>>
>>>>>> ---8<---
>>>>>>
>>>>>> Compared to the chmod approach this has some limitation:
>>>>>>
>>>>>> 1. It doesn't support changing permissions, only the owner.  This
>>>>>>    could be done with umask, but I couldn't find any system call
>>>>>>    to change the umask for a single thread.
>>>>>> 2. Unless vhost-sock-dir is owned by the target owner, the socket
>>>>>>    cannot be created.  I'm not sure whether this is a reasonable
>>>>>>    limitation for the use cases you have in mind.
>>>>>> 3. setfsuid() is Linux specific and somehow deprecated according
>>>>>>    to the manpage:
>>>>>>
>>>>>>    "Thus, setfsuid() is nowadays unneeded and should be avoided
>>>>>>    in new applications"
>>>>>>
>>>>>>    I haven't used seteuid, because it changes the euid of the whole
>>>>>>    process and that may interfere with other operations on OVS.
>>>>>
>>>>> Thanks for this PoC, and explanation.  I agree, there is a race, and I'd
>>>>> like to work on trying to solve the problem.
>>>>>
>>>>>> If these limitations are unacceptable, I can see how we can use
>>>>>> chmod.  After all, as you point out, it's probably better to do it
>>>>>> in OVS than in some script.
>>>>>
>>>>> I think fchmod and fchown may actually be the correct calls to have, and
>>>>> will refactor these chown/chmod utils functions as such, which (I
>>>>> believe) avoids the race as you describe.
>>>>>
>>>>>> Thanks for your patience in solving this problem,
>>>>>
>>>>> Thanks for your reviews!
>>>>>
>>>>>> Daniele
>>>>
>>>> I've been trying to find a way to solve this in a portable fashion, but
>>>> it appears to either require changes in DPDK to let the application
>>>> create the unix server socket, or a linux-only guarantee with a
>>>> non-linux racy fallback.  My attempt at the former change was NAK'd by
>>>> upstream DPDK.  If the latter is unacceptable, then we can drop these
>>>> patches and hope that client mode will be a workable solution.
>>>>
>>>> Thoughts?
>>> Do you mean as if Open vSwitch would be the one connecting to DPDK
>>> socket (the client and server roles swapped)?
>>
>>Exactly that.  The DPDK library calls from open vswitch would be for a
>>vhost-user client socket to connect to qemu (or whatever is serving the
>>vhost user socket).  The problem has to do with chmod/chown, and race
>>conditions therein; fchmod/fchown only work due to linux specific
>>behavior (and I've confirmed this).  There are a number of methods to
>>try and catch when directories have been modified, but the issue is
>>there's no way to close pandora's box (once you've modified the
>>permissions, you have no clue which inode you touched).  Since there's
>>no way, that I can tell, to build a transaction for this, it seems to me
>>either do a linux specific hack, or don't do anything.  I think Daniele
>>would prefer the don't do anything approach, but I'd like to get
>>confirmation.
>
> The code in netdev-dpdk is already Linux specific, I wouldn't object to
> a Linux specific solution.

Okay, that's good to know.

> How would that work?  Does the latest DPDK expose the file descriptor?

Well, my plan was to get DPDK to expose it.  Thinking more, if they
won't let me pass the descriptor, I'm doubtful they will give it back to
me.

Without the file descriptor, the best I can do is walk the directory
path using openat() and make sure no symlinks were put in the way.  Then
I can fchownat() with AT_SYMLINK_NOFOLLOW.  It doesn't prevent hardlinks
being used to beat the race, but it will prevent _symlink_
shenanigans in a cross-platform way, and it seemed to work under FreeBSD
(although I need to test again to make sure I didn't misremember).

Would that solution work?  It's a bit heavier than the others.

> Thanks,
>
> Daniele
diff mbox

Patch

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 24ebb41..d7adc66 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -30,6 +30,7 @@ 
 #include <sys/types.h>
 #include <sys/stat.h>
 #include <getopt.h>
+#include <sys/fsuid.h>
 
 #include "chutil.h"
 #include "dirs.h"
@@ -891,6 +892,17 @@  netdev_dpdk_vhost_user_construct(struct netdev *netdev)
      */
     snprintf(dev->vhost_id, sizeof(dev->vhost_id), "%s/%s",
              vhost_sock_dir, name);
+    uid_t orig_u = geteuid();
+    gid_t orig_g = getegid();
+    if (vhost_sock_def_owner) {
+        uid_t u;
+        gid_t g;
+        if (!ovs_strtousr(vhost_sock_def_owner, &u, NULL, &g, false)) {
+            VLOG_INFO("UID: %d GID: %d", u, g);
+            setfsuid(u);
+            setfsgid(g);
+        }
+    }
 
     err = rte_vhost_driver_register(dev->vhost_id);
     if (err) {
@@ -903,16 +915,12 @@  netdev_dpdk_vhost_user_construct(struct netdev *netdev)
         err = vhost_construct_helper(netdev);
     }
 
-    ovs_mutex_unlock(&dpdk_mutex);
-    if (!err && vhost_sock_def_owner &&
-        (err = ovs_chown(dev->vhost_id, vhost_sock_def_owner))) {
-        VLOG_ERR("vhost-user socket device ownership change failed.");
+    if (vhost_sock_def_owner) {
+        setfsuid(orig_u);
+        setfsgid(orig_g);
     }
 
-    if (!err && vhost_sock_def_perms &&
-        (err = ovs_chmod(dev->vhost_id, vhost_sock_def_perms))) {
-        VLOG_ERR("vhost-user socket device permission change failed.");
-    }
+    ovs_mutex_unlock(&dpdk_mutex);
 
     return err;
 }