[ovs-dev,v2,1/8] netlink: provide network namespace id from a msg.

Message ID 20171109173107.26256-2-fbl@redhat.com
State Changes Requested
Headers show
Series
  • Add minimum network namespace support.
Related show

Commit Message

Flavio Leitner Nov. 9, 2017, 5:31 p.m.
The netlink notification's ancillary data contains the network
namespace id (netnsid) needed to identify the device correctly.
(ifindex and netnsid).

Signed-off-by: Flavio Leitner <fbl@redhat.com>
---

ChangeLog:
* V2
  - report and close unexpected file descriptors.


 configure.ac           |   3 +-
 lib/automake.mk        |   1 +
 lib/dpif-netlink.c     |   6 +--
 lib/netdev-linux.c     |   2 +-
 lib/netlink-notifier.c |   2 +-
 lib/netlink-socket.c   |  53 +++++++++++++++++++---
 lib/netlink-socket.h   |   4 +-
 lib/netns.h            | 119 +++++++++++++++++++++++++++++++++++++++++++++++++
 utilities/nlmon.c      |   2 +-
 9 files changed, 178 insertions(+), 14 deletions(-)
 create mode 100644 lib/netns.h

Comments

Ben Pfaff Dec. 1, 2017, 9:14 p.m. | #1
On Thu, Nov 09, 2017 at 03:31:00PM -0200, Flavio Leitner wrote:
> The netlink notification's ancillary data contains the network
> namespace id (netnsid) needed to identify the device correctly.
> (ifindex and netnsid).
> 
> Signed-off-by: Flavio Leitner <fbl@redhat.com>
> ---
> 
> ChangeLog:
> * V2
>   - report and close unexpected file descriptors.

Thanks for working on improving OVS support for network namespaces.
This work is long overdue (Jiri talked about it over two years ago at
ovscon) and I am happy to see progress.

netns.h uses a struct to represent a network namespace.  I am not sure
yet whether that is necessary.  It seems like a network namespace is a
lot like a process ID: I think that only 0...INT_MAX is actually used.
If so, then I believe that we could use just an "int", with -1 and -2 to
represent the "local" and "invalid" values.  Then, wouldn't many of the
operations be simply natural C operators?  I admit that I don't
understand the special cases for NETNS_INVALID; maybe the code should
document the intended semantics for NETNS_INVALID in the comments.

I think that the 'ns' parameter to nl_sock_recv() receives the network
namespace from which the message was sent.  It might be worth saying
that in the function comment; just saying "the network namespace
information" leaves the reader to guess.

Thanks,

Ben.
Flavio Leitner Dec. 4, 2017, 1:44 p.m. | #2
On Fri, 1 Dec 2017 13:14:07 -0800
Ben Pfaff <blp@ovn.org> wrote:

> On Thu, Nov 09, 2017 at 03:31:00PM -0200, Flavio Leitner wrote:
> > The netlink notification's ancillary data contains the network
> > namespace id (netnsid) needed to identify the device correctly.
> > (ifindex and netnsid).
> > 
> > Signed-off-by: Flavio Leitner <fbl@redhat.com>
> > ---
> > 
> > ChangeLog:
> > * V2
> >   - report and close unexpected file descriptors.  
> 
> Thanks for working on improving OVS support for network namespaces.
> This work is long overdue (Jiri talked about it over two years ago at
> ovscon) and I am happy to see progress.
> 
> netns.h uses a struct to represent a network namespace.  I am not sure
> yet whether that is necessary.  It seems like a network namespace is a
> lot like a process ID: I think that only 0...INT_MAX is actually used.
> If so, then I believe that we could use just an "int", with -1 and -2 to

There was no consensus regarding to the network namespace ID range,
unfortunately.  The only thing we have is NETNSA_NSID_NOT_ASSIGNED.

It is only positive numbers if you review the kernel sources today,
but the API is exposing signed 32bit, so not sure what can happen
in the future if the kernel decides to expand the range.

> represent the "local" and "invalid" values.  Then, wouldn't many of the
> operations be simply natural C operators?  I admit that I don't

Yes, if we assume the ID range is positive only. 
Looks like iproute would break in that case, so maybe we can follow it.

> understand the special cases for NETNS_INVALID; maybe the code should
> document the intended semantics for NETNS_INVALID in the comments.

That represents a temporary state where the network namespace information
can't be used because it's out-of-sync. When it's in that state, it will
query the kernel before do anything that would require netnsid.

I will add more details.

> I think that the 'ns' parameter to nl_sock_recv() receives the network
> namespace from which the message was sent.  It might be worth saying
> that in the function comment; just saying "the network namespace
> information" leaves the reader to guess.

That's a good idea.

Thanks!
Jiri Benc Dec. 4, 2017, 2:08 p.m. | #3
On Mon, 4 Dec 2017 11:44:19 -0200, Flavio Leitner wrote:
> It is only positive numbers if you review the kernel sources today,
> but the API is exposing signed 32bit, so not sure what can happen
> in the future if the kernel decides to expand the range.

I think you can assume that it will always be a positive number.

 Jiri

Patch

diff --git a/configure.ac b/configure.ac
index 9e0081832..ad036fc67 100644
--- a/configure.ac
+++ b/configure.ac
@@ -113,7 +113,8 @@  AC_CHECK_MEMBERS([struct sockaddr_in6.sin6_scope_id], [], [],
   [[#include <sys/socket.h>
 #include <netinet/in.h>]])
 AC_CHECK_FUNCS([mlockall strnlen getloadavg statvfs getmntent_r sendmmsg clock_gettime])
-AC_CHECK_HEADERS([mntent.h sys/statvfs.h linux/types.h linux/if_ether.h stdatomic.h])
+AC_CHECK_HEADERS([mntent.h sys/statvfs.h linux/types.h linux/if_ether.h])
+AC_CHECK_HEADERS([linux/net_namespace.h stdatomic.h])
 AC_CHECK_HEADERS([net/if_mib.h], [], [], [[#include <sys/types.h>
 #include <net/if.h>]])
 
diff --git a/lib/automake.mk b/lib/automake.mk
index effe5b5c2..6b89716ae 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -140,6 +140,7 @@  lib_libopenvswitch_la_SOURCES = \
 	lib/netflow.h \
 	lib/netlink.c \
 	lib/netlink.h \
+	lib/netns.h \
 	lib/nx-match.c \
 	lib/nx-match.h \
 	lib/object-collection.c \
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index c265909f8..fd333094d 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -1287,7 +1287,7 @@  dpif_netlink_port_poll(const struct dpif *dpif_, char **devnamep)
         int error;
 
         ofpbuf_use_stub(&buf, buf_stub, sizeof buf_stub);
-        error = nl_sock_recv(dpif->port_notifier, &buf, false);
+        error = nl_sock_recv(dpif->port_notifier, &buf, NULL, false);
         if (!error) {
             struct dpif_netlink_vport vport;
 
@@ -2621,7 +2621,7 @@  dpif_netlink_recv_windows(struct dpif_netlink *dpif, uint32_t handler_id,
                 return EAGAIN;
             }
 
-            error = nl_sock_recv(sock_pool[i].nl_sock, buf, false);
+            error = nl_sock_recv(sock_pool[i].nl_sock, buf, NULL, false);
             if (error == ENOBUFS) {
                 /* ENOBUFS typically means that we've received so many
                  * packets that the buffer overflowed.  Try again
@@ -2696,7 +2696,7 @@  dpif_netlink_recv__(struct dpif_netlink *dpif, uint32_t handler_id,
                 return EAGAIN;
             }
 
-            error = nl_sock_recv(ch->sock, buf, false);
+            error = nl_sock_recv(ch->sock, buf, NULL, false);
             if (error == ENOBUFS) {
                 /* ENOBUFS typically means that we've received so many
                  * packets that the buffer overflowed.  Try again
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index fbbbb7205..184822816 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -638,7 +638,7 @@  netdev_linux_run(const struct netdev_class *netdev_class OVS_UNUSED)
         struct ofpbuf buf;
 
         ofpbuf_use_stub(&buf, buf_stub, sizeof buf_stub);
-        error = nl_sock_recv(sock, &buf, false);
+        error = nl_sock_recv(sock, &buf, NULL, false);
         if (!error) {
             struct rtnetlink_change change;
 
diff --git a/lib/netlink-notifier.c b/lib/netlink-notifier.c
index 3acded418..d33904658 100644
--- a/lib/netlink-notifier.c
+++ b/lib/netlink-notifier.c
@@ -187,7 +187,7 @@  nln_run(struct nln *nln)
         int error;
 
         ofpbuf_use_stub(&buf, buf_stub, sizeof buf_stub);
-        error = nl_sock_recv(nln->notify_sock, &buf, false);
+        error = nl_sock_recv(nln->notify_sock, &buf, NULL, false);
         if (!error) {
             int group = nln->parse(&buf, nln->change);
 
diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 317bf907f..4e0317fbe 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -19,6 +19,7 @@ 
 #include <errno.h>
 #include <inttypes.h>
 #include <stdlib.h>
+#include <sys/socket.h>
 #include <sys/types.h>
 #include <sys/uio.h>
 #include <unistd.h>
@@ -607,7 +608,8 @@  nl_sock_send_seq(struct nl_sock *sock, const struct ofpbuf *msg,
 }
 
 static int
-nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
+nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, struct netns *ns,
+               bool wait)
 {
     /* We can't accurately predict the size of the data to be received.  The
      * caller is supposed to have allocated enough space in 'buf' to handle the
@@ -618,7 +620,10 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
     uint8_t tail[65536];
     struct iovec iov[2];
     struct msghdr msg;
+    uint8_t msgctrl[64];
+    struct cmsghdr *cmsg;
     ssize_t retval;
+    int *ptr;
     int error;
 
     ovs_assert(buf->allocated >= sizeof *nlmsghdr);
@@ -632,6 +637,8 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
     memset(&msg, 0, sizeof msg);
     msg.msg_iov = iov;
     msg.msg_iovlen = 2;
+    msg.msg_control = msgctrl;
+    msg.msg_controllen = sizeof msgctrl;
 
     /* Receive a Netlink message from the kernel.
      *
@@ -706,6 +713,38 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
     }
 #endif
 
+    if (ns) {
+        /* The network namespace id comes as ancillary data. For older
+         * kernels, this data is either not available or it might be -1,
+         * so it falls back to local network namespace (no id). Latest
+         * kernels return a valid ID only if available or nothing. */
+        netns_set_local(ns);
+        cmsg = CMSG_FIRSTHDR(&msg);
+        while (cmsg != NULL) {
+            if (cmsg->cmsg_level == SOL_NETLINK
+                && cmsg->cmsg_type == NETLINK_LISTEN_ALL_NSID) {
+                ptr = ALIGNED_CAST(int *, CMSG_DATA(cmsg));
+                netns_set_id(ns, *ptr);
+            }
+            if (cmsg->cmsg_level == SOL_SOCKET
+                && cmsg->cmsg_type == SCM_RIGHTS) {
+                /* This is unexpected and unwanted, close all fds */
+                int nfds;
+                int i;
+                nfds = (cmsg->cmsg_len - CMSG_ALIGN(sizeof(struct cmsghdr)))
+                       / sizeof(int);
+                ptr = ALIGNED_CAST(int *, CMSG_DATA(cmsg));
+                for (i = 0; i < nfds; i++) {
+                    VLOG_ERR_RL(&rl, "closing unexpected received fd (%d).",
+                                ptr[i]);
+                    close(ptr[i]);
+                }
+            }
+
+            cmsg = CMSG_NXTHDR(&msg, cmsg);
+        }
+    }
+
     log_nlmsg(__func__, 0, buf->data, buf->size, sock->protocol);
     COVERAGE_INC(netlink_received);
 
@@ -714,7 +753,8 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
 
 /* Tries to receive a Netlink message from the kernel on 'sock' into 'buf'.  If
  * 'wait' is true, waits for a message to be ready.  Otherwise, fails with
- * EAGAIN if the 'sock' receive buffer is empty.
+ * EAGAIN if the 'sock' receive buffer is empty.  If 'ns' is provided, the
+ * network namespace information will be provided.
  *
  * The caller must have initialized 'buf' with an allocation of at least
  * NLMSG_HDRLEN bytes.  For best performance, the caller should allocate enough
@@ -730,9 +770,10 @@  nl_sock_recv__(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
  * Regardless of success or failure, this function resets 'buf''s headroom to
  * 0. */
 int
-nl_sock_recv(struct nl_sock *sock, struct ofpbuf *buf, bool wait)
+nl_sock_recv(struct nl_sock *sock, struct ofpbuf *buf, struct netns *ns,
+             bool wait)
 {
-    return nl_sock_recv__(sock, buf, wait);
+    return nl_sock_recv__(sock, buf, ns, wait);
 }
 
 static void
@@ -821,7 +862,7 @@  nl_sock_transact_multiple__(struct nl_sock *sock,
         }
 
         /* Receive a reply. */
-        error = nl_sock_recv__(sock, buf_txn->reply, false);
+        error = nl_sock_recv__(sock, buf_txn->reply, NULL, false);
         if (error) {
             if (error == EAGAIN) {
                 nl_sock_record_errors__(transactions, n, 0);
@@ -1101,7 +1142,7 @@  nl_dump_refill(struct nl_dump *dump, struct ofpbuf *buffer)
     int error;
 
     while (!buffer->size) {
-        error = nl_sock_recv__(dump->sock, buffer, false);
+        error = nl_sock_recv__(dump->sock, buffer, NULL, false);
         if (error) {
             /* The kernel never blocks providing the results of a dump, so
              * error == EAGAIN means that we've read the whole thing, and
diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h
index d3cc64288..348483fad 100644
--- a/lib/netlink-socket.h
+++ b/lib/netlink-socket.h
@@ -193,6 +193,7 @@ 
 #include <stdbool.h>
 #include <stddef.h>
 #include <stdint.h>
+#include "netns.h"
 #include "openvswitch/ofpbuf.h"
 #include "ovs-atomic.h"
 #include "ovs-thread.h"
@@ -221,7 +222,8 @@  int nl_sock_unsubscribe_packets(struct nl_sock *sock);
 int nl_sock_send(struct nl_sock *, const struct ofpbuf *, bool wait);
 int nl_sock_send_seq(struct nl_sock *, const struct ofpbuf *,
                      uint32_t nlmsg_seq, bool wait);
-int nl_sock_recv(struct nl_sock *, struct ofpbuf *, bool wait);
+int nl_sock_recv(struct nl_sock *, struct ofpbuf *, struct netns *,
+                 bool wait);
 
 int nl_sock_drain(struct nl_sock *);
 
diff --git a/lib/netns.h b/lib/netns.h
new file mode 100644
index 000000000..2e8bd8b0c
--- /dev/null
+++ b/lib/netns.h
@@ -0,0 +1,119 @@ 
+/*
+ * Copyright (c) 2017 Red Hat Inc.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at:
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef NETNS_H
+#define NETNS_H 1
+
+#include <stdbool.h>
+
+#ifdef HAVE_LINUX_NET_NAMESPACE_H
+#include <linux/net_namespace.h>
+#define NETNS_NOT_ASSIGNED NETNSA_NSID_NOT_ASSIGNED
+#else
+#define NETNS_NOT_ASSIGNED -1
+#endif
+
+enum netns_state {
+    NETNS_INVALID,      /* not initialized yet */
+    NETNS_LOCAL,        /* local or not supported on older kernels */
+    NETNS_REMOTE        /* on another network namespace with valid ID */
+};
+
+struct netns {
+    enum netns_state state;
+    int id;
+};
+
+/* Prototypes */
+static inline void netns_set_id(struct netns *ns, int id);
+static inline void netns_set_invalid(struct netns *ns);
+static inline bool netns_is_invalid(struct netns *ns);
+static inline void netns_set_local(struct netns *ns);
+static inline bool netns_is_local(struct netns *ns);
+static inline bool netns_is_remote(struct netns *ns);
+static inline bool netns_eq(const struct netns *a, const struct netns *b);
+static inline void netns_copy(struct netns *dst, const struct netns *src);
+
+/* Functions */
+static inline void
+netns_set_id(struct netns *ns, int id)
+{
+    if (!ns) {
+        return;
+    }
+
+    if (id == NETNS_NOT_ASSIGNED) {
+        ns->state = NETNS_LOCAL;
+    } else {
+        ns->state = NETNS_REMOTE;
+        ns->id = id;
+    }
+}
+
+static inline void
+netns_set_invalid(struct netns *ns)
+{
+    ns->state = NETNS_INVALID;
+}
+
+static inline bool
+netns_is_invalid(struct netns *ns)
+{
+    return ns->state == NETNS_INVALID;
+}
+
+static inline void
+netns_set_local(struct netns *ns)
+{
+    ns->state = NETNS_LOCAL;
+}
+
+static inline bool
+netns_is_local(struct netns *ns)
+{
+    return (ns->state == NETNS_LOCAL);
+}
+
+static inline bool
+netns_is_remote(struct netns *ns)
+{
+    return (ns->state == NETNS_REMOTE);
+}
+
+static inline void
+netns_copy(struct netns *dst, const struct netns *src)
+{
+    if (src->state == NETNS_LOCAL || src->state == NETNS_REMOTE) {
+        *dst = *src;
+    }
+}
+
+static inline bool
+netns_eq(const struct netns *a, const struct netns *b)
+{
+    if (a->state == NETNS_LOCAL && b->state == NETNS_LOCAL) {
+        return true;
+    }
+
+    if (a->state == NETNS_REMOTE && b->state == NETNS_REMOTE &&
+        a->id == b->id) {
+        return true;
+    }
+
+    return false;
+}
+
+#endif
diff --git a/utilities/nlmon.c b/utilities/nlmon.c
index b91fa09b3..d38a70b6f 100644
--- a/utilities/nlmon.c
+++ b/utilities/nlmon.c
@@ -59,7 +59,7 @@  main(int argc OVS_UNUSED, char *argv[])
 
     ofpbuf_use_stub(&buf, buf_stub, sizeof buf_stub);
     for (;;) {
-        error = nl_sock_recv(sock, &buf, false);
+        error = nl_sock_recv(sock, &buf, NULL, false);
         if (error == EAGAIN) {
             /* Nothing to do. */
         } else if (error == ENOBUFS) {