(linux) opensock: AF_NETLINK is a better choice than AF_UNIX
diff mbox

Message ID 20170822142915.5yokngypphdb4mv4@cs.unibo.it
State New
Headers show

Commit Message

Renzo Davoli Aug. 22, 2017, 2:29 p.m. UTC
In sysdeps/unix/sysv/linux/opensock.c the function __opensock
opens a "generic" socket.
This function tries to open sockets of different families and returns
as soon as it succeeds to open a socket. It remembers the kind of socket
which succeeded to avoid the scan in subsequent calls.

The sequence of families for the attempts are:
AF_UNIX, AF_INET, AF_INET6, etc.

This function is used in:
./sysdeps/unix/sysv/linux/if_index.c
./sysdeps/unix/sysv/linux/ifreq.c (for getifaddrs)

All the calling functions are related to the TCP-IP (v4 or v6) stack.
It is conceptually inconsistent to use AF_UNIX sockets (although
the Linux kernel accepts any kind of sockets).
AF_UNIX has nothing to do with networking: it is just an InterProcess 
Communication means.

This choice makes harder to implement system call partial virtualization 
https://sourceforge.net/projects/view-os/

User-space TCP-IP stacks as umnet modules virtualize the address families
AF_INET, AF_INET6 and AF_NETLINK (iproute for net configuration).
From the practical point of view using (for example) a umnetlwipv6
stack, /bin/ip does not work while the busybox implementation of ip
succeeds (it uses a AF_INET socket for SIOCGIFINDEX).
$ ip addr add 10.0.0.1/24 dev vd0
Cannot find device "vd0"
$ busybox ip addr add 10.0.0.1/24 dev vd0
$ 

Partial virtualization using the current glibc code is harder:
all the ioctl requests should be captured and virtualized if the fd 
is a socket and the request # is in a specific range.
It is not possible to decide whether a "socket(AF_UNIX..)" system call 
returns a fd that needs virtualization or not.
If it gets connected to "/tmp/.X11-unix/X0" (for example) it is not
virtualized by the net module while if it is used in a 
SIOCG* ioctl call it must be virtualized.

The patch here attached changes the sequence of socket for opensock's
attempts to:
AF_NETLINK, AF_INET, AF_INET6, AF_UNIX, etc.

Netlink is more appropriate for ioctl calls like "SIOGCIFINDEX".
I suppose AF_UNIX was chosen as its data structures are lighter than
those of AF_INET(6). AF_NETLINK should be as light as AF_UNIX
and it would make the life easier for partial virtualization.
A sequence like AF_NETLINK, AF_UNIX, AF_INET, AF_INET6,...
would fit anyway.

Looking around in other libC implementations
(paths are relative to their source trees):

* NetBSD uses AF_INET see:
http://bxr.su/DragonFly/lib/libc/net/if_nametoindex.c 

* dietlibc uses AF_INET6 see: lib/if_nametoindex.c

* ulibc: uses AF_INET6 or AF_INET if AF_INET6 fails:
  see: libc/inet/opensock.c

* newlibc: uses AF_INET in libc/sys/linux/net/ifname.c
  and a __opensock similar to the one in glibc in
	libc/sys/linux/net/ifreq.c

* eglibc seems to share the same opensock code of glibc.

Thank you
	renzo

2018-08-20 Renzo Davoli <renzo@cs.unibo.it>

Comments

Adhemerval Zanella Aug. 22, 2017, 7:21 p.m. UTC | #1
On 22/08/2017 11:29, Renzo Davoli wrote:
> In sysdeps/unix/sysv/linux/opensock.c the function __opensock
> opens a "generic" socket.
> This function tries to open sockets of different families and returns
> as soon as it succeeds to open a socket. It remembers the kind of socket
> which succeeded to avoid the scan in subsequent calls.
> 
> The sequence of families for the attempts are:
> AF_UNIX, AF_INET, AF_INET6, etc.
> 
> This function is used in:
> ./sysdeps/unix/sysv/linux/if_index.c
> ./sysdeps/unix/sysv/linux/ifreq.c (for getifaddrs)
> 
> All the calling functions are related to the TCP-IP (v4 or v6) stack.
> It is conceptually inconsistent to use AF_UNIX sockets (although
> the Linux kernel accepts any kind of sockets).
> AF_UNIX has nothing to do with networking: it is just an InterProcess 
> Communication means.
> 
> This choice makes harder to implement system call partial virtualization 
> https://sourceforge.net/projects/view-os/
> 
> User-space TCP-IP stacks as umnet modules virtualize the address families
> AF_INET, AF_INET6 and AF_NETLINK (iproute for net configuration).
> From the practical point of view using (for example) a umnetlwipv6
> stack, /bin/ip does not work while the busybox implementation of ip
> succeeds (it uses a AF_INET socket for SIOCGIFINDEX).
> $ ip addr add 10.0.0.1/24 dev vd0
> Cannot find device "vd0"
> $ busybox ip addr add 10.0.0.1/24 dev vd0
> $ 
> 
> Partial virtualization using the current glibc code is harder:
> all the ioctl requests should be captured and virtualized if the fd 
> is a socket and the request # is in a specific range.
> It is not possible to decide whether a "socket(AF_UNIX..)" system call 
> returns a fd that needs virtualization or not.
> If it gets connected to "/tmp/.X11-unix/X0" (for example) it is not
> virtualized by the net module while if it is used in a 
> SIOCG* ioctl call it must be virtualized.
> 
> The patch here attached changes the sequence of socket for opensock's
> attempts to:
> AF_NETLINK, AF_INET, AF_INET6, AF_UNIX, etc.
> 
> Netlink is more appropriate for ioctl calls like "SIOGCIFINDEX".
> I suppose AF_UNIX was chosen as its data structures are lighter than
> those of AF_INET(6). AF_NETLINK should be as light as AF_UNIX
> and it would make the life easier for partial virtualization.
> A sequence like AF_NETLINK, AF_UNIX, AF_INET, AF_INET6,...
> would fit anyway.
> 
> Looking around in other libC implementations
> (paths are relative to their source trees):
> 
> * NetBSD uses AF_INET see:
> http://bxr.su/DragonFly/lib/libc/net/if_nametoindex.c 
> 
> * dietlibc uses AF_INET6 see: lib/if_nametoindex.c
> 
> * ulibc: uses AF_INET6 or AF_INET if AF_INET6 fails:
>   see: libc/inet/opensock.c
> 
> * newlibc: uses AF_INET in libc/sys/linux/net/ifname.c
>   and a __opensock similar to the one in glibc in
> 	libc/sys/linux/net/ifreq.c
> 
> * eglibc seems to share the same opensock code of glibc.
> 
> Thank you
> 	renzo

I am in favour in using AF_NETLINK for getifaddrs, but not really due
the constraints you exposed from your application.  IMHO it is non
issue for glibc since this module seems to cross application and kernel
API boundaries (since using SIOCGIFINDEX with AF_INET is fully supported
on current Linux versions).

Also, all the current possible advantages of AF_NETLINK are not applicable
on current GLIBC usage: the kernel information using ioctl (and thus is 
synchronous), it does not use multicast, and it uses unnamed AF_UNIX (so no
credential leak when using a named file). The only advantage, which I 
am not sure if it really true, is some link advocates that AF_NETLINK
has less overhead than UDP sockets.

Anyway, this patch also has some serious issues you will need to check
out before possible inclusion:

  * It requires a proper ChangeLog entry [1]

  * You need to actually check if this changes does not generate any
    regression by at least running a make check and if is the case
    also providing a proper testcase.  I am seeing this failures with
    your patch:

    FAIL: inet/bug-if1
    FAIL: inet/test_ifindex
    FAIL: inet/tst-inet6_scopeid_pton

    So you will need to first check why these tests are failing and
    either fixing them with a proper rationale or adjust your patch.

  * IMHO the application constraint for this modification is not a
    strong one, as I said is cross application API boundaries and
    usually this is not a good reason to change internal glibc behaviour.
    I can't really see any real good reason to change it, but I will
    also won't oppose if the new change do not add regressions. So
    in future submissions you can say it helps your application to
    make easier to syscall handling, but also states that the change
    should not make any substantial advantage.

[1] https://sourceware.org/glibc/wiki/Contribution%20checklist#Properly_Formatted_GNU_ChangeLog


> 
> 2018-08-20 Renzo Davoli <renzo@cs.unibo.it>
> 
> diff --git a/sysdeps/unix/sysv/linux/opensock.c b/sysdeps/unix/sysv/linux/opensock.c
> index 21508cdc75..236c49a42f 100644
> --- a/sysdeps/unix/sysv/linux/opensock.c
> +++ b/sysdeps/unix/sysv/linux/opensock.c
> @@ -35,9 +35,10 @@ __opensock (void)
>      const char procname[15];
>    } afs[] =
>      {
> -      { AF_UNIX, "net/unix" },
> +      { AF_NETLINK, "net/netlink" },
>        { AF_INET, "" },
>        { AF_INET6, "net/if_inet6" },
> +      { AF_UNIX, "net/unix" },
>        { AF_AX25, "net/ax25" },
>        { AF_NETROM, "net/nr" },
>        { AF_ROSE, "net/rose" },
>

Patch
diff mbox

diff --git a/sysdeps/unix/sysv/linux/opensock.c b/sysdeps/unix/sysv/linux/opensock.c
index 21508cdc75..236c49a42f 100644
--- a/sysdeps/unix/sysv/linux/opensock.c
+++ b/sysdeps/unix/sysv/linux/opensock.c
@@ -35,9 +35,10 @@  __opensock (void)
     const char procname[15];
   } afs[] =
     {
-      { AF_UNIX, "net/unix" },
+      { AF_NETLINK, "net/netlink" },
       { AF_INET, "" },
       { AF_INET6, "net/if_inet6" },
+      { AF_UNIX, "net/unix" },
       { AF_AX25, "net/ax25" },
       { AF_NETROM, "net/nr" },
       { AF_ROSE, "net/rose" },