diff mbox

Network namespace bugs in L2TP

Message ID 20121212155105.GB2790@raven
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Tom Parkin Dec. 12, 2012, 3:51 p.m. UTC
Hi Eric,

I'm following up on this thread from later October in which you
pointed out some network namespace bugs in L2TP:

http://www.spinics.net/lists/netdev/msg214776.html

I use L2TP, and I'd like to help fix these bugs.  But I'm not very
conversant with network namespaces, and so I'm struggling to fully
appreciate the issues you pointed out previously.  Could you give me a
hand getting to grips with this?

So far I've tested L2TP within network namespaces, using both iproute2
to create sessions between two namespaces on the same host, and an
L2TP daemon running in a namespace to create sessions between two
hosts.  In both cases I've done a bit of trivial ping and iperf
testing using Ethernet pseudowires.

To make this work I've had to add a couple of trivial patches (see
below).

There are two things I'm uncertain about:

 1. Why do we need to change the namespace of the socket created in
    l2tp_tunnel_sock_create?  So far as I can tell, sock_create
    defaults to the namespace of the calling process.  Is the issue
    here that this code may run from a work queue or similar?

 2. You mentioned the need to keep track of sockets allocated within a
    namespace in order to be able to clean them up when the namespace
    is deleted.  Should we be keeping a list of sockets we create and
    then destroying them in the namespace pernet_ops exit function?

Thanks,
Tom

From b9c095fdf32c895b79a5954020c4745fe5518141 Mon Sep 17 00:00:00 2001
From: Tom Parkin <tparkin@katalix.com>
Date: Tue, 11 Dec 2012 13:03:48 +0000
Subject: [PATCH 1/2] l2tp: set netnsok flag for netlink messages

The L2TP netlink code can run in namespaces.  Set the netnsok flag in
genl_family to true to reflect that fact.
---
 net/l2tp/l2tp_netlink.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Eric W. Biederman Dec. 12, 2012, 7:44 p.m. UTC | #1
Tom Parkin <tparkin@katalix.com> writes:

> Hi Eric,
>
> I'm following up on this thread from later October in which you
> pointed out some network namespace bugs in L2TP:
>
> http://www.spinics.net/lists/netdev/msg214776.html
>
> I use L2TP, and I'd like to help fix these bugs.  But I'm not very
> conversant with network namespaces, and so I'm struggling to fully
> appreciate the issues you pointed out previously.  Could you give me a
> hand getting to grips with this?
>
> So far I've tested L2TP within network namespaces, using both iproute2
> to create sessions between two namespaces on the same host, and an
> L2TP daemon running in a namespace to create sessions between two
> hosts.  In both cases I've done a bit of trivial ping and iperf
> testing using Ethernet pseudowires.
>
> To make this work I've had to add a couple of trivial patches (see
> below).
>
> There are two things I'm uncertain about:
>
>  1. Why do we need to change the namespace of the socket created in
>     l2tp_tunnel_sock_create?  So far as I can tell, sock_create
>     defaults to the namespace of the calling process.  Is the issue
>     here that this code may run from a work queue or similar?

Something similar.  At the very least l2tp_tunnel_create which calls
l2tp_tunnel_sock_create gets called from netlink.  The network namespace
of a socket is not necessarily the same as the network namespace of the
process that uses that socket.

So since current is not necessarily the right network namespace we need
push the desired network namespace of the socket down into
l2tp_tunnel_sock_create and use that when creating the socket.

>  2. You mentioned the need to keep track of sockets allocated within a
>     namespace in order to be able to clean them up when the namespace
>     is deleted.  Should we be keeping a list of sockets we create and
>     then destroying them in the namespace pernet_ops exit function?

I think the issue that I was referring to and certainly the issue I am
thinking about is the issue where normal sockets hold a reference to a
network namespace and keep the network namespace alive.  Today l2tp uses
sock_create when creating a socket, and as such I think it pins it
current network namespace.  So I believe we can effectively have a
reference counting loop with l2tp sockets pinning the network namespace
and the network namespace keeping the l2tp device alive which keeps the
l2tp socket alive.

I don't remeber the specifics of l2tp as it creates some sockets, and
has other sockets passed in, and as such has rules that are not at all
normal.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Parkin Dec. 13, 2012, 4:56 p.m. UTC | #2
On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
> >  1. Why do we need to change the namespace of the socket created in
> >     l2tp_tunnel_sock_create?  So far as I can tell, sock_create
> >     defaults to the namespace of the calling process.  Is the issue
> >     here that this code may run from a work queue or similar?
> 
> Something similar.  At the very least l2tp_tunnel_create which calls
> l2tp_tunnel_sock_create gets called from netlink.  The network namespace
> of a socket is not necessarily the same as the network namespace of the
> process that uses that socket.
> 
> So since current is not necessarily the right network namespace we need
> push the desired network namespace of the socket down into
> l2tp_tunnel_sock_create and use that when creating the socket.

Ah, I see.  I hadn't appreciated that a process might swap between
namespaces.

I think that raises a question in the case of the L2TP tunnel sockets,
though.  Currently l2tp_tunnel_sock_create uses the namespace of the
current process for the socket.  The alternative is to pass in the
desired namespace from l2tp_tunnel_create -- and this makes sense, I
think.

However, when l2tp_tunnel_create is called from the netlink code, the
namespace passed is that of the netlink socket.  At the risk of sounding
silly, what's the benefit of using the netlink socket namespace over the
process namespace in this case?

> >  2. You mentioned the need to keep track of sockets allocated within a
> >     namespace in order to be able to clean them up when the namespace
> >     is deleted.  Should we be keeping a list of sockets we create and
> >     then destroying them in the namespace pernet_ops exit function?
> 
> I think the issue that I was referring to and certainly the issue I am
> thinking about is the issue where normal sockets hold a reference to a
> network namespace and keep the network namespace alive.  Today l2tp uses
> sock_create when creating a socket, and as such I think it pins it
> current network namespace.  So I believe we can effectively have a
> reference counting loop with l2tp sockets pinning the network namespace
> and the network namespace keeping the l2tp device alive which keeps the
> l2tp socket alive.

OK, so presumably the way this would usually work is that a process
creates sockets, and when the process exits those sockets go away.
When all the processes in the namespace have exited, the namespace
can close because there are no sockets holding it open.  Is that
right?

If that's correct, then I suppose the issue with the L2TP tunnel socket
for an unmanaged tunnel is that it isn't owned by a process, per-se.
So there's no obvious way to get rid of it, apart from sending a
netlink message to tell the kernel to tear it down.

But that doesn't seem too unreasonable.  A user would have to take
explicit action to create an L2TP tunnel socket, and it might seem
reasonable for that socket to keep the namespace alive until the user
explicitly tears it down again.

> I don't remeber the specifics of l2tp as it creates some sockets, and
> has other sockets passed in, and as such has rules that are not at all
> normal.

Ack.  Sockets are created in the kernel code for "unmanaged" tunnels,
which don't run the control protocol over the top -- they're just for
data encapsulation/de-encapsulation.  "Managed" tunnels have a
userspace process looking after all the L2TP configuration and
control/keepalive protocol, and in this case the daemon handles the
creation of the tunnel socket.

Thanks,
Tom
Eric W. Biederman Dec. 13, 2012, 7:31 p.m. UTC | #3
Tom Parkin <tparkin@katalix.com> writes:

> On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
>> Tom Parkin <tparkin@katalix.com> writes:
>> >  1. Why do we need to change the namespace of the socket created in
>> >     l2tp_tunnel_sock_create?  So far as I can tell, sock_create
>> >     defaults to the namespace of the calling process.  Is the issue
>> >     here that this code may run from a work queue or similar?
>> 
>> Something similar.  At the very least l2tp_tunnel_create which calls
>> l2tp_tunnel_sock_create gets called from netlink.  The network namespace
>> of a socket is not necessarily the same as the network namespace of the
>> process that uses that socket.
>> 
>> So since current is not necessarily the right network namespace we need
>> push the desired network namespace of the socket down into
>> l2tp_tunnel_sock_create and use that when creating the socket.
>
> Ah, I see.  I hadn't appreciated that a process might swap between
> namespaces.
>
> I think that raises a question in the case of the L2TP tunnel sockets,
> though.  Currently l2tp_tunnel_sock_create uses the namespace of the
> current process for the socket.  The alternative is to pass in the
> desired namespace from l2tp_tunnel_create -- and this makes sense, I
> think.
>
> However, when l2tp_tunnel_create is called from the netlink code, the
> namespace passed is that of the netlink socket.  At the risk of sounding
> silly, what's the benefit of using the netlink socket namespace over the
> process namespace in this case?

Using the netlink socket namespace ensure that if the netlink socket is
passed between processes the semantics of sending messages down the
netlink socket don't change.

There is another thread on netdev discussing another variant of this
right now.  For some cases it is just a waste of resources to have one
copy of a daemon per network namespace.  In which case a controlling
daemon will open one netlink socket per network namespace and send
commands down the appropriate socket for the network namespace the
daemon wishes to control.

>> >  2. You mentioned the need to keep track of sockets allocated within a
>> >     namespace in order to be able to clean them up when the namespace
>> >     is deleted.  Should we be keeping a list of sockets we create and
>> >     then destroying them in the namespace pernet_ops exit function?
>> 
>> I think the issue that I was referring to and certainly the issue I am
>> thinking about is the issue where normal sockets hold a reference to a
>> network namespace and keep the network namespace alive.  Today l2tp uses
>> sock_create when creating a socket, and as such I think it pins it
>> current network namespace.  So I believe we can effectively have a
>> reference counting loop with l2tp sockets pinning the network namespace
>> and the network namespace keeping the l2tp device alive which keeps the
>> l2tp socket alive.
>
> OK, so presumably the way this would usually work is that a process
> creates sockets, and when the process exits those sockets go away.
> When all the processes in the namespace have exited, the namespace
> can close because there are no sockets holding it open.  Is that
> right?
>
> If that's correct, then I suppose the issue with the L2TP tunnel socket
> for an unmanaged tunnel is that it isn't owned by a process, per-se.
> So there's no obvious way to get rid of it, apart from sending a
> netlink message to tell the kernel to tear it down.
>
> But that doesn't seem too unreasonable.  A user would have to take
> explicit action to create an L2TP tunnel socket, and it might seem
> reasonable for that socket to keep the namespace alive until the user
> explicitly tears it down again.

Sending a netlink message to tear down the socket is not unreasonable.

Having a reference counting loop such that it is possible to close all
other sockets and all other references to a network namespace and not
have the network namespace go away because the L2TP tunnel socket holds
a reference to the unreachable and unuusable network namespace is
unreasonable.

We handle this with arp and icmp control sockets by not creating a
reference count.  And having a pernet cleanup routing clean up those
sockets.  Assuming I am right about the reference counting loop being
possible this is something to look at.

>> I don't remeber the specifics of l2tp as it creates some sockets, and
>> has other sockets passed in, and as such has rules that are not at all
>> normal.
>
> Ack.  Sockets are created in the kernel code for "unmanaged" tunnels,
> which don't run the control protocol over the top -- they're just for
> data encapsulation/de-encapsulation.  "Managed" tunnels have a
> userspace process looking after all the L2TP configuration and
> control/keepalive protocol, and in this case the daemon handles the
> creation of the tunnel socket.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Parkin Dec. 20, 2012, 1:52 p.m. UTC | #4
Hi Eric,

On Thu, Dec 13, 2012 at 11:31:12AM -0800, Eric W. Biederman wrote:
> Tom Parkin <tparkin@katalix.com> writes:
> 
> > On Wed, Dec 12, 2012 at 11:44:36AM -0800, Eric W. Biederman wrote:
> >> Tom Parkin <tparkin@katalix.com> writes:
> > I think that raises a question in the case of the L2TP tunnel sockets,
> > though.  Currently l2tp_tunnel_sock_create uses the namespace of the
> > current process for the socket.  The alternative is to pass in the
> > desired namespace from l2tp_tunnel_create -- and this makes sense, I
> > think.
> >
> > However, when l2tp_tunnel_create is called from the netlink code, the
> > namespace passed is that of the netlink socket.  At the risk of sounding
> > silly, what's the benefit of using the netlink socket namespace over the
> > process namespace in this case?
> 
> Using the netlink socket namespace ensure that if the netlink socket is
> passed between processes the semantics of sending messages down the
> netlink socket don't change.
> 
> There is another thread on netdev discussing another variant of this
> right now.  For some cases it is just a waste of resources to have one
> copy of a daemon per network namespace.  In which case a controlling
> daemon will open one netlink socket per network namespace and send
> commands down the appropriate socket for the network namespace the
> daemon wishes to control.

Yes, I saw that other thread.  Thanks for the clarification on this
point.

> > But that doesn't seem too unreasonable.  A user would have to take
> > explicit action to create an L2TP tunnel socket, and it might seem
> > reasonable for that socket to keep the namespace alive until the user
> > explicitly tears it down again.
> 
> Sending a netlink message to tear down the socket is not unreasonable.
> 
> Having a reference counting loop such that it is possible to close all
> other sockets and all other references to a network namespace and not
> have the network namespace go away because the L2TP tunnel socket holds
> a reference to the unreachable and unuusable network namespace is
> unreasonable.
> 
> We handle this with arp and icmp control sockets by not creating a
> reference count.  And having a pernet cleanup routing clean up those
> sockets.  Assuming I am right about the reference counting loop being
> possible this is something to look at.

Yep, OK.  I hadn't appreciated the namespace could become inaccessible!

I've done some digging and I believe there is an issue with the
reference counting for the unmanaged tunnel sockets -- certainly I am
able to leak netns resources here.

I've been working on a patchset which I hope will address these issues
in l2tp_core.  I'm stress testing it now and hope to post to netdev
soon for review.

Thanks again for your help.

Tom
diff mbox

Patch

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index bbba3a1..c1bab22 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -37,6 +37,7 @@  static struct genl_family l2tp_nl_family = {
 	.version	= L2TP_GENL_VERSION,
 	.hdrsize	= 0,
 	.maxattr	= L2TP_ATTR_MAX,
+	.netnsok	= true,
 };
 
 /* Accessed under genl lock */
-- 
1.7.9.5

From 13e9b0ddc48a16b384ffbf5ff64e6413cfa612f5 Mon Sep 17 00:00:00 2001
From: Tom Parkin <tparkin@katalix.com>
Date: Wed, 12 Dec 2012 12:50:54 +0000
Subject: [PATCH 2/2] l2tp: prevent tunnel creation on netns mismatch

l2tp_tunnel_create is passed a pointer to the network namespace for the
tunnel, along with an optional file descriptor for the tunnel which may
be passed in from userspace via. netlink.

In the case where the file descriptor is defined, ensure that the namespace
associated with that socket matches the namespace explicitly passed to
l2tp_tunnel_create.
---
 net/l2tp/l2tp_core.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 1a9f372..f8d200b 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1528,6 +1528,13 @@  int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id, u32
 			       tunnel_id, fd, err);
 			goto err;
 		}
+
+		/* Reject namespace mismatches */
+		if (!net_eq(sock_net(sock->sk), net)) {
+			pr_err("tunl %hu: netns mismatch\n", tunnel_id);
+			err = -EBADF; /* TODO -- what value? */
+			goto err;
+		}
 	}
 
 	sk = sock->sk;