diff mbox

[v3,10/12] l2tp: Add L2TP ethernet pseudowire support

Message ID 20100330161819.9628.10853.stgit@bert.katalix.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

James Chapman March 30, 2010, 4:18 p.m. UTC
This driver presents a regular net_device for each L2TP ethernet
pseudowire instance. These interfaces are named l2tpethN by default,
though userspace can specify an alternative name when the L2TP
session is created, if preferred. When the pseudowire is established,
regular Linux networking utilities may be used to configure the
interface, i.e. give it IP address info or add it to a bridge. Any
data passed over the interface is carried over an L2TP tunnel.

Signed-off-by: James Chapman <jchapman@katalix.com>
Reviewed-by: Randy Dunlap <randy.dunlap@oracle.com>
---
 net/l2tp/Kconfig    |   24 +++
 net/l2tp/Makefile   |    1 
 net/l2tp/l2tp_eth.c |  371 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 396 insertions(+), 0 deletions(-)
 create mode 100644 net/l2tp/l2tp_eth.c


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

stephen hemminger March 30, 2010, 4:30 p.m. UTC | #1
On Tue, 30 Mar 2010 17:18:19 +0100
James Chapman <jchapman@katalix.com> wrote:

> static inline struct l2tp_eth_net *l2tp_eth_pernet(struct net *net)
> +{
> +	BUG_ON(!net);
> +
> +	return net_generic(net, l2tp_eth_net_id);
> +}

Don't test for NULL with BUG_ON. It is redundant, the null
dereference will cause the same kind of backtrace failure.
stephen hemminger March 30, 2010, 4:30 p.m. UTC | #2
On Tue, 30 Mar 2010 17:18:19 +0100
James Chapman <jchapman@katalix.com> wrote:

> +
> +	/* Derive a MAC address for the new interface. We use the L2TP
> +	 * session's session-id to guarantee a system-wide unique
> +	 * address. This MAC is only visible within the L2TP session.
> +	 */
> +	dev->dev_addr[0] = 0x02; /* IEEE 802 local */
> +	dev->dev_addr[1] = 'L';
> +	memcpy(&dev->dev_addr[2], &sid, 4);

Why not random ether address?
stephen hemminger March 30, 2010, 4:32 p.m. UTC | #3
On Tue, 30 Mar 2010 17:18:19 +0100
James Chapman <jchapman@katalix.com> wrote:

> +struct l2tp_eth_net {
> +	struct list_head l2tp_eth_dev_list;
> +	rwlock_t l2tp_eth_lock;
> +};

Reader/write locks are discouraged because they are slower than
spin locks.  If you have lots of readers use RCU, if reading
is infrequent just use a spin lock.
James Chapman March 31, 2010, 9:35 a.m. UTC | #4
Stephen Hemminger wrote:
> On Tue, 30 Mar 2010 17:18:19 +0100
> James Chapman <jchapman@katalix.com> wrote:
> 
>> +struct l2tp_eth_net {
>> +	struct list_head l2tp_eth_dev_list;
>> +	rwlock_t l2tp_eth_lock;
>> +};
> 
> Reader/write locks are discouraged because they are slower than
> spin locks.  If you have lots of readers use RCU, if reading
> is infrequent just use a spin lock.

Ok. In doing the conversion of the rwlocks in l2tp_core.c, I'm finding
that some list access primitives don't have rcu equivalents, namely
list_is_last(), list_for_each_entry_safe(). Is this intentional? Should
I add the missing ones in a separate patch?
Paul E. McKenney March 31, 2010, 5:38 p.m. UTC | #5
On Wed, Mar 31, 2010 at 10:35:29AM +0100, James Chapman wrote:
> Stephen Hemminger wrote:
> > On Tue, 30 Mar 2010 17:18:19 +0100
> > James Chapman <jchapman@katalix.com> wrote:
> > 
> >> +struct l2tp_eth_net {
> >> +	struct list_head l2tp_eth_dev_list;
> >> +	rwlock_t l2tp_eth_lock;
> >> +};
> > 
> > Reader/write locks are discouraged because they are slower than
> > spin locks.  If you have lots of readers use RCU, if reading
> > is infrequent just use a spin lock.
> 
> Ok. In doing the conversion of the rwlocks in l2tp_core.c, I'm finding
> that some list access primitives don't have rcu equivalents, namely
> list_is_last(), list_for_each_entry_safe(). Is this intentional? Should
> I add the missing ones in a separate patch?

The list_is_last() is RCU-safe already, since the ->next pointer is
only compared, never dereferenced.  I suggest adding a comment to its
header stating that it is RCU-safe.

Feel free to create a list_for_each_entry_safe_rcu(), and I will be
happy to review it.

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney April 2, 2010, 3:24 p.m. UTC | #6
On Wed, Mar 31, 2010 at 10:38:36AM -0700, Paul E. McKenney wrote:
> On Wed, Mar 31, 2010 at 10:35:29AM +0100, James Chapman wrote:
> > Stephen Hemminger wrote:
> > > On Tue, 30 Mar 2010 17:18:19 +0100
> > > James Chapman <jchapman@katalix.com> wrote:
> > > 
> > >> +struct l2tp_eth_net {
> > >> +	struct list_head l2tp_eth_dev_list;
> > >> +	rwlock_t l2tp_eth_lock;
> > >> +};
> > > 
> > > Reader/write locks are discouraged because they are slower than
> > > spin locks.  If you have lots of readers use RCU, if reading
> > > is infrequent just use a spin lock.
> > 
> > Ok. In doing the conversion of the rwlocks in l2tp_core.c, I'm finding
> > that some list access primitives don't have rcu equivalents, namely
> > list_is_last(), list_for_each_entry_safe(). Is this intentional? Should
> > I add the missing ones in a separate patch?
> 
> The list_is_last() is RCU-safe already, since the ->next pointer is
> only compared, never dereferenced.  I suggest adding a comment to its
> header stating that it is RCU-safe.
> 
> Feel free to create a list_for_each_entry_safe_rcu(), and I will be
> happy to review it.

Right.  When using RCU, list_for_each_entry_safe_rcu() is not needed,
because destruction of the element you removed doesn't happen until the
end of a subsequent grace period.  By which time you will be done with
your RCU read-side critical section.

So you can just use list_for_each_entry_rcu(), you do not need a new
list_for_each_entry_safe_rcu().

But you did have me going for a bit!!!  ;-)

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet April 2, 2010, 3:58 p.m. UTC | #7
Le vendredi 02 avril 2010 à 08:24 -0700, Paul E. McKenney a écrit :
> On Wed, Mar 31, 2010 at 10:38:36AM -0700, Paul E. McKenney wrote:

> > The list_is_last() is RCU-safe already, since the ->next pointer is
> > only compared, never dereferenced.  I suggest adding a comment to its
> > header stating that it is RCU-safe.
> > 
> > Feel free to create a list_for_each_entry_safe_rcu(), and I will be
> > happy to review it.
> 
> Right.  When using RCU, list_for_each_entry_safe_rcu() is not needed,
> because destruction of the element you removed doesn't happen until the
> end of a subsequent grace period.  By which time you will be done with
> your RCU read-side critical section.
> 
> So you can just use list_for_each_entry_rcu(), you do not need a new
> list_for_each_entry_safe_rcu().
> 
> But you did have me going for a bit!!!  ;-)

Just wondering Paul if you need to rest :)

list_for_each_entry_safe_rcu() is not needed not because of various (and
very intersting) reasons you gave !!!

But because list_for_each_entry_safe() is OK, since if we are in a
destroy phase, we are a writer, and can use regular list primitive.

Remember _rcu() versions of list iterators are for readers, and readers
are NOT supposed to delete an item from the list !

Eric



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul E. McKenney April 2, 2010, 4:21 p.m. UTC | #8
On Fri, Apr 02, 2010 at 05:58:51PM +0200, Eric Dumazet wrote:
> Le vendredi 02 avril 2010 à 08:24 -0700, Paul E. McKenney a écrit :
> > On Wed, Mar 31, 2010 at 10:38:36AM -0700, Paul E. McKenney wrote:
> 
> > > The list_is_last() is RCU-safe already, since the ->next pointer is
> > > only compared, never dereferenced.  I suggest adding a comment to its
> > > header stating that it is RCU-safe.
> > > 
> > > Feel free to create a list_for_each_entry_safe_rcu(), and I will be
> > > happy to review it.
> > 
> > Right.  When using RCU, list_for_each_entry_safe_rcu() is not needed,
> > because destruction of the element you removed doesn't happen until the
> > end of a subsequent grace period.  By which time you will be done with
> > your RCU read-side critical section.
> > 
> > So you can just use list_for_each_entry_rcu(), you do not need a new
> > list_for_each_entry_safe_rcu().
> > 
> > But you did have me going for a bit!!!  ;-)
> 
> Just wondering Paul if you need to rest :)

In this case, I most certainly needed to have engaged my brain more
fully before hitting "send".  :-/

> list_for_each_entry_safe_rcu() is not needed not because of various (and
> very intersting) reasons you gave !!!
> 
> But because list_for_each_entry_safe() is OK, since if we are in a
> destroy phase, we are a writer, and can use regular list primitive.
> 
> Remember _rcu() versions of list iterators are for readers, and readers
> are NOT supposed to delete an item from the list !

Yes, RCU readers had better have acquired the update-side lock before
modifying the list.  And then there would be some debate as to whether
they were still readers.  For example, while reading one list under RCU
protection, you might be modifying a subordinate list while holding the
corresponding lock.

And in the networking code, you would also need to convince the usual
suspects, including Eric, that upgrading to writer in the middle of an
RCU read-side critical section was the right thing to be doing in the
first place!

							Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/l2tp/Kconfig b/net/l2tp/Kconfig
index 0a11ccf..a292270 100644
--- a/net/l2tp/Kconfig
+++ b/net/l2tp/Kconfig
@@ -68,3 +68,27 @@  config L2TP_IP
 
 	  To compile this driver as a module, choose M here. The module
 	  will be called l2tp_ip.
+
+config L2TP_ETH
+	tristate "L2TP ethernet pseudowire support for L2TPv3"
+	depends on L2TP_V3
+	help
+	  Support for carrying raw ethernet frames over L2TPv3.
+
+	  From RFC 4719 <http://www.ietf.org/rfc/rfc4719.txt>.
+
+	  The Layer 2 Tunneling Protocol, Version 3 (L2TPv3) can be
+	  used as a control protocol and for data encapsulation to set
+	  up Pseudowires for transporting layer 2 Packet Data Units
+	  across an IP network [RFC3931].
+
+	  This driver provides an ethernet virtual interface for each
+	  L2TP ethernet pseudowire instance. Standard Linux tools may
+	  be used to assign an IP address to the local virtual
+	  interface, or add the interface to a bridge.
+
+	  If you are using L2TPv3, you will almost certainly want to
+	  enable this option.
+
+	  To compile this driver as a module, choose M here. The module
+	  will be called l2tp_eth.
diff --git a/net/l2tp/Makefile b/net/l2tp/Makefile
index 2c4a14b..bddbf04 100644
--- a/net/l2tp/Makefile
+++ b/net/l2tp/Makefile
@@ -8,3 +8,4 @@  obj-$(CONFIG_L2TP) += l2tp_core.o
 obj-$(subst y,$(CONFIG_L2TP),$(CONFIG_PPPOL2TP)) += l2tp_ppp.o
 obj-$(subst y,$(CONFIG_L2TP),$(CONFIG_L2TP_IP)) += l2tp_ip.o
 obj-$(subst y,$(CONFIG_L2TP),$(CONFIG_L2TP_V3)) += l2tp_netlink.o
+obj-$(subst y,$(CONFIG_L2TP),$(CONFIG_L2TP_ETH)) += l2tp_eth.o
diff --git a/net/l2tp/l2tp_eth.c b/net/l2tp/l2tp_eth.c
new file mode 100644
index 0000000..3c4f254
--- /dev/null
+++ b/net/l2tp/l2tp_eth.c
@@ -0,0 +1,371 @@ 
+/*
+ * L2TPv3 ethernet pseudowire driver
+ *
+ * Copyright (c) 2008,2009,2010 Katalix Systems Ltd
+ *
+ *	This program is free software; you can redistribute it and/or
+ *	modify it under the terms of the GNU General Public License
+ *	as published by the Free Software Foundation; either version
+ *	2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/skbuff.h>
+#include <linux/socket.h>
+#include <linux/hash.h>
+#include <linux/l2tp.h>
+#include <linux/in.h>
+#include <linux/etherdevice.h>
+#include <net/sock.h>
+#include <net/ip.h>
+#include <net/icmp.h>
+#include <net/udp.h>
+#include <net/inet_common.h>
+#include <net/inet_hashtables.h>
+#include <net/tcp_states.h>
+#include <net/protocol.h>
+#include <net/xfrm.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+
+#include "l2tp_core.h"
+
+/* Default device name. May be overridden by name specified by user */
+#define L2TP_ETH_DEV_NAME	"l2tpeth%d"
+
+/* via netdev_priv() */
+struct l2tp_eth {
+	struct net_device	*dev;
+	struct sock		*tunnel_sock;
+	struct l2tp_session	*session;
+	struct list_head	list;
+};
+
+/* via l2tp_session_priv() */
+struct l2tp_eth_sess {
+	struct net_device	*dev;
+};
+
+/* per-net private data for this module */
+static unsigned int l2tp_eth_net_id;
+struct l2tp_eth_net {
+	struct list_head l2tp_eth_dev_list;
+	rwlock_t l2tp_eth_lock;
+};
+
+static inline struct l2tp_eth_net *l2tp_eth_pernet(struct net *net)
+{
+	BUG_ON(!net);
+
+	return net_generic(net, l2tp_eth_net_id);
+}
+
+static int l2tp_eth_dev_init(struct net_device *dev)
+{
+	struct l2tp_eth *priv = netdev_priv(dev);
+	struct l2tp_session *session = priv->session;
+	__be32 sid = htonl(session->session_id);
+
+	priv->dev = dev;
+
+	/* Derive a MAC address for the new interface. We use the L2TP
+	 * session's session-id to guarantee a system-wide unique
+	 * address. This MAC is only visible within the L2TP session.
+	 */
+	dev->dev_addr[0] = 0x02; /* IEEE 802 local */
+	dev->dev_addr[1] = 'L';
+	memcpy(&dev->dev_addr[2], &sid, 4);
+	memset(&dev->broadcast[0], 0xff, 6);
+
+	return 0;
+}
+
+static void l2tp_eth_dev_uninit(struct net_device *dev)
+{
+	struct l2tp_eth *priv = netdev_priv(dev);
+	struct l2tp_eth_net *pn = l2tp_eth_pernet(dev_net(dev));
+
+	write_lock(&pn->l2tp_eth_lock);
+	list_del_init(&priv->list);
+	write_unlock(&pn->l2tp_eth_lock);
+	dev_put(dev);
+}
+
+static int l2tp_eth_dev_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+	struct l2tp_eth *priv = netdev_priv(dev);
+	struct l2tp_session *session = priv->session;
+
+	l2tp_xmit_skb(session, skb, session->hdr_len);
+
+	dev->stats.tx_bytes += skb->len;
+	dev->stats.tx_packets++;
+
+	return 0;
+}
+
+static struct net_device_ops l2tp_eth_netdev_ops = {
+	.ndo_init		= l2tp_eth_dev_init,
+	.ndo_uninit		= l2tp_eth_dev_uninit,
+	.ndo_start_xmit		= l2tp_eth_dev_xmit,
+};
+
+static void l2tp_eth_dev_setup(struct net_device *dev)
+{
+	ether_setup(dev);
+
+	dev->netdev_ops		= &l2tp_eth_netdev_ops;
+	dev->destructor		= free_netdev;
+}
+
+static void l2tp_eth_dev_recv(struct l2tp_session *session, struct sk_buff *skb, int data_len)
+{
+	struct l2tp_eth_sess *spriv = l2tp_session_priv(session);
+	struct net_device *dev = spriv->dev;
+
+	if (session->debug & L2TP_MSG_DATA) {
+		unsigned int length;
+		int offset;
+		u8 *ptr = skb->data;
+
+		length = min(32u, skb->len);
+		if (!pskb_may_pull(skb, length))
+			goto error;
+
+		printk(KERN_DEBUG "%s: eth recv: ", session->name);
+
+		offset = 0;
+		do {
+			printk(" %02X", ptr[offset]);
+		} while (++offset < length);
+
+		printk("\n");
+	}
+
+	if (data_len < ETH_HLEN)
+		goto error;
+
+	secpath_reset(skb);
+
+	/* checksums verified by L2TP */
+	skb->ip_summed = CHECKSUM_NONE;
+
+	skb_dst_drop(skb);
+	nf_reset(skb);
+
+	if (dev_forward_skb(dev, skb) == NET_RX_SUCCESS) {
+		dev->last_rx = jiffies;
+		dev->stats.rx_packets++;
+		dev->stats.rx_bytes += data_len;
+	} else
+		dev->stats.rx_errors++;
+
+	return;
+
+error:
+	dev->stats.rx_errors++;
+	kfree_skb(skb);
+}
+
+static void l2tp_eth_delete(struct l2tp_session *session)
+{
+	struct l2tp_eth_sess *spriv;
+	struct net_device *dev;
+
+	if (session) {
+		spriv = l2tp_session_priv(session);
+		dev = spriv->dev;
+		if (dev) {
+			unregister_netdev(dev);
+			spriv->dev = NULL;
+		}
+	}
+}
+
+#ifdef CONFIG_PROC_FS
+static void l2tp_eth_show(struct seq_file *m, void *arg)
+{
+	struct l2tp_session *session = arg;
+	struct l2tp_eth_sess *spriv = l2tp_session_priv(session);
+	struct net_device *dev = spriv->dev;
+
+	seq_printf(m, "   interface %s\n", dev->name);
+}
+#endif
+
+static int l2tp_eth_create(struct net *net, u32 tunnel_id, u32 session_id, u32 peer_session_id, struct l2tp_session_cfg *cfg)
+{
+	struct net_device *dev;
+	char name[IFNAMSIZ];
+	struct l2tp_tunnel *tunnel;
+	struct l2tp_session *session;
+	struct l2tp_eth *priv;
+	struct l2tp_eth_sess *spriv;
+	int rc;
+	struct l2tp_eth_net *pn;
+
+	tunnel = l2tp_tunnel_find(net, tunnel_id);
+	if (!tunnel) {
+		rc = -ENODEV;
+		goto out;
+	}
+
+	session = l2tp_session_find(net, tunnel, session_id);
+	if (session) {
+		rc = -EEXIST;
+		goto out;
+	}
+
+	if (cfg->ifname) {
+		dev = dev_get_by_name(net, cfg->ifname);
+		if (dev) {
+			dev_put(dev);
+			rc = -EEXIST;
+			goto out;
+		}
+		strlcpy(name, cfg->ifname, IFNAMSIZ);
+	} else
+		strcpy(name, L2TP_ETH_DEV_NAME);
+
+	session = l2tp_session_create(sizeof(*spriv), tunnel, session_id,
+				      peer_session_id, cfg);
+	if (!session) {
+		rc = -ENOMEM;
+		goto out;
+	}
+
+	dev = alloc_netdev(sizeof(*priv), name, l2tp_eth_dev_setup);
+	if (!dev) {
+		rc = -ENOMEM;
+		goto out_del_session;
+	}
+
+	dev_net_set(dev, net);
+	if (session->mtu == 0)
+		session->mtu = dev->mtu - session->hdr_len;
+	dev->mtu = session->mtu;
+	dev->needed_headroom += session->hdr_len;
+
+	priv = netdev_priv(dev);
+	priv->dev = dev;
+	priv->session = session;
+	INIT_LIST_HEAD(&priv->list);
+
+	priv->tunnel_sock = tunnel->sock;
+	session->recv_skb = l2tp_eth_dev_recv;
+	session->session_close = l2tp_eth_delete;
+#ifdef CONFIG_PROC_FS
+	session->show = l2tp_eth_show;
+#endif
+
+	spriv = l2tp_session_priv(session);
+	spriv->dev = dev;
+
+	rc = register_netdev(dev);
+	if (rc < 0)
+		goto out_del_dev;
+
+	/* Must be done after register_netdev() */
+	strlcpy(session->ifname, dev->name, IFNAMSIZ);
+
+	dev_hold(dev);
+	pn = l2tp_eth_pernet(dev_net(dev));
+	write_lock(&pn->l2tp_eth_lock);
+	list_add(&priv->list, &pn->l2tp_eth_dev_list);
+	write_unlock(&pn->l2tp_eth_lock);
+
+	return 0;
+
+out_del_dev:
+	free_netdev(dev);
+out_del_session:
+	l2tp_session_delete(session);
+out:
+	return rc;
+}
+
+static __net_init int l2tp_eth_init_net(struct net *net)
+{
+	struct l2tp_eth_net *pn;
+	int err;
+
+	pn = kzalloc(sizeof(*pn), GFP_KERNEL);
+	if (!pn)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&pn->l2tp_eth_dev_list);
+	rwlock_init(&pn->l2tp_eth_lock);
+
+	err = net_assign_generic(net, l2tp_eth_net_id, pn);
+	if (err)
+		goto out;
+
+	return 0;
+
+out:
+	kfree(pn);
+	return err;
+}
+
+static __net_exit void l2tp_eth_exit_net(struct net *net)
+{
+	struct l2tp_eth_net *pn;
+
+	pn = net_generic(net, l2tp_eth_net_id);
+	/*
+	 * if someone has cached our net then
+	 * further net_generic call will return NULL
+	 */
+	net_assign_generic(net, l2tp_eth_net_id, NULL);
+	kfree(pn);
+}
+
+static __net_initdata struct pernet_operations l2tp_eth_net_ops = {
+	.init = l2tp_eth_init_net,
+	.exit = l2tp_eth_exit_net,
+	.id   = &l2tp_eth_net_id,
+	.size = sizeof(struct l2tp_eth_net),
+};
+
+
+static const struct l2tp_nl_cmd_ops l2tp_eth_nl_cmd_ops = {
+	.session_create	= l2tp_eth_create,
+	.session_delete	= l2tp_session_delete,
+};
+
+
+static int __init l2tp_eth_init(void)
+{
+	int err = 0;
+
+	err = l2tp_nl_register_ops(L2TP_PWTYPE_ETH, &l2tp_eth_nl_cmd_ops);
+	if (err)
+		goto out;
+
+	err = register_pernet_device(&l2tp_eth_net_ops);
+	if (err)
+		goto out_unreg;
+
+	printk(KERN_INFO "L2TP ethernet pseudowire support (L2TPv3)\n");
+
+	return 0;
+
+out_unreg:
+	l2tp_nl_unregister_ops(L2TP_PWTYPE_ETH);
+out:
+	return err;
+}
+
+static void __exit l2tp_eth_exit(void)
+{
+	unregister_pernet_device(&l2tp_eth_net_ops);
+	l2tp_nl_unregister_ops(L2TP_PWTYPE_ETH);
+}
+
+module_init(l2tp_eth_init);
+module_exit(l2tp_eth_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("James Chapman <jchapman@katalix.com>");
+MODULE_DESCRIPTION("L2TP ethernet pseudowire driver");
+MODULE_VERSION("1.0");