diff mbox

[RFC,net-next] ipip: Add room for custom tunnel header

Message ID 1375969927-22235-1-git-send-email-kristian.evensen@gmail.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Kristian Evensen Aug. 8, 2013, 1:52 p.m. UTC
Hello,

A project I recently worked on required me to tunnel traffic between devices
placed in different locations. The aggregated, average traffic from each
location is ~150Mb/s, which saturated the router CPU when using a TUN-interface
for tunneling (IP in UDP). Actually, the router was not able to tunnel more than
~60Mbit/s. 

I therefore decided to look into the different in-kernel tunneling options. As
the traffic is encrypted by the devices, ip-in-ip tunneling would be sufficient.
However, each site is connected using a public internet connection and placed
behind a NAT, so pure ip-in-ip tunneling would not work.

This patch enables users to specify the size of a header that will be added
between the two IP-headers. The header can then be filled in by for example an
xtables-module (avoiding a memmove). In addition to solving my problem (I
inserted a UDP header to fool the NAT-boxes), this patch can be used to ease
development and deployment of new tunneling protocols. Instead of integrating
them in the kernel, protocols can be built on top of the existing IP-in-IP
tunnels and get the advantages of the ip_tunnels-framework.

The patch works as expected and provided me with a nice performance boost
(roughly x4). However, it currently has a problem. As I extend the ip_tunnel_parm
struct, applications like ip needs to be recompiled in order to work properly.
Is there a better way to pass the hlen-value to the kernel (user rtnl-ops and
add as parameter?), or to detect number of bytes waiting to be copied from user
space?

Also, is this something that would be considered useful and potentially added to
the kernel, or will it be viewed as a protocol hack? One other way I thought of
doing this, was to clone ipip.c and add a new tunneling type. However, that
seems a bit overkill for a five line change.

-Kristian

Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
---
 include/uapi/linux/if_tunnel.h |    1 +
 net/ipv4/ipip.c                |    5 ++++-
 2 files changed, 5 insertions(+), 1 deletion(-)

Comments

Eric Dumazet Aug. 8, 2013, 4 p.m. UTC | #1
On Thu, 2013-08-08 at 15:52 +0200, Kristian Evensen wrote:
> Hello,

...

> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com>
> ---
>  include/uapi/linux/if_tunnel.h |    1 +
>  net/ipv4/ipip.c                |    5 ++++-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index aee73d0..039bbc3 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -35,6 +35,7 @@ struct ip_tunnel_parm {
>  	__be32			i_key;
>  	__be32			o_key;
>  	struct iphdr		iph;
> +	int                     hlen;
>  };
>  
>  enum {
> diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
> index 51fc2a1..9705aa1 100644
> --- a/net/ipv4/ipip.c
> +++ b/net/ipv4/ipip.c
> @@ -226,6 +226,9 @@ static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
>  		skb->encapsulation = 1;
>  	}
>  
> +	if (tunnel->hlen > 0)
> +		skb_push(skb, tunnel->hlen);


What happens if hlen is bigger than available headroom ?

hlen comes from userspace and there is no safety check, right ?

What guarantee do we have _something_ will fill the bytes ?
(We do not want to leak prior content of those bytes to the wire)



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kristian Evensen Aug. 16, 2013, 1:11 p.m. UTC | #2
Hi,

Thank you very much for your feedback and sorry for not replying
earlier. I modified the patch and implemented setting a custom header
length as a netlink message attribute instead
(http://patchwork.ozlabs.org/patch/266456/). It was initially rejected
on the basis that I needed to provide some use-cases, which I did, so
lets see what happens. However, your comments apply to both patches,
so it would be interesting to discuss this.

> What happens if hlen is bigger than available headroom ?
>
> hlen comes from userspace and there is no safety check, right ?

Based on my understanding of the code, hlen will not be bigger than
available headroom. dev->needed_headroom is set to the tunnel header
length in ip_tunnel_bind_dev(). hlen is only read when adding a new
link, so this assignment cannot be skipped. However, I see that a
check along the lines of "hlen > 0 && hlen < (ETH_DATA_LEN -
LL_MAX_HDR - sizeof(iphdr)" should be added.

One potential side-effect is that one can overflow the mtu by
providing a large hlen. Will that be considered a security risk or
misconfiguration? Since you have to have the same rights to create a
tunnel as to create a link, a similar behavior can be accomplished by
creating a new link with a large mtu.

> What guarantee do we have _something_ will fill the bytes ?
> (We do not want to leak prior content of those bytes to the wire)

In the current patch, none. I assumed this was ok as inserting the
custom header does not make sense without another module filling in
the content. I do agree though, a memset should be added to zero out
the custom header.

Thanks again,
Kristian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
index aee73d0..039bbc3 100644
--- a/include/uapi/linux/if_tunnel.h
+++ b/include/uapi/linux/if_tunnel.h
@@ -35,6 +35,7 @@  struct ip_tunnel_parm {
 	__be32			i_key;
 	__be32			o_key;
 	struct iphdr		iph;
+	int                     hlen;
 };
 
 enum {
diff --git a/net/ipv4/ipip.c b/net/ipv4/ipip.c
index 51fc2a1..9705aa1 100644
--- a/net/ipv4/ipip.c
+++ b/net/ipv4/ipip.c
@@ -226,6 +226,9 @@  static netdev_tx_t ipip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
 		skb->encapsulation = 1;
 	}
 
+	if (tunnel->hlen > 0)
+		skb_push(skb, tunnel->hlen);
+
 	ip_tunnel_xmit(skb, dev, tiph, tiph->protocol);
 	return NETDEV_TX_OK;
 
@@ -302,7 +305,7 @@  static int ipip_tunnel_init(struct net_device *dev)
 	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
 	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
 
-	tunnel->hlen = 0;
+	tunnel->hlen = tunnel->parms.hlen;
 	tunnel->parms.iph.protocol = IPPROTO_IPIP;
 	return ip_tunnel_init(dev);
 }