diff mbox

[v2,net-next,2/2] sit: add support of x-netns

Message ID 1372170295-4717-3-git-send-email-nicolas.dichtel@6wind.com
State Rejected, archived
Delegated to: David Miller
Headers show

Commit Message

Nicolas Dichtel June 25, 2013, 2:24 p.m. UTC
This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/ip_tunnels.h |  1 +
 net/ipv4/ip_tunnel.c     |  6 +++++-
 net/ipv6/sit.c           | 40 ++++++++++++++++++++++++++++++----------
 3 files changed, 36 insertions(+), 11 deletions(-)

Comments

David Miller June 25, 2013, 11:56 p.m. UTC | #1
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 25 Jun 2013 16:24:55 +0200

> @@ -453,6 +454,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
>  	tstats->rx_bytes += skb->len;
>  	u64_stats_update_end(&tstats->syncp);
>  
> +	skb_scrub_packet(skb);
> +
>  	if (tunnel->dev->type == ARPHRD_ETHER) {
>  		skb->protocol = eth_type_trans(skb, tunnel->dev);
>  		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);

I can't see how this can be ok.

If something in netfilter depends upon the state you are clearing out
here, someone's packet filtering setup is going to break.

I'm not applying these patches, sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 26, 2013, 1:35 a.m. UTC | #2
David Miller <davem@davemloft.net> writes:

> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Tue, 25 Jun 2013 16:24:55 +0200
>
>> @@ -453,6 +454,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
>>  	tstats->rx_bytes += skb->len;
>>  	u64_stats_update_end(&tstats->syncp);
>>  
>> +	skb_scrub_packet(skb);
>> +
>>  	if (tunnel->dev->type == ARPHRD_ETHER) {
>>  		skb->protocol = eth_type_trans(skb, tunnel->dev);
>>  		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>
> I can't see how this can be ok.
>
> If something in netfilter depends upon the state you are clearing out
> here, someone's packet filtering setup is going to break.
>
> I'm not applying these patches, sorry.

How can netfilter depend on the state of a packet inside of a tunnel?

How can it even make sense?

Or is your concern that we unintentionally allowed this in the past so
to avoid breaking binary compatibility we should continue in case
someone somewhere cares?

I really can't see how this could possibly be an intentional feature.

Eric



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 26, 2013, 5:48 a.m. UTC | #3
From: ebiederm@xmission.com (Eric W. Biederman)
Date: Tue, 25 Jun 2013 18:35:30 -0700

> David Miller <davem@davemloft.net> writes:
> 
>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>> Date: Tue, 25 Jun 2013 16:24:55 +0200
>>
>>> @@ -453,6 +454,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
>>>  	tstats->rx_bytes += skb->len;
>>>  	u64_stats_update_end(&tstats->syncp);
>>>  
>>> +	skb_scrub_packet(skb);
>>> +
>>>  	if (tunnel->dev->type == ARPHRD_ETHER) {
>>>  		skb->protocol = eth_type_trans(skb, tunnel->dev);
>>>  		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>>
>> I can't see how this can be ok.
>>
>> If something in netfilter depends upon the state you are clearing out
>> here, someone's packet filtering setup is going to break.
>>
>> I'm not applying these patches, sorry.
> 
> How can netfilter depend on the state of a packet inside of a tunnel?
> 
> How can it even make sense?
> 
> Or is your concern that we unintentionally allowed this in the past so
> to avoid breaking binary compatibility we should continue in case
> someone somewhere cares?
> 
> I really can't see how this could possibly be an intentional feature.

You can make all of these issues go away by only clearing the SKB
meta state when namespaces are actually changing as we go through
the tunnel.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric W. Biederman June 26, 2013, 10:03 a.m. UTC | #4
David Miller <davem@davemloft.net> writes:

> From: ebiederm@xmission.com (Eric W. Biederman)
> Date: Tue, 25 Jun 2013 18:35:30 -0700
>
>> David Miller <davem@davemloft.net> writes:
>> 
>>> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
>>> Date: Tue, 25 Jun 2013 16:24:55 +0200
>>>
>>>> @@ -453,6 +454,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
>>>>  	tstats->rx_bytes += skb->len;
>>>>  	u64_stats_update_end(&tstats->syncp);
>>>>  
>>>> +	skb_scrub_packet(skb);
>>>> +
>>>>  	if (tunnel->dev->type == ARPHRD_ETHER) {
>>>>  		skb->protocol = eth_type_trans(skb, tunnel->dev);
>>>>  		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>>>
>>> I can't see how this can be ok.
>>>
>>> If something in netfilter depends upon the state you are clearing out
>>> here, someone's packet filtering setup is going to break.
>>>
>>> I'm not applying these patches, sorry.
>> 
>> How can netfilter depend on the state of a packet inside of a tunnel?
>> 
>> How can it even make sense?
>> 
>> Or is your concern that we unintentionally allowed this in the past so
>> to avoid breaking binary compatibility we should continue in case
>> someone somewhere cares?
>> 
>> I really can't see how this could possibly be an intentional feature.
>
> You can make all of these issues go away by only clearing the SKB
> meta state when namespaces are actually changing as we go through
> the tunnel.

I have spent some time thinking about the cases where I have had an
opportunity to use the marks on packets and it turns out that if I had
been using a tunnel with any of those configurations leaving the marks
on would have either broken my configuration or at the very least have
required me to make certain I changed those marks.

So I really think this is a bug fix, for a long standing bug in a rare
corner case of kernel behavior that people just haven't noticed.   Which
is why I suggested to Nicolas Ditchtel that he remove the test to see if
we were changing network namespaces before scrubbing the packet.

That said I won't object if Nocolas Ditchel resends his patches with
that test put back in.  I just think it is silly and when someone
finally gets bit by the bug and complains we will have to go through and
remove the test.

Eric
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet June 26, 2013, 10:22 a.m. UTC | #5
On Wed, 2013-06-26 at 03:03 -0700, Eric W. Biederman wrote:


> That said I won't object if Nocolas Ditchel resends his patches with
> that test put back in.  I just think it is silly and when someone
> finally gets bit by the bug and complains we will have to go through and
> remove the test.

Well, what is the reason skb_orphan() must be called in a tunnel xmit
path ?

This patch changes more things than what advertised in changelog :(



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel June 26, 2013, 12:15 p.m. UTC | #6
Le 26/06/2013 12:22, Eric Dumazet a écrit :
> On Wed, 2013-06-26 at 03:03 -0700, Eric W. Biederman wrote:
>
>
>> That said I won't object if Nocolas Ditchel resends his patches with
>> that test put back in.  I just think it is silly and when someone
>> finally gets bit by the bug and complains we will have to go through and
>> remove the test.
>
> Well, what is the reason skb_orphan() must be called in a tunnel xmit
> path ?
>
> This patch changes more things than what advertised in changelog :(
In fact, this is true. If we finally found that skb_scrub_packet() is needed in 
all cases (not only when changing namespace), this will be another patch.

I will resend the serie with the test put back.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel June 26, 2013, 1:49 p.m. UTC | #7
Le 26/06/2013 01:56, David Miller a écrit :
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Tue, 25 Jun 2013 16:24:55 +0200
>
>> @@ -453,6 +454,8 @@ int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
>>   	tstats->rx_bytes += skb->len;
>>   	u64_stats_update_end(&tstats->syncp);
>>
>> +	skb_scrub_packet(skb);
>> +
>>   	if (tunnel->dev->type == ARPHRD_ETHER) {
>>   		skb->protocol = eth_type_trans(skb, tunnel->dev);
>>   		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
>
> I can't see how this can be ok.
>
> If something in netfilter depends upon the state you are clearing out
> here, someone's packet filtering setup is going to break.
Just for the record, note that nf_reset() is already called in 
iptunnel_pull_header() and iptunnel_xmit().
Hence 4in4 (ipip and sit) and gre tunnels are already reseting netfilter state.
6in4 (sit) do it only in xmit path and Xin6 (ip6_tunnel) never.

We can also notice that nf_reset() was added by the commit 3d7b46cd20e3 
"ip_tunnel: push generic protocol handling to ip_tunnel module." (net-next only) 
in rx path.

The nf_reset() of xmit path of 4in4 (ipip) is here for years (at least 2.6.12).
For gre, it has been added by c54419321455 "GRE: Refactor GRE tunneling code." 
(v3.10-rc1).

It seems that the code is different depending of the type of the tunnel. If we 
omit skb_orphan() (and maybe another one?, to be done only when changing 
namespace), it can be good to have a common function to have the same behavior 
for each tunnel.

Maybe something like:
void skb_scrub_packet(bool netnschange)
{
	if (netnschange)
	        skb_orphan(skb);
         skb->tstamp.tv64 = 0;
         skb->pkt_type = PACKET_HOST;
         skb->skb_iif = 0;
         skb_dst_drop(skb);
         skb->mark = 0;
         secpath_reset(skb);
         nf_reset(skb);
         nf_reset_trace(skb);
}

What's your opinion?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel June 26, 2013, 2:11 p.m. UTC | #8
This patch is a follow up of the thread "switching network namespace midway":
http://marc.info/?t=135101459500004&r=1&w=2

The goal of this serie is to add x-netns support for the module sit, ie the
encapsulation addresses and the network device are not owned by the same
namespace.

Example to configure a tunnel:

modprobe sit
ip netns add netns1
ip link add sit1 type sit remote 10.16.0.121 local 10.16.0.249
ip l s sit1 netns netns1
ip netns exec netns1 ip l s lo up
ip netns exec netns1 ip l s sit1 up
ip netns exec netns1 ip a a dev sit1 192.168.2.123 remote 192.168.2.121
ip netns exec netns1 ip -6 a a dev sit1 2001:1234::123 remote 2001:1234::121

Once this serie is approved, I will add the same feature for the module ipip and
ip6_tunnel.

v3:
  put again the test about netns before calling skb_scrub_packet()
  add a missing skb_scrub_packet() call in ip_tunnel_xmit()

v2:
  rename dev_cleanup_skb to skb_scrub_packet
  move skb_scrub_packet to skbuff.c
  fix netns cleanup
  remove string comparison in netns cleanup
  add a comment about FB device
  call skb_scrub_packet() unconditionnaly
  remove 'RFC'

 include/linux/skbuff.h   |  1 +
 include/net/ip_tunnels.h |  1 +
 net/core/dev.c           | 11 +----------
 net/core/skbuff.c        | 23 +++++++++++++++++++++++
 net/ipv4/ip_tunnel.c     | 10 +++++++++-
 net/ipv6/sit.c           | 42 ++++++++++++++++++++++++++++++++----------
 6 files changed, 67 insertions(+), 21 deletions(-)

Comments are welcome.

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller June 28, 2013, 5:36 a.m. UTC | #9
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 26 Jun 2013 16:11:26 +0200

> This patch is a follow up of the thread "switching network namespace midway":
> http://marc.info/?t=135101459500004&r=1&w=2
> 
> The goal of this serie is to add x-netns support for the module sit, ie the
> encapsulation addresses and the network device are not owned by the same
> namespace.

Ok, applied.

And yes I agree we should look into making tunnel's behave consistently
wrt. SKB orphaning, cleaning netfilter state, etc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel July 3, 2013, 3 p.m. UTC | #10
This patch is a follow up of the previous serie witch add this functionality
for sit tunnels.

The goal is to add x-netns support for the module ipip and ip6_tunnel, ie the
encapsulation addresses and the network device are not owned by the same
namespace.

Note that the first patch is a fix of the previous serie.

Example to configure an ipip tunnel:

modprobe ipip
ip netns add netns1
ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249
ip l s ipip1 netns netns1
ip netns exec netns1 ip l s lo up
ip netns exec netns1 ip l s ipip1 up
ip netns exec netns1 ip a a dev ipip1 192.168.2.123 remote 192.168.2.121

or an ip6_tunnel:

modprobe ip6_tunnel
ip netns add netns1
ip link add ip6tnl1 type ip6tnl remote 2001:660:3008:c1c3::121 local 2001:660:3008:c1c3::123
ip l s ip6tnl1 netns netns1
ip netns exec netns1 ip l s lo up
ip netns exec netns1 ip l s ip6tnl1 up
ip netns exec netns1 ip a a dev ip6tnl1 192.168.1.123 remote 192.168.1.121
ip netns exec netns1 ip -6 a a dev ip6tnl1 2001:1235::123 remote 2001:1235::121

 include/net/ip6_tunnel.h |  1 +
 include/net/ip_tunnels.h |  2 +-
 net/ipv4/ip_gre.c        |  4 ++--
 net/ipv4/ip_tunnel.c     | 42 +++++++++++++++++++++++++++---------------
 net/ipv4/ipip.c          |  3 +--
 net/ipv6/ip6_gre.c       |  5 +++++
 net/ipv6/ip6_tunnel.c    | 41 +++++++++++++++++++++++++++++++----------
 net/ipv6/sit.c           |  4 ++--
 8 files changed, 70 insertions(+), 32 deletions(-)

Comments are welcome.

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 4, 2013, 9:56 p.m. UTC | #11
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed,  3 Jul 2013 17:00:33 +0200

> 
> This patch is a follow up of the previous serie witch add this functionality
> for sit tunnels.
> 
> The goal is to add x-netns support for the module ipip and ip6_tunnel, ie the
> encapsulation addresses and the network device are not owned by the same
> namespace.
> 
> Note that the first patch is a fix of the previous serie.

The first patch, as it is a bug fix, is fine and is applied.

The rest will have to wait until the net-next tree opens again,
sorry.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Aug. 13, 2013, 3:51 p.m. UTC | #12
This serie is a follow up of the previous serie witch adds this functionality
for sit tunnels.

The goal is to add x-netns support for the module ipip and ip6_tunnel, ie the
encapsulation addresses and the network device are not owned by the same
namespace.

Note that the two first patches are cleanup.

Example to configure an ipip tunnel:

modprobe ipip
ip netns add netns1
ip link add ipip1 type ipip remote 10.16.0.121 local 10.16.0.249
ip l s ipip1 netns netns1
ip netns exec netns1 ip l s lo up
ip netns exec netns1 ip l s ipip1 up
ip netns exec netns1 ip a a dev ipip1 192.168.2.123 remote 192.168.2.121

or an ip6_tunnel:

modprobe ip6_tunnel
ip netns add netns1
ip link add ip6tnl1 type ip6tnl remote 2001:660:3008:c1c3::121 local 2001:660:3008:c1c3::123
ip l s ip6tnl1 netns netns1
ip netns exec netns1 ip l s lo up
ip netns exec netns1 ip l s ip6tnl1 up
ip netns exec netns1 ip a a dev ip6tnl1 192.168.1.123 remote 192.168.1.121
ip netns exec netns1 ip -6 a a dev ip6tnl1 2001:1235::123 remote 2001:1235::121

v2: remove the patch 1/3 of the v1 serie (already included)
    use net_eq()
    add patch 1/4 and 2/4

 include/net/ip6_tunnel.h |  1 +
 include/net/ip_tunnels.h |  2 +-
 net/core/dev.c           |  6 +++---
 net/ipv4/ip_gre.c        |  4 ++--
 net/ipv4/ip_tunnel.c     | 52 ++++++++++++++++++++++++++++++------------------
 net/ipv4/ip_vti.c        |  2 +-
 net/ipv4/ipip.c          |  3 +--
 net/ipv6/ip6_gre.c       |  5 +++++
 net/ipv6/ip6_tunnel.c    | 41 ++++++++++++++++++++++++++++----------
 net/ipv6/sit.c           |  6 +++---
 10 files changed, 81 insertions(+), 41 deletions(-)

Comments are welcome.

Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller Aug. 15, 2013, 8:01 a.m. UTC | #13
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Tue, 13 Aug 2013 17:51:08 +0200

> 
> This serie is a follow up of the previous serie witch adds this functionality
> for sit tunnels.
> 
> The goal is to add x-netns support for the module ipip and ip6_tunnel, ie the
> encapsulation addresses and the network device are not owned by the same
> namespace.
> 
> Note that the two first patches are cleanup.

Looks good, series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index b0d9824..781b3cf 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -42,6 +42,7 @@  struct ip_tunnel {
 	struct ip_tunnel __rcu	*next;
 	struct hlist_node hash_node;
 	struct net_device	*dev;
+	struct net		*net;	/* netns for packet i/o */
 
 	int		err_count;	/* Number of arrived ICMP errors */
 	unsigned long	err_time;	/* Time when the last ICMP error
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index bd227e5..d375e4d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -304,6 +304,7 @@  static struct net_device *__ip_tunnel_create(struct net *net,
 
 	tunnel = netdev_priv(dev);
 	tunnel->parms = *parms;
+	tunnel->net = net;
 
 	err = register_netdevice(dev);
 	if (err)
@@ -453,6 +454,8 @@  int ip_tunnel_rcv(struct ip_tunnel *tunnel, struct sk_buff *skb,
 	tstats->rx_bytes += skb->len;
 	u64_stats_update_end(&tstats->syncp);
 
+	skb_scrub_packet(skb);
+
 	if (tunnel->dev->type == ARPHRD_ETHER) {
 		skb->protocol = eth_type_trans(skb, tunnel->dev);
 		skb_postpull_rcsum(skb, eth_hdr(skb), ETH_HLEN);
@@ -541,7 +544,7 @@  void ip_tunnel_xmit(struct sk_buff *skb, struct net_device *dev,
 			tos = ipv6_get_dsfield((const struct ipv6hdr *)inner_iph);
 	}
 
-	rt = ip_route_output_tunnel(dev_net(dev), &fl4,
+	rt = ip_route_output_tunnel(tunnel->net, &fl4,
 				    tunnel->parms.iph.protocol,
 				    dst, tnl_params->saddr,
 				    tunnel->parms.o_key,
@@ -888,6 +891,7 @@  int ip_tunnel_newlink(struct net_device *dev, struct nlattr *tb[],
 	if (ip_tunnel_find(itn, p, dev->type))
 		return -EEXIST;
 
+	nt->net = net;
 	nt->parms = *p;
 	err = register_netdevice(dev);
 	if (err)
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index f639866..8765f4e 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -466,14 +466,14 @@  isatap_chksrc(struct sk_buff *skb, const struct iphdr *iph, struct ip_tunnel *t)
 
 static void ipip6_tunnel_uninit(struct net_device *dev)
 {
-	struct net *net = dev_net(dev);
-	struct sit_net *sitn = net_generic(net, sit_net_id);
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+	struct sit_net *sitn = net_generic(tunnel->net, sit_net_id);
 
 	if (dev == sitn->fb_tunnel_dev) {
 		RCU_INIT_POINTER(sitn->tunnels_wc[0], NULL);
 	} else {
-		ipip6_tunnel_unlink(sitn, netdev_priv(dev));
-		ipip6_tunnel_del_prl(netdev_priv(dev), NULL);
+		ipip6_tunnel_unlink(sitn, tunnel);
+		ipip6_tunnel_del_prl(tunnel, NULL);
 	}
 	dev_put(dev);
 }
@@ -621,6 +621,7 @@  static int ipip6_rcv(struct sk_buff *skb)
 		tstats->rx_packets++;
 		tstats->rx_bytes += skb->len;
 
+		skb_scrub_packet(skb);
 		netif_rx(skb);
 
 		return 0;
@@ -803,7 +804,7 @@  static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			goto tx_error;
 	}
 
-	rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
+	rt = ip_route_output_ports(tunnel->net, &fl4, NULL,
 				   dst, tiph->saddr,
 				   0, 0,
 				   IPPROTO_IPV6, RT_TOS(tos),
@@ -858,6 +859,8 @@  static netdev_tx_t ipip6_tunnel_xmit(struct sk_buff *skb,
 			tunnel->err_count = 0;
 	}
 
+	skb_scrub_packet(skb);
+
 	/*
 	 * Okay, now see if we can stuff it in the buffer as-is.
 	 */
@@ -944,7 +947,8 @@  static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	iph = &tunnel->parms.iph;
 
 	if (iph->daddr) {
-		struct rtable *rt = ip_route_output_ports(dev_net(dev), &fl4, NULL,
+		struct rtable *rt = ip_route_output_ports(tunnel->net, &fl4,
+							  NULL,
 							  iph->daddr, iph->saddr,
 							  0, 0,
 							  IPPROTO_IPV6,
@@ -959,7 +963,7 @@  static void ipip6_tunnel_bind_dev(struct net_device *dev)
 	}
 
 	if (!tdev && tunnel->parms.link)
-		tdev = __dev_get_by_index(dev_net(dev), tunnel->parms.link);
+		tdev = __dev_get_by_index(tunnel->net, tunnel->parms.link);
 
 	if (tdev) {
 		dev->hard_header_len = tdev->hard_header_len + sizeof(struct iphdr);
@@ -972,7 +976,7 @@  static void ipip6_tunnel_bind_dev(struct net_device *dev)
 
 static void ipip6_tunnel_update(struct ip_tunnel *t, struct ip_tunnel_parm *p)
 {
-	struct net *net = dev_net(t->dev);
+	struct net *net = t->net;
 	struct sit_net *sitn = net_generic(net, sit_net_id);
 
 	ipip6_tunnel_unlink(sitn, t);
@@ -1248,7 +1252,6 @@  static void ipip6_tunnel_setup(struct net_device *dev)
 	dev->priv_flags	       &= ~IFF_XMIT_DST_RELEASE;
 	dev->iflink		= 0;
 	dev->addr_len		= 4;
-	dev->features		|= NETIF_F_NETNS_LOCAL;
 	dev->features		|= NETIF_F_LLTX;
 }
 
@@ -1257,6 +1260,7 @@  static int ipip6_tunnel_init(struct net_device *dev)
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 
 	tunnel->dev = dev;
+	tunnel->net = dev_net(dev);
 
 	memcpy(dev->dev_addr, &tunnel->parms.iph.saddr, 4);
 	memcpy(dev->broadcast, &tunnel->parms.iph.daddr, 4);
@@ -1277,6 +1281,7 @@  static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
 	struct sit_net *sitn = net_generic(net, sit_net_id);
 
 	tunnel->dev = dev;
+	tunnel->net = dev_net(dev);
 	strcpy(tunnel->parms.name, dev->name);
 
 	iph->version		= 4;
@@ -1564,8 +1569,14 @@  static struct xfrm_tunnel ipip_handler __read_mostly = {
 
 static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_head *head)
 {
+	struct net *net = dev_net(sitn->fb_tunnel_dev);
+	struct net_device *dev, *aux;
 	int prio;
 
+	for_each_netdev_safe(net, dev, aux)
+		if (dev->rtnl_link_ops == &sit_link_ops)
+			unregister_netdevice_queue(dev, head);
+
 	for (prio = 1; prio < 4; prio++) {
 		int h;
 		for (h = 0; h < HASH_SIZE; h++) {
@@ -1573,7 +1584,12 @@  static void __net_exit sit_destroy_tunnels(struct sit_net *sitn, struct list_hea
 
 			t = rtnl_dereference(sitn->tunnels[prio][h]);
 			while (t != NULL) {
-				unregister_netdevice_queue(t->dev, head);
+				/* If dev is in the same netns, it has already
+				 * been added to the list by the previous loop.
+				 */
+				if (dev_net(t->dev) != net)
+					unregister_netdevice_queue(t->dev,
+								   head);
 				t = rtnl_dereference(t->next);
 			}
 		}
@@ -1598,6 +1614,10 @@  static int __net_init sit_init_net(struct net *net)
 		goto err_alloc_dev;
 	}
 	dev_net_set(sitn->fb_tunnel_dev, net);
+	/* FB netdevice is special: we have one, and only one per netns.
+	 * Allowing to move it to another netns is clearly unsafe.
+	 */
+	sitn->fb_tunnel_dev->features |= NETIF_F_NETNS_LOCAL;
 
 	err = ipip6_fb_tunnel_init(sitn->fb_tunnel_dev);
 	if (err)