diff mbox

[RFC] xfrm{4,6}: only report errors back to local sockets if we don't cross address family

Message ID 20130729145017.GD2490@order.stressinduktion.org
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Hannes Frederic Sowa July 29, 2013, 2:50 p.m. UTC
xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
socket in case it is locally generated. If the packet first traversed
a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
because of address family type mismatch in the error reporting functions.

This bug fixes only the panic part of the excellent bug report
<https://bugzilla.kernel.org/show_bug.cgi?id=58691> by vi0oss.

It is unclear to me how to let these mtu change events travel up the
stack. A match on (skb->sk && (skb->dev | tunnel_types)) could be made
first and we forcefully decrease the mtu of that interface with a warning?
This seems to be fragile to me.

Reported-by: <vi0oss@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
---
 net/ipv4/xfrm4_output.c | 4 ++--
 net/ipv6/xfrm6_output.c | 3 +++
 2 files changed, 5 insertions(+), 2 deletions(-)

Comments

Steffen Klassert July 30, 2013, 8:21 a.m. UTC | #1
On Mon, Jul 29, 2013 at 04:50:17PM +0200, Hannes Frederic Sowa wrote:
> xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
> socket in case it is locally generated. If the packet first traversed
> a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
> because of address family type mismatch in the error reporting functions.
> 

So the skb is still owned by a socket of the inner address family.
Is this intentional? Maybe the ndo_start_xmit() function of the
tunnel device should orphan the skb if we tunnel the packet
through a different address family.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa July 30, 2013, 8:30 a.m. UTC | #2
On Tue, Jul 30, 2013 at 10:21:18AM +0200, Steffen Klassert wrote:
> On Mon, Jul 29, 2013 at 04:50:17PM +0200, Hannes Frederic Sowa wrote:
> > xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
> > socket in case it is locally generated. If the packet first traversed
> > a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
> > because of address family type mismatch in the error reporting functions.
> > 
> 
> So the skb is still owned by a socket of the inner address family.
> Is this intentional? Maybe the ndo_start_xmit() function of the
> tunnel device should orphan the skb if we tunnel the packet
> through a different address family.

I thought about this, too. But then we would stop accounting the data
to the socket while it is travelling the stack. I don't know about the
possible problems resulting from this.

Greetings,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert July 30, 2013, 10:26 a.m. UTC | #3
On Tue, Jul 30, 2013 at 10:30:17AM +0200, Hannes Frederic Sowa wrote:
> On Tue, Jul 30, 2013 at 10:21:18AM +0200, Steffen Klassert wrote:
> > On Mon, Jul 29, 2013 at 04:50:17PM +0200, Hannes Frederic Sowa wrote:
> > > xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
> > > socket in case it is locally generated. If the packet first traversed
> > > a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
> > > because of address family type mismatch in the error reporting functions.
> > > 
> > 
> > So the skb is still owned by a socket of the inner address family.
> > Is this intentional? Maybe the ndo_start_xmit() function of the
> > tunnel device should orphan the skb if we tunnel the packet
> > through a different address family.
> 
> I thought about this, too. But then we would stop accounting the data
> to the socket while it is travelling the stack. I don't know about the
> possible problems resulting from this.
> 

I'm also not absolutely sure, but we reinsert the packet to
the ipv4/ipv6 output path which is also used to output forwarded
packets. So the code should be prepared for handling a skb without
socket context.

There are already situations where we orphan the skb in some
tunnel xmit functions. For example if we tunnel through
another namespace.

Lets see if anyone else has an opinion to that. I'll go
and look what we can do if we have to fix it in the IPsec
code in the meantime.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa July 30, 2013, 10:40 a.m. UTC | #4
On Tue, Jul 30, 2013 at 12:26:11PM +0200, Steffen Klassert wrote:
> On Tue, Jul 30, 2013 at 10:30:17AM +0200, Hannes Frederic Sowa wrote:
> > On Tue, Jul 30, 2013 at 10:21:18AM +0200, Steffen Klassert wrote:
> > > On Mon, Jul 29, 2013 at 04:50:17PM +0200, Hannes Frederic Sowa wrote:
> > > > xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
> > > > socket in case it is locally generated. If the packet first traversed
> > > > a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
> > > > because of address family type mismatch in the error reporting functions.
> > > > 
> > > 
> > > So the skb is still owned by a socket of the inner address family.
> > > Is this intentional? Maybe the ndo_start_xmit() function of the
> > > tunnel device should orphan the skb if we tunnel the packet
> > > through a different address family.
> > 
> > I thought about this, too. But then we would stop accounting the data
> > to the socket while it is travelling the stack. I don't know about the
> > possible problems resulting from this.
> > 
> 
> I'm also not absolutely sure, but we reinsert the packet to
> the ipv4/ipv6 output path which is also used to output forwarded
> packets. So the code should be prepared for handling a skb without
> socket context.
> 
> There are already situations where we orphan the skb in some
> tunnel xmit functions. For example if we tunnel through
> another namespace.

Somehow this seems the way to go.

Even if we get a matching address family socket we would still call
ipv6_local_error with the wrong fl6.daddr for that socket. I do think
IPv4 has the same issue, but I have not checked.

Because skbs could circulate multiple times through devices, a check as
(skb->dev->type | tunnel_types) seems to be not enough, too.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller July 31, 2013, 12:01 a.m. UTC | #5
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Tue, 30 Jul 2013 12:40:38 +0200

> On Tue, Jul 30, 2013 at 12:26:11PM +0200, Steffen Klassert wrote:
>> On Tue, Jul 30, 2013 at 10:30:17AM +0200, Hannes Frederic Sowa wrote:
>> > On Tue, Jul 30, 2013 at 10:21:18AM +0200, Steffen Klassert wrote:
>> > > On Mon, Jul 29, 2013 at 04:50:17PM +0200, Hannes Frederic Sowa wrote:
>> > > > xfrm6_local_error/xfrm4_tunnel_check_size report mtu errors back to a
>> > > > socket in case it is locally generated. If the packet first traversed
>> > > > a 6in4/4in6 tunnel before passing the xfrm layer, we could get a panic
>> > > > because of address family type mismatch in the error reporting functions.
>> > > > 
>> > > 
>> > > So the skb is still owned by a socket of the inner address family.
>> > > Is this intentional? Maybe the ndo_start_xmit() function of the
>> > > tunnel device should orphan the skb if we tunnel the packet
>> > > through a different address family.
>> > 
>> > I thought about this, too. But then we would stop accounting the data
>> > to the socket while it is travelling the stack. I don't know about the
>> > possible problems resulting from this.
>> > 
>> 
>> I'm also not absolutely sure, but we reinsert the packet to
>> the ipv4/ipv6 output path which is also used to output forwarded
>> packets. So the code should be prepared for handling a skb without
>> socket context.
>> 
>> There are already situations where we orphan the skb in some
>> tunnel xmit functions. For example if we tunnel through
>> another namespace.
> 
> Somehow this seems the way to go.

Agreed, I think we should just orphan the SKB.

There was talk about this when the skb_scrub_packet() interface was
added, maybe we should do it unconditionally after all.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Hannes Frederic Sowa Aug. 1, 2013, 8:11 a.m. UTC | #6
On Tue, Jul 30, 2013 at 12:26:11PM +0200, Steffen Klassert wrote:
> Lets see if anyone else has an opinion to that. I'll go
> and look what we can do if we have to fix it in the IPsec
> code in the meantime.

If you have not yet done so, I would try to find a solution over the weekend.

Thanks,

  Hannes

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steffen Klassert Aug. 1, 2013, 10:05 a.m. UTC | #7
On Thu, Aug 01, 2013 at 10:11:50AM +0200, Hannes Frederic Sowa wrote:
> 
> If you have not yet done so, I would try to find a solution over the weekend.
> 

I have not yet done so, please go ahead.

Thanks a lot!

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv4/xfrm4_output.c b/net/ipv4/xfrm4_output.c
index 327a617..61f6cae 100644
--- a/net/ipv4/xfrm4_output.c
+++ b/net/ipv4/xfrm4_output.c
@@ -32,10 +32,10 @@  static int xfrm4_tunnel_check_size(struct sk_buff *skb)
 	dst = skb_dst(skb);
 	mtu = dst_mtu(dst);
 	if (skb->len > mtu) {
-		if (skb->sk)
+		if (skb->sk && skb->sk->sk_family == AF_INET)
 			ip_local_error(skb->sk, EMSGSIZE, ip_hdr(skb)->daddr,
 				       inet_sk(skb->sk)->inet_dport, mtu);
-		else
+		else if (!skb->sk)
 			icmp_send(skb, ICMP_DEST_UNREACH,
 				  ICMP_FRAG_NEEDED, htonl(mtu));
 		ret = -EMSGSIZE;
diff --git a/net/ipv6/xfrm6_output.c b/net/ipv6/xfrm6_output.c
index 8755a30..29f6db7 100644
--- a/net/ipv6/xfrm6_output.c
+++ b/net/ipv6/xfrm6_output.c
@@ -59,6 +59,9 @@  static void xfrm6_local_error(struct sk_buff *skb, u32 mtu)
 	struct flowi6 fl6;
 	struct sock *sk = skb->sk;
 
+	if (sk->sk_family != AF_INET6)
+		return;
+
 	fl6.fl6_dport = inet_sk(sk)->inet_dport;
 	fl6.daddr = ipv6_hdr(skb)->daddr;