diff mbox

IPv6 routing/fragmentation panic

Message ID 20150916132733.GQ24810@breakpoint.cc
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Florian Westphal Sept. 16, 2015, 1:27 p.m. UTC
David Woodhouse <dwmw2@infradead.org> wrote:
> On Wed, 2015-09-16 at 01:48 +0200, Florian Westphal wrote:
> > 
> > What I don't understand is why you see this with fragmented ipv6 
> > packets only (and not with all ipv6 forwarded skbs).
> > 
> > Something like this copy-pastry from ip_finish_output2 should fix it:
> 
> That works; thanks.
> 
> Tested-by: David Woodhouse <David.Woodhouse@intel.com>
> 
> A little extra debugging output shows that the offending fragments were
> arriving here with skb_headroom(skb)==10. Which is reasonable, being
> the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP
> frame type.
> 
> The non-fragmented packets, on the other hand, are arriving with a
> headroom of 42 bytes. Could something else already have reallocated
> them before they get that far?

Yep.  I missed

        if (skb_cow(skb, dst->dev->hard_header_len)) {

call in ip6_forward().

Problem is of course that we only expand headroom of the skb
and not of the fragment(s) stored in that skbs frag list.

So we have several options for a fix.

- expand headroom in ip6_finish_output2, like we do for ipv4
- expand headroom in ip6_fragment
- defer to slowpath if frags don't have enough headroom.

The latter is the smallest patch and would not add test for locally
generated, non-fragmented skbs.

(not even compile tested)
David, could you test this?  I'd do an official patch submission then.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Woodhouse Sept. 16, 2015, 1:34 p.m. UTC | #1
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> 
> David, could you test this?  I'd do an official patch submission
> then.

Compiles. Doesn't fix the problem.
David Woodhouse Sept. 16, 2015, 2 p.m. UTC | #2
On Wed, 2015-09-16 at 15:27 +0200, Florian Westphal wrote:
> @@ -599,7 +600,7 @@ int ip6_fragment(struct sock *sk, struct sk_buff
> *skb,
>                         /* Correct geometry. */
>                         if (frag->len > mtu ||
>                             ((frag->len & 7) && frag->next) ||
> -                           skb_headroom(frag) < hlen)
> +                           skb_headroom(frag) < (hlen + hroom))
>                                 goto slow_path_clean;
>  
>                         /* Partially cloned skb? */

My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes
followed by a fragment of 776 bytes.

The test cited above is only actually running on the latter fragment
(which for some reason is fine and has headroom of 58 bytes).

The first, larger, fragment isn't being checked. And that's the one
with only 10 bytes of headroom.

[   62.027984] has frag list                                                    
[   62.030616] line 604 check frag ddc5fcc0 len 776 headroom 58 (hlen 40 hroom 16)             
[   62.036720] line 678 send skb ded050c0 len 1280 headroom 10                                
[   62.041096] skbuff: skb_under_panic: text:c125f9ca len:1294 put:14 head:dec89
000 data:dec88ffc tail:0xdec8950a end:0xdec89f50 dev:br-lan
Florian Westphal Sept. 16, 2015, 3:30 p.m. UTC | #3
David Woodhouse <dwmw2@infradead.org> wrote:
> >                         if (frag->len > mtu ||
> >                             ((frag->len & 7) && frag->next) ||
> > -                           skb_headroom(frag) < hlen)
> > +                           skb_headroom(frag) < (hlen + hroom))
> >                                 goto slow_path_clean;
> >  
> >                         /* Partially cloned skb? */
> 
> My test is 'ping -s 2000', and I end up with a fragment of 1280 bytes
> followed by a fragment of 776 bytes.
> 
> The test cited above is only actually running on the latter fragment
> (which for some reason is fine and has headroom of 58 bytes).
> 
> The first, larger, fragment isn't being checked. And that's the one
> with only 10 bytes of headroom.

Thanks for this detailed analysis.
I've sent a patch that should address all of these issues.

Turns out that all tests are wrong in your case.

ip6_fragment doesn't expand headroom, since this skb had the ipv6
fragment header pulled, so that part thinks there are 18 bytes
available (we later push the frag header back when sending fragments).

The 'skb_headroom(frag) < hlen))' is wrong since it neither accounts for
device header length nor the fragment header that we need to push.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -586,6 +586,7 @@  int ip6_fragment(struct sock *sk, struct sk_buff *skb,
 	frag_id = ipv6_select_ident(net, &ipv6_hdr(skb)->daddr,
 				    &ipv6_hdr(skb)->saddr);
 
+	hroom = LL_RESERVED_SPACE(rt->dst.dev);
 	if (skb_has_frag_list(skb)) {
 		int first_len = skb_pagelen(skb);
 		struct sk_buff *frag2;
@@ -599,7 +600,7 @@  int ip6_fragment(struct sock *sk, struct sk_buff *skb,
 			/* Correct geometry. */
 			if (frag->len > mtu ||
 			    ((frag->len & 7) && frag->next) ||
-			    skb_headroom(frag) < hlen)
+			    skb_headroom(frag) < (hlen + hroom))
 				goto slow_path_clean;
 
 			/* Partially cloned skb? */
@@ -724,7 +725,6 @@  slow_path:
 	 */
 
 	*prevhdr = NEXTHDR_FRAGMENT;
-	hroom = LL_RESERVED_SPACE(rt->dst.dev);
 	troom = rt->dst.dev->needed_tailroom;
 
 	/*