Message ID | 1318604266.2223.29.camel@edumazet-HP-Compaq-6005-Pro-SFF-PC |
---|---|
State | Accepted, archived |
Delegated to: | David Miller |
Headers | show |
2011/10/14 Eric Dumazet <eric.dumazet@gmail.com>: > Please try following patch : > > [PATCH] ip_gre: dont increase dev->needed_headroom on a live device > > It seems ip_gre is able to change dev->needed_headroom on the fly. > > Its is not legal unfortunately and triggers a BUG in raw_sendmsg() > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > < another cpu change dev->needed_headromm (making it bigger) > > ... > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); > > We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() > -> we crash later because skb head is exhausted. > > Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route > header_len in max_headroom calculation) > > Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > CC: Timo Teräs <timo.teras@iki.fi> > CC: Herbert Xu <herbert@gondor.apana.org.au> > --- > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > index 8871067..1505dcf 100644 > --- a/net/ipv4/ip_gre.c > +++ b/net/ipv4/ip_gre.c > @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev > if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| > (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { > struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); > - if (max_headroom > dev->needed_headroom) > - dev->needed_headroom = max_headroom; > if (!new_skb) { > ip_rt_put(rt); > dev->stats.tx_dropped++; Hello I tried this patch and I was not able anymore to reproduce the kernel oops. So the patch solved the bug. Thank you very much! Would it be possible to add the patch to the long term kernel 2.6.35 as well? Because this is the one I use at the moment in production. And sorry for posting to the wrong mailing list (linux-kernel). Best regards Elmar -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le lundi 17 octobre 2011 à 09:16 +0200, Elmar Vonlanthen a écrit : > 2011/10/14 Eric Dumazet <eric.dumazet@gmail.com>: > > Please try following patch : > > > > [PATCH] ip_gre: dont increase dev->needed_headroom on a live device > > > > It seems ip_gre is able to change dev->needed_headroom on the fly. > > > > Its is not legal unfortunately and triggers a BUG in raw_sendmsg() > > > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > > > < another cpu change dev->needed_headromm (making it bigger) > > > > ... > > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); > > > > We end with LL_RESERVED_SPACE() being bigger than LL_ALLOCATED_SPACE() > > -> we crash later because skb head is exhausted. > > > > Bug introduced in commit 243aad83 in 2.6.34 (ip_gre: include route > > header_len in max_headroom calculation) > > > > Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > CC: Timo Teräs <timo.teras@iki.fi> > > CC: Herbert Xu <herbert@gondor.apana.org.au> > > --- > > diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c > > index 8871067..1505dcf 100644 > > --- a/net/ipv4/ip_gre.c > > +++ b/net/ipv4/ip_gre.c > > @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev > > if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| > > (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { > > struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); > > - if (max_headroom > dev->needed_headroom) > > - dev->needed_headroom = max_headroom; > > if (!new_skb) { > > ip_rt_put(rt); > > dev->stats.tx_dropped++; > > Hello > > I tried this patch and I was not able anymore to reproduce the kernel > oops. So the patch solved the bug. > Thank you very much! > > Would it be possible to add the patch to the long term kernel 2.6.35 > as well? Because this is the one I use at the moment in production. > Thanks for testing. If David/Herbert/Timo agree, then patch should find its way into current kernel, then to stable trees as well. Thanks -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 18, 2011 at 04:30:32AM +0200, Eric Dumazet wrote: > > If David/Herbert/Timo agree, then patch should find its way into current > kernel, then to stable trees as well. Actually, I think we should instead fix the users of needed_headroom to not read it twice which is causing problems here. GRE tunnels by their nature do not have a fixed value for needed_headroom. As the underlying routes change the necessary headroom may need to be adjusted due to further encapsulation such as IPsec. Keeping it constant from tunnel creation may result in suboptimal performance due to unnecessary header reallocations. However, until we audit the stack to see if there are further instances of double-readings such as the one causing the crash here, I'm fine with your patch making it constant. Once we're sure that all of the double-readings are gone we can revert to a dynamic needed_headroom. Thanks,
Le mardi 18 octobre 2011 à 11:34 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 04:30:32AM +0200, Eric Dumazet wrote: > > > > If David/Herbert/Timo agree, then patch should find its way into current > > kernel, then to stable trees as well. > > Actually, I think we should instead fix the users of needed_headroom > to not read it twice which is causing problems here. > > GRE tunnels by their nature do not have a fixed value for > needed_headroom. As the underlying routes change the necessary > headroom may need to be adjusted due to further encapsulation such > as IPsec. > > Keeping it constant from tunnel creation may result in suboptimal > performance due to unnecessary header reallocations. > > However, until we audit the stack to see if there are further > instances of double-readings such as the one causing the crash > here, I'm fine with your patch making it constant. > > Once we're sure that all of the double-readings are gone we > can revert to a dynamic needed_headroom. > Sure, we can work on this path for future kernels. Adding an RCU protected structure to hold hard_header_len / needed_headroom / needed_tailroom should be possible, but this adds yet another pointer dereference... Thanks ! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 18, 2011 at 12:01:33PM +0200, Eric Dumazet wrote: > > Adding an RCU protected structure to hold hard_header_len / > needed_headroom / needed_tailroom should be possible, but this adds yet > another pointer dereference... I don't think we need RCU here since the problem is simply that we're using two different values for skb allocations and skb_reserve. As long as we use one and the same value it should work. The value will rarely be incorrect and when it is, automatic reallocation will occur. Cheers,
Le mardi 18 octobre 2011 à 12:05 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 12:01:33PM +0200, Eric Dumazet wrote: > > > > Adding an RCU protected structure to hold hard_header_len / > > needed_headroom / needed_tailroom should be possible, but this adds yet > > another pointer dereference... > > I don't think we need RCU here since the problem is simply that > we're using two different values for skb allocations and skb_reserve. > > As long as we use one and the same value it should work. The value > will rarely be incorrect and when it is, automatic reallocation will > occur. > You're right, if reallocations are OK in all paths. We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead of a [read once] pointer to values. Thats a bit complex change, but doable. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote: > > You're right, if reallocations are OK in all paths. If it wasn't OK then making needed_headroom constant won't work anyway. > We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / > LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead > of a [read once] pointer to values. I'm not sure what you mean here. I don't see any need to change these macros. All we need is to save the value in a local variable: hh_len = LL_RESERVED_SPACE(dev); skb = alloc_skb(hh_len + len); skb_reserve(skb, hh_len); Cheers,
Le mardi 18 octobre 2011 à 12:45 +0200, Herbert Xu a écrit : > On Tue, Oct 18, 2011 at 12:23:43PM +0200, Eric Dumazet wrote: > > > > You're right, if reallocations are OK in all paths. > > If it wasn't OK then making needed_headroom constant won't work > anyway. > > > We'll need to change LL_RESERVED_SPACE() / LL_RESERVED_SPACE_EXTRA() / > > LL_ALLOCATED_SPACE() macros and provide the [read once] values, instead > > of a [read once] pointer to values. > > I'm not sure what you mean here. I don't see any need to change > these macros. All we need is to save the value in a local variable: > > hh_len = LL_RESERVED_SPACE(dev); > > skb = alloc_skb(hh_len + len); > skb_reserve(skb, hh_len); > Not really Herbert. Please read again my patch changelog. In the bug we try to fix, we have : skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) ... < increase of dev->needed_headroom by another cpu/task > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb_put() -> crash because we reserved too much space So we really want LL_ALLOCATED_SPACE() and LL_RESERVED_SPACE() use the same needed_headroom, or else you can have LL_RESERVED_SPACE() > LL_ALLOCATED_SPACE(). There are several way to fix this, but this kind of code assumed the dev->needed... values were consistent for the whole block. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 18, 2011 at 01:37:58PM +0200, Eric Dumazet wrote: > > In the bug we try to fix, we have : > > skb = sock_alloc_send_skb(sk, ... + LL_ALLOCATED_SPACE(rt->dst.dev) > > ... < increase of dev->needed_headroom by another cpu/task > > > skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); OK, in that case one fix would be to replace LL_ALLOCATED_SPACE with its two constiuents so that they may be stored in local variables for later use. hlen = LL_HEADROOM(skb); tlen = LL_TAILROOM(skb); skb_alloc_send_skb(sk, ... + LL_ALIGN(hlen + tlen)); skb_reserve(skb, LL_ALIGN(hlen)); Cheers,
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 8871067..1505dcf 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -835,8 +835,6 @@ static netdev_tx_t ipgre_tunnel_xmit(struct sk_buff *skb, struct net_device *dev if (skb_headroom(skb) < max_headroom || skb_shared(skb)|| (skb_cloned(skb) && !skb_clone_writable(skb, 0))) { struct sk_buff *new_skb = skb_realloc_headroom(skb, max_headroom); - if (max_headroom > dev->needed_headroom) - dev->needed_headroom = max_headroom; if (!new_skb) { ip_rt_put(rt); dev->stats.tx_dropped++;