Message ID | 20111019.030914.218521668377472155.davem@davemloft.net |
---|---|
State | Deferred, archived |
Delegated to: | David Miller |
Headers | show |
Le mercredi 19 octobre 2011 à 03:09 -0400, David Miller a écrit : > From: Herbert Xu <herbert@gondor.hengli.com.au> > Date: Tue, 18 Oct 2011 15:45:37 +0200 > > > On Tue, Oct 18, 2011 at 02:56:00PM +0200, Eric Dumazet wrote: > >> > >> I am ok by this way, but we might hit another similar problem elsewhere. > >> > >> (igmp.c ip6_output, ...) > >> > >> We effectively want to remove LL_ALLOCATED_SPACE() usage and obfuscate > >> code... > > > > Here's another idea, provide a helper to do the skb allocation > > and the skb_reserve in one go. That way this ugliness would only > > need to be done once. > > Someone please test this: > > -------------------- > net: Fix crashes on devices which dynamically change needed headroom. > > One such device is IP_GRE. > > The problem is that we evaluate the device characteristics twice, once > to determine the allocation size, and once to do the skb_reserve(). > > Combine these into one operation using a helper function. > > With help from Eric Dumazet and Herbert Xu. > > Reported-by: Reported-by: Elmar Vonlanthen <evonlanthen@gmail.com> > Signed-off-by: David S. Miller <davem@davemloft.net> > Seems fine (Maybe do the +15 in caller site ?), but we also have other problematic cases, using alloc_skb() only... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 19 Oct 2011 09:18:33 +0200 > Seems fine (Maybe do the +15 in caller site ?), but we also have other > problematic cases, using alloc_skb() only... Ok, which ones operate over these problematic GRE tunnels? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le mercredi 19 octobre 2011 à 03:30 -0400, David Miller a écrit : > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 19 Oct 2011 09:18:33 +0200 > > > Seems fine (Maybe do the +15 in caller site ?), but we also have other > > problematic cases, using alloc_skb() only... > > Ok, which ones operate over these problematic GRE tunnels? > I was more thinking of general idea to allow any device to change its needed headroom. For GRE tunnels, I dont think IPv6 fragmentation could be relevant, but maybe IGMP could trigger a problem ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Wed, 19 Oct 2011 09:52:19 +0200 > For GRE tunnels, I dont think IPv6 fragmentation could be relevant, > but maybe IGMP could trigger a problem ? Funny... icmpv6 uses ip6_append_data() which uses hh_len in a local variable, which seems to suggest that it's immune to this problem. IPV4 side seems identical in this regard. So, as far as I can see, my patch is sufficient to cover IP_GRE reasonably in the 'net' tree. Agreed? If so, someone please test this thing :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 19, 2011 at 03:30:52AM -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Wed, 19 Oct 2011 09:18:33 +0200 > > > Seems fine (Maybe do the +15 in caller site ?), but we also have other > > problematic cases, using alloc_skb() only... > > Ok, which ones operate over these problematic GRE tunnels? Potentially all of them since we now support Ethernet-over-GRE. I think Eric's initial patch is probably the safest bet for rc10. We can then work on the proper fix for the next release. As to the latter, I've just done a grep over net and it seems that all users of LL_ALLOCATED_SPACE fall into two cases, alloc_skb users or sock_alloc_send_skb (including pskb) users. For alloc_skb we could add a new helper. While for the other case we could either create a new helper or just add an extra dev argument that may be NULL for those that don't care about LL_ALLOCATED_SPACE. Cheers,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Wed, 19 Oct 2011 10:08:07 +0200 > I think Eric's initial patch is probably the safest bet for rc10. > We can then work on the proper fix for the next release. There are two "initial patch", I wonder which one you mean. There's his really first patch, which remoevs the lines in IP_GRE which change dev->needed_headroom. I was under the impression we were against doing that. The other patch he posted duplicates the device attribute variable caching in two functions. My patch is just a tweak so that we only do this sequence in one place, the new sock_alloc_send_skb_reserve() helper, instead of in both the ipv4 and ipv6 RAW code. So I'm a little confused what your suggestion for rc10 really is :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 20, 2011 at 05:30:50AM -0400, David Miller wrote: > > So I'm a little confused what your suggestion for rc10 really > is :-) I meant his first initial patch :) While it is suboptimal in the sense that should the value of needed_headroom increase we'll end up constantly reallocating skbs, I believe that it is at least semantically correct. In the time being I'll look more closely at all the users of needed_headroom to see if there's anything we've missed. Thanks,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Thu, 20 Oct 2011 11:35:41 +0200 > On Thu, Oct 20, 2011 at 05:30:50AM -0400, David Miller wrote: >> >> So I'm a little confused what your suggestion for rc10 really >> is :-) > > I meant his first initial patch :) > > While it is suboptimal in the sense that should the value of > needed_headroom increase we'll end up constantly reallocating > skbs, I believe that it is at least semantically correct. Ok, I applied Eric's patch which removes the dynamic changing of the needed_headroom in IP_GRE. Thanks everyone! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Oct 20, 2011 at 11:35:41AM +0200, Herbert Xu wrote: > > In the time being I'll look more closely at all the users of > needed_headroom to see if there's anything we've missed. OK I've reviewed all the users of needed_headroom and I haven't found anything other than the cases that we have enumerated. One thing I noticed is that the macro LL_ALLOCATED_SPACE is completely bogus and buggy. It applies the alignment to the sum of headroom and tailroom. However, the alignment is then applied separately to the headroom when reserving, meaning that we may end up with insufficient tailroom. So I'm going to get rid of LL_ALLOCATED_SPACE completely and replace it with explicit references to the tailroom as it doesn't need the alignment anyway (The headroom needs alignment since we use it to ensure the head is aligned). Cheers,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Tue, 25 Oct 2011 13:54:25 +0200 > So I'm going to get rid of LL_ALLOCATED_SPACE completely and > replace it with explicit references to the tailroom as it doesn't > need the alignment anyway (The headroom needs alignment since > we use it to ensure the head is aligned). Ok. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Oct 25, 2011 at 11:12:04PM -0400, David Miller wrote: > From: Herbert Xu <herbert@gondor.hengli.com.au> > Date: Tue, 25 Oct 2011 13:54:25 +0200 > > > So I'm going to get rid of LL_ALLOCATED_SPACE completely and > > replace it with explicit references to the tailroom as it doesn't > > need the alignment anyway (The headroom needs alignment since > > we use it to ensure the head is aligned). > > Ok. Here are the patches that do this. I also picked up one spot that should have used LL_ALLOCATED_SPACE but did not (see 2nd last patch). Cheers,
From: Herbert Xu <herbert@gondor.hengli.com.au> Date: Fri, 18 Nov 2011 20:18:32 +0800 > On Tue, Oct 25, 2011 at 11:12:04PM -0400, David Miller wrote: >> From: Herbert Xu <herbert@gondor.hengli.com.au> >> Date: Tue, 25 Oct 2011 13:54:25 +0200 >> >> > So I'm going to get rid of LL_ALLOCATED_SPACE completely and >> > replace it with explicit references to the tailroom as it doesn't >> > need the alignment anyway (The headroom needs alignment since >> > we use it to ensure the head is aligned). >> >> Ok. > > Here are the patches that do this. I also picked up one spot > that should have used LL_ALLOCATED_SPACE but did not (see 2nd > last patch). This all looks good to me, applied to net-next, thanks! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ddee79b..d4abb400 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -276,12 +276,22 @@ struct hh_cache { * LL_ALLOCATED_SPACE also takes into account the tailroom the device * may need. */ +#define LL_ALIGN(__len) (((__len)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + +#define __LL_RESERVED_SPACE(__hard_hdr_len, __needed_headroom) \ + LL_ALIGN((__hard_hdr_len) + (__needed_headroom)) + #define LL_RESERVED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + __LL_RESERVED_SPACE((dev)->hard_header_len, (dev)->needed_headroom)) + #define LL_RESERVED_SPACE_EXTRA(dev,extra) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(extra))&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + LL_ALIGN((dev)->hard_header_len + (dev)->needed_headroom + (extra)) + +#define __LL_ALLOCATED_SPACE(__hard_hdr_len, __needed_headroom, __needed_tailroom) \ + LL_ALIGN((__hard_hdr_len) + (__needed_headroom) + (__needed_tailroom)) + #define LL_ALLOCATED_SPACE(dev) \ - ((((dev)->hard_header_len+(dev)->needed_headroom+(dev)->needed_tailroom)&~(HH_DATA_MOD - 1)) + HH_DATA_MOD) + __LL_ALLOCATED_SPACE((dev)->hard_header_len, (dev)->needed_headroom, (dev)->needed_tailroom) struct header_ops { int (*create) (struct sk_buff *skb, struct net_device *dev, diff --git a/include/net/sock.h b/include/net/sock.h index 8e4062f..9d96995 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -1103,6 +1103,11 @@ extern struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, int noblock, int *errcode); +extern struct sk_buff *sock_alloc_send_skb_reserve(struct sock *sk, + struct net_device *dev, + unsigned long base_size, + int noblock, + int *errcode); extern struct sk_buff *sock_alloc_send_pskb(struct sock *sk, unsigned long header_len, unsigned long data_len, diff --git a/net/core/sock.c b/net/core/sock.c index bc745d0..9cefb72 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1586,6 +1586,26 @@ struct sk_buff *sock_alloc_send_skb(struct sock *sk, unsigned long size, } EXPORT_SYMBOL(sock_alloc_send_skb); +struct sk_buff *sock_alloc_send_skb_reserve(struct sock *sk, struct net_device *dev, + unsigned long base_size, int noblock, + int *errcode) +{ + unsigned int hh_len = dev->hard_header_len; + unsigned int needed_hroom = dev->needed_headroom; + unsigned int needed_troom = dev->needed_tailroom; + unsigned int reserve, lls; + struct sk_buff *skb; + + lls = __LL_ALLOCATED_SPACE(hh_len, needed_hroom, needed_troom); + skb = sock_alloc_send_skb(sk, (base_size + lls + 15), noblock, errcode); + if (skb) { + reserve = __LL_RESERVED_SPACE(hh_len, needed_hroom); + skb_reserve(skb, reserve); + } + return skb; +} +EXPORT_SYMBOL(sock_alloc_send_skb_reserve); + static void __lock_sock(struct sock *sk) __releases(&sk->sk_lock.slock) __acquires(&sk->sk_lock.slock) diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 61714bd..dcecb97 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -335,12 +335,11 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, if (flags&MSG_PROBE) goto out; - skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, - flags & MSG_DONTWAIT, &err); + skb = sock_alloc_send_skb_reserve(sk, rt->dst.dev, length, + flags & MSG_DONTWAIT, + &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark; diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 343852e..5a8e141 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -618,12 +618,11 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, int length, if (flags&MSG_PROBE) goto out; - skb = sock_alloc_send_skb(sk, - length + LL_ALLOCATED_SPACE(rt->dst.dev) + 15, - flags & MSG_DONTWAIT, &err); + skb = sock_alloc_send_skb_reserve(sk, rt->dst.dev, length, + flags & MSG_DONTWAIT, + &err); if (skb == NULL) goto error; - skb_reserve(skb, LL_RESERVED_SPACE(rt->dst.dev)); skb->priority = sk->sk_priority; skb->mark = sk->sk_mark;