Message ID | 20130424154800.16883.4797.stgit@dragon |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. > > The problem with commit 5a3da1fe (inet: limit length of fragment queue > hash table bucket lists) is that, once we hit the hash depth limit (of > 128), the we *keep* the existing frag queues, not allowing new frag > queues to be created. Thus, an attacker can effectivly block handling > of fragments for 30 sec (as each frag queue have a timeout of 30 sec) > I do not think its good to revert this patch. It was a step in right direction. Limiting chain length to 128 is good. An attacker can attack a defrag unit, no matter strategy you use, thats why fragments are doomed. The only way to resist to an attack is to have enough memory in defrag unit to accumulate 30 seconds worth of traffic. Thats 30GB of memory if you receive 1GB per second, bit counting the overhead. If the attacker knows the IP address of the user of your defrag unit, you are doomed because IP id are 16bits. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2013-04-24 at 17:00 -0700, Eric Dumazet wrote: > On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > > This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. > > > > The problem with commit 5a3da1fe (inet: limit length of fragment queue > > hash table bucket lists) is that, once we hit the hash depth limit (of > > 128), the we *keep* the existing frag queues, not allowing new frag > > queues to be created. Thus, an attacker can effectivly block handling > > of fragments for 30 sec (as each frag queue have a timeout of 30 sec) > > > > I do not think its good to revert this patch. It was a step in right > direction. But in its current state I consider this patch dangerous. > Limiting chain length to 128 is good. Yes, its good to have a limit on the hash depth, but with current mem limit per netns this creates an unfortunate side-effect. There is a disconnect between the netns mem limits and the global hash table. Even with hash size of 1024, this just postpones the problem. We now need 35 netns instances to reach the point where we block all fragments for 30 sec. Given a min frag size of 1108 bytes: 1024*128*1108 = 145227776 145227776/(4*1024*1024) = 34.62500000000000000000 The reason this is inevitable, is the attackers invalid fragments will never finish (timeout 30 sec), while valid fragments will complete and "exit" the queue, thus the end result is hash bucket is filled with attackers invalid/incomplete fragments. IMHO this is a very dangerous "feature" to support. --Jesper -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> The reason this is inevitable, is the attackers invalid fragments will > never finish (timeout 30 sec), while valid fragments will complete and > "exit" the queue, thus the end result is hash bucket is filled with > attackers invalid/incomplete fragments. IMHO this is a very dangerous > "feature" to support. If you are being attacked, the hash buckets will immediately be filled with invalid packets, no valid packets will enter. If you delete old entries to add new ones then any valid packet will soon be flushed out by invalid ones. The only time you'll manage to assemble anything is if the valid fragments arrive 'back to back'. The only hope is to part-decode initial fragments and decide whether they are valid. David
On Wed, 24 Apr 2013 17:00:30 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > > This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. > > > > The problem with commit 5a3da1fe (inet: limit length of fragment > > queue hash table bucket lists) is that, once we hit the hash depth > > limit (of 128), the we *keep* the existing frag queues, not > > allowing new frag queues to be created. Thus, an attacker can > > effectivly block handling of fragments for 30 sec (as each frag > > queue have a timeout of 30 sec) > > > > I do not think its good to revert this patch. It was a step in right > direction. We need a revert, because we are too close to the merge window, and cannot complete the needed "steps" to make this patch safe, sorry. --Jesper -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2013-05-02 at 09:59 +0200, Jesper Dangaard Brouer wrote: > On Wed, 24 Apr 2013 17:00:30 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > > > On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > > > This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. > > > > > > The problem with commit 5a3da1fe (inet: limit length of fragment > > > queue hash table bucket lists) is that, once we hit the hash depth > > > limit (of 128), the we *keep* the existing frag queues, not > > > allowing new frag queues to be created. Thus, an attacker can > > > effectivly block handling of fragments for 30 sec (as each frag > > > queue have a timeout of 30 sec) > > > > > > > I do not think its good to revert this patch. It was a step in right > > direction. > > We need a revert, because we are too close to the merge window, and > cannot complete the needed "steps" to make this patch safe, sorry. Again, a limit of 128 is totally OK. Its in fact too big. 128 cache misses consume 5 us Allowing a chain being non limited is a more severe bug. Reverting will allow an attacker to consume all your cpu cycles. We changed INETFRAGS_HASHSZ to 1024, so 128*1024 max frags is already a very big limit. No matter what we do, we need to limit both : - Memory consumption - Cpu consumption For people willing to allow more memory to be used, the only way is to resize hash table, or using a bigger INETFRAGS_HASHSZ I do not think there is a hurry, current defrag code is already better than what we had years ago. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 02 May 2013 08:16:41 -0700 Eric Dumazet <eric.dumazet@gmail.com> wrote: > On Thu, 2013-05-02 at 09:59 +0200, Jesper Dangaard Brouer wrote: > > On Wed, 24 Apr 2013 17:00:30 -0700 Eric Dumazet > > <eric.dumazet@gmail.com> wrote: > > > > > On Wed, 2013-04-24 at 17:48 +0200, Jesper Dangaard Brouer wrote: > > > > This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. > > > > > > > > The problem with commit 5a3da1fe (inet: limit length of fragment > > > > queue hash table bucket lists) is that, once we hit the hash > > > > depth limit (of 128), the we *keep* the existing frag queues, > > > > not allowing new frag queues to be created. Thus, an attacker > > > > can effectivly block handling of fragments for 30 sec (as each > > > > frag queue have a timeout of 30 sec) > > > > > > > > > > I do not think its good to revert this patch. It was a step in > > > right direction. > > > > We need a revert, because we are too close to the merge window, and > > cannot complete the needed "steps" to make this patch safe, sorry. > [...] > > For people willing to allow more memory to be used, the only way is to > resize hash table, or using a bigger INETFRAGS_HASHSZ > > I do not think there is a hurry, current defrag code is already better > than what we had years ago. Eric I think we agree that: 1) we need resizing of hash table based on mem limit 2) mem limit per netns "blocks" the hash resize patch Without these two patches/changes, the static 128 depth limit introduces an undocumented limit on the max mem limit setting (/proc/sys/net/ipv4/ipfrag_high_thresh). I think we only disagree on the order of the patches. But lets keep this, because after we have increased hash size (INETFRAGS_HASHSZ) to 1024, we have pushed the "undocumented limit" so-far that is very unlikely to be hit. We would have to start >36 netns instances, all being overloaded with small incomplete fragments at the same time (30 sec time window). --Jesper -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index 6f41b45..eb1d6ee 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -43,13 +43,6 @@ struct inet_frag_queue { #define INETFRAGS_HASHSZ 64 -/* averaged: - * max_depth = default ipfrag_high_thresh / INETFRAGS_HASHSZ / - * rounded up (SKB_TRUELEN(0) + sizeof(struct ipq or - * struct frag_queue)) - */ -#define INETFRAGS_MAXDEPTH 128 - struct inet_frag_bucket { struct hlist_head chain; spinlock_t chain_lock; @@ -89,8 +82,6 @@ int inet_frag_evictor(struct netns_frags *nf, struct inet_frags *f, bool force); struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, struct inet_frags *f, void *key, unsigned int hash) __releases(&f->lock); -void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q, - const char *prefix); static inline void inet_frag_put(struct inet_frag_queue *q, struct inet_frags *f) { diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index e97d66a..cabe3d7 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -21,7 +21,6 @@ #include <linux/rtnetlink.h> #include <linux/slab.h> -#include <net/sock.h> #include <net/inet_frag.h> #include <net/inet_ecn.h> @@ -327,7 +326,6 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, { struct inet_frag_bucket *hb; struct inet_frag_queue *q; - int depth = 0; hb = &f->hash[hash]; @@ -339,26 +337,10 @@ struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, read_unlock(&f->lock); return q; } - depth++; } spin_unlock(&hb->chain_lock); read_unlock(&f->lock); - if (depth <= INETFRAGS_MAXDEPTH) - return inet_frag_create(nf, f, key); - else - return ERR_PTR(-ENOBUFS); + return inet_frag_create(nf, f, key); } EXPORT_SYMBOL(inet_frag_find); - -void inet_frag_maybe_warn_overflow(struct inet_frag_queue *q, - const char *prefix) -{ - static const char msg[] = "inet_frag_find: Fragment hash bucket" - " list length grew over limit " __stringify(INETFRAGS_MAXDEPTH) - ". Dropping fragment.\n"; - - if (PTR_ERR(q) == -ENOBUFS) - LIMIT_NETDEBUG(KERN_WARNING "%s%s", prefix, msg); -} -EXPORT_SYMBOL(inet_frag_maybe_warn_overflow); diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 9385206..cda5514 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -263,11 +263,14 @@ static inline struct ipq *ip_find(struct net *net, struct iphdr *iph, u32 user) hash = ipqhashfn(iph->id, iph->saddr, iph->daddr, iph->protocol); q = inet_frag_find(&net->ipv4.frags, &ip4_frags, &arg, hash); - if (IS_ERR_OR_NULL(q)) { - inet_frag_maybe_warn_overflow(q, pr_fmt()); - return NULL; - } + if (q == NULL) + goto out_nomem; + return container_of(q, struct ipq, q); + +out_nomem: + LIMIT_NETDEBUG(KERN_ERR pr_fmt("ip_frag_create: no memory left !\n")); + return NULL; } /* Is the fragment too far ahead to be part of ipq? */ diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index dffdc1a..7cfa829 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -14,8 +14,6 @@ * 2 of the License, or (at your option) any later version. */ -#define pr_fmt(fmt) "IPv6-nf: " fmt - #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> @@ -189,11 +187,13 @@ static inline struct frag_queue *fq_find(struct net *net, __be32 id, q = inet_frag_find(&net->nf_frag.frags, &nf_frags, &arg, hash); local_bh_enable(); - if (IS_ERR_OR_NULL(q)) { - inet_frag_maybe_warn_overflow(q, pr_fmt()); - return NULL; - } + if (q == NULL) + goto oom; + return container_of(q, struct frag_queue, q); + +oom: + return NULL; } diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index e6e44ce..74505c5 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -26,9 +26,6 @@ * YOSHIFUJI,H. @USAGI Always remove fragment header to * calculate ICV correctly. */ - -#define pr_fmt(fmt) "IPv6: " fmt - #include <linux/errno.h> #include <linux/types.h> #include <linux/string.h> @@ -196,10 +193,9 @@ fq_find(struct net *net, __be32 id, const struct in6_addr *src, hash = inet6_hash_frag(id, src, dst, ip6_frags.rnd); q = inet_frag_find(&net->ipv6.frags, &ip6_frags, &arg, hash); - if (IS_ERR_OR_NULL(q)) { - inet_frag_maybe_warn_overflow(q, pr_fmt()); + if (q == NULL) return NULL; - } + return container_of(q, struct frag_queue, q); }
This reverts commit 5a3da1fe9561828d0ca7eca664b16ec2b9bf0055. The problem with commit 5a3da1fe (inet: limit length of fragment queue hash table bucket lists) is that, once we hit the hash depth limit (of 128), the we *keep* the existing frag queues, not allowing new frag queues to be created. Thus, an attacker can effectivly block handling of fragments for 30 sec (as each frag queue have a timeout of 30 sec) For this situation to occur the mem limit need to increase (from default 4MB per netns). This can either happen by 1) creating more netns (network namespaces) or 2) by manually increasing the mem limits via proc files: /proc/sys/net/ipv4/ipfrag_high_thresh /proc/sys/net/ipv4/ipfrag_low_thresh To be exact, situation occurs when, increasing the thresh to something allowing 128 elements in each bucket, which is not that high given the hash array size of 64 (64*128=8192), e.g. big MTU frags (2944(truesize)+208(ipq))*8192(max elems)=25755648 small frags ( 896(truesize)+208(ipq))*8192(max elems)=9043968 Thus, with small frags we only need to start >=3 netns instances, for the situation to be possible. The reason this is inevitable, is the attackers invalid fragments will never finish (timeout 30 sec), while valid fragments will complete and "exit" the queue, thus the end result is hash bucket is filled with attackers invalid/incomplete fragments. Fixed conflicts in: include/net/inet_frag.h Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> --- include/net/inet_frag.h | 9 --------- net/ipv4/inet_fragment.c | 20 +------------------- net/ipv4/ip_fragment.c | 11 +++++++---- net/ipv6/netfilter/nf_conntrack_reasm.c | 12 ++++++------ net/ipv6/reassembly.c | 8 ++------ 5 files changed, 16 insertions(+), 44 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html