Message ID | u0QFePiYSfxBeUsNVFRhPjsGViwg-pXLIApJaVLdUICuvLTQg5y5-rdNhh9lPcDsyO24c7wXxy5m6b6dK0aB6kqR0ypk8X9ekiLe3NQ3ICY=@protonmail.com |
---|---|
State | Changes Requested |
Delegated to: | David Miller |
Headers | show |
Series | fragment: Improved handling of incorrect IP fragments | expand |
On Tue, 31 Dec 2019 01:19:27 +0000 Ttttabcd <ttttabcd@protonmail.com> wrote: > @@ -267,8 +278,6 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, > payload_len = ((skb->data - skb_network_header(skb)) - > sizeof(struct ipv6hdr) + fq->q.len - > sizeof(struct frag_hdr)); > - if (payload_len > IPV6_MAXPLEN) > - goto out_oversize; You can not safely drop this check. With recursive fragmentation it is possible that the initial payload ends up exceeding the maximum packet length.
> You can not safely drop this check. > With recursive fragmentation it is possible that the initial payload ends > up exceeding the maximum packet length. Can you give an example? What is "recursive fragmentation"? In my previous tests, all fragment packets with a payload length exceeding 65535 will be in the ip6_frag_queue if ((unsigned int) end> IPV6_MAXPLEN) Was discarded.
On Fri, 03 Jan 2020 00:44:30 +0000 Ttttabcd <ttttabcd@protonmail.com> wrote: > > You can not safely drop this check. > > With recursive fragmentation it is possible that the initial payload ends > > up exceeding the maximum packet length. > > Can you give an example? What is "recursive fragmentation"? > > In my previous tests, all fragment packets with a payload length exceeding 65535 will be in the ip6_frag_queue > > if ((unsigned int) end> IPV6_MAXPLEN) > > Was discarded. > > I get wary of any changes to fragmentation code. It has a long history of bugs and is complex. See recent FragSmack for some backstory. You need to split IPv4 and IPv6 parts into two different patches. In the IPv4 part, you dropped the test for oversize IPv4 packet. With raw packet tools it is possible to generate a packet that reassembles into a packet larger than 64K. An example is: $ tshark -r oversize-ipv4.pcap 1 0.000000 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=0, ID=9b39) 2 0.001615 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=1440, ID=9b39) 3 0.004115 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=2920, ID=9b39) 4 0.006502 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=4400, ID=9b39) 5 0.008819 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=5880, ID=9b39) 6 0.011178 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=7360, ID=9b39) 7 0.013465 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=8840, ID=9b39) 8 0.016040 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=10320, ID=9b39) 9 0.018369 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=11800, ID=9b39) 10 0.020679 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=13280, ID=9b39) 11 0.022965 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=14760, ID=9b39) 12 0.025186 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=16240, ID=9b39) 13 0.027277 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=17720, ID=9b39) 14 0.028917 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=19200, ID=9b39) 15 0.030832 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=20680, ID=9b39) 16 0.032232 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=22160, ID=9b39) 17 0.033742 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=23640, ID=9b39) 18 0.035106 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=25120, ID=9b39) 19 0.036736 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=26600, ID=9b39) 20 0.037728 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=28080, ID=9b39) 21 0.038983 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=29560, ID=9b39) 22 0.040007 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=31040, ID=9b39) 23 0.041459 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=32520, ID=9b39) 24 0.042833 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=34000, ID=9b39) 25 0.044030 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=35480, ID=9b39) 26 0.044909 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=36960, ID=9b39) 27 0.045921 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=38440, ID=9b39) 28 0.046767 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=39920, ID=9b39) 29 0.047581 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=41400, ID=9b39) 30 0.048610 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=42880, ID=9b39) 31 0.049323 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=44360, ID=9b39) 32 0.050102 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=45840, ID=9b39) 33 0.051014 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=47320, ID=9b39) 34 0.051787 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=48800, ID=9b39) 35 0.052576 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=50280, ID=9b39) 36 0.053448 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=51760, ID=9b39) 37 0.054260 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=53240, ID=9b39) 38 0.055036 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=54720, ID=9b39) 39 0.055823 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=56200, ID=9b39) 40 0.056614 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=57680, ID=9b39) 41 0.057512 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=59160, ID=9b39) 42 0.058313 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=60640, ID=9b39) 43 0.059073 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=62120, ID=9b39) 44 0.059945 127.0.0.1 → 127.0.0.1 IPv4 1514 Fragmented IP protocol (proto=TCP 6, off=63600, ID=9b39) 45 0.060705 127.0.0.1 → 127.0.0.1 TCP 469 16705 → 16705 [FIN, ECN, NS] Seq=1 Win=16705, bogus TCP header length (16, must be at least 20) With current (correct) Linux kernel code this gets reassembled and dropped. As seen in dmesg log and statistics. With your Ipv4 patch the oversize packet gets passed on up the stack. Testing this stuff is hard, it requires packet hacker tools.
> With current (correct) Linux kernel code this gets reassembled and dropped. > As seen in dmesg log and statistics. > > With your Ipv4 patch the oversize packet gets passed on up the stack. > > Testing this stuff is hard, it requires packet hacker tools. I know what you mean. What you are talking about is a "ping of death" attack. The use of fragments to construct packets longer than 65535 made the system buffer overflow and crash. This situation has been considered in my code. In the original logic of IPv6, the judgment of data packets exceeding 65535 is duplicated, and the judgment in IPv4 is too late. I have improved this situation, you can see my explanation of the patch at the beginning. > In both ip6_frag_queue and ip6_frag_reasm, it is checked whether it is an > Oversized IPv6 packet, which is duplicated. The original code logic will > only be processed in ip6_frag_queue. The code of ip6_frag_reasm will not > be executed. Now change it to only process in ip6_frag_queue and output > the prompt information. > I also made similar changes in IPv4 fragmentation processing. > > It is not good to use 65535 values directly, > I added the IPV4_MAX_TOT_LEN macro. > > The oversized check in IPv4 fragment processing is in the ip_frag_reasm > of the reassembly fragment. This is too late. The incorrect IP fragment > has been inserted into the fragment queue. I modified it in ip_frag_queue. > I changed the original net_info_ratelimited to net_dbg_ratelimited to make > the debugging information more controllable. @@ -300,6 +300,12 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) end = offset + skb->len - skb_network_offset(skb) - ihl; err = -EINVAL; + if ((unsigned int)end + ihl > IPV4_MAX_TOT_LEN) { + net_dbg_ratelimited("ip_frag_queue: Oversized IP packet from %pI4, end = %d\n", + &qp->q.key.v4.saddr, end); + goto discard_qp; + } + @@ -121,11 +121,10 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); if ((unsigned int)end > IPV6_MAXPLEN) { - *prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); - /* note that if prob_offset is set, the skb is freed elsewhere, - * we do not free it here. - */ - return -1; + prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); + net_dbg_ratelimited("ip6_frag_queue: Oversized IPv6 packet from %pI6c, end = %d\n", + &fq->q.key.v6.saddr, end); + goto send_param_prob; } As long as the IP fragment length exceeds 65535, I will discard the entire fragment queue.
> You need to split IPv4 and IPv6 parts into two different patches.
Forgot to ask, is it necessary to divide this patch into two?
diff --git a/include/net/ip.h b/include/net/ip.h index 5b317c9f4470..43e9dc51852b 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -34,6 +34,8 @@ #define IPV4_MAX_PMTU 65535U /* RFC 2675, Section 5.1 */ #define IPV4_MIN_MTU 68 /* RFC 791 */ +#define IPV4_MAX_TOT_LEN 65535 + extern unsigned int sysctl_fib_sync_mem; extern unsigned int sysctl_fib_sync_mem_min; extern unsigned int sysctl_fib_sync_mem_max; diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index cfeb8890f94e..baee3383d0ac 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -300,6 +300,12 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) end = offset + skb->len - skb_network_offset(skb) - ihl; err = -EINVAL; + if ((unsigned int)end + ihl > IPV4_MAX_TOT_LEN) { + net_dbg_ratelimited("ip_frag_queue: Oversized IP packet from %pI4, end = %d\n", + &qp->q.key.v4.saddr, end); + goto discard_qp; + } + /* Is this the final fragment? */ if ((flags & IP_MF) == 0) { /* If we already have some bits beyond end @@ -311,11 +317,10 @@ static int ip_frag_queue(struct ipq *qp, struct sk_buff *skb) qp->q.flags |= INET_FRAG_LAST_IN; qp->q.len = end; } else { - if (end&7) { - end &= ~7; - if (skb->ip_summed != CHECKSUM_UNNECESSARY) - skb->ip_summed = CHECKSUM_NONE; - } + /* Check if the fragment is rounded to 8 bytes. */ + if (end & 0x7) + goto discard_qp; + if (end > qp->q.len) { /* Some bits beyond end -> corruption. */ if (qp->q.flags & INET_FRAG_LAST_IN) @@ -423,8 +428,6 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, len = ip_hdrlen(skb) + qp->q.len; err = -E2BIG; - if (len > 65535) - goto out_oversize; inet_frag_reasm_finish(&qp->q, skb, reasm_data, ip_frag_coalesce_ok(qp)); @@ -462,9 +465,6 @@ static int ip_frag_reasm(struct ipq *qp, struct sk_buff *skb, out_nomem: net_dbg_ratelimited("queue_glue: no memory for gluing queue %p\n", qp); err = -ENOMEM; - goto out_fail; -out_oversize: - net_info_ratelimited("Oversized IP packet from %pI4\n", &qp->q.key.v4.saddr); out_fail: __IP_INC_STATS(net, IPSTATS_MIB_REASMFAILS); return err; diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c index 1f5d4d196dcc..302bf6c26c45 100644 --- a/net/ipv6/reassembly.c +++ b/net/ipv6/reassembly.c @@ -102,13 +102,13 @@ fq_find(struct net *net, __be32 id, const struct ipv6hdr *hdr, int iif) } static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, - struct frag_hdr *fhdr, int nhoff, - u32 *prob_offset) + struct frag_hdr *fhdr, int nhoff) { struct net *net = dev_net(skb_dst(skb)->dev); int offset, end, fragsize; struct sk_buff *prev_tail; struct net_device *dev; + u32 prob_offset = 0; int err = -ENOENT; u8 ecn; @@ -121,11 +121,10 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, ((u8 *)(fhdr + 1) - (u8 *)(ipv6_hdr(skb) + 1))); if ((unsigned int)end > IPV6_MAXPLEN) { - *prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); - /* note that if prob_offset is set, the skb is freed elsewhere, - * we do not free it here. - */ - return -1; + prob_offset = (u8 *)&fhdr->frag_off - skb_network_header(skb); + net_dbg_ratelimited("ip6_frag_queue: Oversized IPv6 packet from %pI6c, end = %d\n", + &fq->q.key.v6.saddr, end); + goto send_param_prob; } ecn = ip6_frag_ecn(ipv6_hdr(skb)); @@ -155,8 +154,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, /* RFC2460 says always send parameter problem in * this case. -DaveM */ - *prob_offset = offsetof(struct ipv6hdr, payload_len); - return -1; + prob_offset = offsetof(struct ipv6hdr, payload_len); + goto send_param_prob; } if (end > fq->q.len) { /* Some bits beyond end -> corruption. */ @@ -236,6 +235,18 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb, err: kfree_skb(skb); return err; +send_param_prob: + __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), + IPSTATS_MIB_INHDRERRORS); + /* icmpv6_param_prob() calls kfree_skb(skb), + * and the label err also calls kfree_skb(skb), + * so skb_get(skb) here increases the reference count + * to avoid duplicate release + */ + skb_get(skb); + icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, prob_offset); + err = -1; + goto discard_fq; } /* @@ -267,8 +278,6 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, payload_len = ((skb->data - skb_network_header(skb)) - sizeof(struct ipv6hdr) + fq->q.len - sizeof(struct frag_hdr)); - if (payload_len > IPV6_MAXPLEN) - goto out_oversize; /* We have to remove fragment header from datagram and to relocate * header in order to calculate ICV correctly. */ @@ -303,9 +312,6 @@ static int ip6_frag_reasm(struct frag_queue *fq, struct sk_buff *skb, fq->q.last_run_head = NULL; return 1; -out_oversize: - net_dbg_ratelimited("ip6_frag_reasm: payload len = %d\n", payload_len); - goto out_fail; out_oom: net_dbg_ratelimited("ip6_frag_reasm: no memory for reassembly\n"); out_fail: @@ -354,23 +360,14 @@ static int ipv6_frag_rcv(struct sk_buff *skb) iif = skb->dev ? skb->dev->ifindex : 0; fq = fq_find(net, fhdr->identification, hdr, iif); if (fq) { - u32 prob_offset = 0; int ret; - spin_lock(&fq->q.lock); fq->iif = iif; - ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff, - &prob_offset); + ret = ip6_frag_queue(fq, skb, fhdr, IP6CB(skb)->nhoff); spin_unlock(&fq->q.lock); inet_frag_put(&fq->q); - if (prob_offset) { - __IP6_INC_STATS(net, __in6_dev_get_safely(skb->dev), - IPSTATS_MIB_INHDRERRORS); - /* icmpv6_param_prob() calls kfree_skb(skb) */ - icmpv6_param_prob(skb, ICMPV6_HDR_FIELD, prob_offset); - } return ret; }
In both ip6_frag_queue and ip6_frag_reasm, it is checked whether it is an Oversized IPv6 packet, which is duplicated. The original code logic will only be processed in ip6_frag_queue. The code of ip6_frag_reasm will not be executed. Now change it to only process in ip6_frag_queue and output the prompt information. In ip6_frag_queue, set the prob_offset pointer to notify ipv6_frag_rcv to send the ICMPv6 parameter problem message. The logic is not concise, I merged the part that sent ICMPv6 messages into ip6_frag_queue. In the original logic, receiving oversized IPv6 fragments and receiving IPv6 fragments whose end is not an integral multiple of 8 bytes both returns -1. This is inconsistent with other incorrect IPv6 fragmentation processing. For example, in other logic, end == offset will goto discard_fq directly. I think that receiving any incorrect IPv6 fragment means that the fragment processing has failed as a whole, and the data carried in the fragments is likely to be wrong. I think we should also execute goto discard_fq to release this fragments queue. Goto discard_fq at the end of the label send_param_prob, release the fragment queue. Since icmpv6_param_prob will call kfree_skb, it will also be kfree_skb in the label err, which will cause repeated free, so I added skb_get. I also made similar changes in IPv4 fragmentation processing. It is not good to use 65535 values directly, I added the IPV4_MAX_TOT_LEN macro. The oversized check in IPv4 fragment processing is in the ip_frag_reasm of the reassembly fragment. This is too late. The incorrect IP fragment has been inserted into the fragment queue. I modified it in ip_frag_queue. I changed the original net_info_ratelimited to net_dbg_ratelimited to make the debugging information more controllable. I also modified goto discard_qp directly when the end is not an integer multiple of 8 bytes. Signed-off-by: AK Deng <ttttabcd@protonmail.com> --- include/net/ip.h | 2 ++ net/ipv4/ip_fragment.c | 20 +++++++++---------- net/ipv6/reassembly.c | 45 ++++++++++++++++++++---------------------- 3 files changed, 33 insertions(+), 34 deletions(-) -- 2.24.0