From patchwork Fri Aug 10 16:22:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Oskolkov X-Patchwork-Id: 956332 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=google.com header.i=@google.com header.b="MReXEw4s"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 41n9Qj6sbjz9s0n for ; Sat, 11 Aug 2018 02:23:09 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728843AbeHJSxj (ORCPT ); Fri, 10 Aug 2018 14:53:39 -0400 Received: from mail-qk0-f201.google.com ([209.85.220.201]:43291 "EHLO mail-qk0-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727604AbeHJSxj (ORCPT ); Fri, 10 Aug 2018 14:53:39 -0400 Received: by mail-qk0-f201.google.com with SMTP id u22-v6so9727580qkk.10 for ; Fri, 10 Aug 2018 09:23:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:message-id:mime-version:subject:from:to:cc; bh=kbUQ5NvY8Z5ZzTsKj2Yq8/WqA2E8+CeuNBmo+TGRur4=; b=MReXEw4sXvOJnKEOTrMcmn04/JxGwKpRhpkTyCZJhzniYSm5FnNhOKkxpZoc46B1Bh eQCXsPWp2hq+QyBPCSTakp5Q/CcWiIB6vgZmJWr+sC3T8kcVL+/oLYo/rvpF4vvdFH/g Cc3HhJqvgZ7kwxln9ME1jrtEJ9zoigqZjrcvpVbJnvxI53li3h7RcM4BTvykYyknNzuI GHeFG3bi5waq4Plx6scSPGbyBfp2sE41nQnmNbtB8uIsNtOiXH5xpO5i7yhtGo96pp0X wbXB0ZUJPg62J07RRSvn5yfvR+pZBoTHWrlKgS50FhBjymi9Euj8KasALhL32aNsQuBn 4v0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:message-id:mime-version:subject:from:to:cc; bh=kbUQ5NvY8Z5ZzTsKj2Yq8/WqA2E8+CeuNBmo+TGRur4=; b=nI17r3VmnXv4t4+HxdvwFHFlqiumw3qHpHp+yHPW4EYb6kaEphaEbmBBNLj0hC9S6L 4M6/A6Tt+flIKVBWSokxAvaof8Ku8bM71JkIOMjtvdCLJ5lUnZ+YWA2qU1pdD8mSknXp RbClglaJl4BPZGQdLLu3WWo9p2ATp73HugIIanQwjrIcPZxzxTfxPzMSB/G8hSN1X/Wf pQZ1Qre8qGLyGhjwYqd8Kguy6KEOOqWb8gGXDvKK9TGFjJZmhydzVX8Ioo4IzcKkfyCH 3XZXYhAYKWHfusDWw6d/iTazDFI1jvOF6Qa5FtsSFTbLGLVTmimJodSDCm0Tc9Qg8n3p 2gZQ== X-Gm-Message-State: AOUpUlENKi9f9RkIGc2H6IeY1H4/wS527+ov61AwaTSrwvxb4bD+Y8pm AbPLV1FlfZRXLI9MxmsTeZvw+kh1 X-Google-Smtp-Source: AA+uWPzNvkpEwxuzh2qs/uR3f4tRp05vmhKsXspfatmPUj61plp/GtirfISWNgnBowFEY/Qljw0FVmIF X-Received: by 2002:ae9:e116:: with SMTP id g22-v6mr4026637qkm.31.1533918186531; Fri, 10 Aug 2018 09:23:06 -0700 (PDT) Date: Fri, 10 Aug 2018 16:22:45 +0000 Message-Id: <20180810162246.46805-1-posk@google.com> Mime-Version: 1.0 X-Mailer: git-send-email 2.18.0.597.ga71716f1ad-goog Subject: [PATCH net-next 1/2] ip: add helpers to process in-order fragments faster. From: Peter Oskolkov To: David Miller , netdev@vger.kernel.org Cc: Peter Oskolkov , Eric Dumazet , "Cc : Florian Westphal" Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch introduces several helper functions/macros that will be used in the follow-up patch. No runtime changes yet. The new logic (fully implemented in the second patch) is as follows: * Nodes in the rb-tree will now contain not single fragments, but lists of consecutive fragments ("runs"). * At each point in time, the current "active" run at the tail is maintained/tracked. Fragments that arrive in-order, adjacent to the previous tail fragment, are added to this tail run without triggering the re-balancing of the rb-tree. * If a fragment arrives out of order with the offset _before_ the tail run, it is inserted into the rb-tree as a single fragment. * If a fragment arrives after the current tail fragment (with a gap), it starts a new "tail" run, as is inserted into the rb-tree at the end as the head of the new run. skb->cb is used to store additional information needed here (suggested by Eric Dumazet). Reported-by: Willem de Bruijn Cc: Eric Dumazet Cc: Cc: Florian Westphal --- include/net/inet_frag.h | 6 ++++ net/ipv4/ip_fragment.c | 73 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 79 insertions(+) diff --git a/include/net/inet_frag.h b/include/net/inet_frag.h index b86d14528188..1662cbc0b46b 100644 --- a/include/net/inet_frag.h +++ b/include/net/inet_frag.h @@ -57,7 +57,9 @@ struct frag_v6_compare_key { * @lock: spinlock protecting this frag * @refcnt: reference count of the queue * @fragments: received fragments head + * @rb_fragments: received fragments rb-tree root * @fragments_tail: received fragments tail + * @last_run_head: the head of the last "run". see ip_fragment.c * @stamp: timestamp of the last received fragment * @len: total length of the original datagram * @meat: length of received fragments so far @@ -78,6 +80,7 @@ struct inet_frag_queue { struct sk_buff *fragments; /* Used in IPv6. */ struct rb_root rb_fragments; /* Used in IPv4. */ struct sk_buff *fragments_tail; + struct sk_buff *last_run_head; ktime_t stamp; int len; int meat; @@ -113,6 +116,9 @@ void inet_frag_kill(struct inet_frag_queue *q); void inet_frag_destroy(struct inet_frag_queue *q); struct inet_frag_queue *inet_frag_find(struct netns_frags *nf, void *key); +/* Free all skbs in the queue; return the sum of their truesizes. */ +unsigned int inet_frag_rbtree_purge(struct rb_root *root); + static inline void inet_frag_put(struct inet_frag_queue *q) { if (refcount_dec_and_test(&q->refcnt)) diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c index 7cb7ed761d8c..26ace9d2d976 100644 --- a/net/ipv4/ip_fragment.c +++ b/net/ipv4/ip_fragment.c @@ -57,6 +57,57 @@ */ static const char ip_frag_cache_name[] = "ip4-frags"; +/* Use skb->cb to track consecutive/adjacent fragments coming at + * the end of the queue. Nodes in the rb-tree queue will + * contain "runs" of one or more adjacent fragments. + * + * Invariants: + * - next_frag is NULL at the tail of a "run"; + * - the head of a "run" has the sum of all fragment lengths in frag_run_len. + */ +struct ipfrag_skb_cb { + struct inet_skb_parm h; + struct sk_buff *next_frag; + int frag_run_len; +}; + +#define FRAG_CB(skb) ((struct ipfrag_skb_cb *)((skb)->cb)) + +static void ip4_frag_init_run(struct sk_buff *skb) +{ + BUILD_BUG_ON(sizeof(struct ipfrag_skb_cb) > sizeof(skb->cb)); + + FRAG_CB(skb)->next_frag = NULL; + FRAG_CB(skb)->frag_run_len = skb->len; +} + +/* Append skb to the last "run". */ +static void ip4_frag_append_to_last_run(struct inet_frag_queue *q, + struct sk_buff *skb) +{ + RB_CLEAR_NODE(&skb->rbnode); + FRAG_CB(skb)->next_frag = NULL; + + FRAG_CB(q->last_run_head)->frag_run_len += skb->len; + FRAG_CB(q->fragments_tail)->next_frag = skb; + q->fragments_tail = skb; +} + +/* Create a new "run" with the skb. */ +static void ip4_frag_create_run(struct inet_frag_queue *q, struct sk_buff *skb) +{ + if (q->last_run_head) + rb_link_node(&skb->rbnode, &q->last_run_head->rbnode, + &q->last_run_head->rbnode.rb_right); + else + rb_link_node(&skb->rbnode, NULL, &q->rb_fragments.rb_node); + rb_insert_color(&skb->rbnode, &q->rb_fragments); + + ip4_frag_init_run(skb); + q->fragments_tail = skb; + q->last_run_head = skb; +} + /* Describe an entry in the "incomplete datagrams" queue. */ struct ipq { struct inet_frag_queue q; @@ -654,6 +705,28 @@ struct sk_buff *ip_check_defrag(struct net *net, struct sk_buff *skb, u32 user) } EXPORT_SYMBOL(ip_check_defrag); +unsigned int inet_frag_rbtree_purge(struct rb_root *root) +{ + struct rb_node *p = rb_first(root); + unsigned int sum = 0; + + while (p) { + struct sk_buff *skb = rb_entry(p, struct sk_buff, rbnode); + + p = rb_next(p); + rb_erase(&skb->rbnode, root); + while (skb) { + struct sk_buff *next = FRAG_CB(skb)->next_frag; + + sum += skb->truesize; + kfree_skb(skb); + skb = next; + } + } + return sum; +} +EXPORT_SYMBOL(inet_frag_rbtree_purge); + #ifdef CONFIG_SYSCTL static int dist_min;