From patchwork Thu Sep 21 22:43:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Craig Gallek X-Patchwork-Id: 817197 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3xys8m09qnz9s06 for ; Fri, 22 Sep 2017 08:43:35 +1000 (AEST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751860AbdIUWnd (ORCPT ); Thu, 21 Sep 2017 18:43:33 -0400 Received: from mail-qk0-f173.google.com ([209.85.220.173]:53280 "EHLO mail-qk0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751728AbdIUWnc (ORCPT ); Thu, 21 Sep 2017 18:43:32 -0400 Received: by mail-qk0-f173.google.com with SMTP id t184so7225598qke.10 for ; Thu, 21 Sep 2017 15:43:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=C4IZbu+Je4iQRj70CSOZE7rXqqQxCJ2oVFmcVrXL1l4=; b=WejN2HIVbRxfsUGWNFA3fJ4BanoDIZARID77FtGGhUkrp9eKLzfjcEuPAwCIbH+bny dTYJXMB3CsTMKXj0oW06sYzO79GlrVVBORJ8qujfCZknerIijdCnzjqySeHMjFQtdMHb qgIq5J3/pLRzbUp2PZQ6OnkDtceFlhuyFjyvyQYP4Ni/ddUkiO9drf3BbIkdYyBcoUb7 GDTBu82O2kUArNIGi1gXcqiwdVVK4Qs2raQv9Zp7/mbRYyoXd8Rs9QE2U2FG2IviQVLu tCFy720tmbjoJOfNUJDXatxXa9+XeZST42AYKoostXosFqyDLAJbfuTmm6/pMncCTNot yJYg== X-Gm-Message-State: AHPjjUgCwRXqghIC1nzGMPwjbZ8xCG6fVUgvbWomczQ3Cvr2MgHCPuix xpEvSC1KOTKDAliBQAbJ3vup5A== X-Google-Smtp-Source: AOwi7QDYlMkWMoWfeP9j0EpjFdIitmdY0uLFt1gfv4WESkrFEfKrCUpQtQy6mnMuelfZeC05M+FAKg== X-Received: by 10.55.114.2 with SMTP id n2mr5264455qkc.129.1506033811220; Thu, 21 Sep 2017 15:43:31 -0700 (PDT) Received: from monkey.nyc.corp.google.com ([100.101.213.10]) by smtp.gmail.com with ESMTPSA id g132sm1725161qke.11.2017.09.21.15.43.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 21 Sep 2017 15:43:30 -0700 (PDT) From: Craig Gallek To: Daniel Mack , Alexei Starovoitov , Daniel Borkmann , "David S . Miller" Cc: netdev@vger.kernel.org Subject: [PATCH net-next v2] bpf: Optimize lpm trie delete Date: Thu, 21 Sep 2017 18:43:29 -0400 Message-Id: <20170921224329.101928-1-kraigatgoog@gmail.com> X-Mailer: git-send-email 2.14.1.821.g8fa685d3b7-goog Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Craig Gallek Before the delete operator was added, this datastructure maintained an invariant that intermediate nodes were only present when necessary to build the tree. This patch updates the delete operation to reinstate that invariant by removing unnecessary intermediate nodes after a node is removed and thus keeping the tree structure at a minimal size. Suggested-by: Daniel Mack Signed-off-by: Craig Gallek --- v2: I really failed the interview on this one. v1 did not collapse all extra intermediate node possibilities. I believe this one does by additionally tracking node parent information. See comments. kernel/bpf/lpm_trie.c | 71 +++++++++++++++++++++++++++++++-------------------- 1 file changed, 43 insertions(+), 28 deletions(-) diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c index 9d58a576b2ae..34d8a690ea05 100644 --- a/kernel/bpf/lpm_trie.c +++ b/kernel/bpf/lpm_trie.c @@ -394,8 +394,8 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) { struct lpm_trie *trie = container_of(map, struct lpm_trie, map); struct bpf_lpm_trie_key *key = _key; - struct lpm_trie_node __rcu **trim; - struct lpm_trie_node *node; + struct lpm_trie_node __rcu **trim, **trim2; + struct lpm_trie_node *node, *parent; unsigned long irq_flags; unsigned int next_bit; size_t matchlen = 0; @@ -407,31 +407,26 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) raw_spin_lock_irqsave(&trie->lock, irq_flags); /* Walk the tree looking for an exact key/length match and keeping - * track of where we could begin trimming the tree. The trim-point - * is the sub-tree along the walk consisting of only single-child - * intermediate nodes and ending at a leaf node that we want to - * remove. + * track of the path we traverse. We will need to know the node + * we wish to delete, and the slot that points to the node we want + * to delete. We may also need to know the nodes parent and the + * slot that contains it. */ trim = &trie->root; - node = rcu_dereference_protected( - trie->root, lockdep_is_held(&trie->lock)); - while (node) { + trim2 = trim; + parent = NULL; + while ((node = rcu_dereference_protected( + *trim, lockdep_is_held(&trie->lock)))) { matchlen = longest_prefix_match(trie, node, key); if (node->prefixlen != matchlen || node->prefixlen == key->prefixlen) break; + parent = node; + trim2 = trim; next_bit = extract_bit(key->data, node->prefixlen); - /* If we hit a node that has more than one child or is a valid - * prefix itself, do not remove it. Reset the root of the trim - * path to its descendant on our path. - */ - if (!(node->flags & LPM_TREE_NODE_FLAG_IM) || - (node->child[0] && node->child[1])) - trim = &node->child[next_bit]; - node = rcu_dereference_protected( - node->child[next_bit], lockdep_is_held(&trie->lock)); + trim = &node->child[next_bit]; } if (!node || node->prefixlen != key->prefixlen || @@ -442,27 +437,47 @@ static int trie_delete_elem(struct bpf_map *map, void *_key) trie->n_entries--; - /* If the node we are removing is not a leaf node, simply mark it + /* If the node we are removing has two children, simply mark it * as intermediate and we are done. */ - if (rcu_access_pointer(node->child[0]) || + if (rcu_access_pointer(node->child[0]) && rcu_access_pointer(node->child[1])) { node->flags |= LPM_TREE_NODE_FLAG_IM; goto out; } - /* trim should now point to the slot holding the start of a path from - * zero or more intermediate nodes to our leaf node for deletion. + /* If the parent of the node we are about to delete is an intermediate + * node, and the deleted node doesn't have any children, we can delete + * the intermediate parent as well and promote its other child + * up the tree. Doing this maintains the invariant that all + * intermediate nodes have exactly 2 children and that there are no + * unnecessary intermediate nodes in the tree. */ - while ((node = rcu_dereference_protected( - *trim, lockdep_is_held(&trie->lock)))) { - RCU_INIT_POINTER(*trim, NULL); - trim = rcu_access_pointer(node->child[0]) ? - &node->child[0] : - &node->child[1]; + if (parent && (parent->flags & LPM_TREE_NODE_FLAG_IM) && + !node->child[0] && !node->child[1]) { + if (node == rcu_access_pointer(parent->child[0])) + rcu_assign_pointer( + *trim2, rcu_access_pointer(parent->child[1])); + else + rcu_assign_pointer( + *trim2, rcu_access_pointer(parent->child[0])); + kfree_rcu(parent, rcu); kfree_rcu(node, rcu); + goto out; } + /* The node we are removing has either zero or one child. If there + * is a child, move it into the removed node's slot then delete + * the node. Otherwise just clear the slot and delete the node. + */ + if (node->child[0]) + rcu_assign_pointer(*trim, rcu_access_pointer(node->child[0])); + else if (node->child[1]) + rcu_assign_pointer(*trim, rcu_access_pointer(node->child[1])); + else + RCU_INIT_POINTER(*trim, NULL); + kfree_rcu(node, rcu); + out: raw_spin_unlock_irqrestore(&trie->lock, irq_flags);