From patchwork Sun Feb 25 19:47:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Ahern X-Patchwork-Id: 877605 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="EXectR0w"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3zqFqb1B1nz9s2R for ; Mon, 26 Feb 2018 06:47:55 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752005AbeBYTrx (ORCPT ); Sun, 25 Feb 2018 14:47:53 -0500 Received: from mail-pg0-f66.google.com ([74.125.83.66]:44619 "EHLO mail-pg0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751945AbeBYTrl (ORCPT ); Sun, 25 Feb 2018 14:47:41 -0500 Received: by mail-pg0-f66.google.com with SMTP id l4so5378538pgp.11 for ; Sun, 25 Feb 2018 11:47:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=NpTinmsKfrjysowiE/3/QIPW/+3rstMCwyL6V+H+B8U=; b=EXectR0we/zYpFGnZFqS4pMO49CL4A7PfSvBuaSzHU3QHMmshCQRyg14JzP8dKswye cUohDN7FzdrnODZIUGyBK+dHzrJES1mzwFMYhk3nd2j3fCrqhRBUB49GWxinCxSf11ia FueltDLUhdpbqV4x3oapNPUxcmK8UDJqhDm76dGANqi9hF25NrItJI4Np4D0cR2ki5Ia u6kOGZaWEm4nB3HMFnLwTWuamMb4PgtNgQsPNySrO0N/QiTqQEhXWhJQfOTFh2VWD/Ii ugJq482xkzneHSI0K+hDLhWbQEU3HJbexmwxnkq3zh2oYErNzpmmYlKhZeBC+YVDVF9H SCaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=NpTinmsKfrjysowiE/3/QIPW/+3rstMCwyL6V+H+B8U=; b=GScufXw5BuFEdMwM31UAVlrPJIp1t/YMVjanpJvbqYQt0qy6w2NlVIFKeKuBmC8gi8 oFrSUUXhE8+aGHyZFeP/aDhUEHRyu8KbIzgDqrqCgYawiTpsVAHIbFDB6as0gsuTsPvs 5YbSpxz608445Fv/j/o3gluJ11BogrDsvz7P99dLJC6eKrSrZaSo1jnRdXMwypl87Nnc 8puNSCOJAp/ZlYWVpX/1ubkIUywl1B3VhnxOgC12qV3Hn8PoFQhCjMu7y4eW7+toyUaH bmm9m2JBZ3ht1Qti4LfKWkB9k3lzbda1TLUJ8HyuXgC+LIoAoZlVVcONxe9leWkDw3fz NHMg== X-Gm-Message-State: APf1xPBZRK/xN1e+cmLa38/judaFMJP3VXdkuujHDk+ZyFEatFDxAbix BfReD+BVOHhlj+LrpjE5g4+EZQ== X-Google-Smtp-Source: AH8x2275gMJYVCWCprBtdKh30HP2idI9VhA26+55H9ivh9RQnks8DZDthvXJQpowiUWo4QN0vL1Gyw== X-Received: by 10.101.99.205 with SMTP id n13mr6692555pgv.345.1519588060549; Sun, 25 Feb 2018 11:47:40 -0800 (PST) Received: from kenny.it.cumulusnetworks.com. (fw.cumulusnetworks.com. [216.129.126.126]) by smtp.googlemail.com with ESMTPSA id z4sm12414196pgb.4.2018.02.25.11.47.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 25 Feb 2018 11:47:39 -0800 (PST) From: David Ahern To: netdev@vger.kernel.org Cc: davem@davemloft.net, idosch@idosch.org, roopa@cumulusnetworks.com, eric.dumazet@gmail.com, weiwan@google.com, kafai@fb.com, yoshfuji@linux-ipv6.org, David Ahern Subject: [PATCH RFC net-next 00/20] net/ipv6: Separate data structures for FIB and data path Date: Sun, 25 Feb 2018 11:47:10 -0800 Message-Id: <20180225194730.30063-1-dsahern@gmail.com> X-Mailer: git-send-email 2.11.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org IPv6 uses the same data struct for both control plane (FIB entries) and data path (dst entries). This struct has elements needed for both paths adding memory overhead and complexity (taking a dst hold in most places but an additional reference on rt6i_ref in a few). Furthermore, because of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC. This patch set separates FIB entries from dst entries, better aligning IPv6 code with IPv4, simplifying the reference counting and allowing FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is first step to a number of performance and scalability changes. The end result of this patch set: - FIB entries (fib6_info): /* size: 208, cachelines: 4, members: 25 */ /* sum members: 207, holes: 1, sum holes: 1 */ - dst entries (rt6_info) /* size: 240, cachelines: 4, members: 12 */ Versus the the single rt6_info struct today for both paths: /* size: 320, cachelines: 5, members: 28 */ This amounts to a 35% reduction in memory use for FIB entries and a 25% reduction for dst entries. With respect to locking FIB entries use RCU and a single atomic counter with fib6_info_hold and fib6_info_release helpers to manage the reference counting. dst entries use only the traditional dst refcounts with dst_hold and dst_release. FIB entries for host routes are referenced by inet6_ifaddr and ifacaddr6. In both cases, additional holds are taken -- similar to what is done for devices. This set is the first of many changes to improve the scalability of the IPv6 code. Follow on changes include: - consolidating duplicate fib6_info references like IPv4 does with duplicate fib_info - moving fib6_info into a slab cache to avoid allocation roundups to power of 2 (the 208 size becomes a 256 actual allocation) - Allow FIB lookups without generating a dst (e.g., most rt6_lookup users just want to verify the egress device). Means moving dst allocation to the other side of fib6_rule_lookup which again aligns with IPv4 behavior - using separate standalone nexthop objects which have performance benefits beyond fib_info consolidation At this point I am not seeing any refcount leaks or underflows, no oops or bug_ons, or warnings from kasan, so I think it is ready for others to beat up on it finding errors in code paths I have missed. David Ahern (20): net: Move fib_convert_metrics to dst core vrf: Move fib6_table into net_vrf net/ipv6: Pass net to fib6_update_sernum net/ipv6: Pass net namespace to route functions net/ipv6: Move support functions up in route.c net/ipv6: Save route type in rt6_info flags net/ipv6: Move nexthop data to fib6_nh net/ipv6: Defer initialization of dst to data path net/ipv6: move metrics from dst to rt6_info net/ipv6: move expires into rt6_info net/ipv6: Add fib6_null_entry net/ipv6: Add rt6_info create function for ip6_pol_route_lookup net/ipv6: Move dst flags to booleans in fib entries net/ipv6: Create a neigh_lookup for FIB entries net/ipv6: Add gfp_flags to route add functions net/ipv6: Cleanup exception route handling net/ipv6: introduce fib6_info struct and helpers net/ipv6: separate handling of FIB entries from dst based routes net/ipv6: Flip FIB entries to fib6_info net/ipv6: Remove unused code and variables for rt6_info .../net/ethernet/mellanox/mlxsw/spectrum_router.c | 96 +- drivers/net/vrf.c | 25 +- include/net/dst.h | 3 + include/net/if_inet6.h | 4 +- include/net/ip6_fib.h | 146 ++- include/net/ip6_route.h | 46 +- include/net/netns/ipv6.h | 3 +- net/core/dst.c | 49 + net/ipv4/fib_semantics.c | 43 +- net/ipv6/addrconf.c | 131 +- net/ipv6/anycast.c | 21 +- net/ipv6/ip6_fib.c | 344 +++-- net/ipv6/ip6_output.c | 3 +- net/ipv6/ndisc.c | 35 +- net/ipv6/route.c | 1361 ++++++++++---------- net/ipv6/xfrm6_policy.c | 2 - 16 files changed, 1183 insertions(+), 1129 deletions(-)