From patchwork Mon Nov 18 11:01:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1196684 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="fBPQkH8A"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 47GmJG546rz9s4Y for ; Mon, 18 Nov 2019 22:02:38 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726898AbfKRLCi (ORCPT ); Mon, 18 Nov 2019 06:02:38 -0500 Received: from us-smtp-1.mimecast.com ([205.139.110.61]:32715 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727239AbfKRLCf (ORCPT ); Mon, 18 Nov 2019 06:02:35 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574074954; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/CPy6WDs/EWc+Bu0cYFEz/Gm3YKQJa2lR6NOi7rwnOo=; b=fBPQkH8Aj5c5DwoQMDGS70SvL23qn8NMmT4SeV/mzrWKc0K58rcvgULbk6Lmcr0ccTSGoS 5WfqQlyoshHQrwN9+9Y/ZKzvDdZe3S1hxlOpkgbwYwlj63fyyACY9cCSDCAbFjkcFTBOn9 f2Bo779DbJ/eIJvhuwQCfJ8UIyhIv+w= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-254-MjdWET-6PgqAalU8f84WYg-1; Mon, 18 Nov 2019 06:02:31 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id E855A1852E2F; Mon, 18 Nov 2019 11:02:29 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-52.ams2.redhat.com [10.36.117.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id C79B44F9B6; Mon, 18 Nov 2019 11:02:28 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Willem de Bruijn , Edward Cree Subject: [PATCH net-next v2 1/2] ipv6: introduce and uses route look hints for list input Date: Mon, 18 Nov 2019 12:01:29 +0100 Message-Id: <643f2b258e275e915fa96ef0c635f9c5ff804c9d.1574071944.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-MC-Unique: MjdWET-6PgqAalU8f84WYg-1 X-Mimecast-Spam-Score: 0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When doing RX batch packet processing, we currently always repeat the route lookup for each ingress packet. If policy routing is configured, and IPV6_SUBTREES is disabled at build time, we know that packets with the same destination address will use the same dst. This change tries to avoid per packet route lookup caching the destination address of the latest successful lookup, and reusing it for the next packet when the above conditions are in place. Ingress traffic for most servers should fit. The measured performance delta under UDP flood vs a recvmmsg receiver is as follow: vanilla patched delta Kpps Kpps % 1431 1664 +14 In the worst-case scenario - each packet has a different destination address - the performance delta is within noise range. v1 -> v2: - fix build issue with !CONFIG_IPV6_MULTIPLE_TABLES - fix potential race when fib6_has_custom_rules is set while processing a packet batch Signed-off-by: Paolo Abeni --- net/ipv6/ip6_input.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c index ef7f707d9ae3..f559ad6b09ef 100644 --- a/net/ipv6/ip6_input.c +++ b/net/ipv6/ip6_input.c @@ -44,10 +44,16 @@ #include #include +struct ip6_route_input_hint { + unsigned long refdst; + struct in6_addr daddr; +}; + INDIRECT_CALLABLE_DECLARE(void udp_v6_early_demux(struct sk_buff *)); INDIRECT_CALLABLE_DECLARE(void tcp_v6_early_demux(struct sk_buff *)); static void ip6_rcv_finish_core(struct net *net, struct sock *sk, - struct sk_buff *skb) + struct sk_buff *skb, + struct ip6_route_input_hint *hint) { void (*edemux)(struct sk_buff *skb); @@ -59,7 +65,13 @@ static void ip6_rcv_finish_core(struct net *net, struct sock *sk, INDIRECT_CALL_2(edemux, tcp_v6_early_demux, udp_v6_early_demux, skb); } - if (!skb_valid_dst(skb)) + + if (skb_valid_dst(skb)) + return; + + if (hint && ipv6_addr_equal(&hint->daddr, &ipv6_hdr(skb)->daddr)) + __skb_dst_copy(skb, hint->refdst); + else ip6_route_input(skb); } @@ -71,7 +83,7 @@ int ip6_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) skb = l3mdev_ip6_rcv(skb); if (!skb) return NET_RX_SUCCESS; - ip6_rcv_finish_core(net, sk, skb); + ip6_rcv_finish_core(net, sk, skb, NULL); return dst_input(skb); } @@ -86,9 +98,20 @@ static void ip6_sublist_rcv_finish(struct list_head *head) } } +static bool ip6_can_cache_route_hint(struct net *net) +{ + return !IS_ENABLED(IPV6_SUBTREES) && +#ifdef CONFIG_IPV6_MULTIPLE_TABLES + !net->ipv6.fib6_has_custom_rules; +#else + 1; +#endif +} + static void ip6_list_rcv_finish(struct net *net, struct sock *sk, struct list_head *head) { + struct ip6_route_input_hint _hint, *hint = NULL; struct dst_entry *curr_dst = NULL; struct sk_buff *skb, *next; struct list_head sublist; @@ -104,9 +127,18 @@ static void ip6_list_rcv_finish(struct net *net, struct sock *sk, skb = l3mdev_ip6_rcv(skb); if (!skb) continue; - ip6_rcv_finish_core(net, sk, skb); + ip6_rcv_finish_core(net, sk, skb, hint); dst = skb_dst(skb); if (curr_dst != dst) { + if (ip6_can_cache_route_hint(net)) { + _hint.refdst = skb->_skb_refdst; + memcpy(&_hint.daddr, &ipv6_hdr(skb)->daddr, + sizeof(_hint.daddr)); + hint = &_hint; + } else { + hint = NULL; + } + /* dispatch old sublist */ if (!list_empty(&sublist)) ip6_sublist_rcv_finish(&sublist); From patchwork Mon Nov 18 11:01:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 1196683 Return-Path: X-Original-To: patchwork-incoming-netdev@ozlabs.org Delivered-To: patchwork-incoming-netdev@ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.b="alticM4l"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 47GmJD6NF2z9s4Y for ; Mon, 18 Nov 2019 22:02:36 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727243AbfKRLCf (ORCPT ); Mon, 18 Nov 2019 06:02:35 -0500 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:39653 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1727234AbfKRLCe (ORCPT ); Mon, 18 Nov 2019 06:02:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574074953; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=plfPkpX1KyfGNBiAj7GcSBsdj4kiAjXNxa7dt4XFlJY=; b=alticM4lS1Iz6fez3imiRgo9Lsxgut62f9okNyreN14EBDpIMRL6HanGhfdGk+kD52rIOL wiLyVezlUjjNoj0cf7KejCC8nvHTEsk4YNCmEapUpxd8ALdA8bK8ptdPoSk18UI4iJ+HPq TqRNTMg52ozuxM6qPq2Y4GaofCbwHXg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-292-oDqtr69DOX6-VSdfbepkig-1; Mon, 18 Nov 2019 06:02:32 -0500 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6AEDD801E5B; Mon, 18 Nov 2019 11:02:31 +0000 (UTC) Received: from localhost.localdomain.com (ovpn-117-52.ams2.redhat.com [10.36.117.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id 48F714EE11; Mon, 18 Nov 2019 11:02:30 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Willem de Bruijn , Edward Cree Subject: [PATCH net-next v2 2/2] ipv4: use dst hint for ipv4 list receive Date: Mon, 18 Nov 2019 12:01:30 +0100 Message-Id: <592c763828171c414e8927878b1a22027e33dee7.1574071944.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-MC-Unique: oDqtr69DOX6-VSdfbepkig-1 X-Mimecast-Spam-Score: 0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is alike the previous change, with some additional ipv4 specific quirk. Even when using the route hint we still have to do perform additional per packet checks about source address validity: a new helper is added to wrap them. Moreover, the ipv4 route lookup, even in the absence of policy routing, may depend on pkts ToS, so we cache that values, too. Explicitly avoid hints for local broadcast: this simplify the code and broadcasts are slower path anyway. UDP flood performances vs recvmmsg() receiver: vanilla patched delta Kpps Kpps % 1683 1833 +8 In the worst case scenario - each packet has a different destination address - the performance delta is within noise range. v1 -> v2: - fix build issue with !CONFIG_IP_MULTIPLE_TABLES Signed-off-by: Paolo Abeni --- include/net/route.h | 11 +++++++++++ net/ipv4/ip_input.c | 38 +++++++++++++++++++++++++++++++++----- net/ipv4/route.c | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 82 insertions(+), 5 deletions(-) diff --git a/include/net/route.h b/include/net/route.h index 6c516840380d..f7a8a52318cd 100644 --- a/include/net/route.h +++ b/include/net/route.h @@ -185,6 +185,17 @@ int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin, struct fib_result *res); +struct ip_route_input_hint { + unsigned long refdst; + __be32 daddr; + char tos; + bool local; +}; + +int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src, + u8 tos, struct net_device *devin, + struct ip_route_input_hint *hint); + static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, u8 tos, struct net_device *devin) { diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c index 24a95126e698..25f6fcc65380 100644 --- a/net/ipv4/ip_input.c +++ b/net/ipv4/ip_input.c @@ -305,7 +305,8 @@ static inline bool ip_rcv_options(struct sk_buff *skb, struct net_device *dev) INDIRECT_CALLABLE_DECLARE(int udp_v4_early_demux(struct sk_buff *)); INDIRECT_CALLABLE_DECLARE(int tcp_v4_early_demux(struct sk_buff *)); static int ip_rcv_finish_core(struct net *net, struct sock *sk, - struct sk_buff *skb, struct net_device *dev) + struct sk_buff *skb, struct net_device *dev, + struct ip_route_input_hint *hint) { const struct iphdr *iph = ip_hdr(skb); int (*edemux)(struct sk_buff *skb); @@ -335,8 +336,12 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk, * how the packet travels inside Linux networking. */ if (!skb_valid_dst(skb)) { - err = ip_route_input_noref(skb, iph->daddr, iph->saddr, - iph->tos, dev); + if (hint && hint->daddr == iph->daddr && hint->tos == iph->tos) + err = ip_route_use_hint(skb, iph->daddr, iph->saddr, + iph->tos, dev, hint); + else + err = ip_route_input_noref(skb, iph->daddr, iph->saddr, + iph->tos, dev); if (unlikely(err)) goto drop_error; } @@ -408,7 +413,7 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb) if (!skb) return NET_RX_SUCCESS; - ret = ip_rcv_finish_core(net, sk, skb, dev); + ret = ip_rcv_finish_core(net, sk, skb, dev, NULL); if (ret != NET_RX_DROP) ret = dst_input(skb); return ret; @@ -535,9 +540,20 @@ static void ip_sublist_rcv_finish(struct list_head *head) } } +static bool ip_can_cache_route_hint(struct net *net, struct rtable *rt) +{ + return rt->rt_type != RTN_BROADCAST && +#ifdef CONFIG_IP_MULTIPLE_TABLES + !net->ipv6.fib6_has_custom_rules; +#else + 1; +#endif +} + static void ip_list_rcv_finish(struct net *net, struct sock *sk, struct list_head *head) { + struct ip_route_input_hint _hint, *hint = NULL; struct dst_entry *curr_dst = NULL; struct sk_buff *skb, *next; struct list_head sublist; @@ -554,11 +570,23 @@ static void ip_list_rcv_finish(struct net *net, struct sock *sk, skb = l3mdev_ip_rcv(skb); if (!skb) continue; - if (ip_rcv_finish_core(net, sk, skb, dev) == NET_RX_DROP) + if (ip_rcv_finish_core(net, sk, skb, dev, hint) == NET_RX_DROP) continue; dst = skb_dst(skb); if (curr_dst != dst) { + struct rtable *rt = (struct rtable *)dst; + + if (ip_can_cache_route_hint(net, rt)) { + _hint.refdst = skb->_skb_refdst; + _hint.daddr = ip_hdr(skb)->daddr; + _hint.tos = ip_hdr(skb)->tos; + _hint.local = rt->rt_type == RTN_LOCAL; + hint = &_hint; + } else { + hint = NULL; + } + /* dispatch old sublist */ if (!list_empty(&sublist)) ip_sublist_rcv_finish(&sublist); diff --git a/net/ipv4/route.c b/net/ipv4/route.c index dcc4fa10138d..b0ddff17db80 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2019,6 +2019,44 @@ static int ip_mkroute_input(struct sk_buff *skb, return __mkroute_input(skb, res, in_dev, daddr, saddr, tos); } +/* Implements all the saddr-related checks as ip_route_input_slow(), + * assuming daddr is valid and this is not a local broadcast. + * Uses the provided hint instead of performing a route lookup. + */ +int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr, + u8 tos, struct net_device *dev, + struct ip_route_input_hint *hint) +{ + struct in_device *in_dev = __in_dev_get_rcu(dev); + struct net *net = dev_net(dev); + int err = -EINVAL; + u32 itag = 0; + + if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) + goto martian_source; + + if (ipv4_is_zeronet(saddr)) + goto martian_source; + + if (ipv4_is_loopback(saddr) && !IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) + goto martian_source; + + if (hint->local) { + err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, + in_dev, &itag); + if (err < 0) + goto martian_source; + } + + err = 0; + __skb_dst_copy(skb, hint->refdst); + return err; + +martian_source: + ip_handle_martian_source(dev, in_dev, skb, daddr, saddr); + return err; +} + /* * NOTE. We drop all the packets that has local source * addresses, because every properly looped back packet