From patchwork Fri Oct 6 12:57:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 822393 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3y7qTP0c6Pz9sRq for ; Fri, 6 Oct 2017 23:59:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752254AbdJFM6T (ORCPT ); Fri, 6 Oct 2017 08:58:19 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44070 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752213AbdJFM6P (ORCPT ); Fri, 6 Oct 2017 08:58:15 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D6AA4ACB3; Fri, 6 Oct 2017 12:58:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 3D6AA4ACB3 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pabeni@redhat.com Received: from dhcppc0.redhat.com (ovpn-116-241.ams2.redhat.com [10.36.116.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 08E4360603; Fri, 6 Oct 2017 12:58:12 +0000 (UTC) From: Paolo Abeni To: linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" , Josh Triplett , Steven Rostedt , "David S. Miller" , Eric Dumazet , Hannes Frederic Sowa , netdev@vger.kernel.org Subject: [PATCH 1/4] rcu: introduce noref debug Date: Fri, 6 Oct 2017 14:57:46 +0200 Message-Id: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 06 Oct 2017 12:58:15 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We currently lack a debugging infrastructure to ensure that long-lived noref dst are properly handled - namely dropped or converted to a refcounted version before escaping the current RCU section. This changeset implements such infra, tracking the noref pointer on a per CPU store and checking that all noref tracked at any given RCU nesting level are cleared before leaving such RCU section. Each noref entity is identified using the entity pointer and an additional, optional, key/opaque pointer. This is needed to cope with usage case scenario where the same noref entity is stored in different places (e.g. noref dst on skb_clone()). To keep the patch/implementation simple RCU_NOREF_DEBUG depends on PREEMPT_RCU=n in Kconfig. Signed-off-by: Paolo Abeni --- include/linux/rcupdate.h | 11 ++++++ kernel/rcu/Kconfig.debug | 15 ++++++++ kernel/rcu/Makefile | 1 + kernel/rcu/noref_debug.c | 89 ++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 116 insertions(+) create mode 100644 kernel/rcu/noref_debug.c diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index de50d8a4cf41..20c1ce08e3eb 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -60,6 +60,15 @@ void call_rcu_sched(struct rcu_head *head, rcu_callback_t func); void synchronize_sched(void); void rcu_barrier_tasks(void); +#ifdef CONFIG_RCU_NOREF_DEBUG +void rcu_track_noref(void *key, void *noref, bool track); +void __rcu_check_noref(void); + +#else +static inline void rcu_track_noref(void *key, void *noref, bool add) { } +static inline void __rcu_check_noref(void) { } +#endif + #ifdef CONFIG_PREEMPT_RCU void __rcu_read_lock(void); @@ -85,6 +94,7 @@ static inline void __rcu_read_lock(void) static inline void __rcu_read_unlock(void) { + __rcu_check_noref(); if (IS_ENABLED(CONFIG_PREEMPT_COUNT)) preempt_enable(); } @@ -723,6 +733,7 @@ static inline void rcu_read_unlock_bh(void) "rcu_read_unlock_bh() used illegally while idle"); rcu_lock_release(&rcu_bh_lock_map); __release(RCU_BH); + __rcu_check_noref(); local_bh_enable(); } diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug index 0ec7d1d33a14..6c7f52a3e809 100644 --- a/kernel/rcu/Kconfig.debug +++ b/kernel/rcu/Kconfig.debug @@ -68,6 +68,21 @@ config RCU_TRACE Say Y here if you want to enable RCU tracing Say N if you are unsure. +config RCU_NOREF_DEBUG + bool "Debugging for RCU-protected noref entries" + depends on PREEMPT_RCU=n + select PREEMPT_COUNT + default n + help + In case of long lasting rcu_read_lock sections this debug + feature enables one to ensure that no rcu managed dereferenced + data leaves the locked section. One use case is the tracking + of dst_entries in struct sk_buff ->dst, which is used to pass + the dst_entry around in the whole stack while under RCU lock. + + Say Y here if you want to enable noref RCU debugging + Say N if you are unsure. + config RCU_EQS_DEBUG bool "Provide debugging asserts for adding NO_HZ support to an arch" depends on DEBUG_KERNEL diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 13c0fc852767..c67d7c65c582 100644 --- a/kernel/rcu/Makefile +++ b/kernel/rcu/Makefile @@ -11,3 +11,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o obj-$(CONFIG_PREEMPT_RCU) += tree.o obj-$(CONFIG_TINY_RCU) += tiny.o obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o +obj-$(CONFIG_RCU_NOREF_DEBUG) += noref_debug.o \ No newline at end of file diff --git a/kernel/rcu/noref_debug.c b/kernel/rcu/noref_debug.c new file mode 100644 index 000000000000..ae2e104b11d6 --- /dev/null +++ b/kernel/rcu/noref_debug.c @@ -0,0 +1,89 @@ +#include +#include +#include +#include + +#define NOREF_MAX 7 +struct noref_entry { + void *noref; + void *key; + int nesting; +}; + +struct noref_cache { + struct noref_entry store[NOREF_MAX]; +}; + +static DEFINE_PER_CPU(struct noref_cache, per_cpu_noref_cache); + +static int noref_cache_lookup(struct noref_cache *cache, void *noref, void *key) +{ + int i; + + for (i = 0; i < NOREF_MAX; ++i) + if (cache->store[i].noref == noref && + cache->store[i].key == key) + return i; + + return -1; +} + +void rcu_track_noref(void *key, void *noref, bool track) +{ + struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache); + int i; + + if (track) { + /* find the first empty slot */ + i = noref_cache_lookup(cache, NULL, 0); + if (i < 0) { + WARN_ONCE(1, "can't find empty an slot to track a noref" + " noref tracking will be inaccurate"); + return; + } + + cache->store[i].noref = noref; + cache->store[i].key = key; + cache->store[i].nesting = preempt_count(); + return; + } + + i = noref_cache_lookup(cache, noref, key); + if (i == -1) { + WARN_ONCE(1, "to-be-untracked noref entity %p not found in " + "cache\n", noref); + return; + } + + cache->store[i].noref = NULL; + cache->store[i].key = NULL; + cache->store[i].nesting = 0; +} +EXPORT_SYMBOL_GPL(rcu_track_noref); + +void __rcu_check_noref(void) +{ + struct noref_cache *cache = this_cpu_ptr(&per_cpu_noref_cache); + char *cur, buf[strlen("0xffffffffffffffff") * NOREF_MAX + 1]; + int nesting = preempt_count(); + int i, ret, cnt = 0; + + cur = buf; + for (i = 0; i < NOREF_MAX; ++i) { + if (!cache->store[i].noref || + cache->store[i].nesting != nesting) + continue; + + cnt++; + ret = sprintf(cur, " %p", cache->store[i].noref); + if (ret >= 0) + cur += ret; + cache->store[i].noref = NULL; + cache->store[i].key = NULL; + cache->store[i].nesting = 0; + } + + WARN_ONCE(cnt, "%d noref entities escaped an RCU section, " + "nesting %d, leaked noref list %s", cnt, nesting, buf); +} +EXPORT_SYMBOL_GPL(__rcu_check_noref); From patchwork Fri Oct 6 12:57:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 822394 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3y7qVB4bWlz9sRq for ; Fri, 6 Oct 2017 23:59:46 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752458AbdJFM7b (ORCPT ); Fri, 6 Oct 2017 08:59:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39072 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752226AbdJFM6S (ORCPT ); Fri, 6 Oct 2017 08:58:18 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 625A1356F0; Fri, 6 Oct 2017 12:58:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 625A1356F0 Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx06.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pabeni@redhat.com Received: from dhcppc0.redhat.com (ovpn-116-241.ams2.redhat.com [10.36.116.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9F6B570930; Fri, 6 Oct 2017 12:58:15 +0000 (UTC) From: Paolo Abeni To: linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" , Josh Triplett , Steven Rostedt , "David S. Miller" , Eric Dumazet , Hannes Frederic Sowa , netdev@vger.kernel.org Subject: [PATCH 2/4] net: use RCU noref infrastructure to track dst noref Date: Fri, 6 Oct 2017 14:57:47 +0200 Message-Id: <7c7d6183314d8d4cafb9689ce498ec8759c440b0.1507294365.git.pabeni@redhat.com> In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 06 Oct 2017 12:58:18 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org We can now ensure noref dsts do no escape the relevant RCU section. This does not introduce any functional changes, it adds the relevant debug code enabled by CONFIG_RCU_NOREF_DEBUG. Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 1 + include/net/dst.h | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 72299ef00061..5df77c8a0319 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -905,6 +905,7 @@ static inline void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) static inline void skb_dst_set_noref(struct sk_buff *skb, struct dst_entry *dst) { WARN_ON(!rcu_read_lock_held() && !rcu_read_lock_bh_held()); + rcu_track_noref(skb, dst, true); skb->_skb_refdst = (unsigned long)dst | SKB_DST_NOREF; } diff --git a/include/net/dst.h b/include/net/dst.h index 06a6765da074..6bc6fefe178e 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -294,6 +294,8 @@ static inline void refdst_drop(unsigned long refdst) static inline void skb_dst_drop(struct sk_buff *skb) { if (skb->_skb_refdst) { + if (skb->_skb_refdst & SKB_DST_NOREF) + rcu_track_noref(skb, skb_dst(skb), false); refdst_drop(skb->_skb_refdst); skb->_skb_refdst = 0UL; } @@ -304,6 +306,8 @@ static inline void __skb_dst_copy(struct sk_buff *nskb, unsigned long refdst) nskb->_skb_refdst = refdst; if (!(nskb->_skb_refdst & SKB_DST_NOREF)) dst_clone(skb_dst(nskb)); + else + rcu_track_noref(nskb, skb_dst(nskb), true); } static inline void skb_dst_copy(struct sk_buff *nskb, const struct sk_buff *oskb) @@ -335,6 +339,7 @@ static inline void skb_dst_force(struct sk_buff *skb) struct dst_entry *dst = skb_dst(skb); WARN_ON(!rcu_read_lock_held()); + rcu_track_noref(skb, dst, false); if (!dst_hold_safe(dst)) dst = NULL; From patchwork Fri Oct 6 12:57:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 822392 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3y7qT56TFHz9sRq for ; Fri, 6 Oct 2017 23:58:49 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752296AbdJFM6X (ORCPT ); Fri, 6 Oct 2017 08:58:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44474 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752213AbdJFM6V (ORCPT ); Fri, 6 Oct 2017 08:58:21 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 05776D7124; Fri, 6 Oct 2017 12:58:21 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 05776D7124 Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx09.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pabeni@redhat.com Received: from dhcppc0.redhat.com (ovpn-116-241.ams2.redhat.com [10.36.116.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id C413B6F112; Fri, 6 Oct 2017 12:58:18 +0000 (UTC) From: Paolo Abeni To: linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" , Josh Triplett , Steven Rostedt , "David S. Miller" , Eric Dumazet , Hannes Frederic Sowa , netdev@vger.kernel.org Subject: [PATCH 3/4] ipv4: drop unneeded and misleading RCU lock in ip_route_input_noref() Date: Fri, 6 Oct 2017 14:57:48 +0200 Message-Id: In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Fri, 06 Oct 2017 12:58:21 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat on the first ingress IPv4 packet: 1 noref entities escaped an RCU section, nesting 259, leaked noref list ffff8edcefb1dc00 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at kernel/rcu/noref_debug.c:87 __rcu_check_noref+0xf8/0x100 Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd mei_me ipmi_ssif sg mei mxm_wmi iTCO_wdt iTCO_vendor_support dcdbas lpc_ich ipmi_si pcspkr ipmi_devintf ipmi_msghandler shpchp wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio ahci ptp crc32c_intel i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.0-rc1.noref_3+ #1609 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 task: ffffffffae019500 task.stack: ffffffffae000000 RIP: 0010:__rcu_check_noref+0x7b/0xd0 RSP: 0018:ffff900afbe03b30 EFLAGS: 00010246 RAX: 0000000000000034 RBX: ffff900afbfd2500 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000202 RBP: ffff900afbe03b48 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 00000000c388883e R12: 0000000000000103 R13: 000000001d24100a R14: ffff900af13e0000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff900afbe00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005589ce876398 CR3: 0000001ff52f9005 CR4: 00000000003606f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ip_route_input_noref+0xa6/0x150 ? ip_route_input_noref+0x5/0x150 ip_rcv_finish+0x78/0x5e0 ip_rcv+0x2a7/0x540 ? packet_rcv+0x52/0x450 __netif_receive_skb_core+0x3b9/0xe10 ? netif_receive_skb_internal+0x40/0x390 __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x8d/0x390 ? netif_receive_skb_internal+0x40/0x390 napi_gro_receive+0x15c/0x1f0 igb_clean_rx_irq+0x36d/0x7f0 [igb] igb_poll+0x303/0x780 [igb] ? save_stack_trace+0x1b/0x20 ? __lock_acquire+0xcf2/0x11c0 ? net_rx_action+0xb4/0x520 net_rx_action+0x27d/0x520 __do_softirq+0xd1/0x4f5 irq_exit+0xfb/0x110 do_IRQ+0x67/0x120 common_interrupt+0xa7/0xa7 RIP: 0010:cpuidle_enter_state+0xd0/0x360 RSP: 0018:ffffffffae003df8 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff79 RAX: ffffffffae019500 RBX: ffffd7d67c604400 RCX: 0000000000000000 RDX: ffffffffae019500 RSI: 0000000000000001 RDI: ffffffffae019500 RBP: ffffffffae003e30 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000018 R12: 0000000000000004 R13: 0000000000000000 R14: ffffd7d67c604400 R15: 00000009c8eafd50 ? cpuidle_enter_state+0xc9/0x360 cpuidle_enter+0x17/0x20 call_cpuidle+0x23/0x40 do_idle+0x183/0x200 cpu_startup_entry+0x73/0x80 rest_init+0xc3/0xd0 start_kernel+0x4f7/0x518 ? set_init_arg+0x5a/0x5a x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x6f/0x72 secondary_startup_64+0xa5/0xa5 Code: f6 75 07 5b 41 5c 41 5d 5d c3 80 3d eb e4 ff 00 00 75 1a 44 89 e2 48 c7 c7 88 54 e7 ad 31 c0 c6 05 d6 e4 ff 00 01 e8 28 af fe ff <0f> ff 41 bd 07 00 00 00 48 8b 33 48 85 f6 74 06 44 39 63 10 74 The rcu protection in ip_route_input_noref() is unneeded and misleading: the caller still needs to acquire and retain the rcu lock until the skb - carrying a noref dst on successful return - is either dropped or the relevant dst is forced to a ref-counted version. This change just drops the unneeded lock. Signed-off-by: Paolo Abeni --- net/ipv4/route.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 94d4cd2d5ea4..5a6ca1f16d3f 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -2069,14 +2069,9 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr, u8 tos, struct net_device *dev) { struct fib_result res; - int err; tos &= IPTOS_RT_MASK; - rcu_read_lock(); - err = ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res); - rcu_read_unlock(); - - return err; + return ip_route_input_rcu(skb, daddr, saddr, tos, dev, &res); } EXPORT_SYMBOL(ip_route_input_noref); From patchwork Fri Oct 6 12:57:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 822391 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3y7qSv0czhz9sRq for ; Fri, 6 Oct 2017 23:58:39 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752339AbdJFM62 (ORCPT ); Fri, 6 Oct 2017 08:58:28 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35344 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752309AbdJFM6X (ORCPT ); Fri, 6 Oct 2017 08:58:23 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 998A779705; Fri, 6 Oct 2017 12:58:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 998A779705 Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx01.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=pabeni@redhat.com Received: from dhcppc0.redhat.com (ovpn-116-241.ams2.redhat.com [10.36.116.241]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66FB260603; Fri, 6 Oct 2017 12:58:21 +0000 (UTC) From: Paolo Abeni To: linux-kernel@vger.kernel.org Cc: "Paul E. McKenney" , Josh Triplett , Steven Rostedt , "David S. Miller" , Eric Dumazet , Hannes Frederic Sowa , netdev@vger.kernel.org Subject: [PATCH 4/4] tcp: avoid noref dst leak on input path Date: Fri, 6 Oct 2017 14:57:49 +0200 Message-Id: <77fd3b021d5a2f6dd4ad6b08e64f0643e3af95f4.1507294365.git.pabeni@redhat.com> In-Reply-To: References: X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Fri, 06 Oct 2017 12:58:23 +0000 (UTC) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Enabling CONFIG_RCU_NOREF_DEBUG gives the following splat when processing tcp packets: to-be-untracked noref entity ffff942cb71ea300 not found in cache ------------[ cut here ]------------ WARNING: CPU: 24 PID: 178 at kernel/rcu/noref_debug.c:54 rcu_track_noref+0xa4/0xf0 Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt ipmi_ssif mei_me iTCO_vendor_support mei dcdbas lpc_ich ipmi_si mxm_wmi sg pcspkr ipmi_devintf ipmi_msghandler acpi_power_meter shpchp wmi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm igb drm ixgbe mdio crc32c_intel ahci ptp i2c_algo_bit libahci pps_core i2c_core libata dca dm_mirror dm_region_hash dm_log dm_mod CPU: 24 PID: 178 Comm: ksoftirqd/24 Not tainted 4.14.0-rc1.noref_route+ #1610 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017 task: ffff940e48300000 task.stack: ffffaec406a20000 RIP: 0010:rcu_track_noref+0xa4/0xf0 RSP: 0018:ffffaec406a238e0 EFLAGS: 00010246 RAX: 0000000000000040 RBX: 0000000000000000 RCX: 0000000000000002 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000292 RBP: ffffaec406a238e0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000003 R12: 0000000000000000 R13: ffff942cb5110000 R14: 000000000000fe88 R15: ffff942cb1a20200 FS: 0000000000000000(0000) GS:ffff942cbee00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007febc072d140 CR3: 0000001feebd6002 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_data_queue+0x82a/0xce0 tcp_rcv_established+0x283/0x570 tcp_v4_do_rcv+0x102/0x1e0 tcp_v4_rcv+0xa9e/0xc10 ip_local_deliver_finish+0x128/0x380 ? ip_local_deliver_finish+0x41/0x380 ip_local_deliver+0x74/0x230 ip_rcv_finish+0x105/0x5e0 ip_rcv+0x2a7/0x540 __netif_receive_skb_core+0x3b9/0xe10 ? netif_receive_skb_internal+0x40/0x390 __netif_receive_skb+0x18/0x60 netif_receive_skb_internal+0x8d/0x390 ? netif_receive_skb_internal+0x40/0x390 napi_gro_complete+0x127/0x1d0 ? napi_gro_complete+0x2a/0x1d0 napi_gro_flush+0x5f/0x80 napi_complete_done+0x96/0x100 ixgbe_poll+0x5f8/0x7c0 [ixgbe] net_rx_action+0x27d/0x520 __do_softirq+0xd1/0x4f5 run_ksoftirqd+0x25/0x70 smpboot_thread_fn+0x11a/0x1f0 kthread+0x155/0x190 ? sort_range+0x30/0x30 ? kthread_create_on_node+0x70/0x70 ret_from_fork+0x2a/0x40 Code: e9 83 c2 01 48 83 c0 18 83 fa 07 75 ef 80 3d b0 e5 ff 00 00 75 d2 48 c7 c7 50 54 07 9d 31 c0 c6 05 9e e5 ff 00 01 e8 ef af fe ff <0f> ff 5d c3 80 3d 8f e5 ff 00 00 75 b0 48 c7 c7 00 54 07 9d 31 we must clear the skb dst before enqueuing the skb somewhere, but currently the dst entry for packets delivered via tcp_rcv_established() -> tcp_queue_rcv() is not cleared. Fix it by adding the explicit drop in tcp_queue_rcv() and moving the current skb_dst_drop() just before the other enqueuing operation, do avoid unneeded double skb_dst_drop() for some path. The leak itself is not harmful, because the tcp recvmsg() code should not access such info. Signed-off-by: Paolo Abeni --- net/ipv4/tcp_input.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5d7656beeee..bf4e17edfe7a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4422,6 +4422,9 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb) return; } + /* drop the -possibly noref - dst before delivery the skb to ofo tree */ + skb_dst_drop(skb); + /* Stash tstamp to avoid being stomped on by rbnode */ if (TCP_SKB_CB(skb)->has_rxtstamp) TCP_SKB_CB(skb)->swtstamp = skb->tstamp; @@ -4560,6 +4563,7 @@ static int __must_check tcp_queue_rcv(struct sock *sk, struct sk_buff *skb, int skb, fragstolen)) ? 1 : 0; tcp_rcv_nxt_update(tcp_sk(sk), TCP_SKB_CB(skb)->end_seq); if (!eaten) { + skb_dst_drop(skb); __skb_queue_tail(&sk->sk_receive_queue, skb); skb_set_owner_r(skb, sk); } @@ -4626,7 +4630,6 @@ static void tcp_data_queue(struct sock *sk, struct sk_buff *skb) __kfree_skb(skb); return; } - skb_dst_drop(skb); __skb_pull(skb, tcp_hdr(skb)->doff * 4); tcp_ecn_accept_cwr(tp, skb);