From patchwork Mon Mar 6 13:14:03 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Hocko X-Patchwork-Id: 735698 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from bombadil.infradead.org (bombadil.infradead.org [65.50.211.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3vcL0X636hz9sNS for ; Tue, 7 Mar 2017 00:16:44 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="fQEc7Zb/"; dkim-atps=neutral DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:MIME-Version:Cc:List-Subscribe: List-Help:List-Post:List-Archive:List-Unsubscribe:List-Id:References: In-Reply-To:Message-Id:Date:Subject:To:From:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=psSVGoXCLJT6x6cKoazhnFg1s+KBeeX2tUWZNYD3yjU=; b=fQEc7Zb/pMXCUzoC6CbA/lqv6T gaIUVqA6EIBRn+0uymxwUQK8UxHhB5LMz7W/eLMBS2eTiDay2V/a6FYc0GrfdIDoJL16J4iNX8jr/ /z10cHtWU4Br8HgZ0ekfjdnohbxDFwtedyhPkxzFLAS3GGK909QuJ7c9EY/Lq7msvMOZShWPDglZB gZOWT/CkBHMEk5KnmgNnaesA1FPqTcQEKp+8M8DanYcDU9+JO59VZcVKIwL/4f6xz5QIrCpEq2Xsj JIgoukCgdGcGeInt80T+MAEOt4TLm8Gy1n0AhqMUfdMxpxAdenh2hYlBW3c92ODGWJ3f+HkY1oaht F+9Xbjhw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cksVA-0001Ze-4T; Mon, 06 Mar 2017 13:16:36 +0000 Received: from mail-wr0-f194.google.com ([209.85.128.194]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1cksTK-0007oS-Dr; Mon, 06 Mar 2017 13:14:45 +0000 Received: by mail-wr0-f194.google.com with SMTP id u108so17728882wrb.2; Mon, 06 Mar 2017 05:14:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=f4ZZ92aFKDg5LWcp4LqWlQ68DNl4N5YFgxYU+bJs/kk=; b=eyPlYznj+lRANI6ls/3ONxVayazgPyPVMqUOB3rOKTrSW4+sijhC8Pk6pA4OFigL8U jM0mUiZzlqRsFO+vmZsIKOAMv5N6z8kH1W53JJ3Cl8xu0urEmXCEYq9Ymd1ejrx7H1In RLWAv1/m1fDTX3F1FgqbWQ1R7Wtn072VKxhuGuTrS341dRDoT4qXU1ogK0r9opXWLZGu /PKpSBdnBQHN42PeGjohWM1xtl3w25qOoqD66lfXq/ghjeAIGEAUTvGQq8itiBwxOEJr 7Qp0fTWrLGCBshauip65MTEI0xmHOcQnIziT7MYhPCQKjIEurAuN8UvbEPXp0SXO14P9 lRwg== X-Gm-Message-State: AMke39n1ko2CkBfTZZ0+EhSlwCSTd5HeGn1zFqg0D8Xw2P1JaE5gVb7dezZWYOLOhUeipA== X-Received: by 10.223.128.5 with SMTP id 5mr13842398wrk.163.1488806060732; Mon, 06 Mar 2017 05:14:20 -0800 (PST) Received: from tiehlicka.suse.cz (prg-ext-pat.suse.com. [213.151.95.130]) by smtp.gmail.com with ESMTPSA id d6sm14829593wmd.6.2017.03.06.05.14.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 06 Mar 2017 05:14:20 -0800 (PST) From: Michal Hocko To: Andrew Morton , Subject: [PATCH 2/7] lockdep: allow to disable reclaim lockup detection Date: Mon, 6 Mar 2017 14:14:03 +0100 Message-Id: <20170306131408.9828-3-mhocko@kernel.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170306131408.9828-1-mhocko@kernel.org> References: <20170306131408.9828-1-mhocko@kernel.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170306_051442_730419_7EE443E4 X-CRM114-Status: GOOD ( 17.00 ) X-Spam-Score: -1.1 (-) X-Spam-Report: SpamAssassin version 3.4.1 on bombadil.infradead.org summary: Content analysis details: (-1.1 points) pts rule name description ---- ---------------------- -------------------------------------------------- 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [209.85.128.194 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [209.85.128.194 listed in list.dnswl.org] -0.0 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [209.85.128.194 listed in wl.mailspike.net] 0.0 FREEMAIL_FROM Sender email is commonly abused enduser mail provider (mstsxfx[at]gmail.com) -0.0 SPF_PASS SPF: sender matches SPF record 0.0 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail domains are different -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.2 FREEMAIL_FORGED_FROMDOMAIN 2nd level domains in From and EnvelopeFrom freemail headers are different X-BeenThere: linux-mtd@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Michal Hocko , Jan Kara , "Peter Zijlstra \(Intel\)" , djwong@kernel.org, Dave Chinner , David Sterba , Chris Mason , linux-mtd@lists.infradead.org, logfs@logfs.org, linux-afs@lists.infradead.org, cluster-devel@redhat.com, linux-ext4@vger.kernel.org, reiserfs-devel@vger.kernel.org, ceph-devel@vger.kernel.org, Vlastimil Babka , linux-nfs@vger.kernel.org, Theodore Ts'o , linux-mm@kvack.org, linux-ntfs-dev@lists.sourceforge.net, LKML , linux-f2fs-devel@lists.sourceforge.net, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org MIME-Version: 1.0 Sender: "linux-mtd" Errors-To: linux-mtd-bounces+incoming=patchwork.ozlabs.org@lists.infradead.org From: Michal Hocko The current implementation of the reclaim lockup detection can lead to false positives and those even happen and usually lead to tweak the code to silence the lockdep by using GFP_NOFS even though the context can use __GFP_FS just fine. See http://lkml.kernel.org/r/20160512080321.GA18496@dastard as an example. ================================= [ INFO: inconsistent lock state ] 4.5.0-rc2+ #4 Tainted: G O --------------------------------- inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage. kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes: (&xfs_nondir_ilock_class){++++-+}, at: [] xfs_ilock+0x177/0x200 [xfs] {RECLAIM_FS-ON-R} state was registered at: [] mark_held_locks+0x79/0xa0 [] lockdep_trace_alloc+0xb3/0x100 [] kmem_cache_alloc+0x33/0x230 [] kmem_zone_alloc+0x81/0x120 [xfs] [] xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs] [] __xfs_refcount_find_shared+0x75/0x580 [xfs] [] xfs_refcount_find_shared+0x84/0xb0 [xfs] [] xfs_getbmap+0x608/0x8c0 [xfs] [] xfs_vn_fiemap+0xab/0xc0 [xfs] [] do_vfs_ioctl+0x498/0x670 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x12/0x6f CPU0 ---- lock(&xfs_nondir_ilock_class); lock(&xfs_nondir_ilock_class); *** DEADLOCK *** 3 locks held by kswapd0/543: stack backtrace: CPU: 0 PID: 543 Comm: kswapd0 Tainted: G O 4.5.0-rc2+ #4 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ffffffff82a34f10 ffff88003aa078d0 ffffffff813a14f9 ffff88003d8551c0 ffff88003aa07920 ffffffff8110ec65 0000000000000000 0000000000000001 ffff880000000001 000000000000000b 0000000000000008 ffff88003d855aa0 Call Trace: [] dump_stack+0x4b/0x72 [] print_usage_bug+0x215/0x240 [] mark_lock+0x1f5/0x660 [] ? print_shortest_lock_dependencies+0x1a0/0x1a0 [] __lock_acquire+0xa80/0x1e50 [] ? kmem_cache_alloc+0x15e/0x230 [] ? kmem_zone_alloc+0x81/0x120 [xfs] [] lock_acquire+0xd8/0x1e0 [] ? xfs_ilock+0x177/0x200 [xfs] [] ? xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] [] down_write_nested+0x5e/0xc0 [] ? xfs_ilock+0x177/0x200 [xfs] [] xfs_ilock+0x177/0x200 [xfs] [] xfs_reflink_cancel_cow_range+0x150/0x300 [xfs] [] xfs_fs_evict_inode+0xdc/0x1e0 [xfs] [] evict+0xc5/0x190 [] dispose_list+0x39/0x60 [] prune_icache_sb+0x4b/0x60 [] super_cache_scan+0x14f/0x1a0 [] shrink_slab.part.63.constprop.79+0x1e9/0x4e0 [] shrink_zone+0x15e/0x170 [] kswapd+0x4f1/0xa80 [] ? zone_reclaim+0x230/0x230 [] kthread+0xf2/0x110 [] ? kthread_create_on_node+0x220/0x220 [] ret_from_fork+0x3f/0x70 [] ? kthread_create_on_node+0x220/0x220 To quote Dave: " Ignoring whether reflink should be doing anything or not, that's a "xfs_refcountbt_init_cursor() gets called both outside and inside transactions" lockdep false positive case. The problem here is lockdep has seen this allocation from within a transaction, hence a GFP_NOFS allocation, and now it's seeing it in a GFP_KERNEL context. Also note that we have an active reference to this inode. So, because the reclaim annotations overload the interrupt level detections and it's seen the inode ilock been taken in reclaim ("interrupt") context, this triggers a reclaim context warning where it thinks it is unsafe to do this allocation in GFP_KERNEL context holding the inode ilock... " This sounds like a fundamental problem of the reclaim lock detection. It is really impossible to annotate such a special usecase IMHO unless the reclaim lockup detection is reworked completely. Until then it is much better to provide a way to add "I know what I am doing flag" and mark problematic places. This would prevent from abusing GFP_NOFS flag which has a runtime effect even on configurations which have lockdep disabled. Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to skip the current allocation request. While we are at it also make sure that the radix tree doesn't accidentaly override tags stored in the upper part of the gfp_mask. Suggested-by: Peter Zijlstra Acked-by: Peter Zijlstra (Intel) Acked-by: Vlastimil Babka Signed-off-by: Michal Hocko --- include/linux/gfp.h | 10 +++++++++- kernel/locking/lockdep.c | 4 ++++ lib/radix-tree.c | 2 ++ 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index db373b9d3223..978232a3b4ae 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -40,6 +40,11 @@ struct vm_area_struct; #define ___GFP_DIRECT_RECLAIM 0x400000u #define ___GFP_WRITE 0x800000u #define ___GFP_KSWAPD_RECLAIM 0x1000000u +#ifdef CONFIG_LOCKDEP +#define ___GFP_NOLOCKDEP 0x4000000u +#else +#define ___GFP_NOLOCKDEP 0 +#endif /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -179,8 +184,11 @@ struct vm_area_struct; #define __GFP_NOTRACK ((__force gfp_t)___GFP_NOTRACK) #define __GFP_NOTRACK_FALSE_POSITIVE (__GFP_NOTRACK) +/* Disable lockdep for GFP context tracking */ +#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) + /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 25 +#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /* diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 25c33dcd86d7..b169339541f5 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2884,6 +2884,10 @@ static void __lockdep_trace_alloc(gfp_t gfp_mask, unsigned long flags) if (DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))) return; + /* Disable lockdep if explicitly requested */ + if (gfp_mask & __GFP_NOLOCKDEP) + return; + mark_held_locks(curr, RECLAIM_FS); } diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 5ed506d648c4..526142afcf8c 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -2284,6 +2284,8 @@ static int radix_tree_cpu_dead(unsigned int cpu) void __init radix_tree_init(void) { int ret; + + BUILD_BUG_ON(RADIX_TREE_MAX_TAGS + __GFP_BITS_SHIFT > 32); radix_tree_node_cachep = kmem_cache_create("radix_tree_node", sizeof(struct radix_tree_node), 0, SLAB_PANIC | SLAB_RECLAIM_ACCOUNT,