From patchwork Mon Jun 17 17:31:23 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 252015 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id C37BE2C0085 for ; Tue, 18 Jun 2013 03:38:47 +1000 (EST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UodO3-0005Az-9u; Mon, 17 Jun 2013 17:38:39 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1UodNu-0005A8-FQ for kernel-team@lists.ubuntu.com; Mon, 17 Jun 2013 17:38:30 +0000 Received: from c-67-160-231-42.hsd1.ca.comcast.net ([67.160.231.42] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1UodHN-0007FS-0D; Mon, 17 Jun 2013 17:31:45 +0000 Received: from kamal by fourier with local (Exim 4.80) (envelope-from ) id 1UodHK-00069m-VA; Mon, 17 Jun 2013 10:31:42 -0700 From: Kamal Mostafa To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com Subject: [PATCH 79/84] swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion Date: Mon, 17 Jun 2013 10:31:23 -0700 Message-Id: <1371490288-23167-80-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.8.1.2 In-Reply-To: <1371490288-23167-1-git-send-email-kamal@canonical.com> References: <1371490288-23167-1-git-send-email-kamal@canonical.com> X-Extended-Stable: 3.8 Cc: Kamal Mostafa , Linus Torvalds , Andrew Morton , Shaohua Li , Rafael Aquini X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com 3.8.13.3 -stable review patch. If anyone has any objections, please let me know. ------------------ From: Rafael Aquini commit cbab0e4eec299e9059199ebe6daf48730be46d2b upstream. read_swap_cache_async() can race against get_swap_page(), and stumble across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought into the swapcache yet. This transient swap_map state is expected to be transitory, but the actual placement of discard at scan_swap_map() inserts a wait for I/O completion thus making the thread at read_swap_cache_async() to loop around its -EEXIST case, while the other end at get_swap_page() is scheduled away at scan_swap_map(). This can leave the system deadlocked if the I/O completion happens to be waiting on the CPU waitqueue where read_swap_cache_async() is busy looping and !CONFIG_PREEMPT. This patch introduces a cond_resched() call to make the aforementioned read_swap_cache_async() busy loop condition to bail out when necessary, thus avoiding the subtle race window. Signed-off-by: Rafael Aquini Acked-by: Johannes Weiner Acked-by: KOSAKI Motohiro Acked-by: Hugh Dickins Cc: Shaohua Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Kamal Mostafa --- mm/swap_state.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 0cb36fb..f854fbd 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -314,8 +314,24 @@ struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * Swap entry may have been freed since our caller observed it. */ err = swapcache_prepare(entry); - if (err == -EEXIST) { /* seems racy */ + if (err == -EEXIST) { radix_tree_preload_end(); + /* + * We might race against get_swap_page() and stumble + * across a SWAP_HAS_CACHE swap_map entry whose page + * has not been brought into the swapcache yet, while + * the other end is scheduled away waiting on discard + * I/O completion at scan_swap_map(). + * + * In order to avoid turning this transitory state + * into a permanent loop around this -EEXIST case + * if !CONFIG_PREEMPT and the I/O completion happens + * to be waiting on the CPU waitqueue where we are now + * busy looping, we just conditionally invoke the + * scheduler here, if there are some more important + * tasks to run. + */ + cond_resched(); continue; } if (err) { /* swp entry is obsolete ? */