From patchwork Thu Feb 19 00:28:33 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kamal Mostafa X-Patchwork-Id: 441392 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 4675C1400F1; Thu, 19 Feb 2015 11:33:41 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1YOF3h-0005qU-Ml; Thu, 19 Feb 2015 00:33:37 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtp (Exim 4.76) (envelope-from ) id 1YOEyq-0002ud-JW for kernel-team@lists.ubuntu.com; Thu, 19 Feb 2015 00:28:36 +0000 Received: from 1.general.kamal.us.vpn ([10.172.68.52] helo=fourier) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1YOEyq-0004BP-1w; Thu, 19 Feb 2015 00:28:36 +0000 Received: from kamal by fourier with local (Exim 4.82) (envelope-from ) id 1YOEyn-0003oe-K9; Wed, 18 Feb 2015 16:28:33 -0800 From: Kamal Mostafa To: Johannes Weiner Subject: [3.13.y-ckt stable] Patch "mm: protect set_page_dirty() from ongoing truncation" has been added to staging queue Date: Wed, 18 Feb 2015 16:28:33 -0800 Message-Id: <1424305713-14638-1-git-send-email-kamal@canonical.com> X-Mailer: git-send-email 1.9.1 X-Extended-Stable: 3.13 Cc: Jan Kara , Kamal Mostafa , kernel-team@lists.ubuntu.com, Tejun Heo , Andrew Morton , Linus Torvalds , "Kirill A. Shutemov" X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.14 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: kernel-team-bounces@lists.ubuntu.com This is a note to let you know that I have just added a patch titled mm: protect set_page_dirty() from ongoing truncation to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.13.y-queue This patch is scheduled to be released in version 3.13.11-ckt16. If you, or anyone else, feels it should not be added to this tree, please reply to this email. For more information about the 3.13.y-ckt tree, see https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable Thanks. -Kamal ------ From 5edf42b80a22097f05e162fcd28d66dea6dcb5a8 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 8 Jan 2015 14:32:18 -0800 Subject: mm: protect set_page_dirty() from ongoing truncation commit 2d6d7f98284648c5ed113fe22a132148950b140f upstream. Tejun, while reviewing the code, spotted the following race condition between the dirtying and truncation of a page: __set_page_dirty_nobuffers() __delete_from_page_cache() if (TestSetPageDirty(page)) page->mapping = NULL if (PageDirty()) dec_zone_page_state(page, NR_FILE_DIRTY); dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); if (page->mapping) account_page_dirtied(page) __inc_zone_page_state(page, NR_FILE_DIRTY); __inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE); which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE. Dirtiers usually lock out truncation, either by holding the page lock directly, or in case of zap_pte_range(), by pinning the mapcount with the page table lock held. The notable exception to this rule, though, is do_wp_page(), for which this race exists. However, do_wp_page() already waits for a locked page to unlock before setting the dirty bit, in order to prevent a race where clear_page_dirty() misses the page bit in the presence of dirty ptes. Upgrade that wait to a fully locked set_page_dirty() to also cover the situation explained above. Afterwards, the code in set_page_dirty() dealing with a truncation race is no longer needed. Remove it. Reported-by: Tejun Heo Signed-off-by: Johannes Weiner Acked-by: Kirill A. Shutemov Reviewed-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds [ kamal: backport to 3.13-stable: no VM_BUG_ON_PAGE ] Signed-off-by: Kamal Mostafa --- include/linux/writeback.h | 1 - mm/memory.c | 27 +++++++++++++++++---------- mm/page-writeback.c | 43 ++++++++++++------------------------------- 3 files changed, 29 insertions(+), 42 deletions(-) -- 1.9.1 diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 021b8a3..0aa44e2 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -178,7 +178,6 @@ int write_cache_pages(struct address_space *mapping, struct writeback_control *wbc, writepage_t writepage, void *data); int do_writepages(struct address_space *mapping, struct writeback_control *wbc); -void set_page_dirty_balance(struct page *page, int page_mkwrite); void writeback_set_ratelimit(void); void tag_pages_for_writeback(struct address_space *mapping, pgoff_t start, pgoff_t end); diff --git a/mm/memory.c b/mm/memory.c index a5017dd..b24ac03 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2744,17 +2744,24 @@ reuse: if (!dirty_page) return ret; - /* - * Yes, Virginia, this is actually required to prevent a race - * with clear_page_dirty_for_io() from clearing the page dirty - * bit after it clear all dirty ptes, but before a racing - * do_wp_page installs a dirty pte. - * - * __do_fault is protected similarly. - */ if (!page_mkwrite) { - wait_on_page_locked(dirty_page); - set_page_dirty_balance(dirty_page, page_mkwrite); + struct address_space *mapping; + int dirtied; + + lock_page(dirty_page); + dirtied = set_page_dirty(dirty_page); + VM_BUG_ON(PageAnon(dirty_page)); + mapping = dirty_page->mapping; + unlock_page(dirty_page); + + if (dirtied && mapping) { + /* + * Some device drivers do not set page.mapping + * but still dirty their pages + */ + balance_dirty_pages_ratelimited(mapping); + } + /* file_update_time outside page_lock */ if (vma->vm_file) file_update_time(vma->vm_file); diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 9f45f87..145044b 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -1562,16 +1562,6 @@ pause: bdi_start_background_writeback(bdi); } -void set_page_dirty_balance(struct page *page, int page_mkwrite) -{ - if (set_page_dirty(page) || page_mkwrite) { - struct address_space *mapping = page_mapping(page); - - if (mapping) - balance_dirty_pages_ratelimited(mapping); - } -} - static DEFINE_PER_CPU(int, bdp_ratelimits); /* @@ -2161,32 +2151,25 @@ EXPORT_SYMBOL(account_page_writeback); * page dirty in that case, but not all the buffers. This is a "bottom-up" * dirtying, whereas __set_page_dirty_buffers() is a "top-down" dirtying. * - * Most callers have locked the page, which pins the address_space in memory. - * But zap_pte_range() does not lock the page, however in that case the - * mapping is pinned by the vma's ->vm_file reference. - * - * We take care to handle the case where the page was truncated from the - * mapping by re-checking page_mapping() inside tree_lock. + * The caller must ensure this doesn't race with truncation. Most will simply + * hold the page lock, but e.g. zap_pte_range() calls with the page mapped and + * the pte lock held, which also locks out truncation. */ int __set_page_dirty_nobuffers(struct page *page) { if (!TestSetPageDirty(page)) { struct address_space *mapping = page_mapping(page); - struct address_space *mapping2; unsigned long flags; if (!mapping) return 1; spin_lock_irqsave(&mapping->tree_lock, flags); - mapping2 = page_mapping(page); - if (mapping2) { /* Race with truncate? */ - BUG_ON(mapping2 != mapping); - WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); - account_page_dirtied(page, mapping); - radix_tree_tag_set(&mapping->page_tree, - page_index(page), PAGECACHE_TAG_DIRTY); - } + BUG_ON(page_mapping(page) != mapping); + WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); + account_page_dirtied(page, mapping); + radix_tree_tag_set(&mapping->page_tree, page_index(page), + PAGECACHE_TAG_DIRTY); spin_unlock_irqrestore(&mapping->tree_lock, flags); if (mapping->host) { /* !PageAnon && !swapper_space */ @@ -2343,12 +2326,10 @@ int clear_page_dirty_for_io(struct page *page) /* * We carefully synchronise fault handlers against * installing a dirty pte and marking the page dirty - * at this point. We do this by having them hold the - * page lock at some point after installing their - * pte, but before marking the page dirty. - * Pages are always locked coming in here, so we get - * the desired exclusion. See mm/memory.c:do_wp_page() - * for more comments. + * at this point. We do this by having them hold the + * page lock while dirtying the page, and pages are + * always locked coming in here, so we get the desired + * exclusion. */ if (TestClearPageDirty(page)) { dec_zone_page_state(page, NR_FILE_DIRTY);