From patchwork Wed Mar 20 01:29:19 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 229239 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id DC0932C00A2 for ; Wed, 20 Mar 2013 12:29:25 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757449Ab3CTB3Z (ORCPT ); Tue, 19 Mar 2013 21:29:25 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:54447 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754829Ab3CTB3Y (ORCPT ); Tue, 19 Mar 2013 21:29:24 -0400 Received: from root (helo=closure.thunk.org) by imap.thunk.org with local-esmtp (Exim 4.80) (envelope-from ) id 1UI7qC-0004pm-CT; Wed, 20 Mar 2013 01:29:20 +0000 Received: by closure.thunk.org (Postfix, from userid 15806) id 201953608F7; Tue, 19 Mar 2013 21:29:19 -0400 (EDT) From: Theodore Ts'o To: Ext4 Developers List Cc: Theodore Ts'o , Jan Kara Subject: [PATCH] ext4: fix ext4_evict_inode() racing against workqueue processing code Date: Tue, 19 Mar 2013 21:29:19 -0400 Message-Id: <1363742959-12815-1-git-send-email-tytso@mit.edu> X-Mailer: git-send-email 1.7.12.rc0.22.gcdd159b X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Commit 84c17543ab56 (ext4: move work from io_end to inode) triggered a regression when running xfstest #270 when the file system is mounted with dioread_nolock. The problem is that after ext4_evict_inode() calls ext4_ioend_wait(), this guarantees that last io_end structure has been freed, but it does not guarantee that the workqueue structure, which was moved into the inode by commit 84c17543ab56, is actually finished. Once ext4_flush_completed_IO() calls ext4_free_io_end() on CPU #1, this will allow ext4_ioend_wait() to return on CPU #2, at which point the evict_inode() codepath can race against the workqueue code on CPU #1 accessing EXT4_I(inode)->i_unwritten_work to find the next item of work to do. Fix this by calling flush_work() if the work structure is still pending in ext$_ioend_wait(). Also, move the call to ext4_ioend_wait() until after truncate_inode_pages() and filemap_write_and_wait() are called, to make sure all dirty pages have been written back and flushed from the page cache first. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] cwq_activate_delayed_work+0x3b/0x7e *pdpt = 0000000030bc3001 *pde = 0000000000000000 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: Pid: 6, comm: kworker/u:0 Not tainted 3.8.0-rc3-00013-g84c1754-dirty #91 Bochs Bochs EIP: 0060:[] EFLAGS: 00010046 CPU: 0 EIP is at cwq_activate_delayed_work+0x3b/0x7e EAX: 00000000 EBX: 00000000 ECX: f505fe54 EDX: 00000000 ESI: ed5b697c EDI: 00000006 EBP: f64b7e8c ESP: f64b7e84 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 30bc2000 CR4: 000006f0 DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 DR6: ffff0ff0 DR7: 00000400 Process kworker/u:0 (pid: 6, ti=f64b6000 task=f64b4160 task.ti=f64b6000) Stack: f505fe00 00000006 f64b7e9c c01de3d7 f6435540 00000003 f64b7efc c01def1d f6435540 00000002 00000000 0000008a c16d0808 c040a10b c16d07d8 c16d08b0 f505fe00 c16d0780 00000000 00000000 ee153df4 c1ce4a30 c17d0e30 00000000 Call Trace: [] cwq_dec_nr_in_flight+0x71/0xfb [] process_one_work+0x5d8/0x637 [] ? ext4_end_bio+0x300/0x300 [] worker_thread+0x249/0x3ef [] kthread+0xd8/0xeb [] ? manage_workers+0x4bb/0x4bb [] ? trace_hardirqs_on+0x27/0x37 [] ret_from_kernel_thread+0x1b/0x28 [] ? __init_kthread_worker+0x71/0x71 Code: 01 83 15 ac ff 6c c1 00 31 db 89 c6 8b 00 a8 04 74 12 89 c3 30 db 83 05 b0 ff 6c c1 01 83 15 b4 ff 6c c1 00 89 f0 e8 42 ff ff ff <8b> 13 89 f0 83 05 b8 ff 6c c1 6c c1 00 31 c9 83 EIP: [] cwq_activate_delayed_work+0x3b/0x7e SS:ESP 0068:f64b7e84 CR2: 0000000000000000 ---[ end trace a1923229da53d8a4 ]--- Signed-off-by: "Theodore Ts'o" Cc: Jan Kara --- fs/ext4/inode.c | 4 ++-- fs/ext4/page-io.c | 10 ++++++++++ 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 65bbc93..15e0da1 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -185,8 +185,6 @@ void ext4_evict_inode(struct inode *inode) trace_ext4_evict_inode(inode); - ext4_ioend_wait(inode); - if (inode->i_nlink) { /* * When journalling data dirty buffers are tracked only in the @@ -216,6 +214,7 @@ void ext4_evict_inode(struct inode *inode) filemap_write_and_wait(&inode->i_data); } truncate_inode_pages(&inode->i_data, 0); + ext4_ioend_wait(inode); goto no_delete; } @@ -225,6 +224,7 @@ void ext4_evict_inode(struct inode *inode) if (ext4_should_order_data(inode)) ext4_begin_ordered_truncate(inode, 0); truncate_inode_pages(&inode->i_data, 0); + ext4_ioend_wait(inode); if (is_bad_inode(inode)) goto no_delete; diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c index 809b310..8a004a4 100644 --- a/fs/ext4/page-io.c +++ b/fs/ext4/page-io.c @@ -50,11 +50,21 @@ void ext4_exit_pageio(void) kmem_cache_destroy(io_page_cachep); } +/* + * Called by ext4_evict_inode() to make sure there are no pending I/O + * completion work left to do. + */ void ext4_ioend_wait(struct inode *inode) { wait_queue_head_t *wq = ext4_ioend_wq(inode); wait_event(*wq, (atomic_read(&EXT4_I(inode)->i_ioend_count) == 0)); + /* + * We need to make sure the work structure is finished before + * we let the inode get destroyed by ext4_evict_inode() + */ + if (work_pending(&EXT4_I(inode)->i_unwritten_work)) + flush_work(&EXT4_I(inode)->i_unwritten_work); } static void put_io_page(struct ext4_io_page *io_page)