Patchwork [Bug,25832] kernel crashes when a mounted ext3/4 file system is physically removed

login
register
mail settings
Submitter Theodore Ts'o
Date Sept. 9, 2011, 7:13 p.m.
Message ID <20110909191354.GC3818@thunk.org>
Download mbox | patch
Permalink /patch/114122/
State New
Headers show

Comments

Theodore Ts'o - Sept. 9, 2011, 7:13 p.m.
Bugzilla.kernel.org is down, so apologies to people who have
subscribed to this bug but which I didn't cc explicitly...

Rocko, Alan, could you try this patch and see what happens.  It may be
that we'll crash somewhere else; the problem is that Linux that the
low-level generic hd routines doesn't have a formal way of informing
the VFS and layers below that the disk has disappeared.  It just yanks
it out from under the file system, and we've been manually patching
around kernel crashes....

     	   	 	   	       	    - Ted

commit 6e478d46e58181ec4814f25a2fd91c6323e16ad4
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Fri Sep 9 15:02:54 2011 -0400

    ext4: add ext4-specific kludge to avoid an oops after the disk disappears
    
    The del_gendisk() function uninitializes the disk-specific data
    structures, including the bdi structure, without telling anyone
    else.  Once this happens, any attempt to call mark_buffer_dirty()
    (for example, by ext4_commit_super), will cause a kernel OOPS.
    
    Fix this for now until we can fix things in an architecturally correct
    way.
    
    Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Sept. 9, 2011, 10:10 p.m.
On Fri, 9 Sep 2011, Ted Ts'o wrote:

> Rocko, Alan, could you try this patch and see what happens.  It may be
> that we'll crash somewhere else; the problem is that Linux that the
> low-level generic hd routines doesn't have a formal way of informing
> the VFS and layers below that the disk has disappeared.  It just yanks
> it out from under the file system, and we've been manually patching
> around kernel crashes....

I confirm that this patch fixes the issue on with my test script.  The 
unmounts occurred with no apparent problems.

However you probably should make the same change to the ext3 driver, 
because it has exactly the same issue and some people may still be 
using it.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - Sept. 10, 2011, 2:06 p.m.
On Sat, Sep 10, 2011 at 01:49:44AM +0000, Rocko Requin wrote:
> 
> The patch does stop my console-only test script from crashing the
> kernel (thanks for figuring this patch out, Ted!), but if I try it
> from a desktop, the desktop still freezes after two or three
> bind/unbind iterations. So I guess there must be another way to try
> and access the missing file system that also need patching.

Can you get stack traces or register information?  Via sysrq-t /
sysrq-p?  This might require setting up serial console on your desktop
if you don't have kernel mode switching set up so you can switch away
from the X server.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Sept. 10, 2011, 6:07 p.m.
On Fri, 9 Sep 2011, Ted Ts'o wrote:

> commit 6e478d46e58181ec4814f25a2fd91c6323e16ad4
> Author: Theodore Ts'o <tytso@mit.edu>
> Date:   Fri Sep 9 15:02:54 2011 -0400
> 
>     ext4: add ext4-specific kludge to avoid an oops after the disk disappears
>     
>     The del_gendisk() function uninitializes the disk-specific data
>     structures, including the bdi structure, without telling anyone
>     else.  Once this happens, any attempt to call mark_buffer_dirty()
>     (for example, by ext4_commit_super), will cause a kernel OOPS.
>     
>     Fix this for now until we can fix things in an architecturally correct
>     way.
>     
>     Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

Further testing revealed the following problem.  I changed the test 
script so that after the USB device is unbound, the script tries to 
write a file before unmounting the ext4 filesystem.

There was no drastic failure; the unregistered bdi structure wasn't
accessed.  But lockdep complained.  This is what I got:

[  166.932194] end_request: I/O error, dev uba, sector 136
[  166.940903] EXT4-fs error (device uba): ext4_find_entry:934: inode #2: comm sh: reading directory lblock 0
[  166.949284] end_request: I/O error, dev uba, sector 164
[  166.952084] EXT4-fs error (device uba): ext4_read_inode_bitmap:161: comm sh: Cannot read inode bitmap - block_group = 0, inode_bitmap = 82
[  166.952906] EXT4-fs error (device uba) in ext4_new_inode:1073: IO failure
[  166.953357] 
[  166.953381] =============================================
[  166.953624] [ INFO: possible recursive locking detected ]
[  166.953958] 3.1.0-rc4 #34
[  166.954099] ---------------------------------------------
[  166.954295] sh/819 is trying to acquire lock:
[  166.954613]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c1101290>] ext4_evict_inode+0x17/0x288
[  166.955947] 
[  166.955969] but task is already holding lock:
[  166.956281]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c10aeb45>] do_last+0x165/0x4ff
[  166.956586] 
[  166.956586] other info that might help us debug this:
[  166.956586]  Possible unsafe locking scenario:
[  166.956586] 
[  166.956586]        CPU0
[  166.956586]        ----
[  166.956586]   lock(&sb->s_type->i_mutex_key);
[  166.956586]   lock(&sb->s_type->i_mutex_key);
[  166.956586] 
[  166.956586]  *** DEADLOCK ***
[  166.956586] 
[  166.956586]  May be due to missing lock nesting notation
[  166.956586] 
[  166.956586] 2 locks held by sh/819:
[  166.956586]  #0:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<c10aeb45>] do_last+0x165/0x4ff
[  166.956586]  #1:  (jbd2_handle){+.+...}, at: [<c112469f>] start_this_handle+0x3c2/0x41e
[  166.956586] 
[  166.956586] stack backtrace:
[  166.956586] Pid: 819, comm: sh Not tainted 3.1.0-rc4 #34
[  166.956586] Call Trace:
[  166.956586]  [<c135f26e>] ? printk+0xf/0x11
[  166.956586]  [<c105223c>] __lock_acquire+0x875/0xbe7
[  166.956586]  [<c1361600>] ? _raw_spin_unlock_irq+0x2d/0x30
[  166.956586]  [<c105183a>] ? mark_lock+0x26/0x1b3
[  166.956586]  [<c105183a>] ? mark_lock+0x26/0x1b3
[  166.956586]  [<c1052944>] lock_acquire+0x59/0x70
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c13601f9>] __mutex_lock_common+0x38/0x2d4
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c1360573>] mutex_lock_nested+0x32/0x3b
[  166.956586]  [<c1101290>] ? ext4_evict_inode+0x17/0x288
[  166.956586]  [<c1101290>] ext4_evict_inode+0x17/0x288
[  166.956586]  [<c10b5f63>] evict+0x7b/0x11c
[  166.956586]  [<c10b6136>] iput+0x132/0x137
[  166.956586]  [<c10fc467>] ext4_new_inode+0xa53/0xa92
[  166.956586]  [<c1108942>] ? ext4_journal_start_sb+0xdd/0xec
[  166.956586]  [<c10b4afb>] ? d_splice_alias+0xa9/0xb1
[  166.956586]  [<c11045ec>] ext4_create+0xa6/0x10b
[  166.956586]  [<c10ae2d7>] vfs_create+0x61/0x7b
[  166.956586]  [<c10aebd7>] do_last+0x1f7/0x4ff
[  166.956586]  [<c10aefa1>] path_openat+0x9d/0x2b7
[  166.956586]  [<c1052636>] ? lock_release_non_nested+0x88/0x1f7
[  166.956586]  [<c10af1f3>] do_filp_open+0x21/0x5d
[  166.956586]  [<c1361666>] ? _raw_spin_unlock+0x1d/0x2a
[  166.956586]  [<c10b78b1>] ? alloc_fd+0xc0/0xcb
[  166.956586]  [<c10a4207>] do_sys_open+0x54/0xcd
[  166.956586]  [<c10a429e>] sys_open+0x1e/0x26
[  166.956586]  [<c1361820>] syscall_call+0x7/0xb
[  167.175766] end_request: I/O error, dev uba, sector 16534
[  167.177204] Aborting journal on device uba-8.
[  167.179255] end_request: I/O error, dev uba, sector 16516
[  167.179768] Buffer I/O error on device uba, logical block 8258
[  167.179983] lost page write due to I/O error on uba
[  167.180866] JBD2: I/O error detected when updating journal superblock for uba-8.
[  167.181956] journal commit I/O error
[  167.195334] EXT4-fs error (device uba): ext4_put_super:817: Couldn't clean up the journal
[  167.195777] EXT4-fs (uba): Remounting filesystem read-only

It appears to be an unrelated error, but worth looking at.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alan Stern - Sept. 12, 2011, 1:58 a.m.
Ted:

You also ought to look at this bug report and the follow-up thread.  
The symptoms are similar, although not exactly the same.

	http://marc.info/?l=linux-kernel&m=131504588401397&w=2
	([BUG] D state process after unplug and umount usb disk)

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index ee2f74a..48cb615 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -414,6 +414,22 @@  static void save_error_info(struct super_block *sb, const char *func,
 	ext4_commit_super(sb, 1);
 }
 
+/*
+ * The del_gendisk() function deactivates the inode and deactivates
+ * the bdi without telling the file system.  Once this happens, any
+ * attempt to call mark_buffer_dirty() (for example, by
+ * ext4_commit_super), will cause a kernel OOPS.  This is a kludge to
+ * prevent these oops until we can put in a proper hook in
+ * del_gendisk() to inform the VFS and file system layers.
+ */
+static int block_device_ejected(struct super_block *sb)
+{
+	struct inode *bd_inode = sb->s_bdev->bd_inode;
+	struct backing_dev_info *bdi = bd_inode->i_mapping->backing_dev_info;
+
+	return bdi->dev == NULL;
+}
+
 
 /* Deal with the reporting of failure conditions on a filesystem such as
  * inconsistencies detected or read IO failures.
@@ -4072,7 +4088,7 @@  static int ext4_commit_super(struct super_block *sb, int sync)
 	struct buffer_head *sbh = EXT4_SB(sb)->s_sbh;
 	int error = 0;
 
-	if (!sbh)
+	if (!sbh || block_device_ejected(sb))
 		return error;
 	if (buffer_write_io_error(sbh)) {
 		/*