diff mbox

[1/1] ext4: fix hole punch failure when depth is greater than 0

Message ID 1338727489-16867-1-git-send-email-ashish.sangwan2@gmail.com
State Superseded, archived
Headers show

Commit Message

Ashish Sangwan June 3, 2012, 12:44 p.m. UTC
Currently, hole punch will not succeed in case the depth of extent tree is 
greater than 0 and the last block to be removed is part of the extent which belongs
to any but the last extent index.

The current implementation of hole punch uses the same code path as for truncate in which 
the block removal starts from the end of the file and continues backwards removing 
extents and in the process decreases the count of valid entries in the extent header of extent index.
If all the extents under current extent index are removed than this index is also removed 
and the number of valid entries in the "depth -1" extent header is decreased by 1.

Whether to continue removing extents or not is decided by the return value of function
ext4_ext_more_to_rm() which checks 2 conditions a) if the number of entries are decreased 
in the header of "depth -1" or b) if there are no more indexes to process.

In case of hole punch, if the last block to be removed is not part of the last extent index
than this index will not be deleted, hence the number of valid entries in the extent
header of "depth - 1" will remain as it is and ext4_ext_more_to_rm will return 0 and the control
would come out of ext4_ext_remove_space although the required blocks are not yet removed.

This patch fixes the above mentioned problem as instead of starting removing the extents from the 
end of file, it starts removing the blocks from the particular extent from which removing blocks is
actually required and continues backward until done.

Signed-off-by: Ashish Sangwan <ashish.sangwan2@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
---
 fs/ext4/extents.c |   36 +++++++++++++++++++++---------------
 1 files changed, 21 insertions(+), 15 deletions(-)

Comments

Theodore Ts'o June 8, 2012, 2:52 a.m. UTC | #1
I tried applying this patch, and it causes fsstress to die in xfstests
#13.  I took a quick look, but the truncate/punch codepaths are one of
the more obscure parts of ext4, and it's not obvious what causing the
null dereference, which at first glance looks like is coming from
path->b_bh->b_data.

Also, please note that commit descriptions should be wrapped around
70-72 columns, so that "git log" is reasonable to look at on 80 column
terminals.

						- Ted

013	[   62.356981] BUG: unable to handle kernel NULL pointer dereference at 00000019
[   62.357880] IP: [<c029413f>] __ext4_ext_dirty+0x1c/0x52
[   62.358836] *pdpt = 00000000352d2001 *pde = 0000000000000000 
[   62.359528] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[   62.360046] Modules linked in:
[   62.360046] 
[   62.360046] Pid: 7178, comm: fsstress Not tainted 3.5.0-rc1-00003-g69e292a-dirty #123 Bochs Bochs
[   62.360046] EIP: 0060:[<c029413f>] EFLAGS: 00010202 CPU: 0
[   62.360046] EIP is at __ext4_ext_dirty+0x1c/0x52
[   62.360046] EAX: 00000001 EBX: 000009c9 ECX: f603b948 EDX: 000009c9
[   62.360046] ESI: c07234c4 EDI: f6bc05a0 EBP: efd7ddd4 ESP: efd7ddc4
[   62.360046]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[   62.360046] CR0: 8005003b CR2: 00000019 CR3: 352d1000 CR4: 000006f0
[   62.360046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   62.360046] DR6: ffff0ff0 DR7: 00000400
[   62.360046] Process fsstress (pid: 7178, ti=efd7c000 task=f69214b0 task.ti=efd7c000)
[   62.360046] Stack:
[   62.360046]  f627f1ec 00000050 f627f1ec 00000007 efd7de8c c02962b9 f627f2e0 f6bc05a0
[   62.360046]  00000008 ffffffff ffffffff 80504000 f603b948 f62f1150 00000000 00000fe3
[   62.360046]  00010c15 00000000 f603b948 c071edb4 f5da6000 00000001 f627f1ec 00000012
[   62.360046] Call Trace:
[   62.360046]  [<c02962b9>] ext4_ext_remove_space+0x6c2/0xaeb
[   62.360046]  [<c0278998>] ? ext4_reserve_inode_write+0x38/0x6a
[   62.360046]  [<c0296bd3>] ext4_ext_truncate+0x127/0x199
[   62.360046]  [<c027a165>] ext4_truncate+0x7d/0xac
[   62.360046]  [<c027a483>] ext4_setattr+0x2ef/0x377
[   62.360046]  [<c0222636>] notify_change+0x1a6/0x27a
[   62.360046]  [<c020e69d>] do_truncate+0x69/0x82
[   62.360046]  [<c035094b>] ? security_inode_permission+0x1c/0x22
[   62.360046]  [<c020e8f7>] do_sys_truncate+0x12f/0x135
[   62.360046]  [<c020e913>] sys_truncate64+0x16/0x18
[   62.360046]  [<c06ec7c5>] syscall_call+0x7/0xb
[   62.360046]  [<c06e0000>] ? pci_acpi_scan_root+0x12a/0x258
[   62.360046] Code: 89 d1 ba 8b 00 00 00 e8 3b 4e 00 00 5a c9 c3 55 89 e5 57 56 53 83 ec 04 3e 8d 74 26 00 8b 7d 0c 89 c6 89 d3 8b 47 18 85 c0 74 24 <8b> 50 18 8b 45 08 89 4d f0 e8 6d ff ff ff 8b 4d f0 89 da 89 f0 
[   62.360046] EIP: [<c029413f>] __ext4_ext_dirty+0x1c/0x52 SS:ESP 0068:efd7ddc4
[   62.360046] CR2: 0000000000000019
[   62.395961] ---[ end trace d13d52b79a9cd4d5 ]---

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ashish Sangwan June 12, 2012, 5:47 a.m. UTC | #2
Hi Ted,

Thanks for checking the patch.
The reason for the crash is the "again" tag at the end of function
ext4_ext_remove_space() which restarts the truncate operation.
Setting path = NULL here, before doing truncate again solves the problem.
out:
 	ext4_ext_drop_refs(path);
 	kfree(path);
+	path = NULL;   <= Required fix
 	if (err == -EAGAIN)
 		goto again;

We will again submit proper patch for it.

On Fri, Jun 8, 2012 at 8:22 AM, Ted Ts'o <tytso@mit.edu> wrote:
> I tried applying this patch, and it causes fsstress to die in xfstests
> #13.  I took a quick look, but the truncate/punch codepaths are one of
> the more obscure parts of ext4, and it's not obvious what causing the
> null dereference, which at first glance looks like is coming from
> path->b_bh->b_data.
>
> Also, please note that commit descriptions should be wrapped around
> 70-72 columns, so that "git log" is reasonable to look at on 80 column
> terminals.
>
>                                                - Ted
>
> 013     [   62.356981] BUG: unable to handle kernel NULL pointer dereference at 00000019
> [   62.357880] IP: [<c029413f>] __ext4_ext_dirty+0x1c/0x52
> [   62.358836] *pdpt = 00000000352d2001 *pde = 0000000000000000
> [   62.359528] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> [   62.360046] Modules linked in:
> [   62.360046]
> [   62.360046] Pid: 7178, comm: fsstress Not tainted 3.5.0-rc1-00003-g69e292a-dirty #123 Bochs Bochs
> [   62.360046] EIP: 0060:[<c029413f>] EFLAGS: 00010202 CPU: 0
> [   62.360046] EIP is at __ext4_ext_dirty+0x1c/0x52
> [   62.360046] EAX: 00000001 EBX: 000009c9 ECX: f603b948 EDX: 000009c9
> [   62.360046] ESI: c07234c4 EDI: f6bc05a0 EBP: efd7ddd4 ESP: efd7ddc4
> [   62.360046]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> [   62.360046] CR0: 8005003b CR2: 00000019 CR3: 352d1000 CR4: 000006f0
> [   62.360046] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [   62.360046] DR6: ffff0ff0 DR7: 00000400
> [   62.360046] Process fsstress (pid: 7178, ti=efd7c000 task=f69214b0 task.ti=efd7c000)
> [   62.360046] Stack:
> [   62.360046]  f627f1ec 00000050 f627f1ec 00000007 efd7de8c c02962b9 f627f2e0 f6bc05a0
> [   62.360046]  00000008 ffffffff ffffffff 80504000 f603b948 f62f1150 00000000 00000fe3
> [   62.360046]  00010c15 00000000 f603b948 c071edb4 f5da6000 00000001 f627f1ec 00000012
> [   62.360046] Call Trace:
> [   62.360046]  [<c02962b9>] ext4_ext_remove_space+0x6c2/0xaeb
> [   62.360046]  [<c0278998>] ? ext4_reserve_inode_write+0x38/0x6a
> [   62.360046]  [<c0296bd3>] ext4_ext_truncate+0x127/0x199
> [   62.360046]  [<c027a165>] ext4_truncate+0x7d/0xac
> [   62.360046]  [<c027a483>] ext4_setattr+0x2ef/0x377
> [   62.360046]  [<c0222636>] notify_change+0x1a6/0x27a
> [   62.360046]  [<c020e69d>] do_truncate+0x69/0x82
> [   62.360046]  [<c035094b>] ? security_inode_permission+0x1c/0x22
> [   62.360046]  [<c020e8f7>] do_sys_truncate+0x12f/0x135
> [   62.360046]  [<c020e913>] sys_truncate64+0x16/0x18
> [   62.360046]  [<c06ec7c5>] syscall_call+0x7/0xb
> [   62.360046]  [<c06e0000>] ? pci_acpi_scan_root+0x12a/0x258
> [   62.360046] Code: 89 d1 ba 8b 00 00 00 e8 3b 4e 00 00 5a c9 c3 55 89 e5 57 56 53 83 ec 04 3e 8d 74 26 00 8b 7d 0c 89 c6 89 d3 8b 47 18 85 c0 74 24 <8b> 50 18 8b 45 08 89 4d f0 e8 6d ff ff ff 8b 4d f0 89 da 89 f0
> [   62.360046] EIP: [<c029413f>] __ext4_ext_dirty+0x1c/0x52 SS:ESP 0068:efd7ddc4
> [   62.360046] CR2: 0000000000000019
> [   62.395961] ---[ end trace d13d52b79a9cd4d5 ]---
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 91341ec..68e57ad 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2570,10 +2570,10 @@  static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
 {
 	struct super_block *sb = inode->i_sb;
 	int depth = ext_depth(inode);
-	struct ext4_ext_path *path;
+	struct ext4_ext_path *path = NULL;
 	ext4_fsblk_t partial_cluster = 0;
 	handle_t *handle;
-	int i, err;
+	int i = 0, err;
 
 	ext_debug("truncate since %u to %u\n", start, end);
 
@@ -2637,8 +2637,6 @@  again:
 			if (err < 0)
 				goto out;
 		}
-		ext4_ext_drop_refs(path);
-		kfree(path);
 	}
 cont:
 
@@ -2647,19 +2645,27 @@  cont:
 	 * after i_size and walking into the tree depth-wise.
 	 */
 	depth = ext_depth(inode);
-	path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1), GFP_NOFS);
-	if (path == NULL) {
-		ext4_journal_stop(handle);
-		return -ENOMEM;
-	}
-	path[0].p_depth = depth;
-	path[0].p_hdr = ext_inode_hdr(inode);
+	if (path) {
+		i = depth;
+		if (depth > 0)
+			path[depth - 1].p_block =
+			le16_to_cpu(path[depth - 1].p_hdr->eh_entries)+1;
+	} else {
+		path =
+		kzalloc(sizeof(struct ext4_ext_path) * (depth + 1), GFP_NOFS);
+		if (path == NULL) {
+			ext4_journal_stop(handle);
+			return -ENOMEM;
+		}
+		path[0].p_depth = depth;
+		path[0].p_hdr = ext_inode_hdr(inode);
 
-	if (ext4_ext_check(inode, path[0].p_hdr, depth)) {
-		err = -EIO;
-		goto out;
+		if (ext4_ext_check(inode, path[0].p_hdr, depth)) {
+			err = -EIO;
+			goto out;
+		}
 	}
-	i = err = 0;
+	err = 0;
 
 	while (i >= 0 && err == 0) {
 		if (i == depth) {