diff mbox

[2/3] ext4: remove incorrect check for inode journal mode in ext4_writepages()

Message ID 1447810474-14840-2-git-send-email-daeho.jeong@samsung.com
State New, archived
Headers show

Commit Message

Daeho Jeong Nov. 18, 2015, 1:34 a.m. UTC
Now, in ext4, there is only one writepages() function and it is shared
by all the inode modes. Therefore, BUG_ON() for checking journaled
inode mode in ext4_writepages() is not correct anymore because, if
per-file data journaling of a file is enabled while ext4_writepages()
is being executed, this BUG_ON() function can cause a kernel panic
unintentionally even on "nodelalloc" mode.

Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
---
 fs/ext4/inode.c |    6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Jan Kara Nov. 30, 2015, 2:08 p.m. UTC | #1
On Wed 18-11-15 10:34:33, Daeho Jeong wrote:
> Now, in ext4, there is only one writepages() function and it is shared
> by all the inode modes. Therefore, BUG_ON() for checking journaled
> inode mode in ext4_writepages() is not correct anymore because, if
> per-file data journaling of a file is enabled while ext4_writepages()
> is being executed, this BUG_ON() function can cause a kernel panic
> unintentionally even on "nodelalloc" mode.
> 
> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com>
> ---
>  fs/ext4/inode.c |    6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 1f9458e..db24348 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2480,13 +2480,11 @@ retry:
>  		}
>  
>  		/*
> -		 * We have two constraints: We find one extent to map and we
> +		 * We have a constraint: We find one extent to map and we
>  		 * must always write out whole page (makes a difference when
>  		 * blocksize < pagesize) so that we don't block on IO when we
> -		 * try to write out the rest of the page. Journalled mode is
> -		 * not supported by delalloc.
> +		 * try to write out the rest of the page.
>  		 */

Well, it is still true that journalled mode is not supported by delalloc so
I would not delete the comment.

Actually what you do only hides the real problem - that ext4_writepages()
in non-journalled mode can be still running for an inode which is already
switched to journalled mode. In theory if we manage to dirty some pages
after the switch, non-journalled writepages *can* see them an try to write
them back which will break spectacularly. So to fix this we need something
like a writeback barrier for the inode - make sure all ext4_writepages()
calls have completed before switching aops. Now I'd hate to grow struct
ext4_inode_info only for this extra rare case so we could probably
implement the barrier on per-filesystem basis - a fs-wide per-cpu rw
semaphore acquired for reading while ext4_writepages() runs and acquired
for writing when we switch aops for some inode.

								Honza
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1f9458e..db24348 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2480,13 +2480,11 @@  retry:
 		}
 
 		/*
-		 * We have two constraints: We find one extent to map and we
+		 * We have a constraint: We find one extent to map and we
 		 * must always write out whole page (makes a difference when
 		 * blocksize < pagesize) so that we don't block on IO when we
-		 * try to write out the rest of the page. Journalled mode is
-		 * not supported by delalloc.
+		 * try to write out the rest of the page.
 		 */
-		BUG_ON(ext4_should_journal_data(inode));
 		needed_blocks = ext4_da_writepages_trans_blocks(inode);
 
 		/* start a new transaction */