diff mbox

[2/2] ext4: optimize ext4_force_commit

Message ID 1364807037-21664-2-git-send-email-dmonakhov@openvz.org
State Superseded, archived
Headers show

Commit Message

Dmitry Monakhov April 1, 2013, 9:03 a.m. UTC
We do not have to use jbd2_journal_force_commit() because it explicitly
start and stop SYNC transaction. This is very suboptimal because the only
users of ext4_force_commit() are ext4_sync_file and ext4_write_inode().
Both functions just want to commit and wait any uncommitted transaction
similar to ext4_sync_fs().

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/super.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

Comments

Theodore Ts'o April 1, 2013, 6:58 p.m. UTC | #1
On Mon, Apr 01, 2013 at 01:03:57PM +0400, Dmitry Monakhov wrote:
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e3e6a06..280a918 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4436,13 +4436,14 @@ static void ext4_clear_journal_err(struct super_block *sb,
>   */
>  int ext4_force_commit(struct super_block *sb)
>  {
> -	journal_t *journal;
> +	tid_t target;
>  
>  	if (sb->s_flags & MS_RDONLY)
>  		return 0;
>  
> -	journal = EXT4_SB(sb)->s_journal;
> -	return ext4_journal_force_commit(journal);
> +	if (jbd2_journal_start_commit(EXT4_SB(sb)->s_journal, &target))
> +		return jbd2_log_wait_commit(EXT4_SB(sb)->s_journal, target);

EXT4_SB(sb)->s_journal is NULL in no-journal mode.  So you need to
check for this --- this is something which is done in
ext4_journal_force_commit().

Since this is the only user of ext4_journal_force_commit(), you might
as well get rid of ext4_journal_force_commit() as part of making this
change.

The other possibility is we perhaps we should just change
jbd2_journal_force_commit() to call jbd2_journal_start_commit() and
jbd2_log_wait_commit(); the only other caller of
jbd2_journal_force_commit() is ocfs2_sync_file(), and it would benefit
from this change as well.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Monakhov April 9, 2013, 2:06 p.m. UTC | #2
On Tue, 9 Apr 2013 17:42:33 +0400, Dmitry Monakhov <dmonakhovopenvz@gmail.com> wrote:
Non-text part: multipart/alternative
> ---------- Forwarded message ----------
> From: "Theodore Ts'o" <tytso@mit.edu>
> Date: Apr 9, 2013 5:31 PM
> Subject: Re: [PATCH 2/2] ext4: optimize ext4_force_commit
> To: "Dmitry Monakhov" <dmonakhovopenvz@gmail.com>
> Cc:
> 
> On Mon, Apr 01, 2013 at 11:10:59PM +0400, Dmitry Monakhov wrote:
> > Yes. I think so too.  Even more jbd2_force_commit_nested already does what
> > we want because we are not hold transaction. I'll send patch tomorrow
> 
> Hi Dmitry,
> 
> Did you send an update for this patch?  I can't seem to find it.  I'm
> wondering if we perhaps got distracted by the big endian patch, and we
> didn't get back to this commit.
Ohh It is appeared that it requires more deep analysis.
For example sync(2) is implicitly broken because ext4_sync_fs() may not
send a flush barrier. So following case is possible:

dd if=/dev/zero of=file oflag=direct bs=1M count=1
sync
POWER_FAILURE-> result in data lost. Because:

sys_write()
-> __generic_file_aio_write
   ->file_update_time
    ->update_time
      -> touch metadata -> ext4_update_inode_fsync_trans

  <# A lot of journal_start/journal_stop# >
 ->submit_bio  

 ->dio_complete

sys_sync()
 -> flush_inodes (may not start a journal)
 -> ext4_sync_fs
    if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
                if (wait)
                        jbd2_log_wait_commit(sbi->s_journal, target);

   But no one guarantee us that we start any transaction since dio was
   completed so barrier will not be send. This means we may lose
   our data on power failure.
At this moment I try to figure up sane data integrity model for
fsync/syncfs. Hope I'll send you patches at the end of the week.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara April 9, 2013, 2:36 p.m. UTC | #3
On Tue 09-04-13 18:06:58, Dmitry Monakhov wrote:
> On Tue, 9 Apr 2013 17:42:33 +0400, Dmitry Monakhov <dmonakhovopenvz@gmail.com> wrote:
> Non-text part: multipart/alternative
> > ---------- Forwarded message ----------
> > From: "Theodore Ts'o" <tytso@mit.edu>
> > Date: Apr 9, 2013 5:31 PM
> > Subject: Re: [PATCH 2/2] ext4: optimize ext4_force_commit
> > To: "Dmitry Monakhov" <dmonakhovopenvz@gmail.com>
> > Cc:
> > 
> > On Mon, Apr 01, 2013 at 11:10:59PM +0400, Dmitry Monakhov wrote:
> > > Yes. I think so too.  Even more jbd2_force_commit_nested already does what
> > > we want because we are not hold transaction. I'll send patch tomorrow
> > 
> > Hi Dmitry,
> > 
> > Did you send an update for this patch?  I can't seem to find it.  I'm
> > wondering if we perhaps got distracted by the big endian patch, and we
> > didn't get back to this commit.
> Ohh It is appeared that it requires more deep analysis.
> For example sync(2) is implicitly broken because ext4_sync_fs() may not
> send a flush barrier. So following case is possible:
> 
> dd if=/dev/zero of=file oflag=direct bs=1M count=1
> sync
> POWER_FAILURE-> result in data lost. Because:
> 
> sys_write()
> -> __generic_file_aio_write
>    ->file_update_time
>     ->update_time
>       -> touch metadata -> ext4_update_inode_fsync_trans
> 
>   <# A lot of journal_start/journal_stop# >
  If we are allocating blocks (as you would in the above example), then
we call ext4_update_inode_fsync_trans() for each allocation. But that's not
really important for your sync(2) example.

>  ->submit_bio  
> 
>  ->dio_complete
> 
> sys_sync()
>  -> flush_inodes (may not start a journal)
>  -> ext4_sync_fs
>     if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
>                 if (wait)
>                         jbd2_log_wait_commit(sbi->s_journal, target);
> 
>    But no one guarantee us that we start any transaction since dio was
>    completed so barrier will not be send. This means we may lose
>    our data on power failure.
  So ext4_sync_file() actually handles this fine - if transaction commit
doesn't send the flush we need, it will send it manually. But you are right
that ext4_sync_fs() needs a similar thing.

								Honza
diff mbox

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e3e6a06..280a918 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4436,13 +4436,14 @@  static void ext4_clear_journal_err(struct super_block *sb,
  */
 int ext4_force_commit(struct super_block *sb)
 {
-	journal_t *journal;
+	tid_t target;
 
 	if (sb->s_flags & MS_RDONLY)
 		return 0;
 
-	journal = EXT4_SB(sb)->s_journal;
-	return ext4_journal_force_commit(journal);
+	if (jbd2_journal_start_commit(EXT4_SB(sb)->s_journal, &target))
+		return jbd2_log_wait_commit(EXT4_SB(sb)->s_journal, target);
+	return 0;
 }
 
 static int ext4_sync_fs(struct super_block *sb, int wait)
@@ -4460,7 +4461,7 @@  static int ext4_sync_fs(struct super_block *sb, int wait)
 	dquot_writeback_dquots(sb, -1);
 	if (jbd2_journal_start_commit(sbi->s_journal, &target)) {
 		if (wait)
-			jbd2_log_wait_commit(sbi->s_journal, target);
+			ret = jbd2_log_wait_commit(sbi->s_journal, target);
 	}
 	return ret;
 }