Message ID | 1306098825-469-2-git-send-email-tytso@mit.edu |
---|---|
State | Accepted, archived |
Headers | show |
On Sun 22-05-11 17:13:45, Ted Tso wrote: > From: Jan Kara <jack@suse.cz> > > jbd2_log_start_commit() returns 1 only when we really start a > transaction. But we also need to wait for a transaction when the > commit is already running. Fix this problem by waiting for > transaction commit unconditionally (which is just a quick check if the > transaction is already committed). > > Also we have to be more careful with sending of a barrier because when > transaction is being committed in parallel to ext4_sync_file() > running, we cannot be sure that the barrier the journalling code sends > happens after we wrote all the data for fsync (note that not every > data writeout needs to trigger metadata changes thus commit of some > metadata changes can be running while other data is still written > out). So use jbd2_will_send_data_barrier() helper to detect the common > cases when we can be sure barrier will be issued by the commit code > and issue the barrier ourselves in the remaining cases. > > [ Modified by tytso so that the external journal cases are handled in > ext4_sync_file() to avoid needlessly issuing extra flush requests in > the data=ordered and data=journalled cases. ] Well, in data=journal case I agree your change will work (but that's your minor concern I guess). In data=ordered it's harder: a) The flush of j_fs_dev in jbd2_journal_commit_transaction() is issued earlier than we set T_COMMIT_RECORD but that's easy to handle. b) Whether we do or don't send the flush in jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks is set. We can't know in advance whether it gets set or not because it depends on whether some inode is in transaction's t_inode_list and inodes can get removed from there when flusher thread has written all the pages and inode has been reclaimed. OTOH this looks like a bug in the commit code anyway - I guess t_flushed_data_blocks (or better named equivalent) should be set in jbd2_journal_file_inode(). Then such variable will also become a reliable indicator whether the data flush is going to be sent or not. I'll update the patch set to reflect this... Honza > Signed-off-by: Jan Kara <jack@suse.cz> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> > --- > fs/ext4/fsync.c | 31 ++++++++++++++++++++----------- > 1 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index 60fe572..b0e03fa 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync) > journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; > int ret; > tid_t commit_tid; > + bool needs_barrier = false; > > J_ASSERT(ext4_journal_current_handle() == NULL); > > @@ -211,22 +212,30 @@ int ext4_sync_file(struct file *file, int datasync) > } > > commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; > - if (jbd2_log_start_commit(journal, commit_tid)) { > + if (journal->j_flags & JBD2_BARRIER) { > /* > * When the journal is on a different device than the > - * fs data disk, we need to issue the barrier in > - * writeback mode. (In ordered mode, the jbd2 layer > - * will take care of issuing the barrier. In > + * fs data disk, when data=writeback, we need to issue > + * a barrier unconditionally. (In ordered mode, the > + * jbd2 layer will take care of issuing the barrier if > + * there were any writes associated with the inode; in > * data=journal, all of the data blocks are written to > * the journal device.) > */ > - if (ext4_should_writeback_data(inode) && > - (journal->j_fs_dev != journal->j_dev) && > - (journal->j_flags & JBD2_BARRIER)) > - blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, > - NULL); > - ret = jbd2_log_wait_commit(journal, commit_tid); > - } else if (journal->j_flags & JBD2_BARRIER) > + if ((journal->j_fs_dev != journal->j_dev) && > + ext4_should_writeback_data(inode)) > + needs_barrier = true; > + else if (!jbd2_trans_will_send_data_barrier(journal, > + commit_tid)) > + /* > + * If the journal layer isn't going to issue > + * the barrier, then we'd better. > + */ > + needs_barrier = true; > + } > + jbd2_log_start_commit(journal, commit_tid); > + ret = jbd2_log_wait_commit(journal, commit_tid); > + if (needs_barrier) > blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > out: > trace_ext4_sync_file_exit(inode, ret); > -- > 1.7.3.1 >
On Mon, May 23, 2011 at 07:17:47PM +0200, Jan Kara wrote: > b) Whether we do or don't send the flush in > jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks > is set. We can't know in advance whether it gets set or not because it > depends on whether some inode is in transaction's t_inode_list and inodes > can get removed from there when flusher thread has written all the pages > and inode has been reclaimed. OTOH this looks like a bug in the commit code > anyway - I guess t_flushed_data_blocks (or better named equivalent) should > be set in jbd2_journal_file_inode(). Then such variable will also become > a reliable indicator whether the data flush is going to be sent or not. Um, I guess I don't see where an inode gets removed from t_inode_list after the writeback daemon is done with an inode? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 23-05-11 15:28:34, Ted Tso wrote: > On Mon, May 23, 2011 at 07:17:47PM +0200, Jan Kara wrote: > > b) Whether we do or don't send the flush in > > jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks > > is set. We can't know in advance whether it gets set or not because it > > depends on whether some inode is in transaction's t_inode_list and inodes > > can get removed from there when flusher thread has written all the pages > > and inode has been reclaimed. OTOH this looks like a bug in the commit code > > anyway - I guess t_flushed_data_blocks (or better named equivalent) should > > be set in jbd2_journal_file_inode(). Then such variable will also become > > a reliable indicator whether the data flush is going to be sent or not. > > Um, I guess I don't see where an inode gets removed from t_inode_list > after the writeback daemon is done with an inode? ext4_evict_inode()->ext4_clear_inode()->jbd2_journal_release_jbd_inode() removes the inode from transaction's list. Note that nothing prevents inode which is still part of the running transaction to be cleaned by a flusher thread and thus inode shrinker can reap it... Honza
diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c index 60fe572..b0e03fa 100644 --- a/fs/ext4/fsync.c +++ b/fs/ext4/fsync.c @@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync) journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; int ret; tid_t commit_tid; + bool needs_barrier = false; J_ASSERT(ext4_journal_current_handle() == NULL); @@ -211,22 +212,30 @@ int ext4_sync_file(struct file *file, int datasync) } commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; - if (jbd2_log_start_commit(journal, commit_tid)) { + if (journal->j_flags & JBD2_BARRIER) { /* * When the journal is on a different device than the - * fs data disk, we need to issue the barrier in - * writeback mode. (In ordered mode, the jbd2 layer - * will take care of issuing the barrier. In + * fs data disk, when data=writeback, we need to issue + * a barrier unconditionally. (In ordered mode, the + * jbd2 layer will take care of issuing the barrier if + * there were any writes associated with the inode; in * data=journal, all of the data blocks are written to * the journal device.) */ - if (ext4_should_writeback_data(inode) && - (journal->j_fs_dev != journal->j_dev) && - (journal->j_flags & JBD2_BARRIER)) - blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, - NULL); - ret = jbd2_log_wait_commit(journal, commit_tid); - } else if (journal->j_flags & JBD2_BARRIER) + if ((journal->j_fs_dev != journal->j_dev) && + ext4_should_writeback_data(inode)) + needs_barrier = true; + else if (!jbd2_trans_will_send_data_barrier(journal, + commit_tid)) + /* + * If the journal layer isn't going to issue + * the barrier, then we'd better. + */ + needs_barrier = true; + } + jbd2_log_start_commit(journal, commit_tid); + ret = jbd2_log_wait_commit(journal, commit_tid); + if (needs_barrier) blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); out: trace_ext4_sync_file_exit(inode, ret);