diff mbox series

[RFC,03/11] ext4: data=journal: call ext4_force_commit() in ext4_writepages() for msync()

Message ID 20200423233705.5878-4-mfo@canonical.com
State Superseded
Headers show
Series ext4: data=journal: writeback mmap'ed pagecache | expand

Commit Message

Mauricio Faria de Oliveira April 23, 2020, 11:36 p.m. UTC
The data-integrity syscalls (not memory-cleansing writeback)
use file_write_and_wait_range() that wait_on_page_writeback()
for every page in the range after do_writepages().

If any of these pages is mmap'ed pagecache, i.e., goes into
__ext4_journalled_writepage(), with the last couple patches
end_page_writeback() will be done on (or, not be done until)
transaction commit, which can take seconds (commit interval,
max commit age), which delays msync().

Let's fix this so that msync() syscall should just return
quickly without a delay of up to a few seconds by default.

For data=journal the next thing these syscalls do anyway is
ext4_force_commit() (see ext4_sync_file()), which is needed
for the buffered pagecache, as __filemap_fdatawrite_range()
doesn't do anything: the buffers are clean, so it returns
early without calling do_writepages() / ext4_write_pages().
So it's not possible to just move/replace that call here.

(This is better/more correct than to use ext4_handle_sync()
for mmap'ed pagecache, which triggers one commit per page,
as synchronous transaction batching in jbd2 targets other,
concurrent tasks, but in this case one single task writes
all pages back serially.)

Now for memory-cleansing writeback, even though it is not
supposed to wait, we should not wait for seconds either,
as it could delay an upcoming data-integrity syscall in
write_cache_pages() on a call to wait_on_page_writeback().
(another fix is needed for such calls to ext4_writepage()).

So just do not check for wbc->sync_mode to cover it too.

Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com>
---
 fs/ext4/inode.c | 30 ++++++++++++++++++++++++++++++
 1 file changed, 30 insertions(+)
diff mbox series

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d385a11ba31e..574a062b8bcd 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2709,7 +2709,37 @@  static int ext4_writepages(struct address_space *mapping,
 		goto out_writepages;
 
 	if (ext4_should_journal_data(inode)) {
+		journal_t *journal = sbi->s_journal;
+
 		ret = generic_writepages(mapping, wbc);
+		/*
+		 * On the data-integrity syscalls, file_write_and_wait_range()
+		 * will wait on page writeback after calling ext4_writepages().
+		 * For mmaped pagecache that only ends on transaction commit,
+		 * which may take up to commit interval (seconds!) to happen.
+		 *
+		 * So, ensure that ext4_force_commit() happens before return,
+		 * and after all pages in the range are set_page_writeback(),
+		 * but only if needed (i.e. check for datasync transaction
+		 * set in the inode by __ext4_journalled_writepage().)
+		 *
+		 * Do it for memory-cleasing writeback too, because it might
+		 * delay another data-integrity syscall in write_cache_pages()
+		 * on wait_on_page_writeback().
+		 */
+		if (!ret && journal) {
+			bool force_commit = false;
+
+			read_lock(&journal->j_state_lock);
+			if (journal->j_running_transaction &&
+			    journal->j_running_transaction->t_tid ==
+				EXT4_I(inode)->i_datasync_tid)
+				force_commit = true;
+			read_unlock(&journal->j_state_lock);
+
+			if (force_commit)
+				ret = ext4_force_commit(inode->i_sb);
+		}
 		goto out_writepages;
 	}