diff mbox

fix jbd2 asynchronous commit

Message ID 14b201d10e33$fb91c390$f2b54ab0$@gmail.com
State Rejected, archived
Headers show

Commit Message

Eunji Lee Oct. 24, 2015, 8:14 a.m. UTC
This patch is to fix a (seemingly) bug of asynchronous commit process
(pinging w/ a patch). 

In asynchronous commit, JBD2 issues a "FLUSH" request only once after
issuing a commit record if a file system and a journal reside on the same
device. This can reduce redundant storage flushes relying on the checksum. 

However, it seems to incur an undesirable result in ordered mode.
Specifically, the system can crash after metadata blocks and a commit record
are successfully written to the journal (i.e., with no checksum error), but
before data blocks are reflected to the file system. Then, on the system
recovery, metadata updates in a journal area will be replayed as the
checksum has no error, even though the associated data blocks are not
reflected to the file system. This fails to provide the correct ordering -
associated data blocks are written before metadata updates are written. 

This patch flushes a storage device before issuing a commit record, even if
a journal and a file system use a same device, thereby guaranteeing the
correct ordering between metadata and data block updates. 

Without asynchronous commit, it does not matter because the commit record
with WRITE_FUA guarantees to flush storage cache before the commit record is
written to the journal. 


Signed-off-by: Eunji Lee <alicia0729@gmail.com>

---
 fs/jbd2/commit.c |    1 -
 1 file changed, 1 deletion(-)

--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kara Oct. 24, 2015, 7:40 p.m. UTC | #1
On Sat 24-10-15 17:14:15, Eunji Lee wrote:
> This patch is to fix a (seemingly) bug of asynchronous commit process
> (pinging w/ a patch). 
> 
> In asynchronous commit, JBD2 issues a "FLUSH" request only once after
> issuing a commit record if a file system and a journal reside on the same
> device. This can reduce redundant storage flushes relying on the checksum. 
> 
> However, it seems to incur an undesirable result in ordered mode.
> Specifically, the system can crash after metadata blocks and a commit record
> are successfully written to the journal (i.e., with no checksum error), but
> before data blocks are reflected to the file system. Then, on the system
> recovery, metadata updates in a journal area will be replayed as the
> checksum has no error, even though the associated data blocks are not
> reflected to the file system. This fails to provide the correct ordering -
> associated data blocks are written before metadata updates are written. 
> 
> This patch flushes a storage device before issuing a commit record, even if
> a journal and a file system use a same device, thereby guaranteeing the
> correct ordering between metadata and data block updates. 
> 
> Without asynchronous commit, it does not matter because the commit record
> with WRITE_FUA guarantees to flush storage cache before the commit record is
> written to the journal. 

Thanks for your report and detailed analysis. However this bug is already
fixed - parse_options() in fs/ext4/super.c already makes sure that
journal_async_commit cannot be enabled in data=ordered mode.

								Honza
diff mbox

Patch

diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 362e5f6..118b377 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -792,7 +792,6 @@  start_journal_io:
     * the commit record
     */
    if (commit_transaction->t_need_data_flush &&
-       (journal->j_fs_dev != journal->j_dev) &&
        (journal->j_flags & JBD2_BARRIER))
        blkdev_issue_flush(journal->j_fs_dev, GFP_NOFS, NULL);