From patchwork Tue Feb 24 17:14:50 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Theodore Ts'o X-Patchwork-Id: 23627 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 942A0DDE25 for ; Wed, 25 Feb 2009 04:15:15 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756738AbZBXRPO (ORCPT ); Tue, 24 Feb 2009 12:15:14 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756867AbZBXRPO (ORCPT ); Tue, 24 Feb 2009 12:15:14 -0500 Received: from thunk.org ([69.25.196.29]:39942 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756738AbZBXRPM (ORCPT ); Tue, 24 Feb 2009 12:15:12 -0500 Received: from root (helo=closure.thunk.org) by thunker.thunk.org with local-esmtp (Exim 4.50 #1 (Debian)) id 1Lc0s0-0004a7-UT; Tue, 24 Feb 2009 12:15:01 -0500 Received: from tytso by closure.thunk.org with local (Exim 4.69) (envelope-from ) id 1Lc0ru-00027S-H9; Tue, 24 Feb 2009 12:14:54 -0500 From: Theodore Ts'o To: stable@kernel.org Cc: Jan Kara , "Theodore Ts'o" , linux-ext4@vger.kernel.org, ocfs2-devel@oss.oracle.com, mfasheh@suse.de, Dan Carpenter Subject: [STABLE, 2.6.27.y] jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() Date: Tue, 24 Feb 2009 12:14:50 -0500 Message-Id: <1235495694-8116-3-git-send-email-tytso@mit.edu> X-Mailer: git-send-email 1.5.6.3 In-Reply-To: <1235495694-8116-2-git-send-email-tytso@mit.edu> References: <1235495694-8116-1-git-send-email-tytso@mit.edu> <1235495694-8116-2-git-send-email-tytso@mit.edu> X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@mit.edu X-SA-Exim-Scanned: No (on thunker.thunk.org); SAEximRunCond expanded to false Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Jan Kara If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter for pointing to the suspitious code. Signed-off-by: Jan Kara Signed-off-by: "Theodore Ts'o" Acked-by: Joel Becker CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter (cherry picked from commit 7f5aa215088b817add9c71914b83650bdd49f8a9) --- fs/ext4/inode.c | 6 ++++-- fs/jbd2/transaction.c | 42 +++++++++++++++++++++++++++++++----------- fs/ocfs2/journal.h | 13 +++++++++++++ include/linux/jbd2.h | 3 ++- 4 files changed, 50 insertions(+), 14 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 6e7f085..7b063d4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -46,8 +46,10 @@ static inline int ext4_begin_ordered_truncate(struct inode *inode, loff_t new_size) { - return jbd2_journal_begin_ordered_truncate(&EXT4_I(inode)->jinode, - new_size); + return jbd2_journal_begin_ordered_truncate( + EXT4_SB(inode->i_sb)->s_journal, + &EXT4_I(inode)->jinode, + new_size); } static void ext4_invalidatepage(struct page *page, unsigned long offset); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index e5d5405..92d77c9 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2049,26 +2049,46 @@ done: } /* - * This function must be called when inode is journaled in ordered mode - * before truncation happens. It starts writeout of truncated part in - * case it is in the committing transaction so that we stand to ordered - * mode consistency guarantees. + * File truncate and transaction commit interact with each other in a + * non-trivial way. If a transaction writing data block A is + * committing, we cannot discard the data by truncate until we have + * written them. Otherwise if we crashed after the transaction with + * write has committed but before the transaction with truncate has + * committed, we could see stale data in block A. This function is a + * helper to solve this problem. It starts writeout of the truncated + * part in case it is in the committing transaction. + * + * Filesystem code must call this function when inode is journaled in + * ordered mode before truncation happens and after the inode has been + * placed on orphan list with the new inode size. The second condition + * avoids the race that someone writes new data and we start + * committing the transaction after this function has been called but + * before a transaction for truncate is started (and furthermore it + * allows us to optimize the case where the addition to orphan list + * happens in the same transaction as write --- we don't have to write + * any data in such case). */ -int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, +int jbd2_journal_begin_ordered_truncate(journal_t *journal, + struct jbd2_inode *jinode, loff_t new_size) { - journal_t *journal; - transaction_t *commit_trans; + transaction_t *inode_trans, *commit_trans; int ret = 0; - if (!inode->i_transaction && !inode->i_next_transaction) + /* This is a quick check to avoid locking if not necessary */ + if (!jinode->i_transaction) goto out; - journal = inode->i_transaction->t_journal; + /* Locks are here just to force reading of recent values, it is + * enough that the transaction was not committing before we started + * a transaction adding the inode to orphan list */ spin_lock(&journal->j_state_lock); commit_trans = journal->j_committing_transaction; spin_unlock(&journal->j_state_lock); - if (inode->i_transaction == commit_trans) { - ret = filemap_fdatawrite_range(inode->i_vfs_inode->i_mapping, + spin_lock(&journal->j_list_lock); + inode_trans = jinode->i_transaction; + spin_unlock(&journal->j_list_lock); + if (inode_trans == commit_trans) { + ret = filemap_fdatawrite_range(jinode->i_vfs_inode->i_mapping, new_size, LLONG_MAX); if (ret) jbd2_journal_abort(journal, ret); diff --git a/fs/ocfs2/journal.h b/fs/ocfs2/journal.h index 2178ebf..d4de187 100644 --- a/fs/ocfs2/journal.h +++ b/fs/ocfs2/journal.h @@ -415,4 +415,17 @@ static inline int ocfs2_calc_tree_trunc_credits(struct super_block *sb, return credits; } +static inline int ocfs2_jbd2_file_inode(handle_t *handle, struct inode *inode) +{ + return jbd2_journal_file_inode(handle, &OCFS2_I(inode)->ip_jinode); +} + +static inline int ocfs2_begin_ordered_truncate(struct inode *inode, + loff_t new_size) +{ + return jbd2_journal_begin_ordered_truncate( + OCFS2_SB(inode->i_sb)->journal->j_journal, + &OCFS2_I(inode)->ip_jinode, + new_size); +} #endif /* OCFS2_JOURNAL_H */ diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 0e1bd70..df4137e 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1075,7 +1075,8 @@ extern int jbd2_journal_clear_err (journal_t *); extern int jbd2_journal_bmap(journal_t *, unsigned long, unsigned long long *); extern int jbd2_journal_force_commit(journal_t *); extern int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *inode); -extern int jbd2_journal_begin_ordered_truncate(struct jbd2_inode *inode, loff_t new_size); +extern int jbd2_journal_begin_ordered_truncate(journal_t *journal, + struct jbd2_inode *inode, loff_t new_size); extern void jbd2_journal_init_jbd_inode(struct jbd2_inode *jinode, struct inode *inode); extern void jbd2_journal_release_jbd_inode(journal_t *journal, struct jbd2_inode *jinode);