Message ID | 1375710744-29329-5-git-send-email-jack@suse.cz |
---|---|
State | Accepted, archived |
Headers | show |
On Mon, Aug 05, 2013 at 03:52:24PM +0200, Jan Kara wrote: > The following race can lead to a loss of i_disksize update from truncate > thus resulting in a wrong inode size if the inode size isn't updated > again before inode is reclaimed: > > ext4_setattr() mpage_map_and_submit_extent() > EXT4_I(inode)->i_disksize = attr->ia_size; > ... ... > disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT > /* False because i_size isn't > * updated yet */ > if (disksize > i_size_read(inode)) > /* True, because i_disksize is > * already truncated */ > if (disksize > EXT4_I(inode)->i_disksize) > /* Overwrite i_disksize > * update from truncate */ > ext4_update_i_disksize() > i_size_write(inode, attr->ia_size); > > For other places updating i_disksize such race cannot happen because > i_mutex prevents these races. Writeback is the only place where we do > not hold i_mutex and we cannot grab it there because of lock ordering. > > We fix the race by doing both i_disksize and i_size update in truncate > atomically under i_data_sem and in mpage_map_and_submit_extent() we move > the check against i_size under i_data_sem as well. > > Signed-off-by: Jan Kara <jack@suse.cz> Applied, thanks. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 17, 2013 at 10:12:27AM -0400, Theodore Ts'o wrote: > On Mon, Aug 05, 2013 at 03:52:24PM +0200, Jan Kara wrote: > > The following race can lead to a loss of i_disksize update from truncate > > thus resulting in a wrong inode size if the inode size isn't updated > > again before inode is reclaimed: > > > > ext4_setattr() mpage_map_and_submit_extent() > > EXT4_I(inode)->i_disksize = attr->ia_size; > > ... ... > > disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT > > /* False because i_size isn't > > * updated yet */ > > if (disksize > i_size_read(inode)) > > /* True, because i_disksize is > > * already truncated */ > > if (disksize > EXT4_I(inode)->i_disksize) > > /* Overwrite i_disksize > > * update from truncate */ > > ext4_update_i_disksize() > > i_size_write(inode, attr->ia_size); > > > > For other places updating i_disksize such race cannot happen because > > i_mutex prevents these races. Writeback is the only place where we do > > not hold i_mutex and we cannot grab it there because of lock ordering. > > > > We fix the race by doing both i_disksize and i_size update in truncate > > atomically under i_data_sem and in mpage_map_and_submit_extent() we move > > the check against i_size under i_data_sem as well. > > > > Signed-off-by: Jan Kara <jack@suse.cz> > > Applied, thanks. Is this queued for 3.11 ? 1k blocksize fs's are still broken in rc7. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Aug 26, 2013 at 03:01:48PM -0400, Dave Jones wrote: > > Is this queued for 3.11 ? 1k blocksize fs's are still broken in rc7. These patches fixed races that have been around for a while; it's not a regression. Given that they are fairly involved, I was nervous sending them to Linus for 3.11, given the late date. They are queued for the next merge window, and I'll mark them cc: stable@vger.kernel.org. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Aug 28, 2013 at 06:55:04PM -0400, Theodore Ts'o wrote: > On Mon, Aug 26, 2013 at 03:01:48PM -0400, Dave Jones wrote: > > > > Is this queued for 3.11 ? 1k blocksize fs's are still broken in rc7. > > These patches fixed races that have been around for a while; it's not > a regression. Given that they are fairly involved, I was nervous > sending them to Linus for 3.11, given the late date. That's odd, because I can't reproduce the problem I'm seeing on 3.10 > They are queued for the next merge window, and I'll mark them cc: > stable@vger.kernel.org. Fair enough. Dave -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b577e45..648c5e6 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2416,16 +2416,32 @@ do { \ #define EXT4_FREECLUSTERS_WATERMARK 0 #endif +/* Update i_disksize. Requires i_mutex to avoid races with truncate */ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize) { - /* - * XXX: replace with spinlock if seen contended -bzzz - */ + WARN_ON_ONCE(S_ISREG(inode->i_mode) && + !mutex_is_locked(&inode->i_mutex)); + down_write(&EXT4_I(inode)->i_data_sem); + if (newsize > EXT4_I(inode)->i_disksize) + EXT4_I(inode)->i_disksize = newsize; + up_write(&EXT4_I(inode)->i_data_sem); +} + +/* + * Update i_disksize after writeback has been started. Races with truncate + * are avoided by checking i_size under i_data_sem. + */ +static inline void ext4_wb_update_i_disksize(struct inode *inode, loff_t newsize) +{ + loff_t i_size; + down_write(&EXT4_I(inode)->i_data_sem); + i_size = i_size_read(inode); + if (newsize > i_size) + newsize = i_size; if (newsize > EXT4_I(inode)->i_disksize) EXT4_I(inode)->i_disksize = newsize; up_write(&EXT4_I(inode)->i_data_sem); - return ; } struct ext4_group_info { diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e7d98d2..5d3706e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2240,12 +2240,10 @@ static int mpage_map_and_submit_extent(handle_t *handle, /* Update on-disk size after IO is submitted */ disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT; - if (disksize > i_size_read(inode)) - disksize = i_size_read(inode); if (disksize > EXT4_I(inode)->i_disksize) { int err2; - ext4_update_i_disksize(inode, disksize); + ext4_wb_update_i_disksize(inode, disksize); err2 = ext4_mark_inode_dirty(handle, inode); if (err2) ext4_error(inode->i_sb, @@ -4587,18 +4585,27 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) error = ext4_orphan_add(handle, inode); orphan = 1; } + down_write(&EXT4_I(inode)->i_data_sem); EXT4_I(inode)->i_disksize = attr->ia_size; rc = ext4_mark_inode_dirty(handle, inode); if (!error) error = rc; + /* + * We have to update i_size under i_data_sem together + * with i_disksize to avoid races with writeback code + * running ext4_wb_update_i_disksize(). + */ + if (!error) + i_size_write(inode, attr->ia_size); + up_write(&EXT4_I(inode)->i_data_sem); ext4_journal_stop(handle); if (error) { ext4_orphan_del(NULL, inode); goto err_out; } - } + } else + i_size_write(inode, attr->ia_size); - i_size_write(inode, attr->ia_size); /* * Blocks are going to be removed from the inode. Wait * for dio in flight. Temporarily disable
The following race can lead to a loss of i_disksize update from truncate thus resulting in a wrong inode size if the inode size isn't updated again before inode is reclaimed: ext4_setattr() mpage_map_and_submit_extent() EXT4_I(inode)->i_disksize = attr->ia_size; ... ... disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT /* False because i_size isn't * updated yet */ if (disksize > i_size_read(inode)) /* True, because i_disksize is * already truncated */ if (disksize > EXT4_I(inode)->i_disksize) /* Overwrite i_disksize * update from truncate */ ext4_update_i_disksize() i_size_write(inode, attr->ia_size); For other places updating i_disksize such race cannot happen because i_mutex prevents these races. Writeback is the only place where we do not hold i_mutex and we cannot grab it there because of lock ordering. We fix the race by doing both i_disksize and i_size update in truncate atomically under i_data_sem and in mpage_map_and_submit_extent() we move the check against i_size under i_data_sem as well. Signed-off-by: Jan Kara <jack@suse.cz> --- fs/ext4/ext4.h | 24 ++++++++++++++++++++---- fs/ext4/inode.c | 17 ++++++++++++----- 2 files changed, 32 insertions(+), 9 deletions(-)