diff mbox

[-v2] ext4: move ext4_update_i_disksize() into mpage_map_and_submit_extent()

Message ID 1397310327-18147-1-git-send-email-tytso@mit.edu
State Accepted, archived
Headers show

Commit Message

Theodore Ts'o April 12, 2014, 1:45 p.m. UTC
The function ext4_update_i_disksize() is used in only one place, in
the function mpage_map_and_submit_extent().  Move there to simplify
the code paths, and also move the call to ext4_mark_inode_dirty() into
the i_data_sem's critical region, to be consistent with all of the
other places where we update i_disksize.  That way, we also keep the
raw_inode's i_disksize protected.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
---

Oops, the first version was missing the commit description

 fs/ext4/ext4.h  | 17 -----------------
 fs/ext4/inode.c | 16 +++++++++++++---
 2 files changed, 13 insertions(+), 20 deletions(-)

Comments

Jan Kara April 15, 2014, 4:26 p.m. UTC | #1
On Sat 12-04-14 09:45:27, Ted Tso wrote:
> The function ext4_update_i_disksize() is used in only one place, in
> the function mpage_map_and_submit_extent().  Move there to simplify
> the code paths, and also move the call to ext4_mark_inode_dirty() into
> the i_data_sem's critical region, to be consistent with all of the
> other places where we update i_disksize.  That way, we also keep the
> raw_inode's i_disksize protected.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> Cc: stable@vger.kernel.org
  I agree that it makes sense to have all the places consistent and protect
raw disk inode i_disksize with i_data_sem. OTOH I don't see a way how this
can cause any real harm (but I guess you expect there might be something as
you CCed stable), so can you explain it please?

								Honza
> ---
> 
> Oops, the first version was missing the commit description
> 
>  fs/ext4/ext4.h  | 17 -----------------
>  fs/ext4/inode.c | 16 +++++++++++++---
>  2 files changed, 13 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index f1c65dc..66946aa 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2466,23 +2466,6 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  }
>  
> -/*
> - * Update i_disksize after writeback has been started. Races with truncate
> - * are avoided by checking i_size under i_data_sem.
> - */
> -static inline void ext4_wb_update_i_disksize(struct inode *inode, loff_t newsize)
> -{
> -	loff_t i_size;
> -
> -	down_write(&EXT4_I(inode)->i_data_sem);
> -	i_size = i_size_read(inode);
> -	if (newsize > i_size)
> -		newsize = i_size;
> -	if (newsize > EXT4_I(inode)->i_disksize)
> -		EXT4_I(inode)->i_disksize = newsize;
> -	up_write(&EXT4_I(inode)->i_data_sem);
> -}
> -
>  struct ext4_group_info {
>  	unsigned long   bb_state;
>  	struct rb_root  bb_free_root;
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 7b93df9..f023f0c 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2247,13 +2247,23 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>  			return err;
>  	} while (map->m_len);
>  
> -	/* Update on-disk size after IO is submitted */
> +	/*
> +	 * Update on-disk size after IO is submitted.  Races with
> +	 * truncate are avoided by checking i_size under i_data_sem.
> +	 */
>  	disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT;
>  	if (disksize > EXT4_I(inode)->i_disksize) {
>  		int err2;
> -
> -		ext4_wb_update_i_disksize(inode, disksize);
> +		loff_t i_size;
> +
> +		down_write(&EXT4_I(inode)->i_data_sem);
> +		i_size = i_size_read(inode);
> +		if (disksize > i_size)
> +			disksize = i_size;
> +		if (disksize > EXT4_I(inode)->i_disksize)
> +			EXT4_I(inode)->i_disksize = disksize;
>  		err2 = ext4_mark_inode_dirty(handle, inode);
> +		up_write(&EXT4_I(inode)->i_data_sem);
>  		if (err2)
>  			ext4_error(inode->i_sb,
>  				   "Failed to mark inode %lu dirty",
> -- 
> 1.9.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o April 15, 2014, 7:26 p.m. UTC | #2
On Tue, Apr 15, 2014 at 06:26:27PM +0200, Jan Kara wrote:
> On Sat 12-04-14 09:45:27, Ted Tso wrote:
> > The function ext4_update_i_disksize() is used in only one place, in
> > the function mpage_map_and_submit_extent().  Move there to simplify
> > the code paths, and also move the call to ext4_mark_inode_dirty() into
> > the i_data_sem's critical region, to be consistent with all of the
> > other places where we update i_disksize.  That way, we also keep the
> > raw_inode's i_disksize protected.
> > 
> > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > Cc: stable@vger.kernel.org
>   I agree that it makes sense to have all the places consistent and protect
> raw disk inode i_disksize with i_data_sem. OTOH I don't see a way how this
> can cause any real harm (but I guess you expect there might be something as
> you CCed stable), so can you explain it please?

This was the case I was worried about:

        CPU #1                                 CPU #2

1.      down_write(&i_data_sem)
2.      Modify i_disk_size
4.      up_write(&i_data_sem)
5.                                           down_write(&i_data_sem)
6.                                           Modify i_disk_size
7.                                           Copy i_disk_size to on-disk inode
8.                                           up_write(&i_data_sem)
9.      Copy i_disk_size to on-disk inode


It's the standard data race; it might not be a problem on Intel CPU's,
but in general, cpu #1 might still have a stale copy of i_disk_size in
its cache, and hence it might copying the old, outdated value into the
on-disk inode.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara April 15, 2014, 8:18 p.m. UTC | #3
On Tue 15-04-14 15:26:42, Ted Tso wrote:
> On Tue, Apr 15, 2014 at 06:26:27PM +0200, Jan Kara wrote:
> > On Sat 12-04-14 09:45:27, Ted Tso wrote:
> > > The function ext4_update_i_disksize() is used in only one place, in
> > > the function mpage_map_and_submit_extent().  Move there to simplify
> > > the code paths, and also move the call to ext4_mark_inode_dirty() into
> > > the i_data_sem's critical region, to be consistent with all of the
> > > other places where we update i_disksize.  That way, we also keep the
> > > raw_inode's i_disksize protected.
> > > 
> > > Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> > > Cc: stable@vger.kernel.org
> >   I agree that it makes sense to have all the places consistent and protect
> > raw disk inode i_disksize with i_data_sem. OTOH I don't see a way how this
> > can cause any real harm (but I guess you expect there might be something as
> > you CCed stable), so can you explain it please?
> 
> This was the case I was worried about:
> 
>         CPU #1                                 CPU #2
> 
> 1.      down_write(&i_data_sem)
> 2.      Modify i_disk_size
> 4.      up_write(&i_data_sem)
> 5.                                           down_write(&i_data_sem)
> 6.                                           Modify i_disk_size
> 7.                                           Copy i_disk_size to on-disk inode
> 8.                                           up_write(&i_data_sem)
> 9.      Copy i_disk_size to on-disk inode
> 
> 
> It's the standard data race; it might not be a problem on Intel CPU's,
> but in general, cpu #1 might still have a stale copy of i_disk_size in
> its cache, and hence it might copying the old, outdated value into the
> on-disk inode.
  Yes, that could be a problem even on Intel CPU - not because of cache
coherency but because old i_disk_size value might be speculatively
preloaded before CPU#2 updates its value. So feel free to add:
  Reviewed-by: Jan Kara <jack@suse.cz>

								Honza
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f1c65dc..66946aa 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2466,23 +2466,6 @@  static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
 	up_write(&EXT4_I(inode)->i_data_sem);
 }
 
-/*
- * Update i_disksize after writeback has been started. Races with truncate
- * are avoided by checking i_size under i_data_sem.
- */
-static inline void ext4_wb_update_i_disksize(struct inode *inode, loff_t newsize)
-{
-	loff_t i_size;
-
-	down_write(&EXT4_I(inode)->i_data_sem);
-	i_size = i_size_read(inode);
-	if (newsize > i_size)
-		newsize = i_size;
-	if (newsize > EXT4_I(inode)->i_disksize)
-		EXT4_I(inode)->i_disksize = newsize;
-	up_write(&EXT4_I(inode)->i_data_sem);
-}
-
 struct ext4_group_info {
 	unsigned long   bb_state;
 	struct rb_root  bb_free_root;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 7b93df9..f023f0c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2247,13 +2247,23 @@  static int mpage_map_and_submit_extent(handle_t *handle,
 			return err;
 	} while (map->m_len);
 
-	/* Update on-disk size after IO is submitted */
+	/*
+	 * Update on-disk size after IO is submitted.  Races with
+	 * truncate are avoided by checking i_size under i_data_sem.
+	 */
 	disksize = ((loff_t)mpd->first_page) << PAGE_CACHE_SHIFT;
 	if (disksize > EXT4_I(inode)->i_disksize) {
 		int err2;
-
-		ext4_wb_update_i_disksize(inode, disksize);
+		loff_t i_size;
+
+		down_write(&EXT4_I(inode)->i_data_sem);
+		i_size = i_size_read(inode);
+		if (disksize > i_size)
+			disksize = i_size;
+		if (disksize > EXT4_I(inode)->i_disksize)
+			EXT4_I(inode)->i_disksize = disksize;
 		err2 = ext4_mark_inode_dirty(handle, inode);
+		up_write(&EXT4_I(inode)->i_data_sem);
 		if (err2)
 			ext4_error(inode->i_sb,
 				   "Failed to mark inode %lu dirty",