ext4: avoid exposure of stale data in ext4_punch_hole() -v2

Submitted by Maxim Patlasov on Sept. 27, 2013, 3:54 p.m.

Details

Message ID 20130927155329.3272.64086.stgit@dhcp-10-30-17-2.sw.ru
State Accepted, archived
Headers show

Commit Message

Maxim Patlasov Sept. 27, 2013, 3:54 p.m.
While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Changed in v2 (Thanks to Jan Kara):
 - Until the problem of data corruption resulting from pages backed by
   already freed blocks is fully resolved, the simple thing we can do now
   is to add another truncation of pagecache after punch hole is done.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/ext4/inode.c |    6 ++++++
 1 file changed, 6 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kara Sept. 27, 2013, 4:05 p.m.
On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> While handling punch-hole fallocate, it's useless to truncate page cache
> before removing the range from extent tree (or block map in indirect case)
> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> read) immediately after truncating page cache, but before updating extent
> tree (or block map). In that case the user will see stale data even after
> fallocate is completed.
> 
> Changed in v2 (Thanks to Jan Kara):
>  - Until the problem of data corruption resulting from pages backed by
>    already freed blocks is fully resolved, the simple thing we can do now
>    is to add another truncation of pagecache after punch hole is done.
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/ext4/inode.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0d424d7..2984ddf 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3621,6 +3621,12 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> +
> +	/* Now release the pages again to reduce race window */
> +	if (last_block_offset > first_block_offset)
> +		truncate_pagecache_range(inode, first_block_offset,
> +					 last_block_offset);
> +
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  out_stop:
>
Theodore Ts'o Feb. 21, 2014, 12:21 a.m.
On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > While handling punch-hole fallocate, it's useless to truncate page cache
> > before removing the range from extent tree (or block map in indirect case)
> > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > read) immediately after truncating page cache, but before updating extent
> > tree (or block map). In that case the user will see stale data even after
> > fallocate is completed.
> > 
> > Changed in v2 (Thanks to Jan Kara):
> >  - Until the problem of data corruption resulting from pages backed by
> >    already freed blocks is fully resolved, the simple thing we can do now
> >    is to add another truncation of pagecache after punch hole is done.
>   The patch looks good. You can add:
> Reviewed-by: Jan Kara <jack@suse.cz>

I was going through old patches, and it looks like this one got
dropped.  My apologies.

As far as I can tell, the underlying problem in the VFS/MM layer
hasn't been solved yet (Jan, can you confirm?), so I've queued this
patch for the next merge window.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Feb. 21, 2014, 9:45 a.m.
On Thu 20-02-14 19:21:07, Ted Tso wrote:
> On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> > On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > > While handling punch-hole fallocate, it's useless to truncate page cache
> > > before removing the range from extent tree (or block map in indirect case)
> > > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > > read) immediately after truncating page cache, but before updating extent
> > > tree (or block map). In that case the user will see stale data even after
> > > fallocate is completed.
> > > 
> > > Changed in v2 (Thanks to Jan Kara):
> > >  - Until the problem of data corruption resulting from pages backed by
> > >    already freed blocks is fully resolved, the simple thing we can do now
> > >    is to add another truncation of pagecache after punch hole is done.
> >   The patch looks good. You can add:
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> I was going through old patches, and it looks like this one got
> dropped.  My apologies.
> 
> As far as I can tell, the underlying problem in the VFS/MM layer
> hasn't been solved yet (Jan, can you confirm?), so I've queued this
> patch for the next merge window.
  Yes, we didn't solve it yet. Thanks for queueing the patch!

								Honza

Patch hide | download patch | download mbox

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d424d7..2984ddf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3621,6 +3621,12 @@  int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
+
+	/* Now release the pages again to reduce race window */
+	if (last_block_offset > first_block_offset)
+		truncate_pagecache_range(inode, first_block_offset,
+					 last_block_offset);
+
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 out_stop: