Patchwork ext4: avoid exposure of stale data in ext4_punch_hole() -v2

login
register
mail settings
Submitter Maxim Patlasov
Date Sept. 27, 2013, 3:54 p.m.
Message ID <20130927155329.3272.64086.stgit@dhcp-10-30-17-2.sw.ru>
Download mbox | patch
Permalink /patch/278632/
State Awaiting Upstream
Headers show

Comments

Maxim Patlasov - Sept. 27, 2013, 3:54 p.m.
While handling punch-hole fallocate, it's useless to truncate page cache
before removing the range from extent tree (or block map in indirect case)
because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
read) immediately after truncating page cache, but before updating extent
tree (or block map). In that case the user will see stale data even after
fallocate is completed.

Changed in v2 (Thanks to Jan Kara):
 - Until the problem of data corruption resulting from pages backed by
   already freed blocks is fully resolved, the simple thing we can do now
   is to add another truncation of pagecache after punch hole is done.

Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
---
 fs/ext4/inode.c |    6 ++++++
 1 file changed, 6 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Sept. 27, 2013, 4:05 p.m.
On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> While handling punch-hole fallocate, it's useless to truncate page cache
> before removing the range from extent tree (or block map in indirect case)
> because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> read) immediately after truncating page cache, but before updating extent
> tree (or block map). In that case the user will see stale data even after
> fallocate is completed.
> 
> Changed in v2 (Thanks to Jan Kara):
>  - Until the problem of data corruption resulting from pages backed by
>    already freed blocks is fully resolved, the simple thing we can do now
>    is to add another truncation of pagecache after punch hole is done.
  The patch looks good. You can add:
Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> Signed-off-by: Maxim Patlasov <mpatlasov@parallels.com>
> ---
>  fs/ext4/inode.c |    6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 0d424d7..2984ddf 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -3621,6 +3621,12 @@ int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
>  	up_write(&EXT4_I(inode)->i_data_sem);
>  	if (IS_SYNC(inode))
>  		ext4_handle_sync(handle);
> +
> +	/* Now release the pages again to reduce race window */
> +	if (last_block_offset > first_block_offset)
> +		truncate_pagecache_range(inode, first_block_offset,
> +					 last_block_offset);
> +
>  	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
>  	ext4_mark_inode_dirty(handle, inode);
>  out_stop:
>
Theodore Ts'o - Feb. 21, 2014, 12:21 a.m.
On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > While handling punch-hole fallocate, it's useless to truncate page cache
> > before removing the range from extent tree (or block map in indirect case)
> > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > read) immediately after truncating page cache, but before updating extent
> > tree (or block map). In that case the user will see stale data even after
> > fallocate is completed.
> > 
> > Changed in v2 (Thanks to Jan Kara):
> >  - Until the problem of data corruption resulting from pages backed by
> >    already freed blocks is fully resolved, the simple thing we can do now
> >    is to add another truncation of pagecache after punch hole is done.
>   The patch looks good. You can add:
> Reviewed-by: Jan Kara <jack@suse.cz>

I was going through old patches, and it looks like this one got
dropped.  My apologies.

As far as I can tell, the underlying problem in the VFS/MM layer
hasn't been solved yet (Jan, can you confirm?), so I've queued this
patch for the next merge window.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - Feb. 21, 2014, 9:45 a.m.
On Thu 20-02-14 19:21:07, Ted Tso wrote:
> On Fri, Sep 27, 2013 at 06:05:17PM +0200, Jan Kara wrote:
> > On Fri 27-09-13 19:54:03, Maxim Patlasov wrote:
> > > While handling punch-hole fallocate, it's useless to truncate page cache
> > > before removing the range from extent tree (or block map in indirect case)
> > > because page cache can be re-populated (by read-ahead or read(2) or mmap-ed
> > > read) immediately after truncating page cache, but before updating extent
> > > tree (or block map). In that case the user will see stale data even after
> > > fallocate is completed.
> > > 
> > > Changed in v2 (Thanks to Jan Kara):
> > >  - Until the problem of data corruption resulting from pages backed by
> > >    already freed blocks is fully resolved, the simple thing we can do now
> > >    is to add another truncation of pagecache after punch hole is done.
> >   The patch looks good. You can add:
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> I was going through old patches, and it looks like this one got
> dropped.  My apologies.
> 
> As far as I can tell, the underlying problem in the VFS/MM layer
> hasn't been solved yet (Jan, can you confirm?), so I've queued this
> patch for the next merge window.
  Yes, we didn't solve it yet. Thanks for queueing the patch!

								Honza

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 0d424d7..2984ddf 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3621,6 +3621,12 @@  int ext4_punch_hole(struct inode *inode, loff_t offset, loff_t length)
 	up_write(&EXT4_I(inode)->i_data_sem);
 	if (IS_SYNC(inode))
 		ext4_handle_sync(handle);
+
+	/* Now release the pages again to reduce race window */
+	if (last_block_offset > first_block_offset)
+		truncate_pagecache_range(inode, first_block_offset,
+					 last_block_offset);
+
 	inode->i_mtime = inode->i_ctime = ext4_current_time(inode);
 	ext4_mark_inode_dirty(handle, inode);
 out_stop: