diff mbox

[5/5] ext2: convert to use the new truncate convention.

Message ID 20090710073231.195528273@suse.de
State Not Applicable, archived
Headers show

Commit Message

Nick Piggin July 10, 2009, 7:30 a.m. UTC
I have some questions, marked with XXX.

Cc: linux-ext4@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
---
 fs/ext2/ext2.h  |    1 
 fs/ext2/file.c  |    2 
 fs/ext2/inode.c |  138 +++++++++++++++++++++++++++++++++++++++++++++-----------
 3 files changed, 113 insertions(+), 28 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kara Sept. 1, 2009, 6:29 p.m. UTC | #1
Hi Nick,

On Fri 10-07-09 17:30:33, npiggin@suse.de wrote:
> I have some questions, marked with XXX.
> 
> Cc: linux-ext4@vger.kernel.org
> Signed-off-by: Nick Piggin <npiggin@suse.de>
> ---
>  fs/ext2/ext2.h  |    1 
>  fs/ext2/file.c  |    2 
>  fs/ext2/inode.c |  138 +++++++++++++++++++++++++++++++++++++++++++++-----------
>  3 files changed, 113 insertions(+), 28 deletions(-)
> 
> Index: linux-2.6/fs/ext2/inode.c
> ===================================================================
> --- linux-2.6.orig/fs/ext2/inode.c
> +++ linux-2.6/fs/ext2/inode.c
...
> +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> +{
> +	/*
> +	 * XXX: it seems like a bug here that we don't allow
> +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> +	 * review and fix this.
> +	 */
> +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> +	    S_ISLNK(inode->i_mode)))
> +		return -EINVAL;
> +	if (ext2_inode_is_fast_symlink(inode))
> +		return -EINVAL;
> +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> +		return -EPERM;
> +	__ext2_truncate_blocks(inode, offset);
> +}
  Actually, looking again into this, I think you've introduced a subtle bug
into the code. When a write fails for some reason, we did vmtruncate()
previously which called block_truncate_page() which zeroed a tail of the
last block. Now, ext2_truncate_blocks() does not do this and I think it
could be a problem because at least in direct IO case, we could have written
some data into that block on disk.
  We really rely on the tail of the block being zeroed because otherwise
extending truncate will show those old data in the block. I plan to change
that in my mkwrite fixes but until then, we should preserve the old
assumptions.
  So I think that ext2_truncate_blocks() should do all that tail page magic
as well (although it's not needed in all cases).

									Honza
Nick Piggin Sept. 2, 2009, 9:14 a.m. UTC | #2
On Tue, Sep 01, 2009 at 08:29:29PM +0200, Jan Kara wrote:
>   Hi Nick,
> 
> On Fri 10-07-09 17:30:33, npiggin@suse.de wrote:
> > I have some questions, marked with XXX.
> > 
> > Cc: linux-ext4@vger.kernel.org
> > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > ---
> >  fs/ext2/ext2.h  |    1 
> >  fs/ext2/file.c  |    2 
> >  fs/ext2/inode.c |  138 +++++++++++++++++++++++++++++++++++++++++++++-----------
> >  3 files changed, 113 insertions(+), 28 deletions(-)
> > 
> > Index: linux-2.6/fs/ext2/inode.c
> > ===================================================================
> > --- linux-2.6.orig/fs/ext2/inode.c
> > +++ linux-2.6/fs/ext2/inode.c
> ...
> > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> > +{
> > +	/*
> > +	 * XXX: it seems like a bug here that we don't allow
> > +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> > +	 * review and fix this.
> > +	 */
> > +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> > +	    S_ISLNK(inode->i_mode)))
> > +		return -EINVAL;
> > +	if (ext2_inode_is_fast_symlink(inode))
> > +		return -EINVAL;
> > +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> > +		return -EPERM;
> > +	__ext2_truncate_blocks(inode, offset);
> > +}
>   Actually, looking again into this, I think you've introduced a subtle bug
> into the code. When a write fails for some reason, we did vmtruncate()
> previously which called block_truncate_page() which zeroed a tail of the
> last block. Now, ext2_truncate_blocks() does not do this and I think it
> could be a problem because at least in direct IO case, we could have written
> some data into that block on disk.
>   We really rely on the tail of the block being zeroed because otherwise
> extending truncate will show those old data in the block. I plan to change
> that in my mkwrite fixes but until then, we should preserve the old
> assumptions.
>   So I think that ext2_truncate_blocks() should do all that tail page magic
> as well (although it's not needed in all cases).

Hi Jan,

Yeah I did think about this and yes we usually do need to zero out
the page but for these error cases with normal writes we shouldn't
write anything in there.  For direct IO... I don't see the problem
because we're not coherent with pagecache anyway...

Hmm, but possiby it is a good idea just to keep the same block_truncate_page
calls for this patchset and we can look at it again with your truncate
patches. I'll work on fixing these up.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Sept. 2, 2009, 11:14 a.m. UTC | #3
On Wed 02-09-09 11:14:26, Nick Piggin wrote:
> On Tue, Sep 01, 2009 at 08:29:29PM +0200, Jan Kara wrote:
> >   Hi Nick,
> > 
> > On Fri 10-07-09 17:30:33, npiggin@suse.de wrote:
> > > I have some questions, marked with XXX.
> > > 
> > > Cc: linux-ext4@vger.kernel.org
> > > Signed-off-by: Nick Piggin <npiggin@suse.de>
> > > ---
> > >  fs/ext2/ext2.h  |    1 
> > >  fs/ext2/file.c  |    2 
> > >  fs/ext2/inode.c |  138 +++++++++++++++++++++++++++++++++++++++++++++-----------
> > >  3 files changed, 113 insertions(+), 28 deletions(-)
> > > 
> > > Index: linux-2.6/fs/ext2/inode.c
> > > ===================================================================
> > > --- linux-2.6.orig/fs/ext2/inode.c
> > > +++ linux-2.6/fs/ext2/inode.c
> > ...
> > > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
> > > +{
> > > +	/*
> > > +	 * XXX: it seems like a bug here that we don't allow
> > > +	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
> > > +	 * review and fix this.
> > > +	 */
> > > +	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
> > > +	    S_ISLNK(inode->i_mode)))
> > > +		return -EINVAL;
> > > +	if (ext2_inode_is_fast_symlink(inode))
> > > +		return -EINVAL;
> > > +	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
> > > +		return -EPERM;
> > > +	__ext2_truncate_blocks(inode, offset);
> > > +}
> >   Actually, looking again into this, I think you've introduced a subtle bug
> > into the code. When a write fails for some reason, we did vmtruncate()
> > previously which called block_truncate_page() which zeroed a tail of the
> > last block. Now, ext2_truncate_blocks() does not do this and I think it
> > could be a problem because at least in direct IO case, we could have written
> > some data into that block on disk.
> >   We really rely on the tail of the block being zeroed because otherwise
> > extending truncate will show those old data in the block. I plan to change
> > that in my mkwrite fixes but until then, we should preserve the old
> > assumptions.
> >   So I think that ext2_truncate_blocks() should do all that tail page magic
> > as well (although it's not needed in all cases).
> 
> Hi Jan,
> 
> Yeah I did think about this and yes we usually do need to zero out
> the page but for these error cases with normal writes we shouldn't
> write anything in there.  For direct IO... I don't see the problem
> because we're not coherent with pagecache anyway...
  We are not coherent but that's irrelevant - if a failed direct write
is followed by an extending truncate and read, it will read the block
from disk and could see non-zeros where there should be zeros...

> Hmm, but possiby it is a good idea just to keep the same block_truncate_page
> calls for this patchset and we can look at it again with your truncate
> patches. I'll work on fixing these up.
  Yes, I think that keeping this change for a separate patch is definitely
better.

								Honza
diff mbox

Patch

Index: linux-2.6/fs/ext2/inode.c
===================================================================
--- linux-2.6.orig/fs/ext2/inode.c
+++ linux-2.6/fs/ext2/inode.c
@@ -53,6 +53,8 @@  static inline int ext2_inode_is_fast_sym
 		inode->i_blocks - ea_blocks == 0);
 }
 
+static void ext2_truncate_blocks(struct inode *inode, loff_t offset);
+
 /*
  * Called at the last iput() if i_nlink is zero.
  */
@@ -66,9 +68,10 @@  void ext2_delete_inode (struct inode * i
 	mark_inode_dirty(inode);
 	ext2_write_inode(inode, inode_needs_sync(inode));
 
+	/* XXX: where is truncate_inode_pages? */
 	inode->i_size = 0;
 	if (inode->i_blocks)
-		ext2_truncate (inode);
+		ext2_truncate_blocks(inode, 0);
 	ext2_free_inode (inode);
 
 	return;
@@ -761,8 +764,33 @@  ext2_write_begin(struct file *file, stru
 		loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, void **fsdata)
 {
+	int ret;
+
 	*pagep = NULL;
-	return __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata);
+	ret = __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata);
+	if (ret < 0) {
+		struct inode *inode = mapping->host;
+		loff_t isize = inode->i_size;
+		if (pos + len > isize)
+			ext2_truncate_blocks(inode, isize);
+	}
+	return ret;
+}
+
+static int ext2_write_end(struct file *file, struct address_space *mapping,
+			loff_t pos, unsigned len, unsigned copied,
+			struct page *page, void *fsdata)
+{
+	int ret;
+
+	ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
+	if (ret < len) {
+		struct inode *inode = mapping->host;
+		loff_t isize = inode->i_size;
+		if (pos + len > isize)
+			ext2_truncate_blocks(inode, isize);
+	}
+	return ret;
 }
 
 static int
@@ -770,13 +798,22 @@  ext2_nobh_write_begin(struct file *file,
 		loff_t pos, unsigned len, unsigned flags,
 		struct page **pagep, void **fsdata)
 {
+	int ret;
+
 	/*
 	 * Dir-in-pagecache still uses ext2_write_begin. Would have to rework
 	 * directory handling code to pass around offsets rather than struct
 	 * pages in order to make this work easily.
 	 */
-	return nobh_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
+	ret = nobh_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
 							ext2_get_block);
+	if (ret < 0) {
+		struct inode *inode = mapping->host;
+		loff_t isize = inode->i_size;
+		if (pos + len > isize)
+			ext2_truncate_blocks(inode, isize);
+	}
+	return ret;
 }
 
 static int ext2_nobh_writepage(struct page *page,
@@ -796,9 +833,15 @@  ext2_direct_IO(int rw, struct kiocb *ioc
 {
 	struct file *file = iocb->ki_filp;
 	struct inode *inode = file->f_mapping->host;
+	ssize_t ret;
 
-	return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
+	ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov,
 				offset, nr_segs, ext2_get_block, NULL);
+	if (ret < 0 && (rw & WRITE)) {
+		loff_t isize = inode->i_size;
+		ext2_truncate_blocks(inode, isize);
+	}
+	return ret;
 }
 
 static int
@@ -813,7 +856,7 @@  const struct address_space_operations ex
 	.writepage		= ext2_writepage,
 	.sync_page		= block_sync_page,
 	.write_begin		= ext2_write_begin,
-	.write_end		= generic_write_end,
+	.write_end		= ext2_write_end,
 	.bmap			= ext2_bmap,
 	.direct_IO		= ext2_direct_IO,
 	.writepages		= ext2_writepages,
@@ -1020,7 +1063,7 @@  static void ext2_free_branches(struct in
 		ext2_free_data(inode, p, q);
 }
 
-void ext2_truncate(struct inode *inode)
+static void __ext2_truncate_blocks(struct inode *inode, loff_t offset)
 {
 	__le32 *i_data = EXT2_I(inode)->i_data;
 	struct ext2_inode_info *ei = EXT2_I(inode);
@@ -1032,27 +1075,8 @@  void ext2_truncate(struct inode *inode)
 	int n;
 	long iblock;
 	unsigned blocksize;
-
-	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
-	    S_ISLNK(inode->i_mode)))
-		return;
-	if (ext2_inode_is_fast_symlink(inode))
-		return;
-	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
-		return;
-
 	blocksize = inode->i_sb->s_blocksize;
-	iblock = (inode->i_size + blocksize-1)
-					>> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
-
-	if (mapping_is_xip(inode->i_mapping))
-		xip_truncate_page(inode->i_mapping, inode->i_size);
-	else if (test_opt(inode->i_sb, NOBH))
-		nobh_truncate_page(inode->i_mapping,
-				inode->i_size, ext2_get_block);
-	else
-		block_truncate_page(inode->i_mapping,
-				inode->i_size, ext2_get_block);
+	iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb);
 
 	n = ext2_block_to_path(inode, iblock, offsets, NULL);
 	if (n == 0)
@@ -1120,6 +1144,59 @@  do_indirects:
 	ext2_discard_reservation(inode);
 
 	mutex_unlock(&ei->truncate_mutex);
+}
+
+static void ext2_truncate_blocks(struct inode *inode, loff_t offset)
+{
+	/*
+	 * XXX: it seems like a bug here that we don't allow
+	 * IS_APPEND inode to have blocks-past-i_size trimmed off.
+	 * review and fix this.
+	 */
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	    S_ISLNK(inode->i_mode)))
+		return -EINVAL;
+	if (ext2_inode_is_fast_symlink(inode))
+		return -EINVAL;
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+	__ext2_truncate_blocks(inode, offset);
+}
+
+int ext2_setsize(struct inode *inode, loff_t newsize)
+{
+	loff_t oldsize;
+	int error;
+
+	error = inode_newsize_ok(inode, newsize);
+	if (error)
+		return error;
+
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
+	    S_ISLNK(inode->i_mode)))
+		return -EINVAL;
+	if (ext2_inode_is_fast_symlink(inode))
+		return -EINVAL;
+	if (IS_APPEND(inode) || IS_IMMUTABLE(inode))
+		return -EPERM;
+
+	if (mapping_is_xip(inode->i_mapping))
+		error = xip_truncate_page(inode->i_mapping, newsize);
+	else if (test_opt(inode->i_sb, NOBH))
+		error = nobh_truncate_page(inode->i_mapping,
+				newsize, ext2_get_block);
+	else
+		error = block_truncate_page(inode->i_mapping,
+				newsize, ext2_get_block);
+	if (error)
+		return error;
+
+	oldsize = inode->i_size;
+	i_size_write(inode, newsize);
+	truncate_pagecache(inode, oldsize, newsize);
+
+	__ext2_truncate_blocks(inode, newsize);
+
 	inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC;
 	if (inode_needs_sync(inode)) {
 		sync_mapping_buffers(inode->i_mapping);
@@ -1127,6 +1204,8 @@  do_indirects:
 	} else {
 		mark_inode_dirty(inode);
 	}
+
+	return 0;
 }
 
 static struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino,
@@ -1459,6 +1538,13 @@  int ext2_setattr(struct dentry *dentry,
 		if (error)
 			return error;
 	}
+	if (iattr->ia_valid & ATTR_SIZE) {
+		error = ext2_setsize(inode, iattr->ia_size);
+		if (error)
+			return error;
+		iattr->ia_valid &= ~ATTR_SIZE;
+		/* strip ATTR_SIZE for inode_setattr */
+	}
 	error = inode_setattr(inode, iattr);
 	if (!error && (iattr->ia_valid & ATTR_MODE))
 		error = ext2_acl_chmod(inode);
Index: linux-2.6/fs/ext2/ext2.h
===================================================================
--- linux-2.6.orig/fs/ext2/ext2.h
+++ linux-2.6/fs/ext2/ext2.h
@@ -122,7 +122,6 @@  extern int ext2_write_inode (struct inod
 extern void ext2_delete_inode (struct inode *);
 extern int ext2_sync_inode (struct inode *);
 extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int);
-extern void ext2_truncate (struct inode *);
 extern int ext2_setattr (struct dentry *, struct iattr *);
 extern void ext2_set_inode_flags(struct inode *inode);
 extern void ext2_get_inode_flags(struct ext2_inode_info *);
Index: linux-2.6/fs/ext2/file.c
===================================================================
--- linux-2.6.orig/fs/ext2/file.c
+++ linux-2.6/fs/ext2/file.c
@@ -77,13 +77,13 @@  const struct file_operations ext2_xip_fi
 #endif
 
 const struct inode_operations ext2_file_inode_operations = {
-	.truncate	= ext2_truncate,
 #ifdef CONFIG_EXT2_FS_XATTR
 	.setxattr	= generic_setxattr,
 	.getxattr	= generic_getxattr,
 	.listxattr	= ext2_listxattr,
 	.removexattr	= generic_removexattr,
 #endif
+	.new_truncate	= 1,
 	.setattr	= ext2_setattr,
 	.permission	= ext2_permission,
 	.fiemap		= ext2_fiemap,