Message ID | 20090710073231.195528273@suse.de |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Nick, On Fri 10-07-09 17:30:33, npiggin@suse.de wrote: > I have some questions, marked with XXX. > > Cc: linux-ext4@vger.kernel.org > Signed-off-by: Nick Piggin <npiggin@suse.de> > --- > fs/ext2/ext2.h | 1 > fs/ext2/file.c | 2 > fs/ext2/inode.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------- > 3 files changed, 113 insertions(+), 28 deletions(-) > > Index: linux-2.6/fs/ext2/inode.c > =================================================================== > --- linux-2.6.orig/fs/ext2/inode.c > +++ linux-2.6/fs/ext2/inode.c ... > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset) > +{ > + /* > + * XXX: it seems like a bug here that we don't allow > + * IS_APPEND inode to have blocks-past-i_size trimmed off. > + * review and fix this. > + */ > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || > + S_ISLNK(inode->i_mode))) > + return -EINVAL; > + if (ext2_inode_is_fast_symlink(inode)) > + return -EINVAL; > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) > + return -EPERM; > + __ext2_truncate_blocks(inode, offset); > +} Actually, looking again into this, I think you've introduced a subtle bug into the code. When a write fails for some reason, we did vmtruncate() previously which called block_truncate_page() which zeroed a tail of the last block. Now, ext2_truncate_blocks() does not do this and I think it could be a problem because at least in direct IO case, we could have written some data into that block on disk. We really rely on the tail of the block being zeroed because otherwise extending truncate will show those old data in the block. I plan to change that in my mkwrite fixes but until then, we should preserve the old assumptions. So I think that ext2_truncate_blocks() should do all that tail page magic as well (although it's not needed in all cases). Honza
On Tue, Sep 01, 2009 at 08:29:29PM +0200, Jan Kara wrote: > Hi Nick, > > On Fri 10-07-09 17:30:33, npiggin@suse.de wrote: > > I have some questions, marked with XXX. > > > > Cc: linux-ext4@vger.kernel.org > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > --- > > fs/ext2/ext2.h | 1 > > fs/ext2/file.c | 2 > > fs/ext2/inode.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------- > > 3 files changed, 113 insertions(+), 28 deletions(-) > > > > Index: linux-2.6/fs/ext2/inode.c > > =================================================================== > > --- linux-2.6.orig/fs/ext2/inode.c > > +++ linux-2.6/fs/ext2/inode.c > ... > > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset) > > +{ > > + /* > > + * XXX: it seems like a bug here that we don't allow > > + * IS_APPEND inode to have blocks-past-i_size trimmed off. > > + * review and fix this. > > + */ > > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || > > + S_ISLNK(inode->i_mode))) > > + return -EINVAL; > > + if (ext2_inode_is_fast_symlink(inode)) > > + return -EINVAL; > > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) > > + return -EPERM; > > + __ext2_truncate_blocks(inode, offset); > > +} > Actually, looking again into this, I think you've introduced a subtle bug > into the code. When a write fails for some reason, we did vmtruncate() > previously which called block_truncate_page() which zeroed a tail of the > last block. Now, ext2_truncate_blocks() does not do this and I think it > could be a problem because at least in direct IO case, we could have written > some data into that block on disk. > We really rely on the tail of the block being zeroed because otherwise > extending truncate will show those old data in the block. I plan to change > that in my mkwrite fixes but until then, we should preserve the old > assumptions. > So I think that ext2_truncate_blocks() should do all that tail page magic > as well (although it's not needed in all cases). Hi Jan, Yeah I did think about this and yes we usually do need to zero out the page but for these error cases with normal writes we shouldn't write anything in there. For direct IO... I don't see the problem because we're not coherent with pagecache anyway... Hmm, but possiby it is a good idea just to keep the same block_truncate_page calls for this patchset and we can look at it again with your truncate patches. I'll work on fixing these up. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 02-09-09 11:14:26, Nick Piggin wrote: > On Tue, Sep 01, 2009 at 08:29:29PM +0200, Jan Kara wrote: > > Hi Nick, > > > > On Fri 10-07-09 17:30:33, npiggin@suse.de wrote: > > > I have some questions, marked with XXX. > > > > > > Cc: linux-ext4@vger.kernel.org > > > Signed-off-by: Nick Piggin <npiggin@suse.de> > > > --- > > > fs/ext2/ext2.h | 1 > > > fs/ext2/file.c | 2 > > > fs/ext2/inode.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------- > > > 3 files changed, 113 insertions(+), 28 deletions(-) > > > > > > Index: linux-2.6/fs/ext2/inode.c > > > =================================================================== > > > --- linux-2.6.orig/fs/ext2/inode.c > > > +++ linux-2.6/fs/ext2/inode.c > > ... > > > +static void ext2_truncate_blocks(struct inode *inode, loff_t offset) > > > +{ > > > + /* > > > + * XXX: it seems like a bug here that we don't allow > > > + * IS_APPEND inode to have blocks-past-i_size trimmed off. > > > + * review and fix this. > > > + */ > > > + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || > > > + S_ISLNK(inode->i_mode))) > > > + return -EINVAL; > > > + if (ext2_inode_is_fast_symlink(inode)) > > > + return -EINVAL; > > > + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) > > > + return -EPERM; > > > + __ext2_truncate_blocks(inode, offset); > > > +} > > Actually, looking again into this, I think you've introduced a subtle bug > > into the code. When a write fails for some reason, we did vmtruncate() > > previously which called block_truncate_page() which zeroed a tail of the > > last block. Now, ext2_truncate_blocks() does not do this and I think it > > could be a problem because at least in direct IO case, we could have written > > some data into that block on disk. > > We really rely on the tail of the block being zeroed because otherwise > > extending truncate will show those old data in the block. I plan to change > > that in my mkwrite fixes but until then, we should preserve the old > > assumptions. > > So I think that ext2_truncate_blocks() should do all that tail page magic > > as well (although it's not needed in all cases). > > Hi Jan, > > Yeah I did think about this and yes we usually do need to zero out > the page but for these error cases with normal writes we shouldn't > write anything in there. For direct IO... I don't see the problem > because we're not coherent with pagecache anyway... We are not coherent but that's irrelevant - if a failed direct write is followed by an extending truncate and read, it will read the block from disk and could see non-zeros where there should be zeros... > Hmm, but possiby it is a good idea just to keep the same block_truncate_page > calls for this patchset and we can look at it again with your truncate > patches. I'll work on fixing these up. Yes, I think that keeping this change for a separate patch is definitely better. Honza
Index: linux-2.6/fs/ext2/inode.c =================================================================== --- linux-2.6.orig/fs/ext2/inode.c +++ linux-2.6/fs/ext2/inode.c @@ -53,6 +53,8 @@ static inline int ext2_inode_is_fast_sym inode->i_blocks - ea_blocks == 0); } +static void ext2_truncate_blocks(struct inode *inode, loff_t offset); + /* * Called at the last iput() if i_nlink is zero. */ @@ -66,9 +68,10 @@ void ext2_delete_inode (struct inode * i mark_inode_dirty(inode); ext2_write_inode(inode, inode_needs_sync(inode)); + /* XXX: where is truncate_inode_pages? */ inode->i_size = 0; if (inode->i_blocks) - ext2_truncate (inode); + ext2_truncate_blocks(inode, 0); ext2_free_inode (inode); return; @@ -761,8 +764,33 @@ ext2_write_begin(struct file *file, stru loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata) { + int ret; + *pagep = NULL; - return __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata); + ret = __ext2_write_begin(file, mapping, pos, len, flags, pagep,fsdata); + if (ret < 0) { + struct inode *inode = mapping->host; + loff_t isize = inode->i_size; + if (pos + len > isize) + ext2_truncate_blocks(inode, isize); + } + return ret; +} + +static int ext2_write_end(struct file *file, struct address_space *mapping, + loff_t pos, unsigned len, unsigned copied, + struct page *page, void *fsdata) +{ + int ret; + + ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata); + if (ret < len) { + struct inode *inode = mapping->host; + loff_t isize = inode->i_size; + if (pos + len > isize) + ext2_truncate_blocks(inode, isize); + } + return ret; } static int @@ -770,13 +798,22 @@ ext2_nobh_write_begin(struct file *file, loff_t pos, unsigned len, unsigned flags, struct page **pagep, void **fsdata) { + int ret; + /* * Dir-in-pagecache still uses ext2_write_begin. Would have to rework * directory handling code to pass around offsets rather than struct * pages in order to make this work easily. */ - return nobh_write_begin(file, mapping, pos, len, flags, pagep, fsdata, + ret = nobh_write_begin(file, mapping, pos, len, flags, pagep, fsdata, ext2_get_block); + if (ret < 0) { + struct inode *inode = mapping->host; + loff_t isize = inode->i_size; + if (pos + len > isize) + ext2_truncate_blocks(inode, isize); + } + return ret; } static int ext2_nobh_writepage(struct page *page, @@ -796,9 +833,15 @@ ext2_direct_IO(int rw, struct kiocb *ioc { struct file *file = iocb->ki_filp; struct inode *inode = file->f_mapping->host; + ssize_t ret; - return blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, + ret = blockdev_direct_IO(rw, iocb, inode, inode->i_sb->s_bdev, iov, offset, nr_segs, ext2_get_block, NULL); + if (ret < 0 && (rw & WRITE)) { + loff_t isize = inode->i_size; + ext2_truncate_blocks(inode, isize); + } + return ret; } static int @@ -813,7 +856,7 @@ const struct address_space_operations ex .writepage = ext2_writepage, .sync_page = block_sync_page, .write_begin = ext2_write_begin, - .write_end = generic_write_end, + .write_end = ext2_write_end, .bmap = ext2_bmap, .direct_IO = ext2_direct_IO, .writepages = ext2_writepages, @@ -1020,7 +1063,7 @@ static void ext2_free_branches(struct in ext2_free_data(inode, p, q); } -void ext2_truncate(struct inode *inode) +static void __ext2_truncate_blocks(struct inode *inode, loff_t offset) { __le32 *i_data = EXT2_I(inode)->i_data; struct ext2_inode_info *ei = EXT2_I(inode); @@ -1032,27 +1075,8 @@ void ext2_truncate(struct inode *inode) int n; long iblock; unsigned blocksize; - - if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || - S_ISLNK(inode->i_mode))) - return; - if (ext2_inode_is_fast_symlink(inode)) - return; - if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) - return; - blocksize = inode->i_sb->s_blocksize; - iblock = (inode->i_size + blocksize-1) - >> EXT2_BLOCK_SIZE_BITS(inode->i_sb); - - if (mapping_is_xip(inode->i_mapping)) - xip_truncate_page(inode->i_mapping, inode->i_size); - else if (test_opt(inode->i_sb, NOBH)) - nobh_truncate_page(inode->i_mapping, - inode->i_size, ext2_get_block); - else - block_truncate_page(inode->i_mapping, - inode->i_size, ext2_get_block); + iblock = (offset + blocksize-1) >> EXT2_BLOCK_SIZE_BITS(inode->i_sb); n = ext2_block_to_path(inode, iblock, offsets, NULL); if (n == 0) @@ -1120,6 +1144,59 @@ do_indirects: ext2_discard_reservation(inode); mutex_unlock(&ei->truncate_mutex); +} + +static void ext2_truncate_blocks(struct inode *inode, loff_t offset) +{ + /* + * XXX: it seems like a bug here that we don't allow + * IS_APPEND inode to have blocks-past-i_size trimmed off. + * review and fix this. + */ + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || + S_ISLNK(inode->i_mode))) + return -EINVAL; + if (ext2_inode_is_fast_symlink(inode)) + return -EINVAL; + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) + return -EPERM; + __ext2_truncate_blocks(inode, offset); +} + +int ext2_setsize(struct inode *inode, loff_t newsize) +{ + loff_t oldsize; + int error; + + error = inode_newsize_ok(inode, newsize); + if (error) + return error; + + if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || + S_ISLNK(inode->i_mode))) + return -EINVAL; + if (ext2_inode_is_fast_symlink(inode)) + return -EINVAL; + if (IS_APPEND(inode) || IS_IMMUTABLE(inode)) + return -EPERM; + + if (mapping_is_xip(inode->i_mapping)) + error = xip_truncate_page(inode->i_mapping, newsize); + else if (test_opt(inode->i_sb, NOBH)) + error = nobh_truncate_page(inode->i_mapping, + newsize, ext2_get_block); + else + error = block_truncate_page(inode->i_mapping, + newsize, ext2_get_block); + if (error) + return error; + + oldsize = inode->i_size; + i_size_write(inode, newsize); + truncate_pagecache(inode, oldsize, newsize); + + __ext2_truncate_blocks(inode, newsize); + inode->i_mtime = inode->i_ctime = CURRENT_TIME_SEC; if (inode_needs_sync(inode)) { sync_mapping_buffers(inode->i_mapping); @@ -1127,6 +1204,8 @@ do_indirects: } else { mark_inode_dirty(inode); } + + return 0; } static struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino, @@ -1459,6 +1538,13 @@ int ext2_setattr(struct dentry *dentry, if (error) return error; } + if (iattr->ia_valid & ATTR_SIZE) { + error = ext2_setsize(inode, iattr->ia_size); + if (error) + return error; + iattr->ia_valid &= ~ATTR_SIZE; + /* strip ATTR_SIZE for inode_setattr */ + } error = inode_setattr(inode, iattr); if (!error && (iattr->ia_valid & ATTR_MODE)) error = ext2_acl_chmod(inode); Index: linux-2.6/fs/ext2/ext2.h =================================================================== --- linux-2.6.orig/fs/ext2/ext2.h +++ linux-2.6/fs/ext2/ext2.h @@ -122,7 +122,6 @@ extern int ext2_write_inode (struct inod extern void ext2_delete_inode (struct inode *); extern int ext2_sync_inode (struct inode *); extern int ext2_get_block(struct inode *, sector_t, struct buffer_head *, int); -extern void ext2_truncate (struct inode *); extern int ext2_setattr (struct dentry *, struct iattr *); extern void ext2_set_inode_flags(struct inode *inode); extern void ext2_get_inode_flags(struct ext2_inode_info *); Index: linux-2.6/fs/ext2/file.c =================================================================== --- linux-2.6.orig/fs/ext2/file.c +++ linux-2.6/fs/ext2/file.c @@ -77,13 +77,13 @@ const struct file_operations ext2_xip_fi #endif const struct inode_operations ext2_file_inode_operations = { - .truncate = ext2_truncate, #ifdef CONFIG_EXT2_FS_XATTR .setxattr = generic_setxattr, .getxattr = generic_getxattr, .listxattr = ext2_listxattr, .removexattr = generic_removexattr, #endif + .new_truncate = 1, .setattr = ext2_setattr, .permission = ext2_permission, .fiemap = ext2_fiemap,
I have some questions, marked with XXX. Cc: linux-ext4@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@suse.de> --- fs/ext2/ext2.h | 1 fs/ext2/file.c | 2 fs/ext2/inode.c | 138 +++++++++++++++++++++++++++++++++++++++++++++----------- 3 files changed, 113 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html