Message ID | 20110517225926.8B4A94225B@ruihe.smo.corp.google.com |
---|---|
State | Superseded, archived |
Headers | show |
On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. > > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } Hi there, It seems that the error handling has problems. 1)We should handle error as the below code does. or 2) we can add a OR condition to the if statement below so that it can handle case here. I prefer the 2nd solution. it looks like: - /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); - if ((attr->ia_valid & ATTR_SIZE) && - attr->ia_size != i_size_read(inode)) + if (((attr->ia_valid & ATTR_SIZE) && + attr->ia_size != i_size_read(inode)) || + ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS)) rc = vmtruncate(inode, attr->ia_size); Yongqiang. > } > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. > > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } > } Another question, does vmtruncate() write zeros to blocks beyond old EOF? I had a rough look and could not find the code doing this, maybe due to my density. Yongqiang. > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On 5/17/11 5:59 PM, Jiaying Zhang wrote: > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > flag and then ftruncate the file to a size larger than the file's i_size, > any allocated but unwritten blocks will be freed but the file size is set > to the size that ftruncate specifies. > > Here is a simple test to reproduce the problem: > 1. fallocate a 12k size file with KEEP_SIZE flag > 2. write the first 4k > 3. ftruncate the file to 8k > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > shows the file has only the first written block left. To be honest I'm not 100% certain what the fiesystem -should- do in this case. If I go through that same sequence on xfs, I get 4k written / 8k unwritten: # xfs_bmap -vp testfile testfile: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 size 8k: # ls -l testfile -rw-r--r-- 1 root root 8192 May 17 22:33 testfile and diskspace used 12k: # du -hc testfile 12K testfile 12K total I think this is a different result from ext4, either with or without your patch. On ext4 I get size 8k, but only the first 4k mapped, as you say. I don't recall when truncate is supposed to free fallocated blocks, and from what point? -Eric > Below is the proposed patch to fix the bug: > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > of ext4_truncate(inode) when it needs to truncate an inode so that > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > to a size larger than the inode's i_size, we will only truncate the blocks > beyond the specified truncate size instead of all of blocks beyond i_size. > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 3424e82..3bfad57 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > } > } > /* ext4_truncate will clear the flag */ > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > - ext4_truncate(inode); > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > + rc = vmtruncate(inode, attr->ia_size); > + if (rc) > + goto err_out; > + } > } > > if ((attr->ia_valid & ATTR_SIZE) && > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 18, 2011 at 10:56 AM, Yongqiang Yang <xiaoqiangnk@gmail.com> wrote: > On Wed, May 18, 2011 at 6:59 AM, Jiaying Zhang <jiayingz@google.com> wrote: >> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> flag and then ftruncate the file to a size larger than the file's i_size, >> any allocated but unwritten blocks will be freed but the file size is set >> to the size that ftruncate specifies. >> >> Here is a simple test to reproduce the problem: >> 1. fallocate a 12k size file with KEEP_SIZE flag >> 2. write the first 4k >> 3. ftruncate the file to 8k >> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> shows the file has only the first written block left. >> >> Below is the proposed patch to fix the bug: >> >> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >> >> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >> of ext4_truncate(inode) when it needs to truncate an inode so that >> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >> to a size larger than the inode's i_size, we will only truncate the blocks >> beyond the specified truncate size instead of all of blocks beyond i_size. >> >> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 3424e82..3bfad57 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >> } >> } >> /* ext4_truncate will clear the flag */ >> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >> - ext4_truncate(inode); >> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >> + rc = vmtruncate(inode, attr->ia_size); >> + if (rc) >> + goto err_out; >> + } >> } > Another question, does vmtruncate() write zeros to blocks beyond old > EOF? I had a rough look and could not find the code doing this, maybe > due to my density. Sorry, ignore this, the extent is unwritten, no need to zero. > > Yongqiang. >> >> if ((attr->ia_valid & ATTR_SIZE) && >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Best Wishes > Yongqiang Yang >
On May 17, 2011, at 21:19, Eric Sandeen wrote: > On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> flag and then ftruncate the file to a size larger than the file's i_size, >> any allocated but unwritten blocks will be freed but the file size is set >> to the size that ftruncate specifies. >> >> Here is a simple test to reproduce the problem: >> 1. fallocate a 12k size file with KEEP_SIZE flag >> 2. write the first 4k >> 3. ftruncate the file to 8k >> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > -Eric > >> Below is the proposed patch to fix the bug: >> >> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >> >> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >> of ext4_truncate(inode) when it needs to truncate an inode so that >> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >> to a size larger than the inode's i_size, we will only truncate the blocks >> beyond the specified truncate size instead of all of blocks beyond i_size. >> >> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >> >> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >> index 3424e82..3bfad57 100644 >> --- a/fs/ext4/inode.c >> +++ b/fs/ext4/inode.c >> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >> } >> } >> /* ext4_truncate will clear the flag */ >> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >> - ext4_truncate(inode); >> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >> + rc = vmtruncate(inode, attr->ia_size); >> + if (rc) >> + goto err_out; >> + } >> } >> >> if ((attr->ia_valid & ATTR_SIZE) && >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote: > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 Ok, so that's the case for a _truncate up_ from 4k to 8k: $ rm /mnt/test/foo $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 0 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000 wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec) /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 8192 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 2 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 But you get a different result on truncate down: $rm /mnt/test/foo $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 12288 stat.blocks = 24 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 1 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000 wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec) /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000 /mnt/test/foo: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000 fd.path = "/mnt/test/foo" fd.flags = non-sync,non-direct,read-write stat.ino = 71 stat.type = regular file stat.size = 8192 stat.blocks = 16 fsxattr.xflags = 0x2 [-p------------] fsxattr.projid = 0 fsxattr.extsize = 0 fsxattr.nextents = 2 fsxattr.naextents = 0 dioattr.mem = 0x200 dioattr.miniosz = 512 dioattr.maxiosz = 2147483136 IOWs, on XFS a truncate up does not change the preallocation at all, while a truncate down will _always_ remove preallocation beyond the new EOF. It's always had this behaviour w.r.t. to truncate(2) and preallocation beyond EOF. > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? It's entirely up to the filesystem how it treats blocks beyond EOF during truncation. XFS frees them on truncate down, because it is much safer to just truncate away everything beyond the new EOF than to leave written extents beyond EOF as potential landmines. Indeed, that's why calling vmtruncate() as a bad fix. If you have: NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU ....----+----------+--------+--------+ A B C D Where A = new EOF (N) A->B = unwritten (U) B->C = written (W) C = old EOF (O) C->D = unwritten (U) Then just calling vmtruncate() will leave the blocks in the range B->C as written blocks. Hence then doing an extending truncate back out to D will expose stale data rather than zeros in the range B->C.... Cheers, Dave.
On 5/18/11 1:13 AM, Dave Chinner wrote: > It's entirely up to the filesystem how it treats blocks beyond EOF > during truncation. XFS frees them on truncate down, because it is > much safer to just truncate away everything beyond the new EOF than > to leave written extents beyond EOF as potential landmines. > > Indeed, that's why calling vmtruncate() as a bad fix. If you have: > > > NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU > ....----+----------+--------+--------+ > A B C D > > Where A = new EOF (N) > A->B = unwritten (U) > B->C = written (W) > C = old EOF (O) > C->D = unwritten (U) > > Then just calling vmtruncate() will leave the blocks in the range > B->C as written blocks. Hence then doing an extending truncate back > out to D will expose stale data rather than zeros in the range > B->C.... Hm, running recent xfstests which includes fsx-with-fallocate should probably eventually catch that then. -Eric > Cheers, > > Dave. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com> wrote: > > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. I agree that truncating to a size larger than i_size is un-specified by POSIX. However, I think the problem with the current behavior is that we have an inconsistency between file's i_size and its extent tree. Now we have 8k i_size but the file has only 4k space allocated. That can confuse applications. Jiaying > > I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > -Eric > > > Below is the proposed patch to fix the bug: > > > > ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). > > > > Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead > > of ext4_truncate(inode) when it needs to truncate an inode so that > > if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate > > to a size larger than the inode's i_size, we will only truncate the blocks > > beyond the specified truncate size instead of all of blocks beyond i_size. > > > > Signed-off-by: Jiaying Zhang <jiayingz@google.com> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 3424e82..3bfad57 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) > > } > > } > > /* ext4_truncate will clear the flag */ > > - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) > > - ext4_truncate(inode); > > + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { > > + rc = vmtruncate(inode, attr->ia_size); > > + if (rc) > > + goto err_out; > > + } > > } > > > > if ((attr->ia_valid & ATTR_SIZE) && > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 5/18/11 3:28 PM, Jiaying Zhang wrote: > > > On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com <mailto:sandeen@redhat.com>> wrote: > > On 5/17/11 5:59 PM, Jiaying Zhang wrote: > > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks > > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE > > flag and then ftruncate the file to a size larger than the file's i_size, > > any allocated but unwritten blocks will be freed but the file size is set > > to the size that ftruncate specifies. > > > > Here is a simple test to reproduce the problem: > > 1. fallocate a 12k size file with KEEP_SIZE flag > > 2. write the first 4k > > 3. ftruncate the file to 8k > > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs > > shows the file has only the first written block left. > > To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > If I go through that same sequence on xfs, I get 4k written / 8k unwritten: > > # xfs_bmap -vp testfile > testfile: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 > 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > size 8k: > # ls -l testfile > -rw-r--r-- 1 root root 8192 May 17 22:33 testfile > > and diskspace used 12k: > # du -hc testfile > 12K testfile > 12K total > > I think this is a different result from ext4, either with or without your patch. > > On ext4 I get size 8k, but only the first 4k mapped, as you say. > > I agree that truncating to a size larger than i_size is un-specified by > POSIX. However, I think the problem with the current behavior is that > we have an inconsistency between file's i_size and its extent tree. > Now we have 8k i_size but the file has only 4k space allocated. That > can confuse applications. That's called "a sparse file" right? Apps should not be confused by that ... -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On May 17, 2011, at 21:19, Eric Sandeen wrote: >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>> flag and then ftruncate the file to a size larger than the file's i_size, >>> any allocated but unwritten blocks will be freed but the file size is set >>> to the size that ftruncate specifies. >>> >>> Here is a simple test to reproduce the problem: >>> 1. fallocate a 12k size file with KEEP_SIZE flag >>> 2. write the first 4k >>> 3. ftruncate the file to 8k >>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>> shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. > > I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. > Indeed we have applications that are doing exactly the same as you described. They always fallocate files to a pre-defined size with KEEP_SIZE flag and if they end up using less than the allocated size, they ftruncate files to their written size later. Jiaying >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >> >> size 8k: >> # ls -l testfile >> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >> >> and diskspace used 12k: >> # du -hc testfile >> 12K testfile >> 12K total >> >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >> >> -Eric >> >>> Below is the proposed patch to fix the bug: >>> >>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>> >>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>> of ext4_truncate(inode) when it needs to truncate an inode so that >>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>> to a size larger than the inode's i_size, we will only truncate the blocks >>> beyond the specified truncate size instead of all of blocks beyond i_size. >>> >>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>> index 3424e82..3bfad57 100644 >>> --- a/fs/ext4/inode.c >>> +++ b/fs/ext4/inode.c >>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>> } >>> } >>> /* ext4_truncate will clear the flag */ >>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>> - ext4_truncate(inode); >>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>> + rc = vmtruncate(inode, attr->ia_size); >>> + if (rc) >>> + goto err_out; >>> + } >>> } >>> >>> if ((attr->ia_valid & ATTR_SIZE) && >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Cheers, Andreas > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 18, 2011 at 1:31 PM, Eric Sandeen <sandeen@redhat.com> wrote: > On 5/18/11 3:28 PM, Jiaying Zhang wrote: >> >> >> On Tue, May 17, 2011 at 8:19 PM, Eric Sandeen <sandeen@redhat.com <mailto:sandeen@redhat.com>> wrote: >> >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> > flag and then ftruncate the file to a size larger than the file's i_size, >> > any allocated but unwritten blocks will be freed but the file size is set >> > to the size that ftruncate specifies. >> > >> > Here is a simple test to reproduce the problem: >> > 1. fallocate a 12k size file with KEEP_SIZE flag >> > 2. write the first 4k >> > 3. ftruncate the file to 8k >> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> > shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >> >> size 8k: >> # ls -l testfile >> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >> >> and diskspace used 12k: >> # du -hc testfile >> 12K testfile >> 12K total >> >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I agree that truncating to a size larger than i_size is un-specified by >> POSIX. However, I think the problem with the current behavior is that >> we have an inconsistency between file's i_size and its extent tree. >> Now we have 8k i_size but the file has only 4k space allocated. That >> can confuse applications. > > That's called "a sparse file" right? Apps should not be confused by that ... But applications don't intend to create a sparse file in this case. They may create a lot of such files in this way with 'df' showing there is still plenty of free space available but as they start writing to these files they will fill up space and get ENOSPC unexpectedly. Jiaying > > -Eric > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, May 17, 2011 at 11:13 PM, Dave Chinner <david@fromorbit.com> wrote: > On Tue, May 17, 2011 at 10:19:05PM -0500, Eric Sandeen wrote: >> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >> > There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >> > intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >> > flag and then ftruncate the file to a size larger than the file's i_size, >> > any allocated but unwritten blocks will be freed but the file size is set >> > to the size that ftruncate specifies. >> > >> > Here is a simple test to reproduce the problem: >> > 1. fallocate a 12k size file with KEEP_SIZE flag >> > 2. write the first 4k >> > 3. ftruncate the file to 8k >> > Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >> > shows the file has only the first written block left. >> >> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >> >> # xfs_bmap -vp testfile >> testfile: >> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 > > Ok, so that's the case for a _truncate up_ from 4k to 8k: > > $ rm /mnt/test/foo > $ xfs_io -f -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 0 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9712..9735 0 (9712..9735) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (156 MiB/sec and 40000.0000 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9712..9719 0 (9712..9719) 8 00000 > 1: [8..23]: 9720..9735 0 (9720..9735) 16 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > But you get a different result on truncate down: > > $rm /mnt/test/foo > $ xfs_io -f -c "truncate 12k" -c "resvsp 0 12k" -c stat -c "bmap -vp" -c "pwrite 0 4k" -c "fsync" -c "bmap -vp" -c "t 8k" -c "bmap -vp" -c stat /mnt/test/foo > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 12288 > stat.blocks = 24 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 1 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..23]: 9584..9607 0 (9584..9607) 24 10000 > wrote 4096/4096 bytes at offset 0 > 4 KiB, 1 ops; 0.0000 sec (217.014 MiB/sec and 55555.5556 ops/sec) > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..23]: 9592..9607 0 (9592..9607) 16 10000 > /mnt/test/foo: > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS > 0: [0..7]: 9584..9591 0 (9584..9591) 8 00000 > 1: [8..15]: 9592..9599 0 (9592..9599) 8 10000 > fd.path = "/mnt/test/foo" > fd.flags = non-sync,non-direct,read-write > stat.ino = 71 > stat.type = regular file > stat.size = 8192 > stat.blocks = 16 > fsxattr.xflags = 0x2 [-p------------] > fsxattr.projid = 0 > fsxattr.extsize = 0 > fsxattr.nextents = 2 > fsxattr.naextents = 0 > dioattr.mem = 0x200 > dioattr.miniosz = 512 > dioattr.maxiosz = 2147483136 > > IOWs, on XFS a truncate up does not change the preallocation at all, > while a truncate down will _always_ remove preallocation beyond the > new EOF. It's always had this behaviour w.r.t. to truncate(2) and > preallocation beyond EOF. > >> I think this is a different result from ext4, either with or without your patch. >> >> On ext4 I get size 8k, but only the first 4k mapped, as you say. >> >> I don't recall when truncate is supposed to free fallocated blocks, and from what point? > > It's entirely up to the filesystem how it treats blocks beyond EOF > during truncation. XFS frees them on truncate down, because it is > much safer to just truncate away everything beyond the new EOF than > to leave written extents beyond EOF as potential landmines. > > Indeed, that's why calling vmtruncate() as a bad fix. If you have: > > > NUUUUUUUUUUWWWWWWWWWOUUUUUUUUU > ....----+----------+--------+--------+ > A B C D > > Where A = new EOF (N) > A->B = unwritten (U) > B->C = written (W) > C = old EOF (O) > C->D = unwritten (U) > > Then just calling vmtruncate() will leave the blocks in the range > B->C as written blocks. Hence then doing an extending truncate back > out to D will expose stale data rather than zeros in the range > B->C.... Sorry I am a little confused. If I understand correctly, in the situation you described, we call a truncate that causes EOF to change from C to A. On ext4, we should free all of blocks after A. And when we do an extending truncate to D, any blocks beyond A should be treated as unwritten blocks so we should not expose any stale data, right? Jiaying > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On May 18, 2011, at 14:32, Jiaying Zhang wrote: > On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: >> On May 17, 2011, at 21:19, Eric Sandeen wrote: >>> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>>> flag and then ftruncate the file to a size larger than the file's i_size, >>>> any allocated but unwritten blocks will be freed but the file size is set >>>> to the size that ftruncate specifies. >>>> >>>> Here is a simple test to reproduce the problem: >>>> 1. fallocate a 12k size file with KEEP_SIZE flag >>>> 2. write the first 4k >>>> 3. ftruncate the file to 8k >>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>>> shows the file has only the first written block left. >>> >>> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >> >> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. >> > > Indeed we have applications that are doing exactly the same as you > described. They always fallocate files to a pre-defined size with > KEEP_SIZE flag and if they end up using less than the allocated size, > they ftruncate files to their written size later. If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing. If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation. >>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >>> >>> # xfs_bmap -vp testfile >>> testfile: >>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >>> >>> size 8k: >>> # ls -l testfile >>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >>> >>> and diskspace used 12k: >>> # du -hc testfile >>> 12K testfile >>> 12K total >>> >>> I think this is a different result from ext4, either with or without your patch. >>> >>> On ext4 I get size 8k, but only the first 4k mapped, as you say. >>> >>> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >>> >>> -Eric >>> >>>> Below is the proposed patch to fix the bug: >>>> >>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>>> >>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>>> of ext4_truncate(inode) when it needs to truncate an inode so that >>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>>> to a size larger than the inode's i_size, we will only truncate the blocks >>>> beyond the specified truncate size instead of all of blocks beyond i_size. >>>> >>>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>>> >>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>> index 3424e82..3bfad57 100644 >>>> --- a/fs/ext4/inode.c >>>> +++ b/fs/ext4/inode.c >>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>>> } >>>> } >>>> /* ext4_truncate will clear the flag */ >>>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>>> - ext4_truncate(inode); >>>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>>> + rc = vmtruncate(inode, attr->ia_size); >>>> + if (rc) >>>> + goto err_out; >>>> + } >>>> } >>>> >>>> if ((attr->ia_valid & ATTR_SIZE) && >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> Cheers, Andreas >> >> >> >> >> >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 18, 2011 at 1:45 PM, Andreas Dilger <adilger@dilger.ca> wrote: > On May 18, 2011, at 14:32, Jiaying Zhang wrote: >> On Tue, May 17, 2011 at 10:35 PM, Andreas Dilger <adilger@dilger.ca> wrote: >>> On May 17, 2011, at 21:19, Eric Sandeen wrote: >>>> On 5/17/11 5:59 PM, Jiaying Zhang wrote: >>>>> There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks >>>>> intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE >>>>> flag and then ftruncate the file to a size larger than the file's i_size, >>>>> any allocated but unwritten blocks will be freed but the file size is set >>>>> to the size that ftruncate specifies. >>>>> >>>>> Here is a simple test to reproduce the problem: >>>>> 1. fallocate a 12k size file with KEEP_SIZE flag >>>>> 2. write the first 4k >>>>> 3. ftruncate the file to 8k >>>>> Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs >>>>> shows the file has only the first written block left. >>>> >>>> To be honest I'm not 100% certain what the fiesystem -should- do in this case. >>> >>> I think it makes sense from a usage POV to discard the blocks after EOF when a truncate is being done. For something like a PVR that is recording a show, but doesn't know the exact total size, it makes sense to fallocate() some estimated amount of space, and then when the show is finished recording it uses ftruncate() of the current size to drop the fallocated space. >>> >> >> Indeed we have applications that are doing exactly the same as you >> described. They always fallocate files to a pre-defined size with >> KEEP_SIZE flag and if they end up using less than the allocated size, >> they ftruncate files to their written size later. > > If XFS is already handling truncate-up and truncate-down differently, I don't mind keeping consistent behaviour with XFS. I had thought about this also, that truncate-up isn't intending to throw away space while truncate-down is. However, I didn't mention it in my email because I thought the semantics of having different behaviour for truncate-up vs. truncate-down was confusing. > > If XFS is already doing this, then it seems that this is at least somewhat expected by applications and/or is more efficient in the long run for the on-disk allocation. With my patch, it is still a little different from what xfs does in truncating-up case. As Eric mentioned, when an application fallocates 12k, writes 4k, and then truncates to 8k, on xfs there will be 12k allocated blocks left, but on ext4 with my patch there will be 8k allocated blocks left. The reason I think we may want to free any blocks beyond the truncate size is because there may be situations that applications are running out of space and want to shrink fallocated files to a smaller size that is still larger than the current i_size. Jiaying > >>>> If I go through that same sequence on xfs, I get 4k written / 8k unwritten: >>>> >>>> # xfs_bmap -vp testfile >>>> testfile: >>>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS >>>> 0: [0..7]: 2648750760..2648750767 3 (356066400..356066407) 8 00000 >>>> 1: [8..23]: 2648750768..2648750783 3 (356066408..356066423) 16 10000 >>>> >>>> size 8k: >>>> # ls -l testfile >>>> -rw-r--r-- 1 root root 8192 May 17 22:33 testfile >>>> >>>> and diskspace used 12k: >>>> # du -hc testfile >>>> 12K testfile >>>> 12K total >>>> >>>> I think this is a different result from ext4, either with or without your patch. >>>> >>>> On ext4 I get size 8k, but only the first 4k mapped, as you say. >>>> >>>> I don't recall when truncate is supposed to free fallocated blocks, and from what point? >>>> >>>> -Eric >>>> >>>>> Below is the proposed patch to fix the bug: >>>>> >>>>> ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). >>>>> >>>>> Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead >>>>> of ext4_truncate(inode) when it needs to truncate an inode so that >>>>> if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate >>>>> to a size larger than the inode's i_size, we will only truncate the blocks >>>>> beyond the specified truncate size instead of all of blocks beyond i_size. >>>>> >>>>> Signed-off-by: Jiaying Zhang <jiayingz@google.com> >>>>> >>>>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c >>>>> index 3424e82..3bfad57 100644 >>>>> --- a/fs/ext4/inode.c >>>>> +++ b/fs/ext4/inode.c >>>>> @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) >>>>> } >>>>> } >>>>> /* ext4_truncate will clear the flag */ >>>>> - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) >>>>> - ext4_truncate(inode); >>>>> + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { >>>>> + rc = vmtruncate(inode, attr->ia_size); >>>>> + if (rc) >>>>> + goto err_out; >>>>> + } >>>>> } >>>>> >>>>> if ((attr->ia_valid & ATTR_SIZE) && >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>>> the body of a message to majordomo@vger.kernel.org >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >>> >>> Cheers, Andreas >>> >>> >>> >>> >>> >>> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > Cheers, Andreas > > > > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 3424e82..3bfad57 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5347,8 +5347,11 @@ int ext4_setattr(struct dentry *dentry, struct iattr *attr) } } /* ext4_truncate will clear the flag */ - if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) - ext4_truncate(inode); + if ((ext4_test_inode_flag(inode, EXT4_INODE_EOFBLOCKS))) { + rc = vmtruncate(inode, attr->ia_size); + if (rc) + goto err_out; + } } if ((attr->ia_valid & ATTR_SIZE) &&
There is a bug in commit c8d46e41 "ext4: Add flag to files with blocks intentionally past EOF" that if we fallocate a file with FALLOC_FL_KEEP_SIZE flag and then ftruncate the file to a size larger than the file's i_size, any allocated but unwritten blocks will be freed but the file size is set to the size that ftruncate specifies. Here is a simple test to reproduce the problem: 1. fallocate a 12k size file with KEEP_SIZE flag 2. write the first 4k 3. ftruncate the file to 8k Then 'ls -l' shows that the i_size of the file becomes 8k but debugfs shows the file has only the first written block left. Below is the proposed patch to fix the bug: ext4: use vmtruncate() instead of ext4_truncate() in ext4_setattr(). Change ext4_setattr() to use vmtruncate(inode, attr->ia_size) instead of ext4_truncate(inode) when it needs to truncate an inode so that if the inode has EXT4_EOFBLOCKS_FL flag set and we are trying to truncate to a size larger than the inode's i_size, we will only truncate the blocks beyond the specified truncate size instead of all of blocks beyond i_size. Signed-off-by: Jiaying Zhang <jiayingz@google.com> -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html