Patchwork Question on fallocate/ftruncate sequence

login
register
mail settings
Submitter Jiaying Zhang
Date Aug. 28, 2009, 6:42 p.m.
Message ID <5df78e1d0908281142r683b902ube65288df858695d@mail.gmail.com>
Download mbox | patch
Permalink /patch/32401/
State New
Headers show

Comments

Jiaying Zhang - Aug. 28, 2009, 6:42 p.m.
Sorry for joining the conversation late. Frank and I had a discussion on this
problem this morning. We wonder whether we can just add the checking
on whether i_blocks is consistent with i_size during truncate. Here is the
patch I tried and it seems to have solved the problem. I.e., the space reserved
in fallocate(KEEP_SIZE) is now freed in the next truncate.


One thing I am not sure is whether adding this check in inode_setattr may
cause any problem in other cases. I saw inode_setattr is called at many
places as well as during ftruncate. Any opinions on this proposed solution?

Jiaying

On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote:
> On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote:
>> On Jul 23, 2009  11:05 -0700, Frank Mayhar wrote:
>> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote:
>> > > Sorry I skimmed to fast, skipped over the fsck part.  But:
>> > >
>> > > # touch /mnt/test/testfile
>> > > # /root/fallocate -n -l 16m /mnt/test/testfile
>> > > # ls -l /mnt/test/testfile
>> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile
>> > > # du -h /mnt/test/testfile
>> > > 16M       /mnt/test/testfile
>> > >
>> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I
>> > > missing something else?
>> >
>> > I was taking Andreas' word for it but now that you mention it, I see the
>> > same thing.  Andreas, did you have a specific case in mind?
>>
>> Ted and I had discussed this in the past, maybe he fixed e2fsck to not
>> change the file size when there are blocks allocated beyond EOF.  Having
>> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a
>> better decision on whether the size or the blocks count are more correct.
>> I'm not dead set on it.
>
> For the moment I'm going to table the e2fsck change and make the flag
> memory-only.  It'll be easy enough to change this if and when you guys
> come to an agreement about what is right.
>
> As for the flag itself, I'll pick a bit that doesn't conflict with
> anything else and leave reconciling the already-conflicting bits to you
> guys.
> --
> Frank Mayhar <fmayhar@google.com>
> Google, Inc.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Aug. 28, 2009, 7:40 p.m.
On Aug 28, 2009  11:42 -0700, Jiaying Zhang wrote:
> Sorry for joining the conversation late. Frank and I had a discussion on this
> problem this morning. We wonder whether we can just add the checking
> on whether i_blocks is consistent with i_size during truncate. Here is the
> patch I tried and it seems to have solved the problem. I.e., the space
> reserved in fallocate(KEEP_SIZE) is now freed in the next truncate.
> 
> --- git-linux/fs/attr.c	2009-05-20 18:05:55.000000000 -0700
> +++ linux-2.6.30.5/fs/attr.c	2009-08-27 14:34:48.000000000 -0700
> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode,
>  	unsigned int ia_valid = attr->ia_valid;
> 
>  	if (ia_valid & ATTR_SIZE &&
> -	    attr->ia_size != i_size_read(inode)) {
> +	    (attr->ia_size != i_size_read(inode) ||
> +            attr->ia_size >> 9 < inode->i_blocks - 1)) {
>  		int error = vmtruncate(inode, attr->ia_size);
>  		if (error)
>  			return error;

This isn't really correct, however, because i_blocks also contains
non-data blocks (indirect/index, EA, etc) blocks, so even with small
files with ACLs i_blocks may always be larger than ia_size >> 9, and
for ext2/3 at least this will ALWAYS be true for files > 48kB in size.

> On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote:
> > On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote:
> >> On Jul 23, 2009  11:05 -0700, Frank Mayhar wrote:
> >> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote:
> >> > > Sorry I skimmed to fast, skipped over the fsck part.  But:
> >> > >
> >> > > # touch /mnt/test/testfile
> >> > > # /root/fallocate -n -l 16m /mnt/test/testfile
> >> > > # ls -l /mnt/test/testfile
> >> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile
> >> > > # du -h /mnt/test/testfile
> >> > > 16M       /mnt/test/testfile
> >> > >
> >> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I
> >> > > missing something else?
> >> >
> >> > I was taking Andreas' word for it but now that you mention it, I see the
> >> > same thing.  Andreas, did you have a specific case in mind?
> >>
> >> Ted and I had discussed this in the past, maybe he fixed e2fsck to not
> >> change the file size when there are blocks allocated beyond EOF.  Having
> >> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a
> >> better decision on whether the size or the blocks count are more correct.
> >> I'm not dead set on it.
> >
> > For the moment I'm going to table the e2fsck change and make the flag
> > memory-only.  It'll be easy enough to change this if and when you guys
> > come to an agreement about what is right.
> >
> > As for the flag itself, I'll pick a bit that doesn't conflict with
> > anything else and leave reconciling the already-conflicting bits to you
> > guys.
> > --
> > Frank Mayhar <fmayhar@google.com>
> > Google, Inc.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiaying Zhang - Aug. 28, 2009, 9:44 p.m.
On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote:
> On Aug 28, 2009  11:42 -0700, Jiaying Zhang wrote:
>> Sorry for joining the conversation late. Frank and I had a discussion on this
>> problem this morning. We wonder whether we can just add the checking
>> on whether i_blocks is consistent with i_size during truncate. Here is the
>> patch I tried and it seems to have solved the problem. I.e., the space
>> reserved in fallocate(KEEP_SIZE) is now freed in the next truncate.
>>
>> --- git-linux/fs/attr.c       2009-05-20 18:05:55.000000000 -0700
>> +++ linux-2.6.30.5/fs/attr.c  2009-08-27 14:34:48.000000000 -0700
>> @@ -68,7 +68,8 @@ int inode_setattr(struct inode * inode,
>>       unsigned int ia_valid = attr->ia_valid;
>>
>>       if (ia_valid & ATTR_SIZE &&
>> -         attr->ia_size != i_size_read(inode)) {
>> +         (attr->ia_size != i_size_read(inode) ||
>> +            attr->ia_size >> 9 < inode->i_blocks - 1)) {
>>               int error = vmtruncate(inode, attr->ia_size);
>>               if (error)
>>                       return error;
>
> This isn't really correct, however, because i_blocks also contains
> non-data blocks (indirect/index, EA, etc) blocks, so even with small
> files with ACLs i_blocks may always be larger than ia_size >> 9, and
> for ext2/3 at least this will ALWAYS be true for files > 48kB in size.

I see. I guess we need to use a special flag then. Or is there any
other suggestions? I also have another question related to this
problem. Why those fallocated blocks are not marked as preallocated
blocks that will then be automatically freed in ext4_release_file?

Jiaying

>
>> On Thu, Jul 23, 2009 at 3:46 PM, Frank Mayhar<fmayhar@google.com> wrote:
>> > On Thu, 2009-07-23 at 15:56 -0600, Andreas Dilger wrote:
>> >> On Jul 23, 2009  11:05 -0700, Frank Mayhar wrote:
>> >> > On Thu, 2009-07-23 at 12:00 -0500, Eric Sandeen wrote:
>> >> > > Sorry I skimmed to fast, skipped over the fsck part.  But:
>> >> > >
>> >> > > # touch /mnt/test/testfile
>> >> > > # /root/fallocate -n -l 16m /mnt/test/testfile
>> >> > > # ls -l /mnt/test/testfile
>> >> > > -rw-r--r-- 1 root root 0 Jul 23 12:13 /mnt/test/testfile
>> >> > > # du -h /mnt/test/testfile
>> >> > > 16M       /mnt/test/testfile
>> >> > >
>> >> > > there doesn't seem to be a problem in fsck w/ block past EOF, or am I
>> >> > > missing something else?
>> >> >
>> >> > I was taking Andreas' word for it but now that you mention it, I see the
>> >> > same thing.  Andreas, did you have a specific case in mind?
>> >>
>> >> Ted and I had discussed this in the past, maybe he fixed e2fsck to not
>> >> change the file size when there are blocks allocated beyond EOF.  Having
>> >> a flag wouldn't be a terrible idea, IMHO, so that e2fsck can make a
>> >> better decision on whether the size or the blocks count are more correct.
>> >> I'm not dead set on it.
>> >
>> > For the moment I'm going to table the e2fsck change and make the flag
>> > memory-only.  It'll be easy enough to change this if and when you guys
>> > come to an agreement about what is right.
>> >
>> > As for the flag itself, I'll pick a bit that doesn't conflict with
>> > anything else and leave reconciling the already-conflicting bits to you
>> > guys.
>> > --
>> > Frank Mayhar <fmayhar@google.com>
>> > Google, Inc.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger - Aug. 28, 2009, 10:14 p.m.
On Aug 28, 2009  14:44 -0700, Jiaying Zhang wrote:
> On Fri, Aug 28, 2009 at 12:40 PM, Andreas Dilger<adilger@sun.com> wrote:
> > This isn't really correct, however, because i_blocks also contains
> > non-data blocks (indirect/index, EA, etc) blocks, so even with small
> > files with ACLs i_blocks may always be larger than ia_size >> 9, and
> > for ext2/3 at least this will ALWAYS be true for files > 48kB in size.
> 
> I see. I guess we need to use a special flag then. Or is there any
> other suggestions? I also have another question related to this
> problem. Why those fallocated blocks are not marked as preallocated
> blocks that will then be automatically freed in ext4_release_file?

Because fallocate() means "persistent allocation on disk", not "in memory
preallocation".  The "in memory" preallocation already happens in ext4,
and it is released when the inode is cleaned up.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

--- git-linux/fs/attr.c	2009-05-20 18:05:55.000000000 -0700
+++ linux-2.6.30.5/fs/attr.c	2009-08-27 14:34:48.000000000 -0700
@@ -68,7 +68,8 @@  int inode_setattr(struct inode * inode,
 	unsigned int ia_valid = attr->ia_valid;

 	if (ia_valid & ATTR_SIZE &&
-	    attr->ia_size != i_size_read(inode)) {
+	    (attr->ia_size != i_size_read(inode) ||
+            attr->ia_size >> 9 < inode->i_blocks - 1)) {
 		int error = vmtruncate(inode, attr->ia_size);
 		if (error)
 			return error;