diff mbox

[RFC] ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl

Message ID 1235451952-2726-5-git-send-email-tytso@mit.edu
State Accepted, archived
Headers show

Commit Message

Theodore Ts'o Feb. 24, 2009, 5:05 a.m. UTC
Add an ioctl which forces all of the delay allocated blocks to be
allocated.  This also provides a function ext4_alloc_da_blocks() which
will be used by the following commits to force files to be fully
allocated to preserve application-expected ext3 behaviour.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---
 fs/ext4/ext4.h  |    2 ++
 fs/ext4/inode.c |   42 ++++++++++++++++++++++++++++++++++++++++++
 fs/ext4/ioctl.c |   13 +++++++++++++
 3 files changed, 57 insertions(+), 0 deletions(-)

Comments

Aneesh Kumar K.V Feb. 24, 2009, 9:38 a.m. UTC | #1
On Tue, Feb 24, 2009 at 12:05:50AM -0500, Theodore Ts'o wrote:
> diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
> index 22dd29f..0d59f5f 100644
> --- a/fs/ext4/ioctl.c
> +++ b/fs/ext4/ioctl.c
> @@ -262,6 +262,19 @@ setversion_out:
>  		return err;
>  	}
> 
> +	case EXT4_IOC_ALLOC_DA_BLKS:
> +	{
> +		int err;
> +		if (!is_owner_or_cap(inode))
> +			return -EACCES;
> +
> +		err = mnt_want_write(filp->f_path.mnt);
> +		if (err)
> +			return err;
> +		err = ext4_alloc_da_blocks(inode);
> +		return err;
> +	}
> +

I guess we need a  matching
	mnt_drop_write();

>  	default:
>  		return -ENOTTY;
>  	}
> -- 
> 1.5.6.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger Feb. 24, 2009, 10:11 a.m. UTC | #2
On Feb 24, 2009  00:05 -0500, Theodore Ts'o wrote:
> +#define EXT4_IOC_ALLOC_DA_BLKS		_IO('f', 10)
>   /* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */

Please don't use ioctl number 10 - it was used by an early
version of the FIEMAP ioctl that we still have in use in Lustre
filesystems.  Not that there is much chance of a collision, but
it is also easily enough avoided.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Feb. 24, 2009, 1:14 p.m. UTC | #3
On Tue, Feb 24, 2009 at 03:08:32PM +0530, Aneesh Kumar K.V wrote:
> 
> I guess we need a  matching
> 	mnt_drop_write();
> 

Oops.  Nice catch, thanks.  I'll fixed it up in the ext4 patch queue.

       	    	   	    	       	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Feb. 24, 2009, 1:16 p.m. UTC | #4
On Tue, Feb 24, 2009 at 03:11:05AM -0700, Andreas Dilger wrote:
> On Feb 24, 2009  00:05 -0500, Theodore Ts'o wrote:
> > +#define EXT4_IOC_ALLOC_DA_BLKS		_IO('f', 10)
> >   /* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */
> 
> Please don't use ioctl number 10 - it was used by an early
> version of the FIEMAP ioctl that we still have in use in Lustre
> filesystems.  Not that there is much chance of a collision, but
> it is also easily enough avoided.

Ok, I'll also add a note that ioctl number 10 should be reserved.

    	      	    	      	    	   - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 11, 2009, 10:41 p.m. UTC | #5
Theodore Ts'o wrote:
> Add an ioctl which forces all of the delay allocated blocks to be
> allocated.  This also provides a function ext4_alloc_da_blocks() which
> will be used by the following commits to force files to be fully
> allocated to preserve application-expected ext3 behaviour.
> 

Is it worth checking whether a) the file has delalloc blocks, and/or b)
whether the mapping is dirty before we spin off a filemap_flush?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V March 12, 2009, 5:29 a.m. UTC | #6
On Wed, Mar 11, 2009 at 05:41:05PM -0500, Eric Sandeen wrote:
> Theodore Ts'o wrote:
> > Add an ioctl which forces all of the delay allocated blocks to be
> > allocated.  This also provides a function ext4_alloc_da_blocks() which
> > will be used by the following commits to force files to be fully
> > allocated to preserve application-expected ext3 behaviour.
> > 
> 
> Is it worth checking whether a) the file has delalloc blocks, and/or b)
> whether the mapping is dirty before we spin off a filemap_flush?
> 
+int ext4_alloc_da_blocks(struct inode *inode)
+{
+       if (!EXT4_I(inode)->i_reserved_data_blocks &&
+           !EXT4_I(inode)->i_reserved_meta_blocks)
+               return 0;

This check test does (a). Since the ioctl is to force allocation of
delayed allocated blocks i guess (a) is enough because we don't want to
cause a filemap_flush when we don't have any delayed allocated blocks
but have dirty pages around.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 12, 2009, 4:32 p.m. UTC | #7
Aneesh Kumar K.V wrote:
> On Wed, Mar 11, 2009 at 05:41:05PM -0500, Eric Sandeen wrote:
>> Theodore Ts'o wrote:
>>> Add an ioctl which forces all of the delay allocated blocks to be
>>> allocated.  This also provides a function ext4_alloc_da_blocks() which
>>> will be used by the following commits to force files to be fully
>>> allocated to preserve application-expected ext3 behaviour.
>>>
>> Is it worth checking whether a) the file has delalloc blocks, and/or b)
>> whether the mapping is dirty before we spin off a filemap_flush?
>>
> +int ext4_alloc_da_blocks(struct inode *inode)
> +{
> +       if (!EXT4_I(inode)->i_reserved_data_blocks &&
> +           !EXT4_I(inode)->i_reserved_meta_blocks)
> +               return 0;
> 
> This check test does (a). Since the ioctl is to force allocation of
> delayed allocated blocks i guess (a) is enough because we don't want to
> cause a filemap_flush when we don't have any delayed allocated blocks
> but have dirty pages around.

and b) is as simple as

if (!mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY))
	return 0;

I think?

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o March 12, 2009, 8:04 p.m. UTC | #8
On Thu, Mar 12, 2009 at 11:32:18AM -0500, Eric Sandeen wrote:
> Aneesh Kumar K.V wrote:
> > On Wed, Mar 11, 2009 at 05:41:05PM -0500, Eric Sandeen wrote:
> >> Theodore Ts'o wrote:
> >>> Add an ioctl which forces all of the delay allocated blocks to be
> >>> allocated.  This also provides a function ext4_alloc_da_blocks() which
> >>> will be used by the following commits to force files to be fully
> >>> allocated to preserve application-expected ext3 behaviour.
> >>>
> >> Is it worth checking whether a) the file has delalloc blocks, and/or b)
> >> whether the mapping is dirty before we spin off a filemap_flush?
> >>
> > +int ext4_alloc_da_blocks(struct inode *inode)
> > +{
> > +       if (!EXT4_I(inode)->i_reserved_data_blocks &&
> > +           !EXT4_I(inode)->i_reserved_meta_blocks)
> > +               return 0;
> > 
> > This check test does (a). Since the ioctl is to force allocation of
> > delayed allocated blocks i guess (a) is enough because we don't want to
> > cause a filemap_flush when we don't have any delayed allocated blocks
> > but have dirty pages around.
> 
> and b) is as simple as
> 
> if (!mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY))
> 	return 0;

Yeah, but (b) isn't necessary; if there are some delayed allocation
blocks, by definition there must be some dirty pages, right?

	   	      	    	    	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen March 12, 2009, 8:05 p.m. UTC | #9
Theodore Tso wrote:
> On Thu, Mar 12, 2009 at 11:32:18AM -0500, Eric Sandeen wrote:
>> Aneesh Kumar K.V wrote:
>>> On Wed, Mar 11, 2009 at 05:41:05PM -0500, Eric Sandeen wrote:
>>>> Theodore Ts'o wrote:
>>>>> Add an ioctl which forces all of the delay allocated blocks to be
>>>>> allocated.  This also provides a function ext4_alloc_da_blocks() which
>>>>> will be used by the following commits to force files to be fully
>>>>> allocated to preserve application-expected ext3 behaviour.
>>>>>
>>>> Is it worth checking whether a) the file has delalloc blocks, and/or b)
>>>> whether the mapping is dirty before we spin off a filemap_flush?
>>>>
>>> +int ext4_alloc_da_blocks(struct inode *inode)
>>> +{
>>> +       if (!EXT4_I(inode)->i_reserved_data_blocks &&
>>> +           !EXT4_I(inode)->i_reserved_meta_blocks)
>>> +               return 0;
>>>
>>> This check test does (a). Since the ioctl is to force allocation of
>>> delayed allocated blocks i guess (a) is enough because we don't want to
>>> cause a filemap_flush when we don't have any delayed allocated blocks
>>> but have dirty pages around.
>> and b) is as simple as
>>
>> if (!mapping_tagged(inode->i_mapping, PAGECACHE_TAG_DIRTY))
>> 	return 0;
> 
> Yeah, but (b) isn't necessary; if there are some delayed allocation
> blocks, by definition there must be some dirty pages, right?
> 
> 	   	      	    	    	 - Ted

oh, heh.  Right you are :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9840cad..626bda7 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -316,6 +316,7 @@  struct ext4_new_group_data {
 #define EXT4_IOC_GROUP_EXTEND		_IOW('f', 7, unsigned long)
 #define EXT4_IOC_GROUP_ADD		_IOW('f', 8, struct ext4_new_group_input)
 #define EXT4_IOC_MIGRATE		_IO('f', 9)
+#define EXT4_IOC_ALLOC_DA_BLKS		_IO('f', 10)
  /* note ioctl 11 reserved for filesystem-independent FIEMAP ioctl */
 
 /*
@@ -1094,6 +1095,7 @@  extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks);
 extern int ext4_block_truncate_page(handle_t *handle,
 		struct address_space *mapping, loff_t from);
 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+extern int ext4_alloc_da_blocks(struct inode *inode);
 
 /* ioctl.c */
 extern long ext4_ioctl(struct file *, unsigned int, unsigned long);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 1216fb9..c66af1c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2807,6 +2807,48 @@  out:
 	return;
 }
 
+/*
+ * Force all delayed allocation blocks to be allocated for a given inode.
+ */
+int ext4_alloc_da_blocks(struct inode *inode)
+{
+	if (!EXT4_I(inode)->i_reserved_data_blocks &&
+	    !EXT4_I(inode)->i_reserved_meta_blocks)
+		return 0;
+
+	/*
+	 * We do something simple for now.  The filemap_flush() will
+	 * also start triggering a write of the data blocks, which is
+	 * not strictly speaking necessary (and for users of
+	 * laptop_mode, not even desirable).  However, to do otherwise
+	 * would require replicating code paths in:
+	 * 
+	 * ext4_da_writepages() ->
+	 *    write_cache_pages() ---> (via passed in callback function)
+	 *        __mpage_da_writepage() -->
+	 *           mpage_add_bh_to_extent()
+	 *           mpage_da_map_blocks()
+	 *
+	 * The problem is that write_cache_pages(), located in
+	 * mm/page-writeback.c, marks pages clean in preparation for
+	 * doing I/O, which is not desirable if we're not planning on
+	 * doing I/O at all.
+	 *
+	 * We could call write_cache_pages(), and then redirty all of
+	 * the pages by calling redirty_page_for_writeback() but that
+	 * would be ugly in the extreme.  So instead we would need to
+	 * replicate parts of the code in the above functions,
+	 * simplifying them becuase we wouldn't actually intend to
+	 * write out the pages, but rather only collect contiguous
+	 * logical block extents, call the multi-block allocator, and
+	 * then update the buffer heads with the block allocations.
+	 * 
+	 * For now, though, we'll cheat by calling filemap_flush(),
+	 * which will map the blocks, and start the I/O, but not
+	 * actually wait for the I/O to complete.
+	 */
+	return filemap_flush(inode->i_mapping);
+}
 
 /*
  * bmap() is special.  It gets used by applications such as lilo and by
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index 22dd29f..0d59f5f 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -262,6 +262,19 @@  setversion_out:
 		return err;
 	}
 
+	case EXT4_IOC_ALLOC_DA_BLKS:
+	{
+		int err;
+		if (!is_owner_or_cap(inode))
+			return -EACCES;
+
+		err = mnt_want_write(filp->f_path.mnt);
+		if (err)
+			return err;
+		err = ext4_alloc_da_blocks(inode);
+		return err;
+	}
+
 	default:
 		return -ENOTTY;
 	}