diff mbox

ext4: Do not zeroout uninitialized extents beyond i_size

Message ID 1270833748-14381-1-git-send-email-dmonakhov@openvz.org
State Superseded, archived
Delegated to: Theodore Ts'o
Headers show

Commit Message

Dmitry Monakhov April 9, 2010, 5:22 p.m. UTC
Zerrout trick allow us to optimize cases where it is more reasonable
to explicitly zeroout extent and mark it as initialized instead of
splitting to several small ones.
But this optimization is not acceptable is extent is beyond i_size
Because it is not possible to have initialized blocks after i_size.
Fsck treat this as incorrect inode size.

BUG# 15742

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/extents.c |   49 ++++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 38 insertions(+), 11 deletions(-)

Comments

Aneesh Kumar K.V April 28, 2010, 4:40 a.m. UTC | #1
On Fri,  9 Apr 2010 21:22:28 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> Zerrout trick allow us to optimize cases where it is more reasonable
> to explicitly zeroout extent and mark it as initialized instead of
> splitting to several small ones.
> But this optimization is not acceptable is extent is beyond i_size
> Because it is not possible to have initialized blocks after i_size.
> Fsck treat this as incorrect inode size.
> 

With commit c8d46e41bc744c8fa0092112af3942fcd46c8b18 if we set
EXT4_EOFBLOCKS_FL we should be able to have blocks beyond i_size.
May be the zero out path should set the flag instead of doing all these
changes. Zero-out is already complex with all the ENOSPC related
consideration. I guess we should try to keep it simple.

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Monakhov April 28, 2010, 7:38 a.m. UTC | #2
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> On Fri,  9 Apr 2010 21:22:28 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
>> Zerrout trick allow us to optimize cases where it is more reasonable
>> to explicitly zeroout extent and mark it as initialized instead of
>> splitting to several small ones.
>> But this optimization is not acceptable is extent is beyond i_size
>> Because it is not possible to have initialized blocks after i_size.
>> Fsck treat this as incorrect inode size.
>> 
>
> With commit c8d46e41bc744c8fa0092112af3942fcd46c8b18 if we set
> EXT4_EOFBLOCKS_FL we should be able to have blocks beyond i_size.
> May be the zero out path should set the flag instead of doing all these
> changes. Zero-out is already complex with all the ENOSPC related
> consideration. I guess we should try to keep it simple.
For initialized extent beyond i_size? I've check fsck and seems that
is truly possible. So this optimization allow us to avoid some 
bad EIO situations. But we have to rework ext_get_blocks( ,create == 1)
to clear EXT4_EOFBLOCKS_FL if last block of latest_extent is requested.
I'll handle this.
>
> -aneesh
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V May 27, 2010, 5:19 p.m. UTC | #3
On Wed, 28 Apr 2010 11:38:58 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> writes:
> 
> > On Fri,  9 Apr 2010 21:22:28 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> >> Zerrout trick allow us to optimize cases where it is more reasonable
> >> to explicitly zeroout extent and mark it as initialized instead of
> >> splitting to several small ones.
> >> But this optimization is not acceptable is extent is beyond i_size
> >> Because it is not possible to have initialized blocks after i_size.
> >> Fsck treat this as incorrect inode size.
> >> 
> >
> > With commit c8d46e41bc744c8fa0092112af3942fcd46c8b18 if we set
> > EXT4_EOFBLOCKS_FL we should be able to have blocks beyond i_size.
> > May be the zero out path should set the flag instead of doing all these
> > changes. Zero-out is already complex with all the ENOSPC related
> > consideration. I guess we should try to keep it simple.
> For initialized extent beyond i_size? I've check fsck and seems that
> is truly possible. So this optimization allow us to avoid some 
> bad EIO situations. But we have to rework ext_get_blocks( ,create == 1)
> to clear EXT4_EOFBLOCKS_FL if last block of latest_extent is requested.
> I'll handle this.
> >


I thought this patch is going to reworked to use EOFBLOCKS_FL. But i see
Ted sent a pull request with the this patch. Did I miss something ?

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Monakhov June 3, 2010, 8:32 a.m. UTC | #4
"Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> On Wed, 28 Apr 2010 11:38:58 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
>> "Aneesh Kumar K. V" <aneesh.kumar@linux.vnet.ibm.com> writes:
>> 
>> > On Fri,  9 Apr 2010 21:22:28 +0400, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
>> >> Zerrout trick allow us to optimize cases where it is more reasonable
>> >> to explicitly zeroout extent and mark it as initialized instead of
>> >> splitting to several small ones.
>> >> But this optimization is not acceptable is extent is beyond i_size
>> >> Because it is not possible to have initialized blocks after i_size.
>> >> Fsck treat this as incorrect inode size.
>> >> 
>> >
>> > With commit c8d46e41bc744c8fa0092112af3942fcd46c8b18 if we set
>> > EXT4_EOFBLOCKS_FL we should be able to have blocks beyond i_size.
>> > May be the zero out path should set the flag instead of doing all these
>> > changes. Zero-out is already complex with all the ENOSPC related
>> > consideration. I guess we should try to keep it simple.
>> For initialized extent beyond i_size? I've check fsck and seems that
>> is truly possible. So this optimization allow us to avoid some 
>> bad EIO situations. But we have to rework ext_get_blocks( ,create == 1)
>> to clear EXT4_EOFBLOCKS_FL if last block of latest_extent is requested.
>> I'll handle this.
>> >
>
>
> I thought this patch is going to reworked to use EOFBLOCKS_FL. But i see
> Ted sent a pull request with the this patch. Did I miss something ?
Sorry for a long replay.
As far as i can see EXT4_EOFBLOCKS_FL flag is now allowed for
uninitialized extents only in e2fslib. So we have to change e2fslib
first and then revert the kernel zeroout restriction logic.
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o June 8, 2010, 9:46 p.m. UTC | #5
On Thu, Jun 03, 2010 at 12:32:09PM +0400, Dmitry Monakhov wrote:
> >
> > I thought this patch is going to reworked to use EOFBLOCKS_FL. But i see
> > Ted sent a pull request with the this patch. Did I miss something ?

I added it to the patch queue a week before you sent your comment that
we should rework this with EOFBLOCKS_FL, and I forgot to pull the
patch back.

> As far as i can see EXT4_EOFBLOCKS_FL flag is now allowed for
> uninitialized extents only in e2fslib. So we have to change e2fslib
> first and then revert the kernel zeroout restriction logic.

I think I'm missing something.  What change do you think is needed in
e2fsprogs?

						- Ted



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 8bdee27..bdf94f3 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -2631,11 +2631,15 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 	struct ext4_extent *ex2 = NULL;
 	struct ext4_extent *ex3 = NULL;
 	struct ext4_extent_header *eh;
-	ext4_lblk_t ee_block;
+	ext4_lblk_t ee_block, eof_block;
 	unsigned int allocated, ee_len, depth;
 	ext4_fsblk_t newblock;
 	int err = 0;
 	int ret = 0;
+	int may_zeroout;
+	ext_debug("ext4_ext_convert_to_initialized: inode %lu, logical"
+		"block %llu, max_blocks %u ",
+		inode->i_ino, (unsigned long long)iblock, max_blocks);
 
 	depth = ext_depth(inode);
 	eh = path[depth].p_hdr;
@@ -2644,16 +2648,25 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 	ee_len = ext4_ext_get_actual_len(ex);
 	allocated = ee_len - (iblock - ee_block);
 	newblock = iblock - ee_block + ext_pblock(ex);
+	eof_block = (inode->i_size + inode->i_sb->s_blocksize - 1) >>
+		inode->i_sb->s_blocksize_bits;
 	ex2 = ex;
 	orig_ex.ee_block = ex->ee_block;
 	orig_ex.ee_len   = cpu_to_le16(ee_len);
 	ext4_ext_store_pblock(&orig_ex, ext_pblock(ex));
 
+	/*
+	 * It is safe to convert extent to initialized via explicit
+	 * zeroout only if extent is fully insde i_size or new_size.
+	 */
+	may_zeroout = ee_block + ee_len <= iblock + max_blocks ||
+		ee_block + ee_len <= eof_block;
+
 	err = ext4_ext_get_access(handle, inode, path + depth);
 	if (err)
 		goto out;
 	/* If extent has less than 2*EXT4_EXT_ZERO_LEN zerout directly */
-	if (ee_len <= 2*EXT4_EXT_ZERO_LEN) {
+	if (ee_len <= 2*EXT4_EXT_ZERO_LEN && may_zeroout) {
 		err =  ext4_ext_zeroout(inode, &orig_ex);
 		if (err)
 			goto fix_extent_len;
@@ -2684,7 +2697,7 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 	if (allocated > max_blocks) {
 		unsigned int newdepth;
 		/* If extent has less than EXT4_EXT_ZERO_LEN zerout directly */
-		if (allocated <= EXT4_EXT_ZERO_LEN) {
+		if (allocated <= EXT4_EXT_ZERO_LEN && may_zeroout) {
 			/*
 			 * iblock == ee_block is handled by the zerouout
 			 * at the beginning.
@@ -2760,7 +2773,7 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 		ex3->ee_len = cpu_to_le16(allocated - max_blocks);
 		ext4_ext_mark_uninitialized(ex3);
 		err = ext4_ext_insert_extent(handle, inode, path, ex3, 0);
-		if (err == -ENOSPC) {
+		if (err == -ENOSPC && may_zeroout) {
 			err =  ext4_ext_zeroout(inode, &orig_ex);
 			if (err)
 				goto fix_extent_len;
@@ -2784,8 +2797,11 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 		 * update the extent length after successful insert of the
 		 * split extent
 		 */
-		orig_ex.ee_len = cpu_to_le16(ee_len -
-						ext4_ext_get_actual_len(ex3));
+		ee_len -= ext4_ext_get_actual_len(ex3);
+		orig_ex.ee_len = cpu_to_le16(ee_len);
+		may_zeroout = ee_block + ee_len <= iblock + max_blocks ||
+			ee_block + ee_len <= eof_block;
+
 		depth = newdepth;
 		ext4_ext_drop_refs(path);
 		path = ext4_ext_find_extent(inode, iblock, path);
@@ -2809,7 +2825,7 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 		 * otherwise give the extent a chance to merge to left
 		 */
 		if (le16_to_cpu(orig_ex.ee_len) <= EXT4_EXT_ZERO_LEN &&
-							iblock != ee_block) {
+			iblock != ee_block && may_zeroout) {
 			err =  ext4_ext_zeroout(inode, &orig_ex);
 			if (err)
 				goto fix_extent_len;
@@ -2878,7 +2894,7 @@  static int ext4_ext_convert_to_initialized(handle_t *handle,
 	goto out;
 insert:
 	err = ext4_ext_insert_extent(handle, inode, path, &newex, 0);
-	if (err == -ENOSPC) {
+	if (err == -ENOSPC && may_zeroout) {
 		err =  ext4_ext_zeroout(inode, &orig_ex);
 		if (err)
 			goto fix_extent_len;
@@ -2938,14 +2954,16 @@  static int ext4_split_unwritten_extents(handle_t *handle,
 	struct ext4_extent *ex2 = NULL;
 	struct ext4_extent *ex3 = NULL;
 	struct ext4_extent_header *eh;
-	ext4_lblk_t ee_block;
+	ext4_lblk_t ee_block, eof_block;
 	unsigned int allocated, ee_len, depth;
 	ext4_fsblk_t newblock;
 	int err = 0;
+	int may_zeroout;
 
 	ext_debug("ext4_split_unwritten_extents: inode %lu,"
 		  "iblock %llu, max_blocks %u\n", inode->i_ino,
 		  (unsigned long long)iblock, max_blocks);
+
 	depth = ext_depth(inode);
 	eh = path[depth].p_hdr;
 	ex = path[depth].p_ext;
@@ -2953,10 +2971,19 @@  static int ext4_split_unwritten_extents(handle_t *handle,
 	ee_len = ext4_ext_get_actual_len(ex);
 	allocated = ee_len - (iblock - ee_block);
 	newblock = iblock - ee_block + ext_pblock(ex);
+	eof_block = (inode->i_size + inode->i_sb->s_blocksize - 1) >>
+		inode->i_sb->s_blocksize_bits;
+
 	ex2 = ex;
 	orig_ex.ee_block = ex->ee_block;
 	orig_ex.ee_len   = cpu_to_le16(ee_len);
 	ext4_ext_store_pblock(&orig_ex, ext_pblock(ex));
+	/*
+	 * It is safe to convert extent to initialized via explicit
+	 * zeroout only if extent is fully insde i_size or new_size.
+	 */
+	may_zeroout = ee_block + ee_len <= iblock + max_blocks ||
+		ee_block + ee_len <= eof_block;
 
 	/*
  	 * If the uninitialized extent begins at the same logical
@@ -2992,7 +3019,7 @@  static int ext4_split_unwritten_extents(handle_t *handle,
 		ex3->ee_len = cpu_to_le16(allocated - max_blocks);
 		ext4_ext_mark_uninitialized(ex3);
 		err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
-		if (err == -ENOSPC) {
+		if (err == -ENOSPC && may_zeroout) {
 			err =  ext4_ext_zeroout(inode, &orig_ex);
 			if (err)
 				goto fix_extent_len;
@@ -3063,7 +3090,7 @@  static int ext4_split_unwritten_extents(handle_t *handle,
 	goto out;
 insert:
 	err = ext4_ext_insert_extent(handle, inode, path, &newex, flags);
-	if (err == -ENOSPC) {
+	if (err == -ENOSPC && may_zeroout) {
 		err =  ext4_ext_zeroout(inode, &orig_ex);
 		if (err)
 			goto fix_extent_len;