diff mbox

[-V3] Fix sub-block zeroing for buffered writes into unwritten extents

Message ID 1240944653-4328-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com
State Superseded, archived
Headers show

Commit Message

Aneesh Kumar K.V April 28, 2009, 6:50 p.m. UTC
We need to mark the  buffer_head mapping prealloc space
as new during write_begin. Otherwise we don't zero out the
page cache content properly for a partial write. This will
cause file corruption with preallocation.

Also use block number -1 as the fake block number so that
unmap_underlying_metadata doesn't drop wrong buffer_head

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

---
 fs/ext4/inode.c |   11 ++++++++++-
 1 files changed, 10 insertions(+), 1 deletions(-)

Comments

Eric Sandeen April 28, 2009, 9:47 p.m. UTC | #1
Aneesh Kumar K.V wrote:
> We need to mark the  buffer_head mapping prealloc space
> as new during write_begin. Otherwise we don't zero out the
> page cache content properly for a partial write. This will
> cause file corruption with preallocation.
> 
> Also use block number -1 as the fake block number so that
> unmap_underlying_metadata doesn't drop wrong buffer_head
> 
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> 
> ---
>  fs/ext4/inode.c |   11 ++++++++++-
>  1 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index e91f978..0214389 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2318,11 +2318,20 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
>  			/* not enough space to reserve */
>  			return ret;
>  
> -		map_bh(bh_result, inode->i_sb, 0);
> +		map_bh(bh_result, inode->i_sb, -1);

This seems fine, though unrelated, isn't it?  But mapping delalloc
blocks to -1 temporarily rather than to 0 seems safer to me (could this
possibly be related to our low-block corruption cases?)

Oh, I guess this is for the unmap_underlying_metadata stuff, though I
don't know what that call is for in ext4, to be honest.  :)  At any rate
this should make it not findable there which is fine, I guess.

>  		set_buffer_new(bh_result);
>  		set_buffer_delay(bh_result);
>  	} else if (ret > 0) {
>  		bh_result->b_size = (ret << inode->i_blkbits);
> +		bh_result->b_bdev = inode->i_sb->s_bdev;
> +		bh->b_blocknr = -1;

Mingming pointed out on irc that this sets the blocknr to -1 for every
mapping we find, which is probably not what we want.  :)  But if it's an
actually (pre)allocated block, why do we set it to a fake number at all?

I guess it seems to me that we should be setting up a preallocated
block/bh just about like any other, with a block nr, bdev, etc when we
create it or look it up - but with BH_Unwritten as well to flag it as
such.  It may not actually matter but it just seems odd to me for it to
have a fake block nr.

If surrounding infrastructure still expects to call get_block each time
to split up an unwritten extent, ok for now to leave it unmapped, but
that needs work I think, as we mentioned on irc.

FWIW, setting it to -1 even under the if (buffer_unwritten()) test below
is probably redundant, I think it's already set that way from
alloc_page_buffers().

> +		/*
> +		 * With sub-block writes into unwritten extents
> +		 * we also need to mark the buffer as new so that
> +		 * the unwritten parts of the buffer gets correctly zeroed.
> +		 */
> +		if (buffer_unwritten(bh_result))
> +			set_buffer_new(bh_result);
>  		ret = 0;
>  	}
>  

This part still seems fine to me :)

-Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e91f978..0214389 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2318,11 +2318,20 @@  static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock,
 			/* not enough space to reserve */
 			return ret;
 
-		map_bh(bh_result, inode->i_sb, 0);
+		map_bh(bh_result, inode->i_sb, -1);
 		set_buffer_new(bh_result);
 		set_buffer_delay(bh_result);
 	} else if (ret > 0) {
 		bh_result->b_size = (ret << inode->i_blkbits);
+		bh_result->b_bdev = inode->i_sb->s_bdev;
+		bh->b_blocknr = -1;
+		/*
+		 * With sub-block writes into unwritten extents
+		 * we also need to mark the buffer as new so that
+		 * the unwritten parts of the buffer gets correctly zeroed.
+		 */
+		if (buffer_unwritten(bh_result))
+			set_buffer_new(bh_result);
 		ret = 0;
 	}