Message ID | 1240980441-8105-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com |
---|---|
State | Superseded, archived |
Headers | show |
Aneesh Kumar K.V wrote: > We need to mark the buffer_head mapping prealloc space > as new during write_begin. Otherwise we don't zero out the > page cache content properly for a partial write. This will > cause file corruption with preallocation. > > Also use block number -1 as the fake block number so that > unmap_underlying_metadata doesn't drop wrong buffer_head > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > > --- > fs/ext4/inode.c | 10 ++++++++++ > 1 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index e91f978..12dcfab 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, > set_buffer_delay(bh_result); > } else if (ret > 0) { > bh_result->b_size = (ret << inode->i_blkbits); > + /* > + * With sub-block writes into unwritten extents > + * we also need to mark the buffer as new so that > + * the unwritten parts of the buffer gets correctly zeroed. > + */ > + if (buffer_unwritten(bh_result)) { > + bh_result->b_bdev = inode->i_sb->s_bdev; > + set_buffer_new(bh_result); > + bh_result->b_blocknr = -1; > + } > ret = 0; > } > Ok, I guess this seems like the safest approach. Long term we should look really hard at the state & block nr of these buffer heads, but I agree that keeping the changes restricted to the preallocation path for now is safest. -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 2009-04-29 at 08:59 -0500, Eric Sandeen wrote: > Aneesh Kumar K.V wrote: > > We need to mark the buffer_head mapping prealloc space > > as new during write_begin. Otherwise we don't zero out the > > page cache content properly for a partial write. This will > > cause file corruption with preallocation. > > > > Also use block number -1 as the fake block number so that > > unmap_underlying_metadata doesn't drop wrong buffer_head > > > > Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> > > > > --- > > fs/ext4/inode.c | 10 ++++++++++ > > 1 files changed, 10 insertions(+), 0 deletions(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index e91f978..12dcfab 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, > > set_buffer_delay(bh_result); > > } else if (ret > 0) { > > bh_result->b_size = (ret << inode->i_blkbits); > > + /* > > + * With sub-block writes into unwritten extents > > + * we also need to mark the buffer as new so that > > + * the unwritten parts of the buffer gets correctly zeroed. > > + */ > > + if (buffer_unwritten(bh_result)) { > > + bh_result->b_bdev = inode->i_sb->s_bdev; > > + set_buffer_new(bh_result); > > + bh_result->b_blocknr = -1; > > + } > > ret = 0; > > } > > > > Ok, I guess this seems like the safest approach. Long term we should > look really hard at the state & block nr of these buffer heads, but I > agree that keeping the changes restricted to the preallocation path for > now is safest. > This path (ret >0) this is the path where get_blocks() find the block allocated or preallocated. The buffer_unwritten() is strict to the preallocation case, but why not take care of the buffer_new() when we set the buffer_unwritten() for preallocation in ext4_ext_get_blocks() at the first place? That makes the "preallocation" case handling there all together. But both patch is correct, I have tested the prealloc, prealloc->paritial write, prealloc->paritial long write->partial-short-write, the content of the afterward read seems all sane in both patch. Any thoughts about the comments update I made in my previous patch? This part of comment in preallocation handling in ext4_ext_get_blocks() needs some cleanup. Think this over, if we set the buffer new here(i.e. in the write_begin() path), I wonder about the read case: where do we set the buffer_new() for the read on preallocated space? the ext4_ext_get_blocks() with create = 0 on preallocated extent will return bh unwritten, but not new. However my read tests right after new preallocation returns all zeroed data. I wonder what I am missing. Mingming > -Eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 29, 2009 at 10:17:20AM +0530, Aneesh Kumar K.V wrote: > We need to mark the buffer_head mapping prealloc space > as new during write_begin. Otherwise we don't zero out the > page cache content properly for a partial write. This will > cause file corruption with preallocation. > > Also use block number -1 as the fake block number so that > unmap_underlying_metadata doesn't drop wrong buffer_head The buffer_head code is starting to scare me more and more. I'm looking at this code again and I can't figure out why it's safe (or why we would need to) put in an invalid number into bh_result->b_blocknr: > @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, > set_buffer_delay(bh_result); > } else if (ret > 0) { > bh_result->b_size = (ret << inode->i_blkbits); > + /* > + * With sub-block writes into unwritten extents > + * we also need to mark the buffer as new so that > + * the unwritten parts of the buffer gets correctly zeroed. > + */ > + if (buffer_unwritten(bh_result)) { > + bh_result->b_bdev = inode->i_sb->s_bdev; > + set_buffer_new(bh_result); > + bh_result->b_blocknr = -1; Why do we need to avoid calling unmap_underlying_metadata()? And after the buffer is zero'ed out, it leaves b_blocknr in a buffer_head attached to the page at an invalid block number. Doesn't that get us in trouble later on? I see that this line is removed later on in the for-2.6.31 patch "Mark the unwritten buffer_head as mapped during write_begin". But is it safe for 2.6.30? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Theodore Tso wrote: > On Wed, Apr 29, 2009 at 10:17:20AM +0530, Aneesh Kumar K.V wrote: >> We need to mark the buffer_head mapping prealloc space >> as new during write_begin. Otherwise we don't zero out the >> page cache content properly for a partial write. This will >> cause file corruption with preallocation. >> >> Also use block number -1 as the fake block number so that >> unmap_underlying_metadata doesn't drop wrong buffer_head > > The buffer_head code is starting to scare me more and more. > > I'm looking at this code again and I can't figure out why it's safe > (or why we would need to) put in an invalid number into > bh_result->b_blocknr: I don't know for sure why it should be invalid; I think a preallocated block, since it has an *actual* *block* *allocated* after all, should have that block number. But if it's going to be fake, let's not use a "real" one like the superblock location... A real block nr does eventually get assigned when we do getblock with create=1 AFAICT. >> @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, >> set_buffer_delay(bh_result); >> } else if (ret > 0) { >> bh_result->b_size = (ret << inode->i_blkbits); >> + /* >> + * With sub-block writes into unwritten extents >> + * we also need to mark the buffer as new so that >> + * the unwritten parts of the buffer gets correctly zeroed. >> + */ >> + if (buffer_unwritten(bh_result)) { >> + bh_result->b_bdev = inode->i_sb->s_bdev; >> + set_buffer_new(bh_result); >> + bh_result->b_blocknr = -1; > > Why do we need to avoid calling unmap_underlying_metadata()? For that matter, why do we call unmap_underlying_metadata at all, ever? > And after the buffer is zero'ed out, it leaves b_blocknr in a > buffer_head attached to the page at an invalid block number. Doesn't > that get us in trouble later on? > > I see that this line is removed later on in the for-2.6.31 patch "Mark > the unwritten buffer_head as mapped during write_begin". But is it > safe for 2.6.30? I have this in F11 now, but it's giving me the heebie-jeebies still. At least it's confined to preallocation (one of the great new ext4 features I've been promoting recently... :) -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index e91f978..12dcfab 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2323,6 +2323,16 @@ static int ext4_da_get_block_prep(struct inode *inode, sector_t iblock, set_buffer_delay(bh_result); } else if (ret > 0) { bh_result->b_size = (ret << inode->i_blkbits); + /* + * With sub-block writes into unwritten extents + * we also need to mark the buffer as new so that + * the unwritten parts of the buffer gets correctly zeroed. + */ + if (buffer_unwritten(bh_result)) { + bh_result->b_bdev = inode->i_sb->s_bdev; + set_buffer_new(bh_result); + bh_result->b_blocknr = -1; + } ret = 0; }
We need to mark the buffer_head mapping prealloc space as new during write_begin. Otherwise we don't zero out the page cache content properly for a partial write. This will cause file corruption with preallocation. Also use block number -1 as the fake block number so that unmap_underlying_metadata doesn't drop wrong buffer_head Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> --- fs/ext4/inode.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)