Message ID | 87pr0ilw3n.fsf_-_@openvz.org |
---|---|
State | Superseded, archived |
Headers | show |
One more thing. Why do you need EXT4_STATE_EXT_TRUNC? The only place which tests it in any kind of real way is ext4_ext_truncate_extend_restart(), and it is only called by one function, ext4_ext_rm_leaf(), and *it* is only called in one place, inside ext4_ext_remove_space(), and *it* surronds the call with ext4_set_inode_state(inode, EXT4_STATE_EXT_TRUNC) and ext4_clear_inode_state(inode, EXT4_STATE_EXT_TRUNC). And while a truncate is happening, no other block allocation can happen, so the test in ext4_ext_map_blocks() doesn't seem to do much. (It only clears STATE_EXT_TRUNC if it is set and if the flags EXT4_GET_BLOCKS_CREATE is set. I'm not sure what the point of that is, either.) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 26-05-10 09:23:52, tytso@mit.edu wrote: > One more thing. Why do you need EXT4_STATE_EXT_TRUNC? > > The only place which tests it in any kind of real way is > ext4_ext_truncate_extend_restart(), and it is only called by one > function, ext4_ext_rm_leaf(), and *it* is only called in one place, > inside ext4_ext_remove_space(), and *it* surronds the call with > ext4_set_inode_state(inode, EXT4_STATE_EXT_TRUNC) and > ext4_clear_inode_state(inode, EXT4_STATE_EXT_TRUNC). > > And while a truncate is happening, no other block allocation can > happen, so the test in ext4_ext_map_blocks() doesn't seem to do much. This is false. As soon as we drop i_data_sem, allocation *can* happen from writeback path. Because truncate has already invalidated all the pages past new_size, it must be for some page before new_size but still it could modify an extent tree node we passed through when looking up our extent... > (It only clears STATE_EXT_TRUNC if it is set and if the flags > EXT4_GET_BLOCKS_CREATE is set. I'm not sure what the point of that > is, either.) I think the idea Dmitry tries to implement is: When allocation like I describe above happens while we droppped i_data_sem, restart the whole truncation. Honza
tytso@mit.edu writes: > One more thing. Why do you need EXT4_STATE_EXT_TRUNC? > > The only place which tests it in any kind of real way is > ext4_ext_truncate_extend_restart(), and it is only called by one > function, ext4_ext_rm_leaf(), and *it* is only called in one place, > inside ext4_ext_remove_space(), and *it* surronds the call with > ext4_set_inode_state(inode, EXT4_STATE_EXT_TRUNC) and > ext4_clear_inode_state(inode, EXT4_STATE_EXT_TRUNC). > > And while a truncate is happening, no other block allocation can > happen, so the test in ext4_ext_map_blocks() doesn't seem to do much. This is the biggest myth about truncate. Personally i always use to thought like this. But later i've found that it is not so. See later. > (It only clears STATE_EXT_TRUNC if it is set and if the flags > EXT4_GET_BLOCKS_CREATE is set. I'm not sure what the point of that > is, either.) This is the core idea of the patch. *Truncate* task set the bit to signal that truncate is under progress for inode. Later if we face a needs to restart transaction which result in i_data_sem indernal drop/acquire. And when the sem was dropped new block may be allocated by a flusher task (delay allocation writeback in the middle of the file) *Flusher* will discover than STATE_EXT_TRUNC is set and clear it is allocation is really necessary(EXT4_GET_BLOCKS_CREATE is set) By clearing STATE_EXT_TRUNC bit flusher let truncate task to know what it have to restart it's job. *Back to truncate task*. We can may sure what allocation not happens by testing STATE_EXT_TRUNC bit. And it it was cleared we have to restart truncate from very beginning because all data we have collected may not longer be valid, even depth of the file may increase. For example if we about to truncate inode with following leaf block {ee_block:1000, ee_len:100} So if block allocation happens while we are restarting transaction leaf block may looks like follows: {ee_block:500, ee_len:10} {ee_block:1000, ee_len:10} See that latest extent has changed it's position. This was not an issue for ext3 because it's blocks placed in deterministic positions. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 26, 2010 at 06:23:55PM +0400, Dmitry Monakhov wrote: > > Later if we face a needs to restart transaction which result in > i_data_sem indernal drop/acquire. > And when the sem was dropped new block may be allocated by a > flusher task (delay allocation writeback in the middle of the file) > > *Flusher* will discover than STATE_EXT_TRUNC is set and clear it > is allocation is really necessary(EXT4_GET_BLOCKS_CREATE is set) > By clearing STATE_EXT_TRUNC bit flusher let truncate task to know > what it have to restart it's job. OK, but why not have the truncate *always* restart its job after restarting the transaction? #1, it's relatively rare in most workloads that we need to restart the transaction at all in the first place, and #2, it's easier to test if we always restart the truncate, and #3, it's not like we'll be doing that much extra work if we restart the truncate and the file wasn't extended significantly... - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 3b63837..36e6a32 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1162,6 +1162,7 @@ enum { EXT4_STATE_DA_ALLOC_CLOSE, /* Alloc DA blks on close */ EXT4_STATE_EXT_MIGRATE, /* Inode is migrating */ EXT4_STATE_DIO_UNWRITTEN, /* need convert on dio done*/ + EXT4_STATE_EXT_TRUNC, /* truncate is in progress, modified under i_data_sem */ }; #define EXT4_INODE_BIT_FNS(name, field) \ diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index c7c304f..3321f57 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -107,11 +107,8 @@ static int ext4_ext_truncate_extend_restart(handle_t *handle, if (err <= 0) return err; err = ext4_truncate_restart_trans(handle, inode, needed); - /* - * We have dropped i_data_sem so someone might have cached again - * an extent we are going to truncate. - */ - ext4_ext_invalidate_cache(inode); + if (!err && !ext4_test_inode_state(inode, EXT4_STATE_EXT_TRUNC)) + err = -EAGAIN; return err; } @@ -2359,7 +2356,7 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start) int depth = ext_depth(inode); struct ext4_ext_path *path; handle_t *handle; - int i = 0, err = 0; + int i, err = 0; ext_debug("truncate since %u\n", start); @@ -2368,12 +2365,16 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start) if (IS_ERR(handle)) return PTR_ERR(handle); +again: ext4_ext_invalidate_cache(inode); /* * We start scanning from right side, freeing all the blocks * after i_size and walking into the tree depth-wise. */ + i = 0; + ext4_set_inode_state(inode, EXT4_STATE_EXT_TRUNC); + depth = ext_depth(inode); path = kzalloc(sizeof(struct ext4_ext_path) * (depth + 1), GFP_NOFS); if (path == NULL) { ext4_journal_stop(handle); @@ -2478,6 +2479,11 @@ static int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start) out: ext4_ext_drop_refs(path); kfree(path); + if (err == -EAGAIN) { + err = 0; + goto again; + } + ext4_clear_inode_state(inode, EXT4_STATE_EXT_TRUNC); ext4_journal_stop(handle); return err; @@ -3327,6 +3333,9 @@ int ext4_ext_map_blocks(handle_t *handle, struct inode *inode, ext_debug("blocks %u/%u requested for inode %lu\n", map->m_lblk, map->m_len, inode->i_ino); + if (unlikely((flags & EXT4_GET_BLOCKS_CREATE)) && + ext4_test_inode_state(inode, EXT4_STATE_EXT_TRUNC)) + ext4_clear_inode_state(inode, EXT4_STATE_EXT_TRUNC); /* check in cache */ cache_type = ext4_ext_in_cache(inode, map->m_lblk, &newex); if (cache_type) {