From patchwork Sat Dec 15 05:48:39 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoguang Wang X-Patchwork-Id: 1013837 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.alibaba.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43GxRb3dbXz9s5c for ; Sat, 15 Dec 2018 16:53:31 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729681AbeLOFxa (ORCPT ); Sat, 15 Dec 2018 00:53:30 -0500 Received: from out30-130.freemail.mail.aliyun.com ([115.124.30.130]:48608 "EHLO out30-130.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727757AbeLOFxa (ORCPT ); Sat, 15 Dec 2018 00:53:30 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R371e4; CH=green; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01f04428; MF=xiaoguang.wang@linux.alibaba.com; NM=1; PH=DS; RN=2; SR=0; TI=SMTPD_---0TFl95hj_1544852929; Received: from localhost(mailfrom:xiaoguang.wang@linux.alibaba.com fp:SMTPD_---0TFl95hj_1544852929) by smtp.aliyun-inc.com(127.0.0.1); Sat, 15 Dec 2018 13:48:57 +0800 From: Xiaoguang Wang To: linux-ext4@vger.kernel.org Cc: Xiaoguang Wang Subject: [PATCH v2 1/2] ext4: try to merge unwritten extents who are also not under io Date: Sat, 15 Dec 2018 13:48:39 +0800 Message-Id: <20181215054840.5960-1-xiaoguang.wang@linux.alibaba.com> X-Mailer: git-send-email 2.17.2 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Currently in ext4_can_extents_be_merged(), if one file has unwritten extents under io, we will not merge any other unwritten extents, even they are not in range of those unwritten extents under io. This limit is coarse, indeed we can merge these unwritten extents that are not under io. Here add a new ES_IO_B flag to track unwritten extents under io in extents status tree. When we try to merge unwritten extents, search given extents in extents status tree, if not found, then we can merge these unwritten extents. Note currently we only track unwritten extents under io. Signed-off-by: Xiaoguang Wang --- fs/ext4/extents.c | 24 +++++++++++++++++++++-- fs/ext4/extents_status.c | 41 ++++++++++++++++++++++++++++++++++++++++ fs/ext4/extents_status.h | 12 +++++++++++- fs/ext4/inode.c | 28 ++++++++++++++++++++++++++- 4 files changed, 101 insertions(+), 4 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 240b6dea5441..444c739470a5 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1713,6 +1713,25 @@ static int ext4_ext_correct_indexes(handle_t *handle, struct inode *inode, return err; } +static int ext4_unwritten_extent_under_io(struct inode *inode, + ext4_lblk_t start, unsigned int len) +{ + /* + * The check for IO to unwritten extent is somewhat racy as we + * increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after + * dropping i_data_sem. But reserved blocks should save us in that + * case. + */ + if (atomic_read(&EXT4_I(inode)->i_unwritten) == 0) + return 0; + + if (ext4_es_scan_range(inode, &ext4_es_is_under_io, start, + start + len - 1)) + return 1; + + return 0; +} + int ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, struct ext4_extent *ex2) @@ -1744,8 +1763,9 @@ ext4_can_extents_be_merged(struct inode *inode, struct ext4_extent *ex1, */ if (ext4_ext_is_unwritten(ex1) && (ext4_test_inode_state(inode, EXT4_STATE_DIO_UNWRITTEN) || - atomic_read(&EXT4_I(inode)->i_unwritten) || - (ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN))) + (ext1_ee_len + ext2_ee_len > EXT_UNWRITTEN_MAX_LEN) || + ext4_unwritten_extent_under_io(inode, le32_to_cpu(ex1->ee_block), + ext1_ee_len + ext2_ee_len))) return 0; #ifdef AGGRESSIVE_TEST if (ext1_ee_len >= 4) diff --git a/fs/ext4/extents_status.c b/fs/ext4/extents_status.c index 2b439afafe13..1262184dac29 100644 --- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -860,6 +860,38 @@ int ext4_es_insert_extent(struct inode *inode, ext4_lblk_t lblk, return err; } +/* + * When writing unwritten extents, we mark EXTENT_STATUS_IO in es, + * but if some errors happen and stop submitting IO, also need to + * clear EXTENT_STATUS_IO flag. + */ +int ext4_es_clear_io_status(struct inode *inode, ext4_lblk_t lblk, + ext4_lblk_t end, ext4_fsblk_t block) +{ + struct ext4_es_tree *tree = &EXT4_I(inode)->i_es_tree; + struct extent_status *es; + unsigned int status; + int err = 0; + + read_lock(&EXT4_I(inode)->i_es_lock); + es = __es_tree_search(&tree->root, lblk); + if (!es || es->es_lblk > end) { + read_unlock(&EXT4_I(inode)->i_es_lock); + return err; + } + status = ext4_es_type(es); + status &= ~EXTENT_STATUS_IO; + read_unlock(&EXT4_I(inode)->i_es_lock); + + /* + * Note ext4_es_insert_extent will remove es firstly and insert new es + * with new status without EXTENT_STATUS_IO. + */ + err = ext4_es_insert_extent(inode, lblk, end - lblk + 1, block, status); + return err; +} + + /* * ext4_es_cache_extent() inserts information into the extent status * tree if and only if there isn't information about the range in @@ -1332,6 +1364,15 @@ static int es_do_reclaim_extents(struct ext4_inode_info *ei, ext4_lblk_t end, */ if (ext4_es_is_delayed(es)) goto next; + + /* + * We don't reclaim unwritten extent under io because we use + * it to check whether we can merge other unwritten extents + * who are not under io, and when io completes, then we can + * reclaim this extent. + */ + if (ext4_es_is_under_io(es)) + goto next; if (ext4_es_is_referenced(es)) { ext4_es_clear_referenced(es); goto next; diff --git a/fs/ext4/extents_status.h b/fs/ext4/extents_status.h index 131a8b7df265..0452a98af90d 100644 --- a/fs/ext4/extents_status.h +++ b/fs/ext4/extents_status.h @@ -36,6 +36,7 @@ enum { ES_DELAYED_B, ES_HOLE_B, ES_REFERENCED_B, + ES_IO_B, ES_FLAGS }; @@ -47,11 +48,13 @@ enum { #define EXTENT_STATUS_DELAYED (1 << ES_DELAYED_B) #define EXTENT_STATUS_HOLE (1 << ES_HOLE_B) #define EXTENT_STATUS_REFERENCED (1 << ES_REFERENCED_B) +#define EXTENT_STATUS_IO (1 << ES_IO_B) #define ES_TYPE_MASK ((ext4_fsblk_t)(EXTENT_STATUS_WRITTEN | \ EXTENT_STATUS_UNWRITTEN | \ EXTENT_STATUS_DELAYED | \ - EXTENT_STATUS_HOLE) << ES_SHIFT) + EXTENT_STATUS_HOLE | \ + EXTENT_STATUS_IO) << ES_SHIFT) struct ext4_sb_info; struct ext4_extent; @@ -147,6 +150,8 @@ extern bool ext4_es_scan_range(struct inode *inode, extern bool ext4_es_scan_clu(struct inode *inode, int (*matching_fn)(struct extent_status *es), ext4_lblk_t lblk); +extern int ext4_es_clear_io_status(struct inode *inode, ext4_lblk_t lblk, + ext4_lblk_t end, ext4_fsblk_t block); static inline unsigned int ext4_es_status(struct extent_status *es) { @@ -173,6 +178,11 @@ static inline int ext4_es_is_delayed(struct extent_status *es) return (ext4_es_type(es) & EXTENT_STATUS_DELAYED) != 0; } +static inline int ext4_es_is_under_io(struct extent_status *es) +{ + return (ext4_es_type(es) & EXTENT_STATUS_IO) != 0; +} + static inline int ext4_es_is_hole(struct extent_status *es) { return (ext4_es_type(es) & EXTENT_STATUS_HOLE) != 0; diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 22a9d8159720..ba557a731081 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -704,6 +704,16 @@ int ext4_map_blocks(handle_t *handle, struct inode *inode, ext4_es_scan_range(inode, &ext4_es_is_delayed, map->m_lblk, map->m_lblk + map->m_len - 1)) status |= EXTENT_STATUS_DELAYED; + /* + * Track unwritten extent under io. When io completes, we'll + * convert unwritten extent to written, ext4_es_insert_extent() + * will be called again to insert this written extent, then + * EXTENT_STATUS_IO will be cleared automatically, see remove + * logic in ext4_es_insert_extent(). + */ + if ((status & EXTENT_STATUS_UNWRITTEN) && (flags & + EXT4_GET_BLOCKS_IO_SUBMIT)) + status |= EXTENT_STATUS_IO; ret = ext4_es_insert_extent(inode, map->m_lblk, map->m_len, map->m_pblk, status); if (ret < 0) { @@ -2526,6 +2536,8 @@ static int mpage_map_and_submit_extent(handle_t *handle, int err; loff_t disksize; int progress = 0; + ext4_lblk_t start = 0, end = 0, submitted = 0; + ext4_fsblk_t phy_start = 0; mpd->io_submit.io_end->offset = ((loff_t)map->m_lblk) << inode->i_blkbits; @@ -2565,13 +2577,27 @@ static int mpage_map_and_submit_extent(handle_t *handle, return err; } progress = 1; + + if (mpd->io_submit.io_end->flag & EXT4_IO_END_UNWRITTEN) { + start = mpd->map.m_lblk; + end = start + mpd->map.m_len - 1; + phy_start = mpd->map.m_pblk; + } /* * Update buffer state, submit mapped pages, and get us new * extent to map */ err = mpage_map_and_submit_buffers(mpd); - if (err < 0) + if (err < 0) { + submitted = mpd->io_submit.io_end->size >> + inode->i_blkbits; + if (mpd->io_submit.io_end->flag & EXT4_IO_END_UNWRITTEN + && submitted < (end - start + 1)) + ext4_es_clear_io_status(inode, + start + submitted, end, + phy_start + submitted); goto update_disksize; + } } while (map->m_len); update_disksize: From patchwork Sat Dec 15 05:48:40 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Xiaoguang Wang X-Patchwork-Id: 1013836 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=linux.alibaba.com Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43GxQV4Qs6z9s5c for ; Sat, 15 Dec 2018 16:52:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729671AbeLOFwb (ORCPT ); Sat, 15 Dec 2018 00:52:31 -0500 Received: from out30-132.freemail.mail.aliyun.com ([115.124.30.132]:44074 "EHLO out30-132.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729245AbeLOFwb (ORCPT ); Sat, 15 Dec 2018 00:52:31 -0500 X-Alimail-AntiSpam: AC=PASS; BC=-1|-1; BR=01201311R121e4; CH=green; FP=0|-1|-1|-1|0|-1|-1|-1; HT=e01e07486; MF=xiaoguang.wang@linux.alibaba.com; NM=1; PH=DS; RN=3; SR=0; TI=SMTPD_---0TFl33c-_1544852939; Received: from localhost(mailfrom:xiaoguang.wang@linux.alibaba.com fp:SMTPD_---0TFl33c-_1544852939) by smtp.aliyun-inc.com(127.0.0.1); Sat, 15 Dec 2018 13:49:05 +0800 From: Xiaoguang Wang To: linux-ext4@vger.kernel.org Cc: Xiaoguang Wang , Liu Bo Subject: [PATCH v2 2/2] ext4: fix slow writeback under dioread_nolock and nodelalloc Date: Sat, 15 Dec 2018 13:48:40 +0800 Message-Id: <20181215054840.5960-2-xiaoguang.wang@linux.alibaba.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181215054840.5960-1-xiaoguang.wang@linux.alibaba.com> References: <20181215054840.5960-1-xiaoguang.wang@linux.alibaba.com> Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org With "nodelalloc", blocks are allocated at the time of writing, and with "dioread_nolock", these allocated blocks are marked as unwritten as well, so bh(s) attached to the blocks have BH_Unwritten and BH_Mapped. Everything looks normal except with "dioread_nolock", all allocated extents are with EXT4_GET_BLOCKS_PRE_IO, which doesn't allow merging adjacent extents. And when it comes to writepages, given the fact that bh marked as BH_Unwritten, it has to hold a journal handle to process these extents, but when writepages() prepared a bunch of pages in a mpd, it could only find one block to map to and submit one page at a time, and loop to the next page over and over again. ext4_writepages ... # starting from the 1st dirty page ext4_journal_start_with_reserve mpage_prepare_extent_to_map # batch up to 2048 dirty pages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks #with EXT4_GET_BLOCKS_IO_CREATE_EXT ext4_ext_map_blocks ext4_find_extent # find an extent with only one block at the offset ext4_ext_handle_unwritten_extents # try to split due to EXT4_GET_BLOCKS_PRE_IO, # but no need to in this case as there is # only one block in this extent mpage_map_and_submit_buffers #submit io for only 1st page #start from the 2nd dirty page ... --- Given this is for buffered writes, the nice thing we want from "dioread_nolock" is that extents are converted from unwritten at endio, so thus we really don't have to take PRE_IO which is desigend for direct IO path originally. With this, we do extent merging in case of "nodelalloc" and writeback doesn't need to do those extra batching and looping, the performance number is shown as follows: mount -o dioread_nolock,nodelalloc /dev/loop0 /mnt/ xfs_io -f -c "pwrite -W 0 1G" $M/foobar - w/o: wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 262144 ops; 0:02:27.00 (6.951 MiB/sec and 1779.3791 ops/sec) - w/ wrote 1073741824/1073741824 bytes at offset 0 1 GiB, 262144 ops; 0:00:06.00 (161.915 MiB/sec and 41450.3184 ops/sec) Signed-off-by: Liu Bo Signed-off-by: Xiaoguang Wang --- fs/ext4/extents.c | 6 +++++- fs/ext4/inode.c | 2 +- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 444c739470a5..de73b0152892 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4038,9 +4038,13 @@ ext4_ext_handle_unwritten_extents(handle_t *handle, struct inode *inode, /* * repeat fallocate creation request * we already have an unwritten extent + * + * With nodelalloc + dioread_nolock, write can also come here, + * so make sure map is set with new to avoid exposing stale + * data to reads. */ if (flags & EXT4_GET_BLOCKS_UNWRIT_EXT) { - map->m_flags |= EXT4_MAP_UNWRITTEN; + map->m_flags |= EXT4_MAP_UNWRITTEN | EXT4_MAP_NEW; goto map_out; } diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ba557a731081..4dbb43ab9d6e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -822,7 +822,7 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", inode->i_ino, create); return _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); } /* Maximum number of blocks we map for direct IO at once. */