Message ID | 1381682393-5769-1-git-send-email-ming.lei@canonical.com |
---|---|
State | Superseded, archived |
Headers | show |
On Mon, Oct 14, 2013 at 12:39 AM, Ming Lei <ming.lei@canonical.com> wrote: > > On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), Sorry for forgetting to mention: the sata controller is working at 3.0Gbps. Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello, On Mon 14-10-13 00:39:52, Ming Lei wrote: > Commit 4e7ea81db5(ext4: restructure writeback path) introduces > another performance regression on random write: > > - one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map, > and will be submitted for I/O so nr_to_write will become -1 before 'done' > is set > > - the worse thing is that dirty pages may still be retrieved from page > cache after nr_to_write becomes negative, so lots of small chunks can be > submitted to block device when page writeback is catching up with write path, > and performance is hurted. Umm, I guess I see what are you pointing at. Thanks for catching that. mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write is already <= 0. But I would somewhat prefer not to call mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch like: ret = mpage_prepare_extent_to_map(&mpd); if (!ret) { - if (mpd.map.m_len) + if (mpd.map.m_len) { ret = mpage_map_and_submit_extent(handle, &mpd, &give_up_on_write); - else { + done = (wbc->nr_to_write <= 0); + } else { Should also fix your problem, am I right? Honza > On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), > this patch can improve below test result from 157MB/sec to 174MB/sec(>10%): > > dd if=/dev/zero of=./z.img bs=8K count=512K > > The above test is actually prototype of block write in bonnie++ utility. > > This patch fixes check on nr_to_write in mpage_prepare_extent_to_map() > to make sure nr_to_write won't become negative. > > Cc: Ted Tso <tytso@mit.edu> > Cc: Jan Kara <jack@suse.cz> > Cc: linux-ext4@vger.kernel.org > Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org> > Signed-off-by: Ming Lei <ming.lei@canonical.com> > --- > fs/ext4/inode.c | 20 ++++++++++---------- > 1 file changed, 10 insertions(+), 10 deletions(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 32c04ab..6a62803 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > if (mpd->map.m_len == 0) > mpd->first_page = page->index; > mpd->next_page = page->index + 1; > - /* Add all dirty buffers to mpd */ > - lblk = ((ext4_lblk_t)page->index) << > - (PAGE_CACHE_SHIFT - blkbits); > - head = page_buffers(page); > - err = mpage_process_page_bufs(mpd, head, head, lblk); > - if (err <= 0) > - goto out; > - err = 0; > - > /* > * Accumulated enough dirty pages? This doesn't apply > * to WB_SYNC_ALL mode. For integrity sync we have to > @@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) > * of the old dirty pages. > */ > if (mpd->wbc->sync_mode == WB_SYNC_NONE && > - mpd->next_page - mpd->first_page >= > + mpd->next_page - mpd->first_page > > mpd->wbc->nr_to_write) > goto out; > + > + /* Add all dirty buffers to mpd */ > + lblk = ((ext4_lblk_t)page->index) << > + (PAGE_CACHE_SHIFT - blkbits); > + head = page_buffers(page); > + err = mpage_process_page_bufs(mpd, head, head, lblk); > + if (err <= 0) > + goto out; > + err = 0; > } > pagevec_release(&pvec); > cond_resched(); > -- > 1.7.9.5 >
On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote: > Umm, I guess I see what are you pointing at. Thanks for catching that. > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > is already <= 0. But I would somewhat prefer not to call > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > like: > ret = mpage_prepare_extent_to_map(&mpd); > if (!ret) { > - if (mpd.map.m_len) > + if (mpd.map.m_len) { > ret = mpage_map_and_submit_extent(handle, &mpd, > &give_up_on_write); > - else { > + done = (wbc->nr_to_write <= 0); > + } else { > > Should also fix your problem, am I right? I am afraid it can't, because we need to stop scanning page cache if end of file is reached. nr_to_write will become negative inside mpage_map_and_submit_extent(), that is why I fix it inside mpage_prepare_extent_to_map(). Thanks, -- Ming Lei -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 14-10-13 21:50:54, Ming Lei wrote: > On Mon, Oct 14, 2013 at 8:58 PM, Jan Kara <jack@suse.cz> wrote: > > Umm, I guess I see what are you pointing at. Thanks for catching that. > > mpage_process_page_bufs() always adds a buffer to mpd even if nr_to_write > > is already <= 0. But I would somewhat prefer not to call > > mpage_prepare_extent_to_map() at all when nr_to_write <= 0. So a patch > > like: > > ret = mpage_prepare_extent_to_map(&mpd); > > if (!ret) { > > - if (mpd.map.m_len) > > + if (mpd.map.m_len) { > > ret = mpage_map_and_submit_extent(handle, &mpd, > > &give_up_on_write); > > - else { > > + done = (wbc->nr_to_write <= 0); > > + } else { > > > > Should also fix your problem, am I right? > > I am afraid it can't, because we need to stop scanning page cache > if end of file is reached. That should be OK. mpage_process_page_bufs() won't add a buffer beyond EOF so we end the extent at EOF and next time we don't add anything to the extent. My change wouldn't change anything in this. > nr_to_write will become negative inside mpage_map_and_submit_extent(), > that is why I fix it inside mpage_prepare_extent_to_map(). Yes, mpage_map_and_submit_extent() creates negative nr_to_write but only because mpage_prepare_extent_to_map() asked for mapping too big extent. But if I'm reading the code correctly we first ask for writing the extent of just the right size (nr_to_write becomes 0) but then ext4_writepages() asks mpage_prepare_extent_to_map() for another extent and it will create a single page extent for mapping before it finds out nr_to_write <= 0 and terminates. Am I understanding the problem correctly? After thinking about it again, moving the condition in mpage_prepare_extent_to_map() as you did is probably better that what I suggested. Just move it more up in the loop - like after page->index > end condition. So that we don't unnecessarily lock the page etc. Honza
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 32c04ab..6a62803 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2356,15 +2356,6 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) if (mpd->map.m_len == 0) mpd->first_page = page->index; mpd->next_page = page->index + 1; - /* Add all dirty buffers to mpd */ - lblk = ((ext4_lblk_t)page->index) << - (PAGE_CACHE_SHIFT - blkbits); - head = page_buffers(page); - err = mpage_process_page_bufs(mpd, head, head, lblk); - if (err <= 0) - goto out; - err = 0; - /* * Accumulated enough dirty pages? This doesn't apply * to WB_SYNC_ALL mode. For integrity sync we have to @@ -2374,9 +2365,18 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd) * of the old dirty pages. */ if (mpd->wbc->sync_mode == WB_SYNC_NONE && - mpd->next_page - mpd->first_page >= + mpd->next_page - mpd->first_page > mpd->wbc->nr_to_write) goto out; + + /* Add all dirty buffers to mpd */ + lblk = ((ext4_lblk_t)page->index) << + (PAGE_CACHE_SHIFT - blkbits); + head = page_buffers(page); + err = mpage_process_page_bufs(mpd, head, head, lblk); + if (err <= 0) + goto out; + err = 0; } pagevec_release(&pvec); cond_resched();
Commit 4e7ea81db5(ext4: restructure writeback path) introduces another performance regression on random write: - one more page may be mapped to ext4 extent in mpage_prepare_extent_to_map, and will be submitted for I/O so nr_to_write will become -1 before 'done' is set - the worse thing is that dirty pages may still be retrieved from page cache after nr_to_write becomes negative, so lots of small chunks can be submitted to block device when page writeback is catching up with write path, and performance is hurted. On one arm A15 board(arndale) with sata 3.0 SSD(CPU: 1.5GHz dura core, RAM: 2GB), this patch can improve below test result from 157MB/sec to 174MB/sec(>10%): dd if=/dev/zero of=./z.img bs=8K count=512K The above test is actually prototype of block write in bonnie++ utility. This patch fixes check on nr_to_write in mpage_prepare_extent_to_map() to make sure nr_to_write won't become negative. Cc: Ted Tso <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Cc: "linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org> Signed-off-by: Ming Lei <ming.lei@canonical.com> --- fs/ext4/inode.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)