diff mbox

[updated] ext4: Fix file fragmentation during large file write.

Message ID 20081016091015.GA3354@skywalker
State Accepted, archived
Headers show

Commit Message

Aneesh Kumar K.V Oct. 16, 2008, 9:10 a.m. UTC
On Wed, Oct 15, 2008 at 07:51:32PM -0400, Chris Mason wrote:
> On Wed, 2008-10-15 at 16:41 -0400, Chris Mason wrote:
> > On Fri, 2008-10-10 at 23:32 +0530, Aneesh Kumar K.V wrote:
> > > The range_cyclic writeback mode use the address_space
> > > writeback_index as the start index for writeback. With
> > > delayed allocation we were updating writeback_index
> > > wrongly resulting in highly fragmented file. Number of
> > > extents reduced from 4000 to 27 for a 3GB file with
> > > the below patch.
> > > 
> > 
> > I tested the ext4 patch queue from today on top of 2.6.27, and this
> > includes Aneesh's latest patches.
> > 
> > Things are going at disk speed for streaming writes, with the number of
> > extents generated for a 32GB file down to 27.  So, this is definitely an
> > improvement for ext4.
> 
> Just FYI, I ran this with compilebench -i 20 --makej and my log is full
> of these:
> 
> ext4_da_writepages: jbd2_start: 1024 pages, ino 520417; err -30
> Pid: 4072, comm: pdflush Not tainted 2.6.27 #2
> 

last patch I sent didn't solve the problem. The below patch
fixes it for me. The problem is not introduced by the file fragmentation
fixes. Full diff below. The first hunk is the important change. I also
kept the rest of the changes because I think they are fine. Also I added
a printk to figure out whether we ever exit from ext4_da_writepages with
increasing pages_skipped value


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Aneesh Kumar K.V Oct. 16, 2008, 11:13 a.m. UTC | #1
On Thu, Oct 16, 2008 at 02:40:15PM +0530, Aneesh Kumar K.V wrote:
> On Wed, Oct 15, 2008 at 07:51:32PM -0400, Chris Mason wrote:
> > On Wed, 2008-10-15 at 16:41 -0400, Chris Mason wrote:
> > > On Fri, 2008-10-10 at 23:32 +0530, Aneesh Kumar K.V wrote:
> > > > The range_cyclic writeback mode use the address_space
> > > > writeback_index as the start index for writeback. With
> > > > delayed allocation we were updating writeback_index
> > > > wrongly resulting in highly fragmented file. Number of
> > > > extents reduced from 4000 to 27 for a 3GB file with
> > > > the below patch.
> > > > 
> > > 
> > > I tested the ext4 patch queue from today on top of 2.6.27, and this
> > > includes Aneesh's latest patches.
> > > 
> > > Things are going at disk speed for streaming writes, with the number of
> > > extents generated for a 32GB file down to 27.  So, this is definitely an
> > > improvement for ext4.
> > 
> > Just FYI, I ran this with compilebench -i 20 --makej and my log is full
> > of these:
> > 
> > ext4_da_writepages: jbd2_start: 1024 pages, ino 520417; err -30
> > Pid: 4072, comm: pdflush Not tainted 2.6.27 #2
> > 
> 

compilebench numbers

ext4
==========================================================================
intial create total runs 20 avg 30.25 MB/s (user 0.74s sys 2.40s)
no runs for create
no runs for patch
compile total runs 20 avg 39.79 MB/s (user 0.17s sys 2.55s)
no runs for clean
no runs for read tree
read compiled tree total runs 3 avg 19.83 MB/s (user 0.97s sys 4.08s)
no runs for delete tree
delete compiled tree total runs 20 avg 4.42 seconds (user 0.58s sys 1.79s)
no runs for stat tree

ext3
======================
intial create total runs 20 avg 27.96 MB/s (user 0.73s sys 2.57s)
no runs for create
no runs for patch
compile total runs 20 avg 28.84 MB/s (user 0.17s sys 4.03s)
no runs for clean
no runs for read tree
read compiled tree total runs 3 avg 19.46 MB/s (user 0.97s sys 4.13s)
no runs for delete tree
delete compiled tree total runs 20 avg 18.09 seconds (user 0.58s sys 1.61s)
no runs for stat tree
no runs for stat compiled tree

-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3c4b9b4..b6768eb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2136,6 +2136,9 @@  static int mpage_da_writepages(struct address_space *mapping,
 	if (!mpd->io_done && mpd->next_page != mpd->first_page) {
 		if (mpage_da_map_blocks(mpd) == 0)
 			mpage_da_submit_io(mpd);
+
+		mpd->io_done = 1;
+		ret = MPAGE_DA_EXTENT_TAIL;
 	}
 
 	wbc->nr_to_write -= mpd->pages_written;
@@ -2370,10 +2373,10 @@  static int ext4_da_writepages(struct address_space *mapping,
 	pgoff_t	index;
 	int range_whole = 0;
 	handle_t *handle = NULL;
-	long pages_written = 0;
 	struct mpage_da_data mpd;
 	struct inode *inode = mapping->host;
 	int no_nrwrite_index_update;
+	long pages_written = 0, pages_skipped;
 	int needed_blocks, ret = 0, nr_to_writebump = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(mapping->host->i_sb);
 
@@ -2411,6 +2414,7 @@  static int ext4_da_writepages(struct address_space *mapping,
 	 */
 	no_nrwrite_index_update = wbc->no_nrwrite_index_update;
 	wbc->no_nrwrite_index_update = 1;
+	pages_skipped = wbc->pages_skipped;
 
 	while (!ret && wbc->nr_to_write > 0) {
 
@@ -2444,6 +2448,7 @@  static int ext4_da_writepages(struct address_space *mapping,
 			 * and try again
 			 */
 			jbd2_journal_force_commit_nested(sbi->s_journal);
+			wbc->pages_skipped = pages_skipped;
 			ret = 0;
 		} else if (ret == MPAGE_DA_EXTENT_TAIL) {
 			/*
@@ -2451,6 +2456,7 @@  static int ext4_da_writepages(struct address_space *mapping,
 			 * rest of the pages
 			 */
 			pages_written += mpd.pages_written;
+			wbc->pages_skipped = pages_skipped;
 			ret = 0;
 		} else if (wbc->nr_to_write)
 			/*
@@ -2460,6 +2466,11 @@  static int ext4_da_writepages(struct address_space *mapping,
 			 */
 			break;
 	}
+	if (pages_skipped != wbc->pages_skipped)
+		printk(KERN_EMERG "This should not happen leaving %s "
+				"with nr_to_write %ld ret = %d\n",
+				__func__, wbc->nr_to_write, ret);
+
 	/* Update index */
 	index += pages_written;
 	if (wbc->range_cyclic || (range_whole && wbc->nr_to_write > 0))