diff mbox

[1/3] block_write_full_page: Use synchronous writes for WBC_SYNC_ALL writebacks

Message ID 20090407190913.GA31723@mit.edu
State Not Applicable, archived
Headers show

Commit Message

Theodore Ts'o April 7, 2009, 7:09 p.m. UTC
On Tue, Apr 07, 2009 at 09:57:32AM +0200, Jens Axboe wrote:
> > > It looks like a good candidate for WRITE_SYNC_PLUG instead,
> > 

So is this patch sane?  (Compile-tested only, since I'm at a
conference at the moment).  Am I using the proper abstraction to
unplug the block device?  If not, it might be nice to document the
preferred for callers into the block layer.

(BTW, I was looking at Documentation/biodoc.txt, and I found some
clearly old documentation bits: "This is just the same as in 2.4 so
far, though per-device unplugging support is anticipated for 2.5."  :-)

					- Ted

[RFQ] Smart unplugging for page writeback

Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
use WRITE_SYNC_PLUG in block_write_full_page(), and then before we
wait for page writebacks to complete in jbd, jbd2, and filemap, call
blk_unplug() to make sure the writes are on the way to the disk.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jens Axboe April 7, 2009, 7:32 p.m. UTC | #1
On Tue, Apr 07 2009, Theodore Tso wrote:
> On Tue, Apr 07, 2009 at 09:57:32AM +0200, Jens Axboe wrote:
> > > > It looks like a good candidate for WRITE_SYNC_PLUG instead,
> > > 
> 
> So is this patch sane?  (Compile-tested only, since I'm at a
> conference at the moment).  Am I using the proper abstraction to
> unplug the block device?  If not, it might be nice to document the
> preferred for callers into the block layer.
> 
> (BTW, I was looking at Documentation/biodoc.txt, and I found some
> clearly old documentation bits: "This is just the same as in 2.4 so
> far, though per-device unplugging support is anticipated for 2.5."  :-)

Woops, something to update :-)

> [RFQ] Smart unplugging for page writeback
> 
> Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
> use WRITE_SYNC_PLUG in block_write_full_page(), and then before we
> wait for page writebacks to complete in jbd, jbd2, and filemap, call
> blk_unplug() to make sure the writes are on the way to the disk.
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 977e12a..95b5390 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1646,7 +1646,8 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
>  	struct buffer_head *bh, *head;
>  	const unsigned blocksize = 1 << inode->i_blkbits;
>  	int nr_underway = 0;
> -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> +			WRITE_SYNC_PLUG : WRITE);
>  
>  	BUG_ON(!PageLocked(page));

This should be all you need, forget the below. The manual unplugging
only really makes sense for very select situations, since it'll be done
when someone waits on the buffer anyway. And the block layer will unplug
implicitly if 3ms has passed or 4 requests are pending, if someone
hasn't done a wait for a page/bh in flight for that device anyway.

So at least with the current setup, it'll only really make a difference
if you end up waiting on the buffer 1-2ms after submission and you
haven't submitted enough IO to kick off the unplugging. If you wait
immediately, the time period between your blk_unplug() call and the one
that wait_on_bit() would do is so short that it will not make a
difference. And if you don't wait on the IO, you just want to let it hit
the disk eventually.


> diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
> index a8e8513..3e6726f 100644
> --- a/fs/jbd/commit.c
> +++ b/fs/jbd/commit.c
> @@ -21,6 +21,7 @@
>  #include <linux/mm.h>
>  #include <linux/pagemap.h>
>  #include <linux/bio.h>
> +#include <linux/blkdev.h>
>  
>  /*
>   * Default IO end handler for temporary BJ_IO buffer_heads.
> @@ -196,6 +197,7 @@ static int journal_submit_data_buffers(journal_t *journal,
>  	int locked;
>  	int bufs = 0;
>  	struct buffer_head **wbuf = journal->j_wbuf;
> +	struct block_device *fs_bdev = 0;
>  	int err = 0;
>  
>  	/*
> @@ -213,6 +215,7 @@ write_out_data:
>  	while (commit_transaction->t_sync_datalist) {
>  		jh = commit_transaction->t_sync_datalist;
>  		bh = jh2bh(jh);
> +		fs_bdev = bh->b_bdev;
>  		locked = 0;
>  
>  		/* Get reference just to make sure buffer does not disappear
> @@ -290,6 +293,8 @@ write_out_data:
>  	}
>  	spin_unlock(&journal->j_list_lock);
>  	journal_do_submit_data(wbuf, bufs, write_op);
> +	if (fs_bdev)
> +		blk_unplug(bdev_get_queue(fs_bdev));
>  
>  	return err;
>  }
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 282750c..b5448dd 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -26,6 +26,7 @@
>  #include <linux/writeback.h>
>  #include <linux/backing-dev.h>
>  #include <linux/bio.h>
> +#include <linux/blkdev.h>
>  
>  /*
>   * Default IO end handler for temporary BJ_IO buffer_heads.
> @@ -176,6 +177,7 @@ static int journal_wait_on_commit_record(journal_t *journal,
>  
>  retry:
>  	clear_buffer_dirty(bh);
> +	blk_unplug(bdev_get_queue(bh->b_bdev));
>  	wait_on_buffer(bh);

There's really not a lot of point to doing it this way, since
wait_on_buffer() will immediately do an unplug of the device anyway.

So it's quite on purpose that I didn't do a lot of tracking for this and
manual unplugging, we simply don't need it.

>  	if (buffer_eopnotsupp(bh) && (journal->j_flags & JBD2_BARRIER)) {
>  		printk(KERN_WARNING
> @@ -241,10 +243,12 @@ static int journal_submit_data_buffers(journal_t *journal,
>  	struct jbd2_inode *jinode;
>  	int err, ret = 0;
>  	struct address_space *mapping;
> +	struct block_device *fs_bdev = 0;
>  
>  	spin_lock(&journal->j_list_lock);
>  	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
>  		mapping = jinode->i_vfs_inode->i_mapping;
> +		fs_bdev = jinode->i_vfs_inode->i_sb->s_bdev;
>  		jinode->i_flags |= JI_COMMIT_RUNNING;
>  		spin_unlock(&journal->j_list_lock);
>  		/*
> @@ -262,6 +266,8 @@ static int journal_submit_data_buffers(journal_t *journal,
>  		wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
>  	}
>  	spin_unlock(&journal->j_list_lock);
> +	if (fs_bdev)
> +		blk_unplug(bdev_get_queue(fs_bdev));
>  	return ret;
>  }
>  
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 2e2d38e..eff2ed9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -274,6 +274,10 @@ int wait_on_page_writeback_range(struct address_space *mapping,
>  	if (end < start)
>  		return 0;
>  
> +	if (mapping->host && mapping->host->i_sb && mapping->host->i_sb &&
> +	    mapping->host->i_sb->s_bdev)
> +		blk_unplug(bdev_get_queue(mapping->host->i_sb->s_bdev));
> +

Ugh, this one is nasty!

        blk_run_address_space(mapping);

would be a lot cleaner.
Theodore Ts'o April 7, 2009, 9:44 p.m. UTC | #2
On Tue, Apr 07, 2009 at 09:32:39PM +0200, Jens Axboe wrote:
> 
> This should be all you need, forget the below. The manual unplugging
> only really makes sense for very select situations, since it'll be done
> when someone waits on the buffer anyway.

Thanks, ok; I missed the "sync_buffer" in the call to wait_on_bit() in
__wait_on_buffer().  It also wasn't obvious to me that
wait_on_writeback_range() ultimately ends up calling aops->sync_page
which for filemap.c will call blk_run_backing_dev() --- and that
blk_run_*() results in the right device queue getting unplugged.  So
this would be a nice thing to get documented in
Documentation/biodoc.txt.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe April 8, 2009, 5:58 a.m. UTC | #3
On Tue, Apr 07 2009, Theodore Tso wrote:
> On Tue, Apr 07, 2009 at 09:32:39PM +0200, Jens Axboe wrote:
> > 
> > This should be all you need, forget the below. The manual unplugging
> > only really makes sense for very select situations, since it'll be done
> > when someone waits on the buffer anyway.
> 
> Thanks, ok; I missed the "sync_buffer" in the call to wait_on_bit() in
> __wait_on_buffer().  It also wasn't obvious to me that
> wait_on_writeback_range() ultimately ends up calling aops->sync_page
> which for filemap.c will call blk_run_backing_dev() --- and that
> blk_run_*() results in the right device queue getting unplugged.  So
> this would be a nice thing to get documented in
> Documentation/biodoc.txt.

I'm not even sure what is documented about this sort of thing in biodoc,
but it's actually somewhat outside the realm of block layer
documentation. The block layer should document that you need to call
unplug if you are going to be waiting for that IO. How the vm code does
it is implementation detail mostly, the important thing is that it gets
done :-)
Theodore Ts'o April 8, 2009, 3:25 p.m. UTC | #4
On Wed, Apr 08, 2009 at 07:58:54AM +0200, Jens Axboe wrote:
> 
> I'm not even sure what is documented about this sort of thing in biodoc,
> but it's actually somewhat outside the realm of block layer
> documentation. The block layer should document that you need to call
> unplug if you are going to be waiting for that IO. How the vm code does
> it is implementation detail mostly, the important thing is that it gets
> done :-)

Fair enough; I agree a lot of the problem is the page writeback
subsystem isn't well documented at all, and is crying out for a review
for possible refactorization and cleanup....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 977e12a..95b5390 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1646,7 +1646,8 @@  static int __block_write_full_page(struct inode *inode, struct page *page,
 	struct buffer_head *bh, *head;
 	const unsigned blocksize = 1 << inode->i_blkbits;
 	int nr_underway = 0;
-	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
+	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
+			WRITE_SYNC_PLUG : WRITE);
 
 	BUG_ON(!PageLocked(page));
 
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index a8e8513..3e6726f 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -21,6 +21,7 @@ 
 #include <linux/mm.h>
 #include <linux/pagemap.h>
 #include <linux/bio.h>
+#include <linux/blkdev.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -196,6 +197,7 @@  static int journal_submit_data_buffers(journal_t *journal,
 	int locked;
 	int bufs = 0;
 	struct buffer_head **wbuf = journal->j_wbuf;
+	struct block_device *fs_bdev = 0;
 	int err = 0;
 
 	/*
@@ -213,6 +215,7 @@  write_out_data:
 	while (commit_transaction->t_sync_datalist) {
 		jh = commit_transaction->t_sync_datalist;
 		bh = jh2bh(jh);
+		fs_bdev = bh->b_bdev;
 		locked = 0;
 
 		/* Get reference just to make sure buffer does not disappear
@@ -290,6 +293,8 @@  write_out_data:
 	}
 	spin_unlock(&journal->j_list_lock);
 	journal_do_submit_data(wbuf, bufs, write_op);
+	if (fs_bdev)
+		blk_unplug(bdev_get_queue(fs_bdev));
 
 	return err;
 }
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 282750c..b5448dd 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -26,6 +26,7 @@ 
 #include <linux/writeback.h>
 #include <linux/backing-dev.h>
 #include <linux/bio.h>
+#include <linux/blkdev.h>
 
 /*
  * Default IO end handler for temporary BJ_IO buffer_heads.
@@ -176,6 +177,7 @@  static int journal_wait_on_commit_record(journal_t *journal,
 
 retry:
 	clear_buffer_dirty(bh);
+	blk_unplug(bdev_get_queue(bh->b_bdev));
 	wait_on_buffer(bh);
 	if (buffer_eopnotsupp(bh) && (journal->j_flags & JBD2_BARRIER)) {
 		printk(KERN_WARNING
@@ -241,10 +243,12 @@  static int journal_submit_data_buffers(journal_t *journal,
 	struct jbd2_inode *jinode;
 	int err, ret = 0;
 	struct address_space *mapping;
+	struct block_device *fs_bdev = 0;
 
 	spin_lock(&journal->j_list_lock);
 	list_for_each_entry(jinode, &commit_transaction->t_inode_list, i_list) {
 		mapping = jinode->i_vfs_inode->i_mapping;
+		fs_bdev = jinode->i_vfs_inode->i_sb->s_bdev;
 		jinode->i_flags |= JI_COMMIT_RUNNING;
 		spin_unlock(&journal->j_list_lock);
 		/*
@@ -262,6 +266,8 @@  static int journal_submit_data_buffers(journal_t *journal,
 		wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING);
 	}
 	spin_unlock(&journal->j_list_lock);
+	if (fs_bdev)
+		blk_unplug(bdev_get_queue(fs_bdev));
 	return ret;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 2e2d38e..eff2ed9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -274,6 +274,10 @@  int wait_on_page_writeback_range(struct address_space *mapping,
 	if (end < start)
 		return 0;
 
+	if (mapping->host && mapping->host->i_sb && mapping->host->i_sb &&
+	    mapping->host->i_sb->s_bdev)
+		blk_unplug(bdev_get_queue(mapping->host->i_sb->s_bdev));
+
 	pagevec_init(&pvec, 0);
 	index = start;
 	while ((index <= end) &&