diff mbox

block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG

Message ID 20090407221933.GB7031@mit.edu
State Not Applicable, archived
Headers show

Commit Message

Theodore Ts'o April 7, 2009, 10:19 p.m. UTC
Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
the block device I/O queue between each page that gets flushed out.

The upstream callers of block_write_full_page() which wait for the
writes to finish call wait_on_buffer(), wait_on_writeback_range()
(which ultimately calls sync_page(), which calls
blk_run_backing_dev(), which will unplug the device queue), and so on.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
---

We should get this applied to avoid any performance regressions
resulting from commit a64c8610.

 fs/buffer.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

Comments

Andrew Morton April 7, 2009, 11:09 p.m. UTC | #1
On Tue, 7 Apr 2009 18:19:33 -0400
Theodore Tso <tytso@mit.edu> wrote:

> Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
> use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
> the block device I/O queue between each page that gets flushed out.
> 
> The upstream callers of block_write_full_page() which wait for the
> writes to finish call wait_on_buffer(), wait_on_writeback_range()
> (which ultimately calls sync_page(), which calls
> blk_run_backing_dev(), which will unplug the device queue), and so on.
> 

<sob>

> 
> We should get this applied to avoid any performance regressions
> resulting from commit a64c8610.
> 
>  fs/buffer.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 977e12a..95b5390 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1646,7 +1646,8 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
>  	struct buffer_head *bh, *head;
>  	const unsigned blocksize = 1 << inode->i_blkbits;
>  	int nr_underway = 0;
> -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> +			WRITE_SYNC_PLUG : WRITE);
>  
>  	BUG_ON(!PageLocked(page));

So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
this change have upon kernel behaviour?


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o April 7, 2009, 11:46 p.m. UTC | #2
On Tue, Apr 07, 2009 at 04:09:44PM -0700, Andrew Morton wrote:
> > 
> > The upstream callers of block_write_full_page() which wait for the
> > writes to finish call wait_on_buffer(), wait_on_writeback_range()
> > (which ultimately calls sync_page(), which calls
> > blk_run_backing_dev(), which will unplug the device queue), and so on.
> 
> <sob>

No question, this stuff needs to be better documented; the codepaths
involved is scattered between files in block/, fs/, and mm/
directories, and it's not well documented as *all* what a filesystem
developer is supposed to do.

> >  	const unsigned blocksize = 1 << inode->i_blkbits;
> >  	int nr_underway = 0;
> > -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> > +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> > +			WRITE_SYNC_PLUG : WRITE);
> >  
> >  	BUG_ON(!PageLocked(page));
> 
> So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> this change have upon kernel behaviour?

The difference between WRITE_SYNC_PLUG and WRITE is that from the
perspective of the I/O scheduler, they are prioritized as
"synchronous" operations.  Some I/O schedulers, such as AS and CFQ,
prioritize synchronous writes and put them in the same bucket as
synchronous reads, and above asynchronous writes.

Currently, we are using WRITE_SYNC, which has the implicit unplug if
wbc->sync_mode is WB_SYNC_ALL.  WRITE_SYNC_PLUG removes the implicit
unplug, which was the issue that you had expressed concern.

	      	      	    	     	 - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe April 8, 2009, 6 a.m. UTC | #3
On Tue, Apr 07 2009, Theodore Tso wrote:
> Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
> use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
> the block device I/O queue between each page that gets flushed out.
> 
> The upstream callers of block_write_full_page() which wait for the
> writes to finish call wait_on_buffer(), wait_on_writeback_range()
> (which ultimately calls sync_page(), which calls
> blk_run_backing_dev(), which will unplug the device queue), and so on.
> 
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> ---
> 
> We should get this applied to avoid any performance regressions
> resulting from commit a64c8610.
> 
>  fs/buffer.c |    3 ++-
>  1 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 977e12a..95b5390 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -1646,7 +1646,8 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
>  	struct buffer_head *bh, *head;
>  	const unsigned blocksize = 1 << inode->i_blkbits;
>  	int nr_underway = 0;
> -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> +			WRITE_SYNC_PLUG : WRITE);
>  
>  	BUG_ON(!PageLocked(page));

I think you should comment on why we don't need to do the actual unplug.
See what I added in fs/jbd/commit.c:journal_commit_transaction():

        /*
         * Use plugged writes here, since we want to submit several
         * before we unplug the device. We don't do explicit
         * unplugging in here, instead we rely on sync_buffer() doing
         * the unplug for us.
         */
Theodore Ts'o April 8, 2009, 3:26 p.m. UTC | #4
On Wed, Apr 08, 2009 at 08:00:33AM +0200, Jens Axboe wrote:
> 
> I think you should comment on why we don't need to do the actual unplug.
> See what I added in fs/jbd/commit.c:journal_commit_transaction():
> 
>         /*
>          * Use plugged writes here, since we want to submit several
>          * before we unplug the device. We don't do explicit
>          * unplugging in here, instead we rely on sync_buffer() doing
>          * the unplug for us.
>          */

OK, agreed.  I'll add a comment explaining what is going on in the
patch; better there than in the commit log.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/buffer.c b/fs/buffer.c
index 977e12a..95b5390 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1646,7 +1646,8 @@  static int __block_write_full_page(struct inode *inode, struct page *page,
 	struct buffer_head *bh, *head;
 	const unsigned blocksize = 1 << inode->i_blkbits;
 	int nr_underway = 0;
-	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
+	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
+			WRITE_SYNC_PLUG : WRITE);
 
 	BUG_ON(!PageLocked(page));