diff mbox

block_write_full_page: switch synchronous writes to use WRITE_SYNC_PLUG

Message ID 20090408080844.GW5178@kernel.dk
State Not Applicable, archived
Headers show

Commit Message

Jens Axboe April 8, 2009, 8:08 a.m. UTC
On Tue, Apr 07 2009, Andrew Morton wrote:
> On Tue, 7 Apr 2009 18:19:33 -0400
> Theodore Tso <tytso@mit.edu> wrote:
> 
> > Now that we have a distinction between WRITE_SYNC and WRITE_SYNC_PLUG,
> > use WRITE_SYNC_PLUG in __block_write_full_page() to avoid unplugging
> > the block device I/O queue between each page that gets flushed out.
> > 
> > The upstream callers of block_write_full_page() which wait for the
> > writes to finish call wait_on_buffer(), wait_on_writeback_range()
> > (which ultimately calls sync_page(), which calls
> > blk_run_backing_dev(), which will unplug the device queue), and so on.
> > 
> 
> <sob>
> 
> > 
> > We should get this applied to avoid any performance regressions
> > resulting from commit a64c8610.
> > 
> >  fs/buffer.c |    3 ++-
> >  1 files changed, 2 insertions(+), 1 deletions(-)
> > 
> > diff --git a/fs/buffer.c b/fs/buffer.c
> > index 977e12a..95b5390 100644
> > --- a/fs/buffer.c
> > +++ b/fs/buffer.c
> > @@ -1646,7 +1646,8 @@ static int __block_write_full_page(struct inode *inode, struct page *page,
> >  	struct buffer_head *bh, *head;
> >  	const unsigned blocksize = 1 << inode->i_blkbits;
> >  	int nr_underway = 0;
> > -	int write_op = (wbc->sync_mode == WB_SYNC_ALL ? WRITE_SYNC : WRITE);
> > +	int write_op = (wbc->sync_mode == WB_SYNC_ALL ?
> > +			WRITE_SYNC_PLUG : WRITE);
> >  
> >  	BUG_ON(!PageLocked(page));
> 
> So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> this change have upon kernel behaviour?

How about something like this. Comments welcome. Should we move this to
a dedicated header file? fs.h is amazingly cluttered as it is.

Comments

Andrew Morton April 8, 2009, 10:34 p.m. UTC | #1
On Wed, 8 Apr 2009 10:08:44 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:

> > So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> > this change have upon kernel behaviour?
> 
> How about something like this. Comments welcome.

It's lovely.

> Should we move this to
> a dedicated header file? fs.h is amazingly cluttered as it is.

Sometime, perhaps.

> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 562d285..6b6597a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -87,6 +87,57 @@ struct inodes_stat_t {
>   */
>  #define FMODE_NOCMTIME		((__force fmode_t)2048)
>  
> +/*
> + * The below are the various read and write types that we support. Some of
> + * them include behavioral modifiers that send information down to the
> + * block layer and IO scheduler. Terminology:
> + *
> + *	The block layer uses device plugging to defer IO a little bit, in
> + *	the hope that we will see more IO very shortly. This increases
> + *	coalescing of adjacent IO and thus reduces the number of IOs we
> + *	have to send to the device. It also allows for better queuing,
> + *	if the IO isn't mergeable. If the caller is going to be waiting
> + *	for the IO, then he must ensure that the device is unplugged so
> + *	that the IO is dispatched to the driver.
> + *
> + *	All IO is handled async in Linux. This is fine for background
> + *	writes, but for reads or writes that someone waits for completion
> + *	on, we want to notify the block layer and IO scheduler so that they
> + *	know about it. That allows them to make better scheduling
> + *	decisions. So when the below references 'sync' and 'async', it
> + *	is referencing this priority hint.
> + *
> + * With that in mind, the available types are:
> + *
> + * READ			A normal read operation. Device will be plugged.
> + * READ_SYNC		A synchronous read. Device is not plugged, caller can
> + *			immediately wait on this read without caring about
> + *			unplugging.
> + * READA		Used for read-ahead operations. Lower priority, and the
> + *			 block layer could (in theory) choose to ignore this
> + *			request if it runs into resource problems.
> + * WRITE		A normal async write. Device will be plugged.
> + * SWRITE		Like WRITE, but a special case for ll_rw_block() that
> + *			tells it to lock the buffer first. Normally a buffer
> + *			must be locked before doing IO.
> + * WRITE_SYNC_PLUG	Synchronous write. Identical to WRITE, but passes down
> + *			the hint that someone will be waiting on this IO
> + *			shortly.

From the text, I'd expect WRITE_SYNC_PLUG to, err, unplug!

> + * WRITE_SYNC		Like WRITE_SYNC_PLUG, but also unplugs the device
> + *			immediately after submission. The write equivalent
> + *			of READ_SYNC.

But this contradicts my expectation.

So what does WRITE_SYNC_PLUG really do dofferent from WRITE?

> + * WRITE_ODIRECT	Special case write for O_DIRECT only.
> + * SWRITE_SYNC
> + * SWRITE_SYNC_PLUG	Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
> + *			See SWRITE.
> + * WRITE_BARRIER	Like WRITE, but tells the block layer that all
> + *			previously submitted writes must be safely on storage
> + *			before this one is started. Also guarantees that when
> + *			this write is complete, it itself is also safely on
> + *			storage. Prevents reordering of writes on both sides
> + *			of this IO.
> + *
> + */
>  #define RW_MASK		1
>  #define RWA_MASK	2
>  #define READ 0
> @@ -102,6 +153,11 @@ struct inodes_stat_t {
>  			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
>  #define SWRITE_SYNC	(SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
>  #define WRITE_BARRIER	(WRITE | (1 << BIO_RW_BARRIER))
> +
> +/*
> + * These aren't really reads or writes, they pass down information about
> + * parts of device that are now unused by the file system.
> + */
>  #define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
>  #define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jens Axboe April 9, 2009, 5:59 p.m. UTC | #2
On Wed, Apr 08 2009, Andrew Morton wrote:
> On Wed, 8 Apr 2009 10:08:44 +0200 Jens Axboe <jens.axboe@oracle.com> wrote:
> 
> > > So how does WRITE_SYNC_PLUG differ from WRITE, and what effect does
> > > this change have upon kernel behaviour?
> > 
> > How about something like this. Comments welcome.
> 
> It's lovely.
> 
> > Should we move this to
> > a dedicated header file? fs.h is amazingly cluttered as it is.
> 
> Sometime, perhaps.
> 
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 562d285..6b6597a 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -87,6 +87,57 @@ struct inodes_stat_t {
> >   */
> >  #define FMODE_NOCMTIME		((__force fmode_t)2048)
> >  
> > +/*
> > + * The below are the various read and write types that we support. Some of
> > + * them include behavioral modifiers that send information down to the
> > + * block layer and IO scheduler. Terminology:
> > + *
> > + *	The block layer uses device plugging to defer IO a little bit, in
> > + *	the hope that we will see more IO very shortly. This increases
> > + *	coalescing of adjacent IO and thus reduces the number of IOs we
> > + *	have to send to the device. It also allows for better queuing,
> > + *	if the IO isn't mergeable. If the caller is going to be waiting
> > + *	for the IO, then he must ensure that the device is unplugged so
> > + *	that the IO is dispatched to the driver.
> > + *
> > + *	All IO is handled async in Linux. This is fine for background
> > + *	writes, but for reads or writes that someone waits for completion
> > + *	on, we want to notify the block layer and IO scheduler so that they
> > + *	know about it. That allows them to make better scheduling
> > + *	decisions. So when the below references 'sync' and 'async', it
> > + *	is referencing this priority hint.
> > + *
> > + * With that in mind, the available types are:
> > + *
> > + * READ			A normal read operation. Device will be plugged.
> > + * READ_SYNC		A synchronous read. Device is not plugged, caller can
> > + *			immediately wait on this read without caring about
> > + *			unplugging.
> > + * READA		Used for read-ahead operations. Lower priority, and the
> > + *			 block layer could (in theory) choose to ignore this
> > + *			request if it runs into resource problems.
> > + * WRITE		A normal async write. Device will be plugged.
> > + * SWRITE		Like WRITE, but a special case for ll_rw_block() that
> > + *			tells it to lock the buffer first. Normally a buffer
> > + *			must be locked before doing IO.
> > + * WRITE_SYNC_PLUG	Synchronous write. Identical to WRITE, but passes down
> > + *			the hint that someone will be waiting on this IO
> > + *			shortly.
> 
> From the text, I'd expect WRITE_SYNC_PLUG to, err, unplug!

You are still mixing up the sync hint and unplugging. I'll expand it a
bit, I guess.

> > + * WRITE_SYNC		Like WRITE_SYNC_PLUG, but also unplugs the device
> > + *			immediately after submission. The write equivalent
> > + *			of READ_SYNC.
> 
> But this contradicts my expectation.
> 
> So what does WRITE_SYNC_PLUG really do dofferent from WRITE?

It tells the IO scheduler that this write is really sync, not async.
The key difference between the two is as written - the sync one will
have someone waiting for its completion. What the IO scheduler does with
the flag is its own business. In reality it means that the write is
expedited, whereas async writes are done somewhat more lazily.

> > + * WRITE_ODIRECT	Special case write for O_DIRECT only.
> > + * SWRITE_SYNC
> > + * SWRITE_SYNC_PLUG	Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
> > + *			See SWRITE.
> > + * WRITE_BARRIER	Like WRITE, but tells the block layer that all
> > + *			previously submitted writes must be safely on storage
> > + *			before this one is started. Also guarantees that when
> > + *			this write is complete, it itself is also safely on
> > + *			storage. Prevents reordering of writes on both sides
> > + *			of this IO.
> > + *
> > + */
> >  #define RW_MASK		1
> >  #define RWA_MASK	2
> >  #define READ 0
> > @@ -102,6 +153,11 @@ struct inodes_stat_t {
> >  			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
> >  #define SWRITE_SYNC	(SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
> >  #define WRITE_BARRIER	(WRITE | (1 << BIO_RW_BARRIER))
> > +
> > +/*
> > + * These aren't really reads or writes, they pass down information about
> > + * parts of device that are now unused by the file system.
> > + */
> >  #define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
> >  #define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))
>
diff mbox

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 562d285..6b6597a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,6 +87,57 @@  struct inodes_stat_t {
  */
 #define FMODE_NOCMTIME		((__force fmode_t)2048)
 
+/*
+ * The below are the various read and write types that we support. Some of
+ * them include behavioral modifiers that send information down to the
+ * block layer and IO scheduler. Terminology:
+ *
+ *	The block layer uses device plugging to defer IO a little bit, in
+ *	the hope that we will see more IO very shortly. This increases
+ *	coalescing of adjacent IO and thus reduces the number of IOs we
+ *	have to send to the device. It also allows for better queuing,
+ *	if the IO isn't mergeable. If the caller is going to be waiting
+ *	for the IO, then he must ensure that the device is unplugged so
+ *	that the IO is dispatched to the driver.
+ *
+ *	All IO is handled async in Linux. This is fine for background
+ *	writes, but for reads or writes that someone waits for completion
+ *	on, we want to notify the block layer and IO scheduler so that they
+ *	know about it. That allows them to make better scheduling
+ *	decisions. So when the below references 'sync' and 'async', it
+ *	is referencing this priority hint.
+ *
+ * With that in mind, the available types are:
+ *
+ * READ			A normal read operation. Device will be plugged.
+ * READ_SYNC		A synchronous read. Device is not plugged, caller can
+ *			immediately wait on this read without caring about
+ *			unplugging.
+ * READA		Used for read-ahead operations. Lower priority, and the
+ *			 block layer could (in theory) choose to ignore this
+ *			request if it runs into resource problems.
+ * WRITE		A normal async write. Device will be plugged.
+ * SWRITE		Like WRITE, but a special case for ll_rw_block() that
+ *			tells it to lock the buffer first. Normally a buffer
+ *			must be locked before doing IO.
+ * WRITE_SYNC_PLUG	Synchronous write. Identical to WRITE, but passes down
+ *			the hint that someone will be waiting on this IO
+ *			shortly.
+ * WRITE_SYNC		Like WRITE_SYNC_PLUG, but also unplugs the device
+ *			immediately after submission. The write equivalent
+ *			of READ_SYNC.
+ * WRITE_ODIRECT	Special case write for O_DIRECT only.
+ * SWRITE_SYNC
+ * SWRITE_SYNC_PLUG	Like WRITE_SYNC/WRITE_SYNC_PLUG, but locks the buffer.
+ *			See SWRITE.
+ * WRITE_BARRIER	Like WRITE, but tells the block layer that all
+ *			previously submitted writes must be safely on storage
+ *			before this one is started. Also guarantees that when
+ *			this write is complete, it itself is also safely on
+ *			storage. Prevents reordering of writes on both sides
+ *			of this IO.
+ *
+ */
 #define RW_MASK		1
 #define RWA_MASK	2
 #define READ 0
@@ -102,6 +153,11 @@  struct inodes_stat_t {
 			(SWRITE | (1 << BIO_RW_SYNCIO) | (1 << BIO_RW_NOIDLE))
 #define SWRITE_SYNC	(SWRITE_SYNC_PLUG | (1 << BIO_RW_UNPLUG))
 #define WRITE_BARRIER	(WRITE | (1 << BIO_RW_BARRIER))
+
+/*
+ * These aren't really reads or writes, they pass down information about
+ * parts of device that are now unused by the file system.
+ */
 #define DISCARD_NOBARRIER (1 << BIO_RW_DISCARD)
 #define DISCARD_BARRIER ((1 << BIO_RW_DISCARD) | (1 << BIO_RW_BARRIER))