Patchwork Performance testing of various barrier reduction patches [was: Re: [RFC v4] ext4: Coordinate fsync requests]

login
register
mail settings
Submitter Christoph Hellwig
Date Oct. 12, 2010, 2:14 p.m.
Message ID <20101012141455.GA27572@lst.de>
Download mbox | patch
Permalink /patch/67592/
State New
Headers show

Comments

Christoph Hellwig - Oct. 12, 2010, 2:14 p.m.
I still think adding code to every filesystem to optimize for a rather
stupid use case is not a good idea.  I dropped out a bit from the
thread in the middle, but what was the real use case for lots of
concurrent fsyncs on the same inode again?

And what is the amount of performance you need?  If we go back to the
direct submission of REQ_FLUSH request from the earlier flush+fua
setups that were faster or high end storage, would that be enough for
you?

Below is a patch brining the optimization back.

	WARNING: completely untested!


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Oct. 15, 2010, 11:39 p.m.
On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote:
> I still think adding code to every filesystem to optimize for a rather
> stupid use case is not a good idea.  I dropped out a bit from the
> thread in the middle, but what was the real use case for lots of
> concurrent fsyncs on the same inode again?

The use case I'm looking at is concurrent fsyncs on /different/ inodes,
actually.  We have _n_ different processes, each writing (and fsyncing) its own
separate file on the same filesystem.

iirc, ext4_sync_file is called with the inode mutex held, which prevents
concurrent fsyncs on the same inode.

> And what is the amount of performance you need?  If we go back to the
> direct submission of REQ_FLUSH request from the earlier flush+fua
> setups that were faster or high end storage, would that be enough for
> you?
> 
> Below is a patch brining the optimization back.
> 
> 	WARNING: completely untested!

So I hacked up a patch to the block layer that collects measurements of the
time delay between blk_start_request and blk_finish_request when a flush
command is encountered, and what I noticed was that there's a rather large
discrepancy between the delay as observed by the block layer and the delay as
observed by ext4.  In general, the discrepancy is a nearly 2x increase between
what the block layer sees and what ext4 sees, so I'll give Christoph's
direct-flush patch (below) a try over the weekend.

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig - Oct. 15, 2010, 11:40 p.m.
On Fri, Oct 15, 2010 at 04:39:04PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote:
> > I still think adding code to every filesystem to optimize for a rather
> > stupid use case is not a good idea.  I dropped out a bit from the
> > thread in the middle, but what was the real use case for lots of
> > concurrent fsyncs on the same inode again?
> 
> The use case I'm looking at is concurrent fsyncs on /different/ inodes,
> actually.  We have _n_ different processes, each writing (and fsyncing) its own
> separate file on the same filesystem.
> 
> iirc, ext4_sync_file is called with the inode mutex held, which prevents
> concurrent fsyncs on the same inode.

Indeed.  Although we could drop it at least for the cache flush
call.  We already do this for block devices.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Darrick J. Wong - Oct. 16, 2010, 12:02 a.m.
On Sat, Oct 16, 2010 at 01:40:41AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 15, 2010 at 04:39:04PM -0700, Darrick J. Wong wrote:
> > On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote:
> > > I still think adding code to every filesystem to optimize for a rather
> > > stupid use case is not a good idea.  I dropped out a bit from the
> > > thread in the middle, but what was the real use case for lots of
> > > concurrent fsyncs on the same inode again?
> > 
> > The use case I'm looking at is concurrent fsyncs on /different/ inodes,
> > actually.  We have _n_ different processes, each writing (and fsyncing) its own
> > separate file on the same filesystem.
> > 
> > iirc, ext4_sync_file is called with the inode mutex held, which prevents
> > concurrent fsyncs on the same inode.
> 
> Indeed.  Although we could drop it at least for the cache flush
> call.  We already do this for block devices.

<nod>

Unfortunately, the patch immediately triggers the BUG at
drivers/scsi/scsi_lib.c:1064:
 /*
  * BLOCK_PC requests may transfer data, in which case they must
  * a bio attached to them.  Or they might contain a SCSI command
  * that does not transfer data, in which case they may optionally
  * submit a request without an attached bio.
  */
 if (req->bio) {
         int ret;

         BUG_ON(!req->nr_phys_segments);

--D
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Index: linux-2.6/block/blk-flush.c
===================================================================
--- linux-2.6.orig/block/blk-flush.c	2010-10-12 10:08:43.777004514 -0400
+++ linux-2.6/block/blk-flush.c	2010-10-12 10:10:37.547016093 -0400
@@ -143,6 +143,17 @@  struct request *blk_do_flush(struct requ
 	unsigned skip = 0;
 
 	/*
+	 * Just issue pure flushes directly.
+	 */
+	if (!blk_rq_sectors(rq)) {
+		if (!do_preflush) {
+			__blk_end_request_all(rq, 0);
+			return NULL;
+		}
+		return rq;
+	}
+
+	/*
 	 * Special case.  If there's data but flush is not necessary,
 	 * the request can be issued directly.
 	 *