Message ID | 20101012141455.GA27572@lst.de |
---|---|
State | New, archived |
Headers | show |
On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote: > I still think adding code to every filesystem to optimize for a rather > stupid use case is not a good idea. I dropped out a bit from the > thread in the middle, but what was the real use case for lots of > concurrent fsyncs on the same inode again? The use case I'm looking at is concurrent fsyncs on /different/ inodes, actually. We have _n_ different processes, each writing (and fsyncing) its own separate file on the same filesystem. iirc, ext4_sync_file is called with the inode mutex held, which prevents concurrent fsyncs on the same inode. > And what is the amount of performance you need? If we go back to the > direct submission of REQ_FLUSH request from the earlier flush+fua > setups that were faster or high end storage, would that be enough for > you? > > Below is a patch brining the optimization back. > > WARNING: completely untested! So I hacked up a patch to the block layer that collects measurements of the time delay between blk_start_request and blk_finish_request when a flush command is encountered, and what I noticed was that there's a rather large discrepancy between the delay as observed by the block layer and the delay as observed by ext4. In general, the discrepancy is a nearly 2x increase between what the block layer sees and what ext4 sees, so I'll give Christoph's direct-flush patch (below) a try over the weekend. --D -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 15, 2010 at 04:39:04PM -0700, Darrick J. Wong wrote: > On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote: > > I still think adding code to every filesystem to optimize for a rather > > stupid use case is not a good idea. I dropped out a bit from the > > thread in the middle, but what was the real use case for lots of > > concurrent fsyncs on the same inode again? > > The use case I'm looking at is concurrent fsyncs on /different/ inodes, > actually. We have _n_ different processes, each writing (and fsyncing) its own > separate file on the same filesystem. > > iirc, ext4_sync_file is called with the inode mutex held, which prevents > concurrent fsyncs on the same inode. Indeed. Although we could drop it at least for the cache flush call. We already do this for block devices. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 16, 2010 at 01:40:41AM +0200, Christoph Hellwig wrote: > On Fri, Oct 15, 2010 at 04:39:04PM -0700, Darrick J. Wong wrote: > > On Tue, Oct 12, 2010 at 04:14:55PM +0200, Christoph Hellwig wrote: > > > I still think adding code to every filesystem to optimize for a rather > > > stupid use case is not a good idea. I dropped out a bit from the > > > thread in the middle, but what was the real use case for lots of > > > concurrent fsyncs on the same inode again? > > > > The use case I'm looking at is concurrent fsyncs on /different/ inodes, > > actually. We have _n_ different processes, each writing (and fsyncing) its own > > separate file on the same filesystem. > > > > iirc, ext4_sync_file is called with the inode mutex held, which prevents > > concurrent fsyncs on the same inode. > > Indeed. Although we could drop it at least for the cache flush > call. We already do this for block devices. <nod> Unfortunately, the patch immediately triggers the BUG at drivers/scsi/scsi_lib.c:1064: /* * BLOCK_PC requests may transfer data, in which case they must * a bio attached to them. Or they might contain a SCSI command * that does not transfer data, in which case they may optionally * submit a request without an attached bio. */ if (req->bio) { int ret; BUG_ON(!req->nr_phys_segments); --D -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/block/blk-flush.c =================================================================== --- linux-2.6.orig/block/blk-flush.c 2010-10-12 10:08:43.777004514 -0400 +++ linux-2.6/block/blk-flush.c 2010-10-12 10:10:37.547016093 -0400 @@ -143,6 +143,17 @@ struct request *blk_do_flush(struct requ unsigned skip = 0; /* + * Just issue pure flushes directly. + */ + if (!blk_rq_sectors(rq)) { + if (!do_preflush) { + __blk_end_request_all(rq, 0); + return NULL; + } + return rq; + } + + /* * Special case. If there's data but flush is not necessary, * the request can be issued directly. *