diff mbox

buffered writeback torture program

Message ID 20110421083258.GA26784@infradead.org
State New, archived
Headers show

Commit Message

Christoph Hellwig April 21, 2011, 8:32 a.m. UTC
On Wed, Apr 20, 2011 at 02:23:29PM -0400, Chris Mason wrote:
> # fsync-tester
> setting up random write file
> done setting up random write file
> starting fsync run
> starting random io!
> write time 0.0009s fsync time: 2.0142s
> write time 128.9305s fsync time: 2.6046s
> run done 2 fsyncs total, killing random writer
> 
> In this case the 128s spent in write was on a single 4K overwrite on a
> 4K file.

I can't really reproduce this locally on XFS:

setting up random write file
done setting up random write file
starting fsync run
starting random io!
write time: 0.0023s fsync time: 0.5949s
write time: 0.0605s fsync time: 0.2339s
write time: 0.0018s fsync time: 0.0179s
write time: 0.0020s fsync time: 0.0201s
write time: 0.0019s fsync time: 0.0176s
write time: 0.0018s fsync time: 0.0209s
write time: 0.0025s fsync time: 0.0197s
write time: 0.0013s fsync time: 0.0183s
write time: 0.0013s fsync time: 0.0217s
write time: 0.0016s fsync time: 0.0158s
write time: 0.0022s fsync time: 0.0240s
write time: 0.0024s fsync time: 0.0190s
write time: 0.0017s fsync time: 0.0205s
write time: 0.0030s fsync time: 0.0688s
write time: 0.0045s fsync time: 0.0193s
write time: 0.0022s fsync time: 0.0356s

But given that you are able to reproduce it, does the following patch
help your latencies?  Currently XFS actually does stop I/O when
nr_to_write reaches zero, but only for non-blocking I/O. This behaviour
was introduced in commit efceab1d563153a2b1a6e7d35376241a48126989

	"xfs: handle negative wbc->nr_to_write during sync writeback"

and works around issues in the generic writeback code.


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chris Mason April 21, 2011, 5:34 p.m. UTC | #1
Excerpts from Christoph Hellwig's message of 2011-04-21 04:32:58 -0400:
> On Wed, Apr 20, 2011 at 02:23:29PM -0400, Chris Mason wrote:
> But given that you are able to reproduce it, does the following patch
> help your latencies?  Currently XFS actually does stop I/O when
> nr_to_write reaches zero, but only for non-blocking I/O. This behaviour
> was introduced in commit efceab1d563153a2b1a6e7d35376241a48126989
> 
>     "xfs: handle negative wbc->nr_to_write during sync writeback"
> 
> and works around issues in the generic writeback code.
> 
> 
> Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
> ===================================================================
> --- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c    2011-04-21 10:20:48.303550404 +0200
> +++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c    2011-04-21 10:20:58.203496773 +0200
> @@ -765,8 +765,7 @@ xfs_convert_page(
>          SetPageUptodate(page);
>  
>      if (count) {
> -        if (--wbc->nr_to_write <= 0 &&
> -            wbc->sync_mode == WB_SYNC_NONE)
> +        if (--wbc->nr_to_write <= 0)
>              done = 1;
>      }
>      xfs_start_page_writeback(page, !page_dirty, count);

Sorry, this doesn't do it.  I think that given what a strange special
case this is, we're best off waiting for the IO-less throttling, and
maybe changing the code in xfs/ext4 to be a little more seek aware.  Or
maybe not, it has to get written eventually either way.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 21, 2011, 5:41 p.m. UTC | #2
On Thu, Apr 21, 2011 at 01:34:44PM -0400, Chris Mason wrote:
> Sorry, this doesn't do it.  I think that given what a strange special
> case this is, we're best off waiting for the IO-less throttling, and
> maybe changing the code in xfs/ext4 to be a little more seek aware.  Or
> maybe not, it has to get written eventually either way.

I'm not sure what you mean with seek aware.  XFS only clusters
additional pages that are in the same extent, and in fact only does
so for asynchrononous writeback.  Not sure how this should be more
seek aware.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger April 21, 2011, 5:59 p.m. UTC | #3
On 2011-04-21, at 11:41 AM, Christoph Hellwig wrote:
> On Thu, Apr 21, 2011 at 01:34:44PM -0400, Chris Mason wrote:
>> Sorry, this doesn't do it.  I think that given what a strange special
>> case this is, we're best off waiting for the IO-less throttling, and
>> maybe changing the code in xfs/ext4 to be a little more seek aware.  Or
>> maybe not, it has to get written eventually either way.
> 
> I'm not sure what you mean with seek aware.  XFS only clusters
> additional pages that are in the same extent, and in fact only does
> so for asynchrononous writeback.  Not sure how this should be more
> seek aware.

But doesn't XFS have potentially very large extents, especially in the case of files that were fallocate()'d or linearly written?  If there is a single 8GB extent, and then random writes within that extent (seems very database like) grouping the all of the writes in the extent doesn't seem so great.


Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason April 21, 2011, 6 p.m. UTC | #4
Excerpts from Christoph Hellwig's message of 2011-04-21 13:41:21 -0400:
> On Thu, Apr 21, 2011 at 01:34:44PM -0400, Chris Mason wrote:
> > Sorry, this doesn't do it.  I think that given what a strange special
> > case this is, we're best off waiting for the IO-less throttling, and
> > maybe changing the code in xfs/ext4 to be a little more seek aware.  Or
> > maybe not, it has to get written eventually either way.
> 
> I'm not sure what you mean with seek aware.  XFS only clusters
> additional pages that are in the same extent, and in fact only does
> so for asynchrononous writeback.  Not sure how this should be more
> seek aware.
> 

How big are extents?  fiemap tells me the file has a single 8GB extent.

There's a little room for seeking inside there.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 21, 2011, 6:02 p.m. UTC | #5
On Thu, Apr 21, 2011 at 11:59:37AM -0600, Andreas Dilger wrote:
> But doesn't XFS have potentially very large extents, especially in the case of files that were fallocate()'d or linearly written?  If there is a single 8GB extent, and then random writes within that extent (seems very database like) grouping the all of the writes in the extent doesn't seem so great.

It doesn't cluster any writes in an extent.  It only writes out
additional dirty pages directly following that one we were asked to
write out.  As soon as we hit a non-dirty page we give up.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason April 21, 2011, 6:02 p.m. UTC | #6
Excerpts from Christoph Hellwig's message of 2011-04-21 14:02:13 -0400:
> On Thu, Apr 21, 2011 at 11:59:37AM -0600, Andreas Dilger wrote:
> > But doesn't XFS have potentially very large extents, especially in the case of files that were fallocate()'d or linearly written?  If there is a single 8GB extent, and then random writes within that extent (seems very database like) grouping the all of the writes in the extent doesn't seem so great.
> 
> It doesn't cluster any writes in an extent.  It only writes out
> additional dirty pages directly following that one we were asked to
> write out.  As soon as we hit a non-dirty page we give up.

For this program, they are almost all dirty pages.

I tried patching it to give up if we seek but it is still pretty slow.
There's something else going on in addition to the xfs clustering being
too aggressive.

I'll try dropping the clustering completley.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig April 21, 2011, 6:08 p.m. UTC | #7
On Thu, Apr 21, 2011 at 02:02:43PM -0400, Chris Mason wrote:
> For this program, they are almost all dirty pages.
> 
> I tried patching it to give up if we seek but it is still pretty slow.
> There's something else going on in addition to the xfs clustering being
> too aggressive.

I'm not sure where you this beeing to agressive from - it's doing
exactly the same amount of I/O as a filesystem writing out a single
page from ->writepage or using write_cache_pages (either directly
or as a copy) as ->writepages.  The only thing special compared to
the no ->writepages case is that it's submitting a large I/O
from the first ->writepage call.

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason April 21, 2011, 6:29 p.m. UTC | #8
Excerpts from Christoph Hellwig's message of 2011-04-21 14:08:05 -0400:
> On Thu, Apr 21, 2011 at 02:02:43PM -0400, Chris Mason wrote:
> > For this program, they are almost all dirty pages.
> > 
> > I tried patching it to give up if we seek but it is still pretty slow.
> > There's something else going on in addition to the xfs clustering being
> > too aggressive.
> 
> I'm not sure where you this beeing to agressive from - it's doing
> exactly the same amount of I/O as a filesystem writing out a single
> page from ->writepage or using write_cache_pages (either directly
> or as a copy) as ->writepages.  The only thing special compared to
> the no ->writepages case is that it's submitting a large I/O
> from the first ->writepage call.
> 

Ok, I see what you mean.  The clustering code stops once it hits
nr_to_write, I missed that.  So we shouldn't be doing more than a single
writepages call.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andreas Dilger April 21, 2011, 6:43 p.m. UTC | #9
On 2011-04-21, at 12:29 PM, Chris Mason wrote:
> Excerpts from Christoph Hellwig's message of 2011-04-21 14:08:05 -0400:
>> On Thu, Apr 21, 2011 at 02:02:43PM -0400, Chris Mason wrote:
>>> For this program, they are almost all dirty pages.
>>> 
>>> I tried patching it to give up if we seek but it is still pretty slow.
>>> There's something else going on in addition to the xfs clustering being
>>> too aggressive.
>> 
>> I'm not sure where you this beeing to agressive from - it's doing
>> exactly the same amount of I/O as a filesystem writing out a single
>> page from ->writepage or using write_cache_pages (either directly
>> or as a copy) as ->writepages.  The only thing special compared to
>> the no ->writepages case is that it's submitting a large I/O
>> from the first ->writepage call.
> 
> Ok, I see what you mean.  The clustering code stops once it hits
> nr_to_write, I missed that.  So we shouldn't be doing more than a single
> writepages call.

I wonder if it makes sense to disentangle the two processes state in the kernel, by forking the fsync thread before any writes are done.  That would avoid penalizing the random writer in the VM/VFS, but means there needs to be some coordination between the threads (e.g. polling for a sentinel file written when the sequential phase is complete).

Cheers, Andreas





--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chris Mason April 21, 2011, 6:47 p.m. UTC | #10
Excerpts from Andreas Dilger's message of 2011-04-21 14:43:47 -0400:
> On 2011-04-21, at 12:29 PM, Chris Mason wrote:
> > Excerpts from Christoph Hellwig's message of 2011-04-21 14:08:05 -0400:
> >> On Thu, Apr 21, 2011 at 02:02:43PM -0400, Chris Mason wrote:
> >>> For this program, they are almost all dirty pages.
> >>> 
> >>> I tried patching it to give up if we seek but it is still pretty slow.
> >>> There's something else going on in addition to the xfs clustering being
> >>> too aggressive.
> >> 
> >> I'm not sure where you this beeing to agressive from - it's doing
> >> exactly the same amount of I/O as a filesystem writing out a single
> >> page from ->writepage or using write_cache_pages (either directly
> >> or as a copy) as ->writepages.  The only thing special compared to
> >> the no ->writepages case is that it's submitting a large I/O
> >> from the first ->writepage call.
> > 
> > Ok, I see what you mean.  The clustering code stops once it hits
> > nr_to_write, I missed that.  So we shouldn't be doing more than a single
> > writepages call.
> 
> I wonder if it makes sense to disentangle the two processes state in the kernel, by forking the fsync thread before any writes are done.  That would avoid penalizing the random writer in the VM/VFS, but means there needs to be some coordination between the threads (e.g. polling for a sentinel file written when the sequential phase is complete).
> 

The test itself may not be realistic, but I actually think its a feature
that we end up stuck doing the random buffered ios.  Somehow it is much
slower than it should be on my box on xfs.  This isn't universal, other
machines seem to be working fine.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

Index: linux-2.6/fs/xfs/linux-2.6/xfs_aops.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_aops.c	2011-04-21 10:20:48.303550404 +0200
+++ linux-2.6/fs/xfs/linux-2.6/xfs_aops.c	2011-04-21 10:20:58.203496773 +0200
@@ -765,8 +765,7 @@  xfs_convert_page(
 		SetPageUptodate(page);
 
 	if (count) {
-		if (--wbc->nr_to_write <= 0 &&
-		    wbc->sync_mode == WB_SYNC_NONE)
+		if (--wbc->nr_to_write <= 0)
 			done = 1;
 	}
 	xfs_start_page_writeback(page, !page_dirty, count);