Patchwork ext4: Use sing thread to perform DIO unwritten convertion

login
register
mail settings
Submitter Mingming Cao
Date March 3, 2011, 7:29 p.m.
Message ID <1299180594.2826.6.camel@mingming-laptop>
Download mbox | patch
Permalink /patch/85328/
State Accepted
Headers show

Comments

Mingming Cao - March 3, 2011, 7:29 p.m.
While running ext4 testing on multiple core, we found there are per cpu ext4-dio-unwritten threads processing
conversion from unwritten extents to written for IOs completed from async direct IO patch.
Per filesystem is enough, we don't need per cpu threads to work on conversion.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
---
 fs/ext4/super.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Theodore Ts'o - March 5, 2011, 4:54 p.m.
On Thu, Mar 03, 2011 at 11:29:54AM -0800, Mingming Cao wrote:
> While running ext4 testing on multiple core, we found there are per cpu ext4-dio-unwritten threads processing
> conversion from unwritten extents to written for IOs completed from async direct IO patch.
> Per filesystem is enough, we don't need per cpu threads to work on conversion.
> 
> Signed-off-by: Mingming Cao <cmm@us.ibm.com>

Thanks, added to the ext4 patch queue.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - March 5, 2011, 5:46 p.m.
On Thu, Mar 03, 2011 at 11:29:54AM -0800, Mingming Cao wrote:
> While running ext4 testing on multiple core, we found there are per
> cpu ext4-dio-unwritten threads processing conversion from unwritten
> extents to written for IOs completed from async direct IO patch.
> Per filesystem is enough, we don't need per cpu threads to work on
> conversion.
> 
> Signed-off-by: Mingming Cao <cmm@us.ibm.com>

Eric, would you be able to do a very quick sanity check on your
48-core machine?  I can definitely see how having a huge number of
threads per file system could be problematic, especially on a system
with 32 or 64 ext4 file systems.  I'm curious though if we'll end up
taking a performance hit on direct I/O workloads.

If I remember correctly we currently have large file create with DIO
turned off, right?  Would it be possible to do a large file create
with DIO enabled, and do a quick run both with and without this patch?

In the future it would also be interesting to see how we are doing
versus other file systems using a DIO workload.  This is a probably
another area where I suspect some lockstat and oprofile runs may give
us opportunities for further optimization.

       	     	  	  	      - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Whitney - March 7, 2011, 3:47 p.m.
On 03/05/2011 12:46 PM, Ted Ts'o wrote:
> On Thu, Mar 03, 2011 at 11:29:54AM -0800, Mingming Cao wrote:
>> While running ext4 testing on multiple core, we found there are per
>> cpu ext4-dio-unwritten threads processing conversion from unwritten
>> extents to written for IOs completed from async direct IO patch.
>> Per filesystem is enough, we don't need per cpu threads to work on
>> conversion.
>>
>> Signed-off-by: Mingming Cao<cmm@us.ibm.com>
>
> Eric, would you be able to do a very quick sanity check on your
> 48-core machine?  I can definitely see how having a huge number of
> threads per file system could be problematic, especially on a system
> with 32 or 64 ext4 file systems.  I'm curious though if we'll end up
> taking a performance hit on direct I/O workloads.
>

Hi Ted:

Sure, I can do that - I'll queue it up once I'm done with the "for .39" 
patch measurements.

> If I remember correctly we currently have large file create with DIO
> turned off, right?  Would it be possible to do a large file create
> with DIO enabled, and do a quick run both with and without this patch?

That's right, we're not measuring DIO right now.  I think I've got 
enough hardware to run a filesystem per core (or more), and I think it 
should be straightforward to write a modified ffsb profile to run (say) 
48 filesystems in parallel.

>
> In the future it would also be interesting to see how we are doing
> versus other file systems using a DIO workload.  This is a probably
> another area where I suspect some lockstat and oprofile runs may give
> us opportunities for further optimization.

Yes - as discussed at Plumber's.  I'll put that on the list as well. 
With luck, there should be some time towards the end of the .39 merge 
window.

Eric

>
>         	     	  	  	      - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mingming Cao - March 8, 2011, 1:40 a.m.
On Sat, 2011-03-05 at 12:46 -0500, Ted Ts'o wrote: 
> On Thu, Mar 03, 2011 at 11:29:54AM -0800, Mingming Cao wrote:
> > While running ext4 testing on multiple core, we found there are per
> > cpu ext4-dio-unwritten threads processing conversion from unwritten
> > extents to written for IOs completed from async direct IO patch.
> > Per filesystem is enough, we don't need per cpu threads to work on
> > conversion.
> > 
> > Signed-off-by: Mingming Cao <cmm@us.ibm.com>
> 
> Eric, would you be able to do a very quick sanity check on your
> 48-core machine?  I can definitely see how having a huge number of
> threads per file system could be problematic, especially on a system
> with 32 or 64 ext4 file systems.  I'm curious though if we'll end up
> taking a performance hit on direct I/O workloads.
> 
> If I remember correctly we currently have large file create with DIO
> turned off, right?  Would it be possible to do a large file create
> with DIO enabled, and do a quick run both with and without this patch?
> 
The background thread performs the conversion when IOs from async dio
writing to holes/preallocated is completed. So would need to setup
fallocated files and running async and direct IO would possible to
exercise any potential scalability issue with the background dio
conversion thread... 

I took a look at FFSB, it doesn't support fallocate and async IO yet.
But fio does support aio and fallocate. This is a simple fio profile I
use for test file being setup by fallocate() and run random aio dio over
it. See it is useful for Eric to give it a try or a reference on his 48
core.

examples$ cat aio-setup 
; Random read/write to fallocat files with aio dio
[global]
ioengine=libaio
direct=1
rw=randrw
bs=4k
size=2m
filesize=1024m
fallocate=1
directory=/tmp

[file1]
iodepth=4

> In the future it would also be interesting to see how we are doing
> versus other file systems using a DIO workload.  This is a probably
> another area where I suspect some lockstat and oprofile runs may give
> us opportunities for further optimization.
> 
>        	     	  	  	      - Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f6a318f..c76a6a5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3509,7 +3509,7 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	percpu_counter_set(&sbi->s_dirtyblocks_counter, 0);
 
 no_journal:
-	EXT4_SB(sb)->dio_unwritten_wq = create_workqueue("ext4-dio-unwritten");
+	EXT4_SB(sb)->dio_unwritten_wq = create_singlethread_workqueue("ext4-dio-unwritten");
 	if (!EXT4_SB(sb)->dio_unwritten_wq) {
 		printk(KERN_ERR "EXT4-fs: failed to create DIO workqueue\n");
 		goto failed_mount_wq;