Message ID | 20120816152513.GA31346@thunk.org |
---|---|
State | Superseded, archived |
Headers | show |
This would probably be much more readable code if the 'i=0' init was before path=kzalloc. On Thu, Aug 16, 2012 at 8:25 AM, Theodore Ts'o <tytso@mit.edu> wrote: > On Thu, Aug 16, 2012 at 07:10:51PM +0800, Fengguang Wu wrote: >> >> Here is the dmesg. BTW, it seems 3.5.0 don't have this issue. > > Fengguang, > > It sounds like you have a (at least fairly) reliable reproduction for > this problem? Is it something you can share? It would be good to get > this into our test suites, since it was _not_ something that was > caught by xfstests, apparently. > > Can you see if this patch addresses it? (The first two patch hunks > are the same debugging additions I had posted before.) > > It looks like the responsible commit is 968dee7722: "ext4: fix hole > punch failure when depth is greater than 0". I had thought this patch > was low risk if you weren't using the new punch ioctl, but it turns > out it did make a critical change in the non-punch (i.e., truncate) > code path, which is what the addition of "i = 0;" in the patch below > addresses. > > Regards, > > - Ted > > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c > index 769151d..fa829dc 100644 > --- a/fs/ext4/extents.c > +++ b/fs/ext4/extents.c > @@ -2432,6 +2432,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode, > > /* the header must be checked already in ext4_ext_remove_space() */ > ext_debug("truncate since %u in leaf to %u\n", start, end); > + if (!path[depth].p_hdr && !path[depth].p_bh) { > + EXT4_ERROR_INODE(inode, "depth %d", depth); > + BUG_ON(1); > + } > if (!path[depth].p_hdr) > path[depth].p_hdr = ext_block_hdr(path[depth].p_bh); > eh = path[depth].p_hdr; > @@ -2730,6 +2734,10 @@ cont: > /* this is index block */ > if (!path[i].p_hdr) { > ext_debug("initialize header\n"); > + if (!path[i].p_hdr && !path[i].p_bh) { > + EXT4_ERROR_INODE(inode, "i=%d", i); > + BUG_ON(1); > + } > path[i].p_hdr = ext_block_hdr(path[i].p_bh); > } > > @@ -2828,6 +2836,7 @@ out: > kfree(path); > if (err == -EAGAIN) { > path = NULL; > + i = 0; > goto again; > } > ext4_journal_stop(handle);
On Thu, Aug 16, 2012 at 01:21:12PM -0700, Maciej Żenczykowski wrote: > This would probably be much more readable code if the 'i=0' init was > before path=kzalloc. Good point, I agree. I'll move the initialization so i gets initialized in both branches of the if statement. Maciej, you weren't able to reliably repro the crash were you? I'm pretty sure this should fix the crash, but it would be really great to confirm things. I suspect creating a file system with a really small journal may make it easier to reproduce, but I haven't had time to try create a reliable repro for this bug yet. Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Maciej, you weren't able to reliably repro the crash were you? I'm > pretty sure this should fix the crash, but it would be really great to > confirm things. > > I suspect creating a file system with a really small journal may make > it easier to reproduce, but I haven't had time to try create a > reliable repro for this bug yet. This happened twice to me while moving data off of a ~1TB ext4 partition. The data portion was on a stripe raid across 2 ~500GB drives, the journal was on a relatively large partition (500MB?) on an SSD. (crypto and lvm were also involved). I've since emptied the partition and deleted even the raid array. Both times it happened during rm, first time rm -rf of a directory tree, second time during rm of a 250GB disk image generated by dd (from a notebook drive). Both rm's were manually run by me from a shell command line, and there was pretty much nothing else happening on the machine at the time. I'm not aware of there having been anything interesting (like: holes/punch/sparseness, much r/w activity in the middle of files, etc) on this filesystem, it was pretty much just a write-once data backup that I had copied elsewhere and was deleting. The 250GB disk image was definitely just a sequentially written disk dump, and I think the same thing holds true for the contents of the wiped directory tree (although in many much smaller files). I know i=1 in both cases (and dissasembly pointed out the location where the above debug patch is BUGing), but I don't think it's possible to figure out what inode # it crashed on. Perhaps just untarring a bunch of kernels onto an empty partition, filling it up, then deleting those kernels should be sufficient to repro this (untried). Perhaps something like: create 1TB filesystem untar a thousand kernel source trees on to it create 20GB files of junk until it is full rm -rf / - Maciej -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 16, 2012 at 02:40:53PM -0700, Maciej Żenczykowski wrote: > > This happened twice to me while moving data off of a ~1TB ext4 partition. > The data portion was on a stripe raid across 2 ~500GB drives, the > journal was on a relatively large partition (500MB?) on an SSD. > (crypto and lvm were also involved). > ... > Perhaps just untarring a bunch of kernels onto an empty partition, > filling it up, then deleting those kernels should be sufficient to > repro this (untried). Thanks, that's really helpful. I can say that using a 4MB journal and running fsstress is _not_ enough to trigger the problem. Looking more closely at what might be needed to trigger the bug, 'i' gets left uninitialized when err is set to -EAGAIN, and that happens when ext4_ext_truncate_extend_restart() is unable to extend the journal transaction. But that also means we need to be deleting a sufficiently large enough file that the blocks span multiple block groups (which is why we need to extend the transaction, so we can modify more bitmap blocks) at the point when there is no more room in the journal, so we have to close the current transaction, and then retry it again with a new journal handle in a new transaction. So that implies that untaring a bunch of kernels probably won't be sufficient, since the files will be too small. What we probably will need to do is to fill a large file system with lots of large files, use a small journal, and then try to do an rm -rf. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Thanks, that's really helpful. I can say that using a 4MB journal and > running fsstress is _not_ enough to trigger the problem. > > Looking more closely at what might be needed to trigger the bug, 'i' > gets left uninitialized when err is set to -EAGAIN, and that happens > when ext4_ext_truncate_extend_restart() is unable to extend the > journal transaction. But that also means we need to be deleting a > sufficiently large enough file that the blocks span multiple block > groups (which is why we need to extend the transaction, so we can > modify more bitmap blocks) at the point when there is no more room in > the journal, so we have to close the current transaction, and then > retry it again with a new journal handle in a new transaction. > > So that implies that untaring a bunch of kernels probably won't be > sufficient, since the files will be too small. What we probably will > need to do is to fill a large file system with lots of large files, > use a small journal, and then try to do an rm -rf. > > - Ted My suggestion of untarring kernels was to cause the big multi gigabyte files created later on to be massively fragmented, and thus have tons of extents and a relatively deep extent tree. But maybe that's not needed to trigger this bug, if as you say, it is caused by the absolute number of disks blocks being freed and not by the size/depth/complexity of the extent tree. My knowledge of the internals of ext4 is pretty much non-existent. ;-) In this case I'm just an end user. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Aug 16, 2012 at 11:25:13AM -0400, Theodore Ts'o wrote: > On Thu, Aug 16, 2012 at 07:10:51PM +0800, Fengguang Wu wrote: > > > > Here is the dmesg. BTW, it seems 3.5.0 don't have this issue. > > Fengguang, > > It sounds like you have a (at least fairly) reliable reproduction for > this problem? Is it something you can share? It would be good to get Right, it can be easily reproduced here. I'm running these writeback performance tests: https://github.com/fengguang/writeback-tests Which is basically doing N parallel dd writes to JBOD/RAID arrays on various filesystems. It seems that the RAID test can reliably trigger the problem. > this into our test suites, since it was _not_ something that was > caught by xfstests, apparently. > > Can you see if this patch addresses it? (The first two patch hunks > are the same debugging additions I had posted before.) > > It looks like the responsible commit is 968dee7722: "ext4: fix hole > punch failure when depth is greater than 0". I had thought this patch > was low risk if you weren't using the new punch ioctl, but it turns > out it did make a critical change in the non-punch (i.e., truncate) > code path, which is what the addition of "i = 0;" in the patch below > addresses. Yes, I'm sure the patch fixed the bug. With the fix, the writeback tests have run flawlessly for a dozen hours without any problem. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ted, I find ext4 write performance dropped by 3.3% on average in the 3.6-rc1 merge window. xfs and btrfs are fine. Two machines are tested. The performance regression happens in the lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does not see regression, which is equipped with HDD drives. I'll continue to repeat the tests and report variations. The below 3.6.0-rc1+ kernel is 3.6.0-rc1 plus the NULL deference fix. wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 214.25 +5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 13286.46 -3.3% 12851.55 TOTAL write_bw wfg@bee /export/writeback% ./compare -g xfs lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 687.76 +2.4% 704.52 lkp-nex04/JBOD-12HDD-thresh=1000M/xfs-100dd-1-3.5.0 705.09 +0.0% 705.11 lkp-nex04/JBOD-12HDD-thresh=1000M/xfs-10dd-1-3.5.0 702.21 -0.1% 701.72 lkp-nex04/JBOD-12HDD-thresh=1000M/xfs-1dd-1-3.5.0 664.86 +21.8% 809.81 lkp-nex04/JBOD-12HDD-thresh=100M/xfs-100dd-1-3.5.0 609.97 +13.6% 693.12 lkp-nex04/JBOD-12HDD-thresh=100M/xfs-10dd-1-3.5.0 708.30 +0.8% 713.68 lkp-nex04/JBOD-12HDD-thresh=100M/xfs-1dd-1-3.5.0 701.19 -0.0% 700.85 lkp-nex04/JBOD-12HDD-thresh=8G/xfs-10dd-1-3.5.0 701.69 -0.1% 701.01 lkp-nex04/JBOD-12HDD-thresh=8G/xfs-1dd-1-3.5.0 699.98 -0.4% 697.40 lkp-nex04/RAID0-12HDD-thresh=1000M/xfs-10dd-1-3.5.0 653.92 +0.3% 656.07 lkp-nex04/RAID0-12HDD-thresh=1000M/xfs-1dd-1-3.5.0 650.25 +0.5% 653.32 lkp-nex04/RAID0-12HDD-thresh=100M/xfs-10dd-1-3.5.0 612.47 -2.9% 594.93 lkp-nex04/RAID0-12HDD-thresh=100M/xfs-1dd-1-3.5.0 694.90 +0.0% 695.19 lkp-nex04/RAID0-12HDD-thresh=8G/xfs-10dd-1-3.5.0 607.37 +14.2% 693.36 lkp-nex04/RAID0-12HDD-thresh=8G/xfs-1dd-1-3.5.0 273.54 +27.1% 347.67 lkp-nex04/RAID5-12HDD-thresh=1000M/xfs-10dd-1-3.5.0 277.00 +30.6% 361.71 lkp-nex04/RAID5-12HDD-thresh=1000M/xfs-1dd-1-3.5.0 194.74 +6.6% 207.62 lkp-nex04/RAID5-12HDD-thresh=100M/xfs-10dd-1-3.5.0 288.92 +21.2% 350.05 lkp-nex04/RAID5-12HDD-thresh=100M/xfs-1dd-1-3.5.0 278.33 +26.4% 351.78 lkp-nex04/RAID5-12HDD-thresh=8G/xfs-10dd-1-3.5.0 285.64 +24.2% 354.68 lkp-nex04/RAID5-12HDD-thresh=8G/xfs-1dd-1-3.5.0 10998.15 +6.3% 11693.60 TOTAL write_bw wfg@bee /export/writeback% ./compare -g btrfs lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 703.26 -0.1% 702.57 lkp-nex04/JBOD-12HDD-thresh=1000M/btrfs-10dd-1-3.5.0 701.88 -0.0% 701.85 lkp-nex04/JBOD-12HDD-thresh=1000M/btrfs-1dd-1-3.5.0 697.67 +7.1% 747.07 lkp-nex04/JBOD-12HDD-thresh=100M/btrfs-10dd-1-3.5.0 712.91 -0.4% 710.36 lkp-nex04/JBOD-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 702.02 -0.1% 701.26 lkp-nex04/JBOD-12HDD-thresh=8G/btrfs-10dd-1-3.5.0 702.06 -0.1% 701.66 lkp-nex04/JBOD-12HDD-thresh=8G/btrfs-1dd-1-3.5.0 709.01 -0.7% 703.83 lkp-nex04/RAID0-12HDD-thresh=1000M/btrfs-10dd-1-3.5.0 696.67 -4.2% 667.22 lkp-nex04/RAID0-12HDD-thresh=1000M/btrfs-1dd-1-3.5.0 822.15 +0.1% 823.01 lkp-nex04/RAID0-12HDD-thresh=100M/btrfs-10dd-1-3.5.0 685.14 +2.9% 705.35 lkp-nex04/RAID0-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 702.55 -0.0% 702.23 lkp-nex04/RAID0-12HDD-thresh=8G/btrfs-10dd-1-3.5.0 674.09 -7.1% 626.31 lkp-nex04/RAID0-12HDD-thresh=8G/btrfs-1dd-1-3.5.0 270.81 +21.0% 327.76 lkp-nex04/RAID5-12HDD-thresh=1000M/btrfs-10dd-1-3.5.0 267.19 +15.8% 309.36 lkp-nex04/RAID5-12HDD-thresh=1000M/btrfs-1dd-1-3.5.0 273.89 +25.3% 343.10 lkp-nex04/RAID5-12HDD-thresh=100M/btrfs-10dd-1-3.5.0 276.31 +19.7% 330.87 lkp-nex04/RAID5-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 251.25 +17.3% 294.80 lkp-nex04/RAID5-12HDD-thresh=8G/btrfs-10dd-1-3.5.0 267.48 +7.1% 286.47 lkp-nex04/RAID5-12HDD-thresh=8G/btrfs-1dd-1-3.5.0 10116.34 +2.7% 10385.07 TOTAL write_bw wfg@bee /export/writeback% ./compare -g ext4 lkp-st02-x8664/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 900.62 +0.1% 901.66 lkp-st02-x8664/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 898.13 +1.4% 910.73 lkp-st02-x8664/JBOD-12HDD-thresh=2G/ext4-1dd-1-3.5.0 166.95 +3.8% 173.33 lkp-st02-x8664/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 176.14 +2.8% 181.01 lkp-st02-x8664/RAID5-12HDD-thresh=2G/ext4-1dd-1-3.5.0 25.84 +0.3% 25.92 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_mmap_0_4k-1-3.5.0 92.34 -4.8% 87.88 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_mmap_0_64k-1-3.5.0 21.20 +2.1% 21.65 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_mmap_1_4k-1-3.5.0 90.43 +1.6% 91.90 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_mmap_1_64k-1-3.5.0 28.69 -1.8% 28.18 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_sync_0_4k-1-3.5.0 201.86 +0.2% 202.17 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randrw_sync_0_64k-1-3.5.0 28.43 -0.2% 28.37 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_mmap_0_4k-1-3.5.0 110.25 -0.1% 110.20 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_mmap_0_64k-1-3.5.0 31.20 +0.5% 31.36 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_sync_0_4k-1-3.5.0 289.28 +1.0% 292.08 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_sync_0_64k-1-3.5.0 20.50 +0.9% 20.67 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_sync_1_4k-1-3.5.0 294.64 +0.4% 295.94 lkp-st02-x8664/jbod_12hdd/ext4-fio_jbod_12hdd_randwrite_sync_1_64k-1-3.5.0 3376.51 +0.8% 3403.05 TOTAL write_bw wfg@bee /export/writeback% ./compare -g xfs lkp-st02-x8664/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 976.57 -4.8% 929.50 lkp-st02-x8664/JBOD-12HDD-thresh=100M/xfs-1dd-1-3.5.0 1003.33 +2.3% 1026.41 lkp-st02-x8664/JBOD-12HDD-thresh=2G/xfs-1dd-1-3.5.0 796.67 -2.1% 780.09 lkp-st02-x8664/RAID0-12HDD-thresh=100M/xfs-1dd-1-3.5.0 754.89 +0.3% 757.24 lkp-st02-x8664/RAID0-12HDD-thresh=2G/xfs-1dd-1-3.5.0 183.18 +7.6% 197.02 lkp-st02-x8664/RAID5-12HDD-thresh=100M/xfs-1dd-1-3.5.0 191.62 +9.0% 208.92 lkp-st02-x8664/RAID5-12HDD-thresh=2G/xfs-1dd-1-3.5.0 71.83 -1.0% 71.13 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_mmap_0_4k-1-3.5.0 104.93 -1.3% 103.56 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_mmap_0_64k-1-3.5.0 25.90 -0.4% 25.79 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_mmap_1_4k-1-3.5.0 88.13 +1.1% 89.06 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_mmap_1_64k-1-3.5.0 88.63 +0.2% 88.85 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_sync_0_4k-1-3.5.0 291.55 +0.1% 291.70 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randrw_sync_0_64k-1-3.5.0 87.44 -1.5% 86.15 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randwrite_mmap_0_4k-1-3.5.0 122.64 -1.6% 120.69 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randwrite_mmap_0_64k-1-3.5.0 507.15 +0.2% 508.12 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randwrite_sync_0_64k-1-3.5.0 32.09 -0.8% 31.85 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randwrite_sync_1_4k-1-3.5.0 331.16 +0.2% 331.77 lkp-st02-x8664/jbod_12hdd/xfs-fio_jbod_12hdd_randwrite_sync_1_64k-1-3.5.0 5657.70 -0.2% 5647.85 TOTAL write_bw wfg@bee /export/writeback% ./compare -g btrfs lkp-st02-x8664/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 970.57 -2.9% 942.80 lkp-st02-x8664/JBOD-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 965.95 -0.1% 964.91 lkp-st02-x8664/JBOD-12HDD-thresh=2G/btrfs-1dd-1-3.5.0 813.94 -2.3% 794.99 lkp-st02-x8664/RAID0-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 860.05 -11.1% 764.50 lkp-st02-x8664/RAID0-12HDD-thresh=2G/btrfs-1dd-1-3.5.0 164.02 +15.3% 189.09 lkp-st02-x8664/RAID5-12HDD-thresh=100M/btrfs-1dd-1-3.5.0 163.78 +14.1% 186.94 lkp-st02-x8664/RAID5-12HDD-thresh=2G/btrfs-1dd-1-3.5.0 3938.30 -2.4% 3843.24 TOTAL write_bw Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks Fengguang: For the record, I was able to find my own easy repro, last night using only a 220 meg partition: # mke2fs -t ext4 -b 1024 -J size=1 /dev/vdc # mount -t ext2 /dev/vdc /vdc # mkdir /vdc/a # cd /vdc/a # seq 1 210000 | xargs -n 1 fallocate -l 1m # seq 1 2 210000 | xargs /bin/rm # mkdir /vdc/b # cd /vdc/b # seq 1 103 | xargs -n 1 fallocate -l 1g # cd / # umount /vdc # mount -t ext4 -o commit=10000 /dev/vdc /vdc # rm -rf /vdc/b For future reference, there are a couple of things that are of interest to ext4 developers when trying to create repro's: 1) The use of mounting with ext2 to speed up the setup. 2) The first two "seq ... | xargs ..." commands to create a very fragmented file system. 3) Using a 1k block size file system to stress the extent tree code and htree directory (since its easier to make larger tree structure). 4) The use of the mount option commit=10000 to test what happens when the journal is full (without using a nice, fast device such as RAID array or without burning write cycles on an expensive flash device.) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ted, On Fri, Aug 17, 2012 at 09:15:58AM -0400, Theodore Ts'o wrote: > Thanks Fengguang: > > For the record, I was able to find my own easy repro, last night using > only a 220 meg partition: > > # mke2fs -t ext4 -b 1024 -J size=1 /dev/vdc > # mount -t ext2 /dev/vdc /vdc > # mkdir /vdc/a > # cd /vdc/a > # seq 1 210000 | xargs -n 1 fallocate -l 1m > # seq 1 2 210000 | xargs /bin/rm > # mkdir /vdc/b > # cd /vdc/b > # seq 1 103 | xargs -n 1 fallocate -l 1g > # cd / > # umount /vdc > # mount -t ext4 -o commit=10000 /dev/vdc /vdc > # rm -rf /vdc/b It makes a nice and simple test script, I'd very like to add it to my 0day test system :-) > For future reference, there are a couple of things that are of > interest to ext4 developers when trying to create repro's: > > 1) The use of mounting with ext2 to speed up the setup. > > 2) The first two "seq ... | xargs ..." commands to create a very > fragmented file system. > > 3) Using a 1k block size file system to stress the extent tree code > and htree directory (since its easier to make larger tree structure). > > 4) The use of the mount option commit=10000 to test what happens when > the journal is full (without using a nice, fast device such as RAID array > or without burning write cycles on an expensive flash device.) Thanks for the directions! I'll make that a big comment. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > Ted, > > I find ext4 write performance dropped by 3.3% on average in the > 3.6-rc1 merge window. xfs and btrfs are fine. > > Two machines are tested. The performance regression happens in the > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > not see regression, which is equipped with HDD drives. I'll continue > to repeat the tests and report variations. Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- fs/ext4 fs/jbd2" and I don't see anything that I would expect would cause that. The are the lock elimination changes for Direct I/O overwrites, but that shouldn't matter for your tests which are measuring buffered writes, correct? Is there any chance you could do me a favor and do a git bisect restricted to commits involving fs/ext4 and fs/jbd2? Many thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > Ted, > > > > I find ext4 write performance dropped by 3.3% on average in the > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > Two machines are tested. The performance regression happens in the > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > not see regression, which is equipped with HDD drives. I'll continue > > to repeat the tests and report variations. > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > cause that. The are the lock elimination changes for Direct I/O > overwrites, but that shouldn't matter for your tests which are > measuring buffered writes, correct? > > Is there any chance you could do me a favor and do a git bisect > restricted to commits involving fs/ext4 and fs/jbd2? No problem :) Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[CC md list] On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > Ted, > > > > I find ext4 write performance dropped by 3.3% on average in the > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > Two machines are tested. The performance regression happens in the > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > not see regression, which is equipped with HDD drives. I'll continue > > to repeat the tests and report variations. > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > cause that. The are the lock elimination changes for Direct I/O > overwrites, but that shouldn't matter for your tests which are > measuring buffered writes, correct? > > Is there any chance you could do me a favor and do a git bisect > restricted to commits involving fs/ext4 and fs/jbd2? I noticed that the regressions all happen in the RAID0/RAID5 cases. So it may be some interactions between the RAID/ext4 code? I'll try to get some ext2/3 numbers, which should have less changes on the fs side. wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 214.25 +5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 207.16 +13.4% 234.98 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-2-3.5.0 17572.12 -1.9% 17240.05 TOTAL write_bw Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 11:13:18PM +0800, Fengguang Wu wrote: > > Obviously the major regressions happen to the 100dd over raid cases. > Some 10dd cases are also impacted. > > The attached graphs show that everything becomes more fluctuated in > 3.6.0-rc1 for the lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1 case. Hmm... I'm not seeing any differences in the block allocation code, or in ext4's buffered writeback code paths, which would be the most likely cause of such problems. Maybe a quick eyeball of the blktrace to see if we're doing something pathalogically stupid? You could also try running a filefrag -v on a few of the dd files to see if there's any significant difference, although as I said, there doesn't look like there was any significant changes in the block allocation code between v3.5 and v3.6-rc1 --- although I suppose changes in timeing could have have caused the block allocation decisions to be different, so it's worth checking that out. Thanks, regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 09:15:58AM -0400, Theodore Ts'o wrote: > Thanks Fengguang: > > For the record, I was able to find my own easy repro, last night using > only a 220 meg partition: > > # mke2fs -t ext4 -b 1024 -J size=1 /dev/vdc > # mount -t ext2 /dev/vdc /vdc > # mkdir /vdc/a > # cd /vdc/a > # seq 1 210000 | xargs -n 1 fallocate -l 1m > # seq 1 2 210000 | xargs /bin/rm > # mkdir /vdc/b > # cd /vdc/b > # seq 1 103 | xargs -n 1 fallocate -l 1g > # cd / > # umount /vdc > # mount -t ext4 -o commit=10000 /dev/vdc /vdc > # rm -rf /vdc/b Can you submit this for xfstests? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 01:48:41PM -0400, Christoph Hellwig wrote: > > Can you submit this for xfstests? > This is actually something I wanted to ask you guys about. There are a series of ext4-specific tests that I could potentially add, but I wasn't sure how welcome they would be in xfstests. Assuming that ext4-specific tests would be welcome, is there a number range for these ext4-specific tests that I should use? BTW, we have an extension to xfstests that we've been using inside Google where Google-internal tests have a "g" prefix (i.e., g001, g002, etc.). That way we didn't need to worry about conflicts between newly added upstream xfstests, and ones which were added internally. Would it make sense to start using some kind of prefix such as "e001" for ext2/3/4 specific tests? Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu <fengguang.wu@intel.com> wrote: > [CC md list] > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > Ted, > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > Two machines are tested. The performance regression happens in the > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > not see regression, which is equipped with HDD drives. I'll continue > > > to repeat the tests and report variations. > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > cause that. The are the lock elimination changes for Direct I/O > > overwrites, but that shouldn't matter for your tests which are > > measuring buffered writes, correct? > > > > Is there any chance you could do me a favor and do a git bisect > > restricted to commits involving fs/ext4 and fs/jbd2? > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > So it may be some interactions between the RAID/ext4 code? I'm aware of some performance regression in RAID5 which I will be drilling down into next week. Some things are faster, but some are slower :-( RAID0 should be unchanged though - I don't think I've changed anything there. Looking at your numbers, JBOD ranges from +6.5% to -1.5% RAID0 ranges from +4.0% to -19.2% RAID5 ranges from +20.7% to -39.7% I'm guessing + is good and - is bad? The RAID5 numbers don't surprise me. The RAID0 do. > > I'll try to get some ext2/3 numbers, which should have less changes on the fs side. Thanks. That will be useful. NeilBrown > > wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} > 3.5.0 3.6.0-rc1+ > ------------------------ ------------------------ > 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > 214.25 +5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > 207.16 +13.4% 234.98 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > 17572.12 -1.9% 17240.05 TOTAL write_bw > > Thanks, > Fengguang
On Fri, Aug 17, 2012 at 04:34:38PM -0400, Theodore Ts'o wrote: > On Fri, Aug 17, 2012 at 01:48:41PM -0400, Christoph Hellwig wrote: > > > > Can you submit this for xfstests? > > > > This is actually something I wanted to ask you guys about. There are > a series of ext4-specific tests that I could potentially add, but I > wasn't sure how welcome they would be in xfstests. Assuming that > ext4-specific tests would be welcome, is there a number range for > these ext4-specific tests that I should use? Dave actually has an outstanding series to move tests from the toplevel directory to directories for categories. We already have a lot of btrfs-specific tests that have a separate directory, as well as xfs specific ones, ext4 would just follow this model. For this specific test it actually seems fairly generic except for the commit interval, so I'd love to run it for all filesystems, just setting the interval for ext4. > BTW, we have an extension to xfstests that we've been using inside > Google where Google-internal tests have a "g" prefix (i.e., g001, > g002, etc.). That way we didn't need to worry about conflicts between > newly added upstream xfstests, and ones which were added internally. > Would it make sense to start using some kind of prefix such as "e001" > for ext2/3/4 specific tests? Can you take a look at Dave's series if that helps you? I haven't really reviewed it much myself yet, but I'll try to get to it ASAP. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Aug 17, 2012 at 05:05:27PM -0400, Christoph Hellwig wrote: > On Fri, Aug 17, 2012 at 04:34:38PM -0400, Theodore Ts'o wrote: > > On Fri, Aug 17, 2012 at 01:48:41PM -0400, Christoph Hellwig wrote: > > > > > > Can you submit this for xfstests? > > > > > > > This is actually something I wanted to ask you guys about. There are > > a series of ext4-specific tests that I could potentially add, but I > > wasn't sure how welcome they would be in xfstests. Assuming that > > ext4-specific tests would be welcome, is there a number range for > > these ext4-specific tests that I should use? > > Dave actually has an outstanding series to move tests from the toplevel > directory to directories for categories. And a whole lot more stuff, like a separate results directory, being able to run just a directory of tests rather than a group (e.g. just run ext4 specific tests), being able to use names rather than numbers for tests (not quite there yet), being able to exclude different tests (e.g. for older distro testing with wont-fix bugs), etc. Basically, all those things I talked about at the LSF/MM conference about making xfstests easier to use, develop and deploy for the wider filesystem community are started in the patchsets here: http://oss.sgi.com/archives/xfs/2012-07/msg00361.html http://oss.sgi.com/archives/xfs/2012-07/msg00373.html "This moves all the tests into a ./tests subdirectory, and sorts them into classes of related tests. Those are: tests/generic: valid for all filesystems tests/shared: valid for a limited number of filesystems tests/xfs: xfs specific tests tests/btrfs btrfs specific tests tests/ext4 ext4 specific tests tests/udf udf specific tests Each directory has it's own group file to determine what groups the tests are associated with. Tests are run in exactly the same was as before, but when trying to run individual tests you need to specify the class as well. e.g. the old way: # ./check 001 The new way: # ./check generic/001 ...." > We already have a lot of > btrfs-specific tests that have a separate directory, as well as xfs > specific ones, ext4 would just follow this model. For this specific > test it actually seems fairly generic except for the commit interval, > so I'd love to run it for all filesystems, just setting the interval for > ext4. Yeah, anything that is not deeply fileystem specific should be written as a generic test so that it can run on all filesystems. If it's mostly generic, with a small fs specific extension, that extension is easy to do under a 'if [ $FSTYP = "ext4" ]; then' branch.... > > BTW, we have an extension to xfstests that we've been using inside > > Google where Google-internal tests have a "g" prefix (i.e., g001, > > g002, etc.). That way we didn't need to worry about conflicts between > > newly added upstream xfstests, and ones which were added internally. > > Would it make sense to start using some kind of prefix such as "e001" > > for ext2/3/4 specific tests? No. The whole point of moving to multiple directories is to allow easy extension for domain specific tests without having to hack up the check script or play other games with test naming. Duplicate names in different test subdirectories are most certainly allowed. > Can you take a look at Dave's series if that helps you? I haven't > really reviewed it much myself yet, but I'll try to get to it ASAP. Well, I'd apprepciate it if somebody looked at it. It's been almost a month since I posted it and all I've heard is crickets so far... Cheers, Dave.
On Sat, Aug 18, 2012 at 08:55:24AM +1000, Dave Chinner wrote: > > No. The whole point of moving to multiple directories is to allow > easy extension for domain specific tests without having to hack up > the check script or play other games with test naming. Duplicate > names in different test subdirectories are most certainly allowed. Oh, I agree, using separate directories is *way* better than the hack we're using internally. The main benefit of what we did was the patches were minimally intrusive.... > > Can you take a look at Dave's series if that helps you? I haven't > > really reviewed it much myself yet, but I'll try to get to it ASAP. > > Well, I'd apprepciate it if somebody looked at it. It's been almost > a month since I posted it and all I've heard is crickets so far... I definitely want to look at it, but realistically, I probably won't have time until after San Diego.... I've been crazy busy lately. Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: > On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu <fengguang.wu@intel.com> > wrote: > > > [CC md list] > > > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > > Ted, > > > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > > > Two machines are tested. The performance regression happens in the > > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > > not see regression, which is equipped with HDD drives. I'll continue > > > > to repeat the tests and report variations. > > > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > > cause that. The are the lock elimination changes for Direct I/O > > > overwrites, but that shouldn't matter for your tests which are > > > measuring buffered writes, correct? > > > > > > Is there any chance you could do me a favor and do a git bisect > > > restricted to commits involving fs/ext4 and fs/jbd2? > > > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > > So it may be some interactions between the RAID/ext4 code? > > I'm aware of some performance regression in RAID5 which I will be drilling > down into next week. Some things are faster, but some are slower :-( > > RAID0 should be unchanged though - I don't think I've changed anything there. > > Looking at your numbers, JBOD ranges from +6.5% to -1.5% > RAID0 ranges from +4.0% to -19.2% > RAID5 ranges from +20.7% to -39.7% > > I'm guessing + is good and - is bad? Yes. > The RAID5 numbers don't surprise me. The RAID0 do. You are right. I did more tests and it's now obvious that RAID0 is mostly fine. The major regressions are in the RAID5 10/100dd cases. JBOD is performing better in 3.6.0-rc1 :-) > > > > I'll try to get some ext2/3 numbers, which should have less changes on the fs side. > > Thanks. That will be useful. Here are the more complete results. RAID5 ext4 100dd -7.3% RAID5 ext4 10dd -2.2% RAID5 ext4 1dd +12.1% RAID5 ext3 100dd -3.1% RAID5 ext3 10dd -11.5% RAID5 ext3 1dd +8.9% RAID5 ext2 100dd -10.5% RAID5 ext2 10dd -5.2% RAID5 ext2 1dd +10.0% RAID0 ext4 100dd +1.7% RAID0 ext4 10dd -0.9% RAID0 ext4 1dd -1.1% RAID0 ext3 100dd -4.2% RAID0 ext3 10dd -0.2% RAID0 ext3 1dd -1.0% RAID0 ext2 100dd +11.3% RAID0 ext2 10dd +4.7% RAID0 ext2 1dd -1.6% JBOD ext4 100dd +5.9% JBOD ext4 10dd +6.0% JBOD ext4 1dd +0.6% JBOD ext3 100dd +6.1% JBOD ext3 10dd +1.9% JBOD ext3 1dd +1.7% JBOD ext2 100dd +9.9% JBOD ext2 10dd +9.4% JBOD ext2 1dd +0.5% wfg@bee /export/writeback% ./compare-groups 'RAID5 RAID0 JBOD' 'ext4 ext3 ext2' '100dd 10dd 1dd' lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} RAID5 ext4 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 130.42 -21.7% 102.06 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-2-3.5.0 83.45 +10.2% 91.96 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-3-3.5.0 105.97 +11.5% 118.12 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-4-3.5.0 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 52.79 +1.1% 53.36 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-2-3.5.0 40.75 -5.1% 38.69 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-3-3.5.0 42.79 +14.5% 48.99 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-4-3.5.0 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 176.21 +11.3% 196.16 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-2-3.5.0 158.12 +3.7% 163.99 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-3-3.5.0 180.18 +6.4% 191.74 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-4-3.5.0 1419.08 -7.3% 1314.88 TOTAL write_bw RAID5 ext4 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 212.84 +16.7% 248.39 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-2-3.5.0 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 124.61 +3.2% 128.65 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-2-3.5.0 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 229.35 -2.8% 222.82 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-2-3.5.0 1200.03 -2.2% 1173.81 TOTAL write_bw RAID5 ext4 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 214.25 +5.6% 226.32 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-1-3.5.0 207.16 +13.4% 234.98 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-1dd-2-3.5.0 1377.15 +12.1% 1544.14 TOTAL write_bw RAID5 ext3 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 72.75 -5.8% 68.50 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-100dd-1-3.5.0 52.04 +0.8% 52.45 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-100dd-2-3.5.0 48.85 +19.2% 58.21 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-100dd-3-3.5.0 47.04 +9.4% 51.44 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-100dd-4-3.5.0 53.89 -7.4% 49.90 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-100dd-1-3.5.0 43.00 -10.7% 38.39 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-100dd-2-3.5.0 37.82 +0.8% 38.11 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-100dd-3-3.5.0 39.59 -4.0% 38.02 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-100dd-4-3.5.0 54.45 -15.0% 46.26 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-100dd-1-3.5.0 45.81 -4.5% 43.77 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-100dd-2-3.5.0 51.20 -12.6% 44.75 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-100dd-3-3.5.0 47.39 -3.9% 45.53 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-100dd-4-3.5.0 593.84 -3.1% 575.32 TOTAL write_bw RAID5 ext3 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 50.29 -10.2% 45.14 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-10dd-1-3.5.0 48.46 -7.1% 45.04 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-10dd-2-3.5.0 67.11 -17.8% 55.16 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-10dd-1-3.5.0 75.45 -28.2% 54.21 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-10dd-2-3.5.0 42.08 +6.4% 44.78 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-10dd-1-3.5.0 40.48 +4.8% 42.44 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-10dd-2-3.5.0 323.87 -11.5% 286.76 TOTAL write_bw RAID5 ext3 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 190.20 +14.5% 217.69 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-1dd-1-3.5.0 192.30 +9.4% 210.43 lkp-nex04/RAID5-12HDD-thresh=1000M/ext3-1dd-2-3.5.0 193.63 +14.0% 220.64 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-1dd-1-3.5.0 224.33 -6.8% 209.07 lkp-nex04/RAID5-12HDD-thresh=100M/ext3-1dd-2-3.5.0 188.30 +14.6% 215.83 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-1dd-1-3.5.0 179.04 +10.7% 198.17 lkp-nex04/RAID5-12HDD-thresh=8G/ext3-1dd-2-3.5.0 1167.79 +8.9% 1271.83 TOTAL write_bw RAID5 ext2 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 76.75 -17.9% 62.98 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-100dd-1-3.5.0 72.32 -7.7% 66.73 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-100dd-2-3.5.0 56.48 +2.2% 57.75 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-100dd-3-3.5.0 56.81 -1.8% 55.81 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-100dd-4-3.5.0 58.58 -5.0% 55.67 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-100dd-1-3.5.0 60.02 -3.1% 58.15 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-100dd-2-3.5.0 54.01 -9.1% 49.12 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-100dd-3-3.5.0 61.00 -22.3% 47.38 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-100dd-4-3.5.0 50.98 -15.2% 43.22 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-100dd-1-3.5.0 49.52 -14.8% 42.18 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-100dd-2-3.5.0 48.35 -17.2% 40.04 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-100dd-3-3.5.0 49.14 -14.8% 41.88 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-100dd-4-3.5.0 693.96 -10.5% 620.90 TOTAL write_bw RAID5 ext2 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 46.88 -0.5% 46.67 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-10dd-1-3.5.0 49.98 -8.8% 45.59 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-10dd-2-3.5.0 45.01 +0.7% 45.32 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-10dd-1-3.5.0 84.88 -25.5% 63.27 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-10dd-2-3.5.0 44.49 +15.9% 51.56 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-10dd-1-3.5.0 43.73 +5.6% 46.19 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-10dd-2-3.5.0 314.97 -5.2% 298.60 TOTAL write_bw RAID5 ext2 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 234.85 +7.6% 252.80 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-1dd-1-3.5.0 250.77 +17.9% 295.65 lkp-nex04/RAID5-12HDD-thresh=1000M/ext2-1dd-2-3.5.0 205.84 +4.9% 215.93 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-1dd-1-3.5.0 213.89 +7.2% 229.37 lkp-nex04/RAID5-12HDD-thresh=100M/ext2-1dd-2-3.5.0 217.70 +13.1% 246.25 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-1dd-1-3.5.0 241.22 +8.3% 261.19 lkp-nex04/RAID5-12HDD-thresh=8G/ext2-1dd-2-3.5.0 1364.27 +10.0% 1501.19 TOTAL write_bw RAID0 ext4 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 640.40 -0.8% 635.21 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-2-3.5.0 370.03 +4.9% 388.06 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-3-3.5.0 376.90 +6.1% 399.96 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-4-3.5.0 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 399.91 +52.7% 610.76 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-2-3.5.0 342.58 +6.3% 364.24 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-3-3.5.0 300.55 +24.6% 374.34 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-4-3.5.0 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 582.28 -1.5% 573.28 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-2-3.5.0 491.00 +8.4% 532.13 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-3-3.5.0 485.84 +9.5% 532.12 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-4-3.5.0 6074.09 +1.7% 6178.38 TOTAL write_bw RAID0 ext4 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 626.18 -3.2% 606.21 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-2-3.5.0 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 630.30 +3.4% 651.81 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-2-3.5.0 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 665.04 +1.3% 673.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-2-3.5.0 3955.21 -0.9% 3920.37 TOTAL write_bw RAID0 ext4 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 2516.71 -1.1% 2488.51 TOTAL write_bw RAID0 ext3 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 500.94 -4.0% 481.11 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-100dd-1-3.5.0 494.13 -4.3% 473.13 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-100dd-2-3.5.0 513.57 -9.2% 466.07 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-100dd-3-3.5.0 490.19 -5.9% 461.42 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-100dd-4-3.5.0 511.08 -2.9% 496.04 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-100dd-1-3.5.0 520.57 -7.6% 480.95 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-100dd-2-3.5.0 523.62 -5.2% 496.52 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-100dd-3-3.5.0 497.72 -0.1% 497.16 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-100dd-4-3.5.0 470.99 -5.0% 447.64 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-100dd-1-3.5.0 444.63 +2.0% 453.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-100dd-2-3.5.0 448.25 -4.7% 427.18 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-100dd-3-3.5.0 475.57 -3.1% 460.84 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-100dd-4-3.5.0 5891.26 -4.2% 5641.62 TOTAL write_bw RAID0 ext3 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 560.26 +2.8% 576.15 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-10dd-1-3.5.0 583.44 +0.5% 586.08 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-10dd-2-3.5.0 566.37 -3.2% 548.19 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-10dd-1-3.5.0 579.37 -2.1% 567.13 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-10dd-2-3.5.0 623.24 +0.2% 624.71 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-10dd-1-3.5.0 624.26 +0.7% 628.74 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-10dd-2-3.5.0 3536.93 -0.2% 3531.00 TOTAL write_bw RAID0 ext3 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 351.69 -2.3% 343.73 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-1dd-1-3.5.0 355.50 -4.1% 340.77 lkp-nex04/RAID0-12HDD-thresh=1000M/ext3-1dd-2-3.5.0 383.34 +0.9% 386.96 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-1dd-1-3.5.0 385.74 +1.3% 390.69 lkp-nex04/RAID0-12HDD-thresh=100M/ext3-1dd-2-3.5.0 315.53 -0.2% 314.81 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-1dd-1-3.5.0 319.52 -1.9% 313.36 lkp-nex04/RAID0-12HDD-thresh=8G/ext3-1dd-2-3.5.0 2111.31 -1.0% 2090.32 TOTAL write_bw RAID0 ext2 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 694.06 +0.0% 694.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-100dd-1-3.5.0 693.19 -0.1% 692.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-100dd-2-3.5.0 686.16 +1.2% 694.35 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-100dd-3-3.5.0 691.17 -0.2% 690.13 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-100dd-4-3.5.0 668.16 +1.3% 677.07 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-100dd-1-3.5.0 404.60 +62.9% 658.90 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-100dd-2-3.5.0 346.48 +81.1% 627.62 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-100dd-3-3.5.0 373.48 +71.3% 639.85 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-100dd-4-3.5.0 691.96 +0.2% 693.29 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-100dd-1-3.5.0 690.73 +0.5% 694.40 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-100dd-2-3.5.0 692.65 +0.1% 693.27 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-100dd-3-3.5.0 690.10 +0.2% 691.71 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-100dd-4-3.5.0 7322.74 +11.3% 8147.20 TOTAL write_bw RAID0 ext2 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 247.29 +23.2% 304.58 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-10dd-1-3.5.0 697.35 +0.4% 700.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-10dd-2-3.5.0 662.14 +1.8% 673.83 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-10dd-1-3.5.0 613.81 +10.0% 675.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-10dd-2-3.5.0 337.37 +5.5% 355.95 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-10dd-1-3.5.0 682.57 +0.0% 682.90 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-10dd-2-3.5.0 3240.53 +4.7% 3393.07 TOTAL write_bw RAID0 ext2 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 526.72 -4.1% 505.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-1dd-1-3.5.0 516.77 -0.6% 513.81 lkp-nex04/RAID0-12HDD-thresh=1000M/ext2-1dd-2-3.5.0 617.83 +0.3% 619.45 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-1dd-1-3.5.0 617.49 +0.6% 621.09 lkp-nex04/RAID0-12HDD-thresh=100M/ext2-1dd-2-3.5.0 502.60 -1.0% 497.39 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-1dd-1-3.5.0 504.82 -5.7% 475.86 lkp-nex04/RAID0-12HDD-thresh=8G/ext2-1dd-2-3.5.0 3286.22 -1.6% 3232.89 TOTAL write_bw JBOD ext4 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 469.82 +14.3% 536.78 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-2-3.5.0 666.90 -2.6% 649.61 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-3-3.5.0 343.93 +24.1% 426.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-4-3.5.0 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 457.65 -1.4% 451.18 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-2-3.5.0 739.08 +5.7% 781.16 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-3-3.5.0 332.98 -9.6% 301.13 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-4-3.5.0 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 565.73 +16.8% 660.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-2-3.5.0 647.47 +5.3% 681.63 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-3-3.5.0 416.22 +25.5% 522.50 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-4-3.5.0 6845.19 +5.9% 7248.49 TOTAL write_bw JBOD ext4 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 525.34 +12.1% 589.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-2-3.5.0 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 335.12 +25.1% 419.10 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-2-3.5.0 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 665.60 +5.4% 701.85 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-2-3.5.0 3582.17 +6.0% 3798.23 TOTAL write_bw JBOD ext4 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 4216.81 +0.6% 4241.46 TOTAL write_bw JBOD ext3 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 683.15 -3.8% 657.31 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-100dd-1-3.5.0 711.48 -0.1% 710.83 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-100dd-2-3.5.0 677.50 +0.0% 677.62 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-100dd-3-3.5.0 713.31 -0.5% 709.88 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-100dd-4-3.5.0 648.70 +16.1% 753.15 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-100dd-1-3.5.0 633.24 +26.5% 801.02 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-100dd-2-3.5.0 568.48 +23.8% 703.49 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-100dd-3-3.5.0 680.59 +21.6% 827.77 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-100dd-4-3.5.0 656.73 -0.9% 651.09 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-100dd-1-3.5.0 697.29 -0.3% 695.49 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-100dd-2-3.5.0 669.99 -1.9% 657.24 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-100dd-3-3.5.0 697.73 -2.1% 683.17 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-100dd-4-3.5.0 8038.18 +6.1% 8528.06 TOTAL write_bw JBOD ext3 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 669.69 -1.0% 663.26 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-10dd-1-3.5.0 704.60 +0.3% 707.03 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-10dd-2-3.5.0 629.95 +3.0% 648.55 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-10dd-1-3.5.0 616.65 +9.6% 676.08 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-10dd-2-3.5.0 691.77 +0.6% 695.88 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-10dd-1-3.5.0 706.05 -0.2% 704.95 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-10dd-2-3.5.0 4018.71 +1.9% 4095.75 TOTAL write_bw JBOD ext3 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 700.88 +0.1% 701.30 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-1dd-1-3.5.0 699.78 +0.1% 700.27 lkp-nex04/JBOD-12HDD-thresh=1000M/ext3-1dd-2-3.5.0 700.07 -0.0% 699.98 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-1dd-1-3.5.0 598.09 +11.7% 668.09 lkp-nex04/JBOD-12HDD-thresh=100M/ext3-1dd-2-3.5.0 700.53 -0.0% 700.47 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-1dd-1-3.5.0 700.65 +0.1% 701.43 lkp-nex04/JBOD-12HDD-thresh=8G/ext3-1dd-2-3.5.0 4100.00 +1.7% 4171.54 TOTAL write_bw JBOD ext2 100dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 644.32 +6.1% 683.56 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-100dd-1-3.5.0 558.44 +16.9% 652.63 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-100dd-2-3.5.0 443.68 +30.1% 577.18 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-100dd-3-3.5.0 449.28 +36.8% 614.49 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-100dd-4-3.5.0 526.02 +10.2% 579.52 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-100dd-1-3.5.0 442.03 +10.6% 488.71 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-100dd-2-3.5.0 375.04 -5.5% 354.36 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-100dd-3-3.5.0 365.83 +3.9% 379.96 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-100dd-4-3.5.0 693.56 +0.8% 699.06 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-100dd-1-3.5.0 661.00 +3.3% 682.82 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-100dd-2-3.5.0 584.28 +9.1% 637.22 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-100dd-3-3.5.0 657.01 +4.2% 684.28 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-100dd-4-3.5.0 6400.48 +9.9% 7033.79 TOTAL write_bw JBOD ext2 10dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 662.51 +3.4% 685.05 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-10dd-1-3.5.0 665.07 +3.2% 686.05 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-10dd-2-3.5.0 431.26 +26.9% 547.16 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-10dd-1-3.5.0 397.42 +39.4% 553.83 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-10dd-2-3.5.0 685.99 +0.9% 691.90 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-10dd-1-3.5.0 685.68 +1.2% 693.94 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-10dd-2-3.5.0 3527.93 +9.4% 3857.93 TOTAL write_bw JBOD ext2 1dd 3.5.0 3.6.0-rc1+ ------------------------ ------------------------ 718.45 -0.1% 717.89 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-1dd-1-3.5.0 717.25 +0.2% 718.89 lkp-nex04/JBOD-12HDD-thresh=1000M/ext2-1dd-2-3.5.0 686.97 +1.0% 693.82 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-1dd-1-3.5.0 683.79 +1.5% 694.03 lkp-nex04/JBOD-12HDD-thresh=100M/ext2-1dd-2-3.5.0 699.79 +0.1% 700.41 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-1dd-1-3.5.0 700.22 +0.1% 701.15 lkp-nex04/JBOD-12HDD-thresh=8G/ext2-1dd-2-3.5.0 4206.46 +0.5% 4226.19 TOTAL write_bw -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Aug 21, 2012 at 05:42:21PM +0800, Fengguang Wu wrote: > On Sat, Aug 18, 2012 at 06:44:57AM +1000, NeilBrown wrote: > > On Fri, 17 Aug 2012 22:25:26 +0800 Fengguang Wu <fengguang.wu@intel.com> > > wrote: > > > > > [CC md list] > > > > > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > > > > On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > > > > > Ted, > > > > > > > > > > I find ext4 write performance dropped by 3.3% on average in the > > > > > 3.6-rc1 merge window. xfs and btrfs are fine. > > > > > > > > > > Two machines are tested. The performance regression happens in the > > > > > lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > > > > > not see regression, which is equipped with HDD drives. I'll continue > > > > > to repeat the tests and report variations. > > > > > > > > Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > > > > fs/ext4 fs/jbd2" and I don't see anything that I would expect would > > > > cause that. The are the lock elimination changes for Direct I/O > > > > overwrites, but that shouldn't matter for your tests which are > > > > measuring buffered writes, correct? > > > > > > > > Is there any chance you could do me a favor and do a git bisect > > > > restricted to commits involving fs/ext4 and fs/jbd2? > > > > > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > > > So it may be some interactions between the RAID/ext4 code? > > > > I'm aware of some performance regression in RAID5 which I will be drilling > > down into next week. Some things are faster, but some are slower :-( > > > > RAID0 should be unchanged though - I don't think I've changed anything there. > > > > Looking at your numbers, JBOD ranges from +6.5% to -1.5% > > RAID0 ranges from +4.0% to -19.2% > > RAID5 ranges from +20.7% to -39.7% > > > > I'm guessing + is good and - is bad? > > Yes. > > > The RAID5 numbers don't surprise me. The RAID0 do. > > You are right. I did more tests and it's now obvious that RAID0 is > mostly fine. The major regressions are in the RAID5 10/100dd cases. > JBOD is performing better in 3.6.0-rc1 :-) > > > > > > > I'll try to get some ext2/3 numbers, which should have less changes on the fs side. > > > > Thanks. That will be useful. > > Here are the more complete results. > > RAID5 ext4 100dd -7.3% > RAID5 ext4 10dd -2.2% > RAID5 ext4 1dd +12.1% > RAID5 ext3 100dd -3.1% > RAID5 ext3 10dd -11.5% > RAID5 ext3 1dd +8.9% > RAID5 ext2 100dd -10.5% > RAID5 ext2 10dd -5.2% > RAID5 ext2 1dd +10.0% > RAID0 ext4 100dd +1.7% > RAID0 ext4 10dd -0.9% > RAID0 ext4 1dd -1.1% > RAID0 ext3 100dd -4.2% > RAID0 ext3 10dd -0.2% > RAID0 ext3 1dd -1.0% > RAID0 ext2 100dd +11.3% > RAID0 ext2 10dd +4.7% > RAID0 ext2 1dd -1.6% > JBOD ext4 100dd +5.9% > JBOD ext4 10dd +6.0% > JBOD ext4 1dd +0.6% > JBOD ext3 100dd +6.1% > JBOD ext3 10dd +1.9% > JBOD ext3 1dd +1.7% > JBOD ext2 100dd +9.9% > JBOD ext2 10dd +9.4% > JBOD ext2 1dd +0.5% And here are the xfs/btrfs results. Very impressive RAID5 improvements! RAID5 btrfs 100dd +25.8% RAID5 btrfs 10dd +21.3% RAID5 btrfs 1dd +14.3% RAID5 xfs 100dd +32.8% RAID5 xfs 10dd +21.5% RAID5 xfs 1dd +25.2% RAID0 btrfs 100dd -7.4% RAID0 btrfs 10dd -0.2% RAID0 btrfs 1dd -2.8% RAID0 xfs 100dd +18.8% RAID0 xfs 10dd +0.0% RAID0 xfs 1dd +3.8% JBOD btrfs 100dd -0.0% JBOD btrfs 10dd +2.3% JBOD btrfs 1dd -0.1% JBOD xfs 100dd +8.3% JBOD xfs 10dd +4.1% JBOD xfs 1dd +0.1% Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 8/22/12 11:57 AM, Yuanhan Liu wrote: > On Fri, Aug 17, 2012 at 10:25:26PM +0800, Fengguang Wu wrote: > > [CC md list] > > > > On Fri, Aug 17, 2012 at 09:40:39AM -0400, Theodore Ts'o wrote: > >> On Fri, Aug 17, 2012 at 02:09:15PM +0800, Fengguang Wu wrote: > >>> Ted, > >>> > >>> I find ext4 write performance dropped by 3.3% on average in the > >>> 3.6-rc1 merge window. xfs and btrfs are fine. > >>> > >>> Two machines are tested. The performance regression happens in the > >>> lkp-nex04 machine, which is equipped with 12 SSD drives. lkp-st02 does > >>> not see regression, which is equipped with HDD drives. I'll continue > >>> to repeat the tests and report variations. > >> > >> Hmm... I've checked out the commits in "git log v3.5..v3.6-rc1 -- > >> fs/ext4 fs/jbd2" and I don't see anything that I would expect would > >> cause that. The are the lock elimination changes for Direct I/O > >> overwrites, but that shouldn't matter for your tests which are > >> measuring buffered writes, correct? > >> > >> Is there any chance you could do me a favor and do a git bisect > >> restricted to commits involving fs/ext4 and fs/jbd2? > > > > I noticed that the regressions all happen in the RAID0/RAID5 cases. > > So it may be some interactions between the RAID/ext4 code? > > > > I'll try to get some ext2/3 numbers, which should have less changes on the fs side. > > > > wfg@bee /export/writeback% ./compare -g ext4 lkp-nex04/*/*-{3.5.0,3.6.0-rc1+} > > 3.5.0 3.6.0-rc1+ > > ------------------------ ------------------------ > > 720.62 -1.5% 710.16 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > > 706.04 -0.0% 705.86 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > > 702.86 -0.2% 701.74 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > > 702.41 -0.0% 702.06 lkp-nex04/JBOD-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > > 779.52 +6.5% 830.11 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > > 646.70 +4.9% 678.59 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > > 704.49 +2.6% 723.00 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > > 704.21 +1.2% 712.47 lkp-nex04/JBOD-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > > 705.26 -1.2% 696.61 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > > 703.37 +0.1% 703.76 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > > 701.66 -0.1% 700.83 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > > 701.17 +0.0% 701.36 lkp-nex04/JBOD-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > > 675.08 -10.5% 604.29 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > > 676.52 -2.7% 658.38 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > > 512.70 +4.0% 533.22 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > > 524.61 -0.3% 522.90 lkp-nex04/RAID0-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > > 709.76 -15.7% 598.44 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > > 681.39 -2.1% 667.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > > 524.16 +0.8% 528.25 lkp-nex04/RAID0-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > > 699.77 -19.2% 565.54 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > > 675.79 -1.9% 663.17 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > > 484.84 -7.4% 448.83 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-1-3.5.0 > > 470.40 -3.2% 455.31 lkp-nex04/RAID0-12HDD-thresh=8G/ext4-1dd-2-3.5.0 > > 167.97 -38.7% 103.03 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-100dd-1-3.5.0 > > 243.67 -9.1% 221.41 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-10dd-1-3.5.0 > > 248.98 +12.2% 279.33 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-1-3.5.0 > > 208.45 +14.1% 237.86 lkp-nex04/RAID5-12HDD-thresh=1000M/ext4-1dd-2-3.5.0 > > 71.18 -34.2% 46.82 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-100dd-1-3.5.0 > > 145.84 -7.3% 135.25 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-10dd-1-3.5.0 > > 255.22 +6.7% 272.35 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-1-3.5.0 > > 243.09 +20.7% 293.30 lkp-nex04/RAID5-12HDD-thresh=100M/ext4-1dd-2-3.5.0 > > 209.24 -23.6% 159.96 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-100dd-1-3.5.0 > > 243.73 -10.9% 217.28 lkp-nex04/RAID5-12HDD-thresh=8G/ext4-10dd-1-3.5.0 > > Hi, > > About this issue, I did some investigation. And found we are blocked at > get_active_stripes() in most times. It's reasonable, since max_nr_stripes > is set to 256 now. It's a kind of small value, thus I tried with > different value. Please see the following patch for detailed numbers. > > The test machine is same as above. > > From 85c27fca12b770da5bc8ec9f26a22cb414e84c68 Mon Sep 17 00:00:00 2001 > From: Yuanhan Liu <yuanhan.liu@linux.intel.com> > Date: Wed, 22 Aug 2012 10:51:48 +0800 > Subject: [RFC PATCH] md/raid5: increase NR_STRIPES to 1024 > > Stripe head is a must held resource before doing any IO. And it's > limited to 256 by default. With 10dd case, we found that it is > blocked at get_active_stripes() in most times(please see the ps > output attached). > > Thus I did some tries with different value set to NR_STRIPS, and > here are some numbers(EXT4 only) I got with different NR_STRIPS set: > > write bandwidth: > ================ > 3.5.0-rc1-256+: (Here 256 means with max strip head set to 256) > write bandwidth: 280 > 3.5.0-rc1-1024+: > write bandwidth: 421 (+50.4%) > 3.5.0-rc1-4096+: > write bandwidth: 506 (+80.7%) > 3.5.0-rc1-32768+: > write bandwidth: 615 (+119.6%) > > (Here 'sh' means with Shaohua's "multiple threads to handle strips" patch [0]) > 3.5.0-rc3-strip-sh+-256: > write bandwidth: 465 > > 3.5.0-rc3-strip-sh+-1024: > write bandwidth: 599 > > 3.5.0-rc3-strip-sh+-32768: > write bandwidth: 615 > > The kernel maybe a bit older but I found that the data are still kind of > valid. Though, I haven't tried Shaohua's latest patch. > > As you can see from those data above: the write bandwidth is increased > (a lot) as we increase NR_STRIPES. Thus the bigger NR_STRIPES set, the > better write bandwidth we get. But we can't set NR_STRIPES with a too > large number, especially by default, or it need lots of memory. Due to > the number I got with Shaohua's patch applied, I guess 1024 would be > nice value; it's not too big but we gain above 110% performance. > > Comments? BTW, I have a more flexible(more stupid, in the meantime) way: > change the max_nr_stripes dynamically based on need? > > Here I also attached more data: the script I used to get those number, > ps output, and iostat -kx 3 output. > > The script does it's job in a straight way: start NR dd in background, > trace the writeback/global_dirty_state event in background to count the > write bandwidth, sample the ps out regularly. > > --- > [0]: patch: http://lwn.net/Articles/500200/ > > Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com> > --- > drivers/md/raid5.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index adda94d..82dca53 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -62,7 +62,7 @@ > * Stripe cache > */ > > -#define NR_STRIPES 256 > +#define NR_STRIPES 1024 > #define STRIPE_SIZE PAGE_SIZE > #define STRIPE_SHIFT (PAGE_SHIFT - 9) > #define STRIPE_SECTORS (STRIPE_SIZE>>9) does revert commit 8811b5968f6216e fix the problem? Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > > -#define NR_STRIPES 256 > +#define NR_STRIPES 1024 Changing one magic number into another magic number might help your case, but it not really a general solution. Possibly making sure that max_nr_stripes is at least some multiple of the chunk size might make sense, but I wouldn't want to see a very large multiple. I thing the problems with RAID5 are deeper than that. Hopefully I'll figure out exactly what the best fix is soon - I'm trying to look into it. I don't think the size of the cache is a big part of the solution. I think correct scheduling of IO is the real answer. Thanks, NeilBrown
On Wed, Aug 22, 2012 at 04:00:25PM +1000, NeilBrown wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > > > > > -#define NR_STRIPES 256 > > +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. Agreed. > > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. > > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. Yes, it should not be. But with less max_nr_stripes, the chance to get a full strip write is less, and maybe that's the reason why the chance to block at get_active_strip() is more; and also, the reading is more. The perfect case would be there are no reading; setting max_nr_stripes to 32768(the max we get set now), you will find the reading is quite less(almost zero, please see the iostat I attached in former email). Anyway, I do agree this should not be the big part of the solution. If we can handle those stripes faster, I guess 256 would be enough. Thanks, Yuanhan Liu -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2012-08-22, at 12:00 AM, NeilBrown wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: >> >> -#define NR_STRIPES 256 >> +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. We've actually been carrying a patch for a few years in Lustre to increase the NR_STRIPES to 2048, and made it a configurable module parameter. This made a noticeable improvement to the performance for fast systems. > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. The other MD RAID-5/6 patches that we have change the page submission order to avoid the need to merge pages in the elevator so much, and a patch to allow zero-copy IO submission if the caller marks the page for direct IO (indicating it will not be modified until after IO completes). This avoids a lot of overhead on fast systems. This isn't really my area of expertise, but patches against RHEL6 could be seen at http://review.whamcloud.com/1142 if you want to take a look. I don't know if that code is at all relevant to what is in 3.x today. > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. My experience is that on fast systems the IO scheduler just gets in the way. Submitting larger contiguous IOs to each disk in the first place is far better than trying to merge small IOs again at the back end. Cheers, Andreas
On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown <neilb@suse.de> wrote: > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> > wrote: > >> >> -#define NR_STRIPES 256 >> +#define NR_STRIPES 1024 > > Changing one magic number into another magic number might help your case, but > it not really a general solution. > > Possibly making sure that max_nr_stripes is at least some multiple of the > chunk size might make sense, but I wouldn't want to see a very large multiple. > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > out exactly what the best fix is soon - I'm trying to look into it. > > I don't think the size of the cache is a big part of the solution. I think > correct scheduling of IO is the real answer. Not sure if this is what we are seeing here, but we still have the unresolved fast parity effect whereby slower parity calculation gives a larger time to coalesce writes. I saw this effect when playing with xor offload. -- Dan -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 22 Aug 2012 13:47:07 -0700 Dan Williams <djbw@fb.com> wrote: > On Tue, Aug 21, 2012 at 11:00 PM, NeilBrown <neilb@suse.de> wrote: > > On Wed, 22 Aug 2012 11:57:02 +0800 Yuanhan Liu <yuanhan.liu@linux.intel.com> > > wrote: > > > >> > >> -#define NR_STRIPES 256 > >> +#define NR_STRIPES 1024 > > > > Changing one magic number into another magic number might help your case, but > > it not really a general solution. > > > > Possibly making sure that max_nr_stripes is at least some multiple of the > > chunk size might make sense, but I wouldn't want to see a very large multiple. > > > > I thing the problems with RAID5 are deeper than that. Hopefully I'll figure > > out exactly what the best fix is soon - I'm trying to look into it. > > > > I don't think the size of the cache is a big part of the solution. I think > > correct scheduling of IO is the real answer. > > Not sure if this is what we are seeing here, but we still have the > unresolved fast parity effect whereby slower parity calculation gives > a larger time to coalesce writes. I saw this effect when playing with > xor offload. I did find a case where inserting a printk made it go faster again. Replacing that with msleep(2) worked as well. :-) I'm looking for a most robust solution though. Thanks for the reminder. NeilBrown
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 769151d..fa829dc 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -2432,6 +2432,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode, /* the header must be checked already in ext4_ext_remove_space() */ ext_debug("truncate since %u in leaf to %u\n", start, end); + if (!path[depth].p_hdr && !path[depth].p_bh) { + EXT4_ERROR_INODE(inode, "depth %d", depth); + BUG_ON(1); + } if (!path[depth].p_hdr) path[depth].p_hdr = ext_block_hdr(path[depth].p_bh); eh = path[depth].p_hdr; @@ -2730,6 +2734,10 @@ cont: /* this is index block */ if (!path[i].p_hdr) { ext_debug("initialize header\n"); + if (!path[i].p_hdr && !path[i].p_bh) { + EXT4_ERROR_INODE(inode, "i=%d", i); + BUG_ON(1); + } path[i].p_hdr = ext_block_hdr(path[i].p_bh); } @@ -2828,6 +2836,7 @@ out: kfree(path); if (err == -EAGAIN) { path = NULL; + i = 0; goto again; } ext4_journal_stop(handle);