Message ID | 1358068095-9034-3-git-send-email-wenqing.lz@taobao.com |
---|---|
State | Accepted, archived |
Headers | show |
On Sun, Jan 13, 2013 at 05:08:15PM +0800, Zheng Liu wrote: > From: Zheng Liu <wenqing.lz@taobao.com> > > Bigalloc feature has been used for a long time, but the documentation in mke2fs > is still missing. So add it. > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Applied (with some changes to improve the english/wording). - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Jan 14, 2013 at 10:10:06PM -0500, Theodore Ts'o wrote: > On Sun, Jan 13, 2013 at 05:08:15PM +0800, Zheng Liu wrote: > > From: Zheng Liu <wenqing.lz@taobao.com> > > > > Bigalloc feature has been used for a long time, but the documentation in mke2fs > > is still missing. So add it. > > > > Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> > > Applied (with some changes to improve the english/wording). BTW, I used the following modified text: bigalloc This feature enables clustered allocation, so that the unit of allocation is a power of two number of blocks. That is, each bit in the what had tradi‐ tionally been known as the block allocation bitmap now indicates whether a cluster is in use or not, where a cluster is by default composed of 16 blocks. This feature can decrease the time spent on doing block allocation and brings smaller fragmentation, especially for large files. The size can be speci‐ fied using the -C option. Warning: The bigalloc feature is still under devel‐ opment, and may not be fully supported with your kernel or may have various bugs. Please see the web page http://ext4.wiki.kernel.org/index.php/Bigalloc for details. And I populated the page on the ext4 wiki so that when we finally fix the delalloc/bigalloc problem, we can update the ext4 wiki to reflect this. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/15/2013 2:12 PM, Theodore Ts'o wrote: > BTW, I used the following modified text: > > bigalloc This feature enables clustered allocation, so that the > unit of allocation is a power of two number of blocks. That > is, each bit in the what had tradi‐ tionally been known as the > block allocation bitmap now indicates whether a cluster is in > use or not, where a cluster is by default composed of 16 blocks. > This feature can decrease the time spent on doing block > allocation and brings smaller fragmentation, especially for > large files. The size can be speci‐ fied using the -C option. > > Warning: The bigalloc feature is still under devel‐ opment, and > may not be fully supported with your kernel or may have various > bugs. Please see the web page > http://ext4.wiki.kernel.org/index.php/Bigalloc for details. Does this mean that a cluster is the minimum allocation unit, or can two small files allocate different blocks in the same cluster, leaving the cluster partially used? If the former, then how is this different than just using a larger block size? -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJQ9bIJAAoJEJrBOlT6nu75WskIAM6eNjA1updKy6Kh2SrMWavB bX7EeTGmXMrxbQtMDgmG1+V2kOy9RoYtCZ5+pXijqJHzrovEtyIwHVdzntKSTtZi tYSqjZrOpJ/bTJpXuP5AIew9mXRTKzGF8lNyPZkLIgX0AyhTsbC4cccpcmfsnGEX RfuwDd2Z2NEKhmsXH4SI3HXDM2f4EGZmPqPG8It/B49HXrzfDq+YqzKwVqdrDJ5V jdTLV5xjJ4E9Y+/P3EC1l2KvfDf0KjJjA2CiuG4sqrthwwQGfdEFK+MF2bfz5nMi VBsuZQRF5kFgekpsHXy7b0Do9Qa3wMm9FL8Sv2QMy7xf92FxCwrLJFlpIZ9iuSQ= =BKGM -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 15, 2013 at 02:46:17PM -0500, Phillip Susi wrote: > > Does this mean that a cluster is the minimum allocation unit, or can > two small files allocate different blocks in the same cluster, leaving > the cluster partially used? If the former, then how is this different > than just using a larger block size? The former. The difference is that we use units of blocks in the indirect blocks and extents --- and the reason for this is because there's a pretty fundamental limitation baked into the MM layer that the file system block size is less than or equal to the page size. So on architectures where we have 16k page sizes, we can use a 16k block size --- but then you won't be able to mount that file system on an x86 system. So bigalloc is basically a hack because it was easier to make this change in the file system than it is to deal with block sizes greater than the page size. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 1/15/2013 2:57 PM, Theodore Ts'o wrote: > So bigalloc is basically a hack because it was easier to make this > change in the file system than it is to deal with block sizes > greater than the page size. If it is only to get around the mm pagesize limit, then why not just have the fs automatically lie to the kernel about the block size and shift the references back and forth on the fly when it detects a larger blocksize? -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.17 (MingW32) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJQ9b5XAAoJEJrBOlT6nu75ankH/39qiU2tahSVRekQ6kkyFeaY RnGMydXcKgSacKAlvmIgP6VOnqPmBCKtdM8oxlob4+orJUksWiLmT7nvMIukUOqs QGqPzsHtpIzNZnPB6soc6ToRbx+b53EM4fQ+XIt9egnJ4p6gDRiS83xKKjyZnywq 94ZPH5Zg84Xr+zmyUFRqs/cDG2tmbo/6qgkqkVUeFfdLzygq2K4LO/dFpuRg3oqV ceUqVCieCfEplcjnyClT1uOv3RBrjCcyFW1j46UjEYiHkFENYsCSi/Hk4qOYDls+ i/bETjWABSXz23BMD2/B0wZwhGwkrsX2Y5g1CjtRksgPFthW/rmk0HzvWniC9J4= =kJ8x -----END PGP SIGNATURE----- -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 15, 2013 at 03:38:47PM -0500, Phillip Susi wrote: > > If it is only to get around the mm pagesize limit, then why not just > have the fs automatically lie to the kernel about the block size and > shift the references back and forth on the fly when it detects a > larger blocksize? Because of the pain in dealing with how to handle random writes into a sparse file. We need to either track which blocks in the large block have been initialized, or we would need to erase the entire large block before writing the first page into the large block (and then you still need to track whether or not you are writing that first or subsequent page into a large block). What we're doing with bigalloc is effectively tracking which blocks in the cluster have been initialized by using entries in the extent tree, since entries to the allocation bitmaps is in units of clusters, but entries in the extent tree is in units of blocks. Looking back at how complicated it has been to get delalloc right, it may have been the case that just using a brute-force sb_issue_zeroout when the block is freshly allocated, unless the arguments to the request to ext4_writepages() exactly covered the large block might have been simpler. Getting the Direct I/O path right would have been messy, but perhaps it would have been less work in the end. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/misc/mke2fs.8.in b/misc/mke2fs.8.in index d4fbe00..ca3083d 100644 --- a/misc/mke2fs.8.in +++ b/misc/mke2fs.8.in @@ -187,6 +187,11 @@ Check the device for bad blocks before creating the file system. If this option is specified twice, then a slower read-write test is used instead of a fast read-only test. .TP +.B \-C " cluster-size" +Specify the size of cluster in bytes. Valid cluster-size values are from +2048 to 256M bytes per cluster. If omiited, cluster-size is 64KB by +default. +.TP .B \-D Use direct I/O when writing to the disk. This avoids mke2fs dirtying a lot of buffer cache memory, which may impact other applications running @@ -516,6 +521,12 @@ prefix the feature name with a caret ('^') character. The pseudo-filesystem feature "none" will clear all filesystem features. .RS 1.2i .TP +.B bigalloc +Allow to allocate block-size beyond the 4096 bytes. That can decrease the time +spent on doing block allocation and brings smaller fragmentation, especially +for large files. The size can be specified using the +.B \-C option. +.TP .B dir_index Use hashed b-trees to speed up lookups in large directories. .TP