Message ID | 20130912122317.GC12918@thunk.org |
---|---|
State | Not Applicable, archived |
Headers | show |
On 09/12/2013 06:23 AM, Theodore Ts'o wrote: > (I've trimmed the cc list to stop spamming people who probably don't > care as much about this). > > So I've tried the following patch, and I've confirmed that for short > xattrs (i.e., that fit inside the inode body, assuming an inode size > > 128 bytes), the mbcache paths don't trigger at all. > > Could you try this patch and see if we can figure out why mbcache code > paths are triggering for you? > > - Ted > I tried the patch. First off, looks like the patch has a lot of warnings, it does produces tremendous non-stop warnings during aim7 run, particularly the one from line 1627. Unfortunately, looks like with this patch we are still calling ext4_xattr_cache_find(). I added more debugging code to print out the first few xattrs that get us into ex4_xattr_cache_find(). This is what I have, [ 80.540977] ext4_xattr_block_set: name len 28. [ 155.480692] ext4_xattr_block_set: name selinux len 32. [ 155.486531] ext4_xattr_block_set: name selinux len 32. [ 155.492283] ext4_xattr_block_set: name selinux len 32. [ 155.497988] ext4_xattr_block_set: name selinux len 32. [ 155.504064] ext4_xattr_block_set: name selinux len 32. [ 155.509771] ext4_xattr_block_set: name selinux len 32. [ 155.515475] ext4_xattr_block_set: name selinux len 32. [ 155.521179] ext4_xattr_block_set: name selinux len 32. I'll continue debugging and get back with any new finding. Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/13/13 7:04 AM, Thavatchai Makphaibulchoke wrote: > On 09/12/2013 06:23 AM, Theodore Ts'o wrote: >> (I've trimmed the cc list to stop spamming people who probably don't >> care as much about this). >> >> So I've tried the following patch, and I've confirmed that for short >> xattrs (i.e., that fit inside the inode body, assuming an inode size > >> 128 bytes), the mbcache paths don't trigger at all. >> >> Could you try this patch and see if we can figure out why mbcache code >> paths are triggering for you? >> >> - Ted >> > > I tried the patch. First off, looks like the patch has a lot of warnings, it does produces tremendous non-stop warnings during aim7 run, particularly the one from line 1627. > > Unfortunately, looks like with this patch we are still calling ext4_xattr_cache_find(). All the patch does is issue warnings, so no fixing was expected ;) > I added more debugging code to print out the first few xattrs that get us into ex4_xattr_cache_find(). This is what I have, > > [ 80.540977] ext4_xattr_block_set: name len 28. > [ 155.480692] ext4_xattr_block_set: name selinux len 32. > [ 155.486531] ext4_xattr_block_set: name selinux len 32. > [ 155.492283] ext4_xattr_block_set: name selinux len 32. > [ 155.497988] ext4_xattr_block_set: name selinux len 32. > [ 155.504064] ext4_xattr_block_set: name selinux len 32. > [ 155.509771] ext4_xattr_block_set: name selinux len 32. > [ 155.515475] ext4_xattr_block_set: name selinux len 32. > [ 155.521179] ext4_xattr_block_set: name selinux len 32. ok, all small... > I'll continue debugging and get back with any new finding. suggestions: 1) send us dumpe2fs -h /dev/sdXX | grep "Inode size" for this filesystem 2) print out the inode number in your printk's above 3) stat one of those inodes in debugfs after the run (debugfs stat <inodenumber>) - with <> around the inode nr -Eric -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/13/2013 12:59 PM, Eric Sandeen wrote: > > suggestions: > > 1) send us dumpe2fs -h /dev/sdXX | grep "Inode size" for this filesystem > 2) print out the inode number in your printk's above > 3) stat one of those inodes in debugfs after the run (debugfs stat <inodenumber>) - with <> around the inode nr > > -Eric > I think I found out why we get into the mbcache path. The test uses ramfss, which are mounted and unmounted at the start and end of the test. Looks like a ramfs' default inode size is 128, causing all the mbcaching for the xattrs. There seems to be nothing wrong with either SELinux or xattr. Sorry for the confusion. Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 9/17/13 11:43 AM, Thavatchai Makphaibulchoke wrote: > On 09/13/2013 12:59 PM, Eric Sandeen wrote: >> >> suggestions: >> >> 1) send us dumpe2fs -h /dev/sdXX | grep "Inode size" for this >> filesystem 2) print out the inode number in your printk's above 3) >> stat one of those inodes in debugfs after the run (debugfs stat >> <inodenumber>) - with <> around the inode nr >> >> -Eric >> > > I think I found out why we get into the mbcache path. The test uses > ramfss, which are mounted and unmounted at the start and end of the > test. Looks like a ramfs' default inode size is 128, causing all the > mbcaching for the xattrs. There seems to be nothing wrong with > either SELinux or xattr. Sorry for the confusion. smaller filesystems go back to 128 byte inodes, IIRC. That's why I asked for dumpe2fs long ago ;) But glad to have the mystery solved! Thanks, -Eric > Thanks, Mak. > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Sep 18, 2013 at 09:19:10AM -0500, Eric Sandeen wrote: > > I think I found out why we get into the mbcache path. The test uses > > ramfss, which are mounted and unmounted at the start and end of the > > test. Looks like a ramfs' default inode size is 128, causing all the > > mbcaching for the xattrs. There seems to be nothing wrong with > > either SELinux or xattr. Sorry for the confusion. > > smaller filesystems go back to 128 byte inodes, IIRC. Mak, did you mean a ext4 file system created on a ramdisk, using a /dev/ramNN device, or the literal "ramfs" file system? I assume the former, right? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/21/2013 12:53 PM, Theodore Ts'o wrote: > > Mak, did you mean a ext4 file system created on a ramdisk, using a > /dev/ramNN device, or the literal "ramfs" file system? > > I assume the former, right? > > - Ted > Yes Ted, that is correct, the filesystem using /dev/ramNN. Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/21/2013 05:12 PM, Andreas Dilger wrote: > I just noticed that we have a patch for Lustre that adds a "no_mbcache" mount option to disable mbcache on a per-filesystem basis. > > The newest patch we have is based on FC19 (3.10 kernel), is this something that would be of interest if we submitted it upstream? > > http://review.whamcloud.com/#/c/7263/8/ldiskfs/kernel_patches/patches/fc19/ext4-disable-mb-cache.patch > > Cheers, Andreas > Here are some of the improvement I saw with SELinux disabled. On an 80 core machine, custom 15.5% disk 27.81 fserver 5.33% new_dbase 9.06% new_fserver 5.45% On an 8 core machine, alltests 5.24 custom 2.26 shared 9.32 short 24.08 The rest is not noticably different. Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2013-09-25, at 12:43 PM, Thavatchai Makphaibulchoke wrote: > On 09/21/2013 05:12 PM, Andreas Dilger wrote: >> I just noticed that we have a patch for Lustre that adds a "no_mbcache" mount option to disable mbcache on a per-filesystem basis. >> >> The newest patch we have is based on FC19 (3.10 kernel), is this something that would be of interest if we submitted it upstream? >> >> http://review.whamcloud.com/#/c/7263/8/ldiskfs/kernel_patches/patches/fc19/ext4-disable-mb-cache.patch >> >> Cheers, Andreas >> > > Here are some of the improvement I saw with SELinux disabled. To confirm, is this data on your "128-byte inode on ramdisk" config, or is this on a system with a real disk and 256-byte inodes? > On an 80 core machine, > > custom 15.5% > disk 27.81 > fserver 5.33% > new_dbase 9.06% > new_fserver 5.45% > > > On an 8 core machine, > > alltests 5.24 > custom 2.26 > shared 9.32 > short 24.08 > > The rest is not noticably different. This is a pretty significant price to pay for SELinux in any case. I guess it is probably lower overhead with 256-byte inodes, but anything that adds 5-25% overhead shouldn't be taken for granted. Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/03/2013 09:22 PM, Andreas Dilger wrote: > > This is a pretty significant price to pay for SELinux in any case. > I guess it is probably lower overhead with 256-byte inodes, but > anything that adds 5-25% overhead shouldn't be taken for granted. > > Cheers, Andreas > So far we have determined that, - With SELinux enabled a ramdisk filesytem, with default node size of 128 bytes, extended xattr is generated - In one of the aim7 workload, mbcache has a hit ratio of about 65% Seems like mbcache itself and the mbcache lock optimization attempt by the patch could actually improve performance for a real word system where extended xattr is employed. I believe the patch should go in if there isn't any concern. Please let me know if there is any comment or concern about the patch. Thanks, Mak. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/18/13 7:43 AM, Thavatchai Makphaibulchoke wrote: > On 10/03/2013 09:22 PM, Andreas Dilger wrote: > >> >> This is a pretty significant price to pay for SELinux in any case. >> I guess it is probably lower overhead with 256-byte inodes, but >> anything that adds 5-25% overhead shouldn't be taken for granted. >> >> Cheers, Andreas >> > > So far we have determined that, > > - With SELinux enabled a ramdisk filesytem, with default node size of 128 bytes, extended xattr is generated > - In one of the aim7 workload, mbcache has a hit ratio of about 65% > > Seems like mbcache itself and the mbcache lock optimization attempt > by the patch could actually improve performance for a real word > system where extended xattr is employed. I believe the patch should > go in if there isn't any concern. But the 128-byte inode size is not common, so in general, mbcache isn't exercised that often. (still, it is there, and if there's a nice scalability patch for it, it's probably worth it). -Eric > Please let me know if there is any comment or concern about the patch. > > Thanks, > Mak. > > > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index c081e34..0942ae3 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -265,6 +265,8 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, struct ext4_xattr_entry *entry; size_t size; int error; + static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -286,6 +288,11 @@ bad_block: error = -EIO; goto cleanup; } + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ext4_xattr_cache_insert(bh); entry = BFIRST(bh); error = ext4_xattr_find_entry(&entry, name_index, name, bh->b_size, 1); @@ -409,6 +416,8 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct inode *inode = dentry->d_inode; struct buffer_head *bh = NULL; int error; + static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -430,6 +439,11 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) error = -EIO; goto cleanup; } + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ext4_xattr_cache_insert(bh); error = ext4_xattr_list_entries(dentry, BFIRST(bh), buffer, buffer_size); @@ -526,7 +540,14 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, { struct mb_cache_entry *ce = NULL; int error = 0; + static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ce = mb_cache_entry_get(ext4_xattr_cache, bh->b_bdev, bh->b_blocknr); error = ext4_journal_get_write_access(handle, bh); if (error) @@ -745,12 +766,19 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct ext4_xattr_search *s = &bs->s; struct mb_cache_entry *ce = NULL; int error = 0; + static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); #define header(x) ((struct ext4_xattr_header *)(x)) if (i->value && i->value_len > sb->s_blocksize) return -ENOSPC; if (s->base) { + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ce = mb_cache_entry_get(ext4_xattr_cache, bs->bh->b_bdev, bs->bh->b_blocknr); error = ext4_journal_get_write_access(handle, bs->bh); @@ -769,6 +797,12 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, if (!IS_LAST_ENTRY(s->first)) ext4_xattr_rehash(header(s->base), s->here); + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, + "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ext4_xattr_cache_insert(bs->bh); } unlock_buffer(bs->bh); @@ -905,6 +939,11 @@ getblk_failed: memcpy(new_bh->b_data, s->base, new_bh->b_size); set_buffer_uptodate(new_bh); unlock_buffer(new_bh); + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } ext4_xattr_cache_insert(new_bh); error = ext4_handle_dirty_xattr_block(handle, inode, new_bh); @@ -1570,10 +1609,17 @@ ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, { __u32 hash = le32_to_cpu(header->h_hash); struct mb_cache_entry *ce; + static DEFINE_RATELIMIT_STATE(_rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); + if (__ratelimit(&_rs)) { + ext4_warning(inode->i_sb, "inode %lu", + (unsigned long) inode->i_ino); + WARN_ON(1); + } again: ce = mb_cache_entry_find_first(ext4_xattr_cache, inode->i_sb->s_bdev, hash);