From patchwork Thu Nov 16 16:17:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838650 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yX6jW3z9s74; Fri, 17 Nov 2017 03:18:28 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMrw-0008VI-Bd; Thu, 16 Nov 2017 16:18:24 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMrt-0008Tw-Kp for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:21 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMrs-0000ir-Ly for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:21 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 01/11] mbcache2: reimplement mbcache Date: Thu, 16 Nov 2017 14:17:39 -0200 Message-Id: <20171116161749.15878-2-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara Original mbcache was designed to have more features than what ext? filesystems ended up using. It supported entry being in more hashes, it had a home-grown rwlocking of each entry, and one cache could cache entries from multiple filesystems. This genericity also resulted in more complex locking, larger cache entries, and generally more code complexity. This is reimplementation of the mbcache functionality to exactly fit the purpose ext? filesystems use it for. Cache entries are now considerably smaller (7 instead of 13 longs), the code is considerably smaller as well (414 vs 913 lines of code), and IMO also simpler. The new code is also much more lightweight. I have measured the speed using artificial xattr-bench benchmark, which spawns P processes, each process sets xattr for F different files, and the value of xattr is randomly chosen from a pool of V values. Averages of runtimes for 5 runs for various combinations of parameters are below. The first value in each cell is old mbache, the second value is the new mbcache. V=10 F\P 1 2 4 8 16 32 64 10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803 100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943 1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888 V=100 F\P 1 2 4 8 16 32 64 10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501 100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484 1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160 V=1000 F\P 1 2 4 8 16 32 64 10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801 100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887 1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579 V=10000 F\P 1 2 4 8 16 32 64 10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746 100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760 1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620 V=100000 F\P 1 2 4 8 16 32 64 10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771 100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544 1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198 We see that the new code is faster in pretty much all the cases and starting from 4 processes there are significant gains with the new code resulting in upto 20-times shorter runtimes. Also for large numbers of cached entries all values for the old code could not be measured as the kernel started hitting softlockups and died before the test completed. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit f9a61eb4e2471c56a63cd804c7474128138c38ac) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/Makefile | 2 +- fs/mbcache2.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mbcache2.h | 50 +++++++ 3 files changed, 410 insertions(+), 1 deletion(-) create mode 100644 fs/mbcache2.c create mode 100644 include/linux/mbcache2.h diff --git a/fs/Makefile b/fs/Makefile index a7c7f160371c..b7078cf5437c 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o -obj-$(CONFIG_FS_MBCACHE) += mbcache.o +obj-$(CONFIG_FS_MBCACHE) += mbcache.o mbcache2.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o diff --git a/fs/mbcache2.c b/fs/mbcache2.c new file mode 100644 index 000000000000..5c3e1a8c38f6 --- /dev/null +++ b/fs/mbcache2.c @@ -0,0 +1,359 @@ +#include +#include +#include +#include +#include +#include +#include + +/* + * Mbcache is a simple key-value store. Keys need not be unique, however + * key-value pairs are expected to be unique (we use this fact in + * mb2_cache_entry_delete_block()). + * + * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. + * They use hash of a block contents as a key and block number as a value. + * That's why keys need not be unique (different xattr blocks may end up having + * the same hash). However block number always uniquely identifies a cache + * entry. + * + * We provide functions for creation and removal of entries, search by key, + * and a special "delete entry with given key-value pair" operation. Fixed + * size hash table is used for fast key lookups. + */ + +struct mb2_cache { + /* Hash table of entries */ + struct hlist_bl_head *c_hash; + /* log2 of hash table size */ + int c_bucket_bits; + /* Protects c_lru_list, c_entry_count */ + spinlock_t c_lru_list_lock; + struct list_head c_lru_list; + /* Number of entries in cache */ + unsigned long c_entry_count; + struct shrinker c_shrink; +}; + +static struct kmem_cache *mb2_entry_cache; + +/* + * mb2_cache_entry_create - create entry in cache + * @cache - cache where the entry should be created + * @mask - gfp mask with which the entry should be allocated + * @key - key of the entry + * @block - block that contains data + * + * Creates entry in @cache with key @key and records that data is stored in + * block @block. The function returns -EBUSY if entry with the same key + * and for the same block already exists in cache. Otherwise 0 is returned. + */ +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block) +{ + struct mb2_cache_entry *entry, *dup; + struct hlist_bl_node *dup_node; + struct hlist_bl_head *head; + + entry = kmem_cache_alloc(mb2_entry_cache, mask); + if (!entry) + return -ENOMEM; + + INIT_LIST_HEAD(&entry->e_lru_list); + /* One ref for hash, one ref returned */ + atomic_set(&entry->e_refcnt, 1); + entry->e_key = key; + entry->e_block = block; + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + entry->e_hash_list_head = head; + hlist_bl_lock(head); + hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { + if (dup->e_key == key && dup->e_block == block) { + hlist_bl_unlock(head); + kmem_cache_free(mb2_entry_cache, entry); + return -EBUSY; + } + } + hlist_bl_add_head(&entry->e_hash_list, head); + hlist_bl_unlock(head); + + spin_lock(&cache->c_lru_list_lock); + list_add_tail(&entry->e_lru_list, &cache->c_lru_list); + /* Grab ref for LRU list */ + atomic_inc(&entry->e_refcnt); + cache->c_entry_count++; + spin_unlock(&cache->c_lru_list_lock); + + return 0; +} +EXPORT_SYMBOL(mb2_cache_entry_create); + +void __mb2_cache_entry_free(struct mb2_cache_entry *entry) +{ + kmem_cache_free(mb2_entry_cache, entry); +} +EXPORT_SYMBOL(__mb2_cache_entry_free); + +static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, + struct mb2_cache_entry *entry, + u32 key) +{ + struct mb2_cache_entry *old_entry = entry; + struct hlist_bl_node *node; + struct hlist_bl_head *head; + + if (entry) + head = entry->e_hash_list_head; + else + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + if (entry && !hlist_bl_unhashed(&entry->e_hash_list)) + node = entry->e_hash_list.next; + else + node = hlist_bl_first(head); + while (node) { + entry = hlist_bl_entry(node, struct mb2_cache_entry, + e_hash_list); + if (entry->e_key == key) { + atomic_inc(&entry->e_refcnt); + goto out; + } + node = node->next; + } + entry = NULL; +out: + hlist_bl_unlock(head); + if (old_entry) + mb2_cache_entry_put(cache, old_entry); + + return entry; +} + +/* + * mb2_cache_entry_find_first - find the first entry in cache with given key + * @cache: cache where we should search + * @key: key to look for + * + * Search in @cache for entry with key @key. Grabs reference to the first + * entry found and returns the entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key) +{ + return __entry_find(cache, NULL, key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_first); + +/* + * mb2_cache_entry_find_next - find next entry in cache with the same + * @cache: cache where we should search + * @entry: entry to start search from + * + * Finds next entry in the hash chain which has the same key as @entry. + * If @entry is unhashed (which can happen when deletion of entry races + * with the search), finds the first entry in the hash chain. The function + * drops reference to @entry and returns with a reference to the found entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + return __entry_find(cache, entry, entry->e_key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_next); + +/* mb2_cache_entry_delete_block - remove information about block from cache + * @cache - cache we work with + * @key - key of the entry to remove + * @block - block containing data for @key + * + * Remove entry from cache @cache with key @key with data stored in @block. + */ +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block) +{ + struct hlist_bl_node *node; + struct hlist_bl_head *head; + struct mb2_cache_entry *entry; + + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + hlist_bl_for_each_entry(entry, node, head, e_hash_list) { + if (entry->e_key == key && entry->e_block == block) { + /* We keep hash list reference to keep entry alive */ + hlist_bl_del_init(&entry->e_hash_list); + hlist_bl_unlock(head); + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) { + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + atomic_dec(&entry->e_refcnt); + } + spin_unlock(&cache->c_lru_list_lock); + mb2_cache_entry_put(cache, entry); + return; + } + } + hlist_bl_unlock(head); +} +EXPORT_SYMBOL(mb2_cache_entry_delete_block); + +/* mb2_cache_entry_touch - cache entry got used + * @cache - cache the entry belongs to + * @entry - entry that got used + * + * Move entry in lru list to reflect the fact that it was used. + */ +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) + list_move_tail(&cache->c_lru_list, &entry->e_lru_list); + spin_unlock(&cache->c_lru_list_lock); +} +EXPORT_SYMBOL(mb2_cache_entry_touch); + +static unsigned long mb2_cache_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + + return cache->c_entry_count; +} + +/* Shrink number of entries in cache */ +static unsigned long mb2_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) +{ + int nr_to_scan = sc->nr_to_scan; + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + struct mb2_cache_entry *entry; + struct hlist_bl_head *head; + unsigned int shrunk = 0; + + spin_lock(&cache->c_lru_list_lock); + while (nr_to_scan-- && !list_empty(&cache->c_lru_list)) { + entry = list_first_entry(&cache->c_lru_list, + struct mb2_cache_entry, e_lru_list); + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + /* + * We keep LRU list reference so that entry doesn't go away + * from under us. + */ + spin_unlock(&cache->c_lru_list_lock); + head = entry->e_hash_list_head; + hlist_bl_lock(head); + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } + hlist_bl_unlock(head); + if (mb2_cache_entry_put(cache, entry)) + shrunk++; + cond_resched(); + spin_lock(&cache->c_lru_list_lock); + } + spin_unlock(&cache->c_lru_list_lock); + + return shrunk; +} + +/* + * mb2_cache_create - create cache + * @bucket_bits: log2 of the hash table size + * + * Create cache for keys with 2^bucket_bits hash entries. + */ +struct mb2_cache *mb2_cache_create(int bucket_bits) +{ + struct mb2_cache *cache; + int bucket_count = 1 << bucket_bits; + int i; + + if (!try_module_get(THIS_MODULE)) + return NULL; + + cache = kzalloc(sizeof(struct mb2_cache), GFP_KERNEL); + if (!cache) + goto err_out; + cache->c_bucket_bits = bucket_bits; + INIT_LIST_HEAD(&cache->c_lru_list); + spin_lock_init(&cache->c_lru_list_lock); + cache->c_hash = kmalloc(bucket_count * sizeof(struct hlist_bl_head), + GFP_KERNEL); + if (!cache->c_hash) { + kfree(cache); + goto err_out; + } + for (i = 0; i < bucket_count; i++) + INIT_HLIST_BL_HEAD(&cache->c_hash[i]); + + cache->c_shrink.count_objects = mb2_cache_count; + cache->c_shrink.scan_objects = mb2_cache_scan; + cache->c_shrink.seeks = DEFAULT_SEEKS; + register_shrinker(&cache->c_shrink); + + return cache; + +err_out: + module_put(THIS_MODULE); + return NULL; +} +EXPORT_SYMBOL(mb2_cache_create); + +/* + * mb2_cache_destroy - destroy cache + * @cache: the cache to destroy + * + * Free all entries in cache and cache itself. Caller must make sure nobody + * (except shrinker) can reach @cache when calling this. + */ +void mb2_cache_destroy(struct mb2_cache *cache) +{ + struct mb2_cache_entry *entry, *next; + + unregister_shrinker(&cache->c_shrink); + + /* + * We don't bother with any locking. Cache must not be used at this + * point. + */ + list_for_each_entry_safe(entry, next, &cache->c_lru_list, e_lru_list) { + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } else + WARN_ON(1); + list_del(&entry->e_lru_list); + WARN_ON(atomic_read(&entry->e_refcnt) != 1); + mb2_cache_entry_put(cache, entry); + } + kfree(cache->c_hash); + kfree(cache); + module_put(THIS_MODULE); +} +EXPORT_SYMBOL(mb2_cache_destroy); + +static int __init mb2cache_init(void) +{ + mb2_entry_cache = kmem_cache_create("mbcache", + sizeof(struct mb2_cache_entry), 0, + SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); + BUG_ON(!mb2_entry_cache); + return 0; +} + +static void __exit mb2cache_exit(void) +{ + kmem_cache_destroy(mb2_entry_cache); +} + +module_init(mb2cache_init) +module_exit(mb2cache_exit) + +MODULE_AUTHOR("Jan Kara "); +MODULE_DESCRIPTION("Meta block cache (for extended attributes)"); +MODULE_LICENSE("GPL"); diff --git a/include/linux/mbcache2.h b/include/linux/mbcache2.h new file mode 100644 index 000000000000..b6f160ff2533 --- /dev/null +++ b/include/linux/mbcache2.h @@ -0,0 +1,50 @@ +#ifndef _LINUX_MB2CACHE_H +#define _LINUX_MB2CACHE_H + +#include +#include +#include +#include +#include + +struct mb2_cache; + +struct mb2_cache_entry { + /* LRU list - protected by cache->c_lru_list_lock */ + struct list_head e_lru_list; + /* Hash table list - protected by bitlock in e_hash_list_head */ + struct hlist_bl_node e_hash_list; + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; + /* Block number of hashed block - stable during lifetime of the entry */ + sector_t e_block; + /* Head of hash list (for list bit lock) - stable */ + struct hlist_bl_head *e_hash_list_head; +}; + +struct mb2_cache *mb2_cache_create(int bucket_bits); +void mb2_cache_destroy(struct mb2_cache *cache); + +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block); +void __mb2_cache_entry_free(struct mb2_cache_entry *entry); +static inline int mb2_cache_entry_put(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + if (!atomic_dec_and_test(&entry->e_refcnt)) + return 0; + __mb2_cache_entry_free(entry); + return 1; +} + +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block); +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key); +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry); +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry); + +#endif /* _LINUX_MB2CACHE_H */ From patchwork Thu Nov 16 16:17:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838653 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yc60mxz9s9Y; Fri, 17 Nov 2017 03:18:32 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs0-00006T-M2; Thu, 16 Nov 2017 16:18:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMrv-0008Ud-3Q for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:23 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMru-0000ir-5F for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:22 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 02/11] ext2: convert to mbcache2 Date: Thu, 16 Nov 2017 14:17:40 -0200 Message-Id: <20171116161749.15878-3-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara The conversion is generally straightforward. We convert filesystem from a global cache to per-fs one. Similarly to ext4 the tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting the buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit be0726d33cb8f411945884664924bed3cb8c70ee) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/ext2/ext2.h | 3 ++ fs/ext2/super.c | 25 ++++++---- fs/ext2/xattr.c | 143 ++++++++++++++++++++++++++------------------------------ fs/ext2/xattr.h | 21 ++------- 4 files changed, 92 insertions(+), 100 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 4c69c94cafd8..f98ce7e60a0f 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -61,6 +61,8 @@ struct ext2_block_alloc_info { #define rsv_start rsv_window._rsv_start #define rsv_end rsv_window._rsv_end +struct mb2_cache; + /* * second extended-fs super-block data in memory */ @@ -111,6 +113,7 @@ struct ext2_sb_info { * of the mount options. */ spinlock_t s_lock; + struct mb2_cache *s_mb_cache; }; static inline spinlock_t * diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 748d35afc902..111a31761ffa 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -131,7 +131,10 @@ static void ext2_put_super (struct super_block * sb) dquot_disable(sb, -1, DQUOT_USAGE_ENABLED | DQUOT_LIMITS_ENABLED); - ext2_xattr_put_super(sb); + if (sbi->s_mb_cache) { + ext2_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (!(sb->s_flags & MS_RDONLY)) { struct ext2_super_block *es = sbi->s_es; @@ -1104,6 +1107,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent) ext2_msg(sb, KERN_ERR, "error: insufficient memory"); goto failed_mount3; } + +#ifdef CONFIG_EXT2_FS_XATTR + sbi->s_mb_cache = ext2_xattr_create_cache(); + if (!sbi->s_mb_cache) { + ext2_msg(sb, KERN_ERR, "Failed to create an mb_cache"); + goto failed_mount3; + } +#endif /* * set up enough so that it can read an inode */ @@ -1149,6 +1160,8 @@ cantfind_ext2: sb->s_id); goto failed_mount; failed_mount3: + if (sbi->s_mb_cache) + ext2_xattr_destroy_cache(sbi->s_mb_cache); percpu_counter_destroy(&sbi->s_freeblocks_counter); percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); @@ -1555,20 +1568,17 @@ MODULE_ALIAS_FS("ext2"); static int __init init_ext2_fs(void) { - int err = init_ext2_xattr(); - if (err) - return err; + int err; + err = init_inodecache(); if (err) - goto out1; + return err; err = register_filesystem(&ext2_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); -out1: - exit_ext2_xattr(); return err; } @@ -1576,7 +1586,6 @@ static void __exit exit_ext2_fs(void) { unregister_filesystem(&ext2_fs_type); destroy_inodecache(); - exit_ext2_xattr(); } MODULE_AUTHOR("Remy Card and others"); diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c index fa70848afa8f..24736c8b3d51 100644 --- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -56,7 +56,7 @@ #include #include #include -#include +#include #include #include #include @@ -92,14 +92,12 @@ static int ext2_xattr_set2(struct inode *, struct buffer_head *, struct ext2_xattr_header *); -static int ext2_xattr_cache_insert(struct buffer_head *); +static int ext2_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext2_xattr_cache_find(struct inode *, struct ext2_xattr_header *); static void ext2_xattr_rehash(struct ext2_xattr_header *, struct ext2_xattr_entry *); -static struct mb_cache *ext2_xattr_cache; - static const struct xattr_handler *ext2_xattr_handler_map[] = { [EXT2_XATTR_INDEX_USER] = &ext2_xattr_user_handler, #ifdef CONFIG_EXT2_FS_POSIX_ACL @@ -154,6 +152,7 @@ ext2_xattr_get(struct inode *inode, int name_index, const char *name, size_t name_len, size; char *end; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -198,7 +197,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_get", goto found; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); error = -ENODATA; goto cleanup; @@ -211,7 +210,7 @@ found: le16_to_cpu(entry->e_value_offs) + size > inode->i_sb->s_blocksize) goto bad_block; - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); if (buffer) { error = -ERANGE; @@ -249,6 +248,7 @@ ext2_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size) char *end; size_t rest = buffer_size; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -283,7 +283,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_list", goto bad_block; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); /* list the attribute names */ @@ -480,22 +480,23 @@ bad_block: ext2_error(sb, "ext2_xattr_set", /* Here we know that we can set the new attribute. */ if (header) { - struct mb_cache_entry *ce; - /* assert(header == HDR(bh)); */ - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, - bh->b_blocknr); lock_buffer(bh); if (header->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(header->h_hash); + ea_bdebug(bh, "modifying in-place"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect modified block + */ + mb2_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, + hash, bh->b_blocknr); + /* keep the buffer locked while modifying it. */ } else { int offset; - if (ce) - mb_cache_entry_release(ce); unlock_buffer(bh); ea_bdebug(bh, "cloning"); header = kmalloc(bh->b_size, GFP_KERNEL); @@ -623,6 +624,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(sb)->s_mb_cache; if (header) { new_bh = ext2_xattr_cache_find(inode, header); @@ -650,7 +652,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, don't need to change the reference count. */ new_bh = old_bh; get_bh(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); } else { /* We need to allocate a new block */ ext2_fsblk_t goal = ext2_group_first_block_no(sb, @@ -671,7 +673,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, memcpy(new_bh->b_data, header, new_bh->b_size); set_buffer_uptodate(new_bh); unlock_buffer(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); ext2_xattr_update_super_block(sb); } @@ -704,19 +706,21 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, error = 0; if (old_bh && old_bh != new_bh) { - struct mb_cache_entry *ce; - /* * If there was an old block and we are no longer using it, * release the old block. */ - ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev, - old_bh->b_blocknr); lock_buffer(old_bh); if (HDR(old_bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(HDR(old_bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect freed block + */ + mb2_cache_entry_delete_block(ext2_mb_cache, + hash, old_bh->b_blocknr); /* Free the old block. */ - if (ce) - mb_cache_entry_free(ce); ea_bdebug(old_bh, "freeing"); ext2_free_blocks(inode, old_bh->b_blocknr, 1); mark_inode_dirty(inode); @@ -727,8 +731,6 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, } else { /* Decrement the refcount only. */ le32_add_cpu(&HDR(old_bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); dquot_free_block_nodirty(inode, 1); mark_inode_dirty(inode); mark_buffer_dirty(old_bh); @@ -754,7 +756,6 @@ void ext2_xattr_delete_inode(struct inode *inode) { struct buffer_head *bh = NULL; - struct mb_cache_entry *ce; down_write(&EXT2_I(inode)->xattr_sem); if (!EXT2_I(inode)->i_file_acl) @@ -774,19 +775,22 @@ ext2_xattr_delete_inode(struct inode *inode) EXT2_I(inode)->i_file_acl); goto cleanup; } - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, bh->b_blocknr); lock_buffer(bh); if (HDR(bh)->h_refcount == cpu_to_le32(1)) { - if (ce) - mb_cache_entry_free(ce); + __u32 hash = le32_to_cpu(HDR(bh)->h_hash); + + /* + * This must happen under buffer lock for ext2_xattr_set2() to + * reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, + hash, bh->b_blocknr); ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1); get_bh(bh); bforget(bh); unlock_buffer(bh); } else { le32_add_cpu(&HDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); ea_bdebug(bh, "refcount now=%d", le32_to_cpu(HDR(bh)->h_refcount)); unlock_buffer(bh); @@ -802,18 +806,6 @@ cleanup: up_write(&EXT2_I(inode)->xattr_sem); } -/* - * ext2_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext2_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - - /* * ext2_xattr_cache_insert() * @@ -823,28 +815,20 @@ ext2_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static int -ext2_xattr_cache_insert(struct buffer_head *bh) +ext2_xattr_cache_insert(struct mb2_cache *cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(HDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext2_xattr_cache, GFP_NOFS); - if (!ce) - return -ENOMEM; - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); if (error == -EBUSY) { ea_bdebug(bh, "already in cache (%d cache entries)", atomic_read(&ext2_xattr_cache->c_entry_count)); error = 0; } - } else { - ea_bdebug(bh, "inserting [%x] (%d cache entries)", (int)hash, - atomic_read(&ext2_xattr_cache->c_entry_count)); - mb_cache_entry_release(ce); - } + } else + ea_bdebug(bh, "inserting [%x]", (int)hash); return error; } @@ -900,23 +884,17 @@ static struct buffer_head * ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; + struct mb2_cache_entry *ce; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); again: - ce = mb_cache_entry_find_first(ext2_xattr_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext2_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } - bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { ext2_error(inode->i_sb, "ext2_xattr_cache_find", @@ -924,7 +902,21 @@ again: inode->i_ino, (unsigned long) ce->e_block); } else { lock_buffer(bh); - if (le32_to_cpu(HDR(bh)->h_refcount) > + /* + * We have to be careful about races with freeing or + * rehashing of xattr block. Once we hold buffer lock + * xattr block's state is stable so we can check + * whether the block got freed / rehashed or not. + * Since we unhash mbcache entry under buffer lock when + * freeing / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + mb2_cache_entry_put(ext2_mb_cache, ce); + unlock_buffer(bh); + brelse(bh); + goto again; + } else if (le32_to_cpu(HDR(bh)->h_refcount) > EXT2_XATTR_REFCOUNT_MAX) { ea_idebug(inode, "block %ld refcount %d>%d", (unsigned long) ce->e_block, @@ -933,13 +925,14 @@ again: } else if (!ext2_xattr_cmp(header, HDR(bh))) { ea_bdebug(bh, "b_count=%d", atomic_read(&(bh->b_count))); - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext2_mb_cache, ce); + mb2_cache_entry_put(ext2_mb_cache, ce); return bh; } unlock_buffer(bh); brelse(bh); } - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext2_mb_cache, ce); } return NULL; } @@ -1012,17 +1005,15 @@ static void ext2_xattr_rehash(struct ext2_xattr_header *header, #undef BLOCK_HASH_SHIFT -int __init -init_ext2_xattr(void) +#define HASH_BUCKET_BITS 10 + +struct mb2_cache *ext2_xattr_create_cache(void) { - ext2_xattr_cache = mb_cache_create("ext2_xattr", 6); - if (!ext2_xattr_cache) - return -ENOMEM; - return 0; + return mb2_cache_create(HASH_BUCKET_BITS); } -void -exit_ext2_xattr(void) +void ext2_xattr_destroy_cache(struct mb2_cache *cache) { - mb_cache_destroy(ext2_xattr_cache); + if (cache) + mb2_cache_destroy(cache); } diff --git a/fs/ext2/xattr.h b/fs/ext2/xattr.h index 60edf298644e..6ea38aa9563a 100644 --- a/fs/ext2/xattr.h +++ b/fs/ext2/xattr.h @@ -53,6 +53,8 @@ struct ext2_xattr_entry { #define EXT2_XATTR_SIZE(size) \ (((size) + EXT2_XATTR_ROUND) & ~EXT2_XATTR_ROUND) +struct mb2_cache; + # ifdef CONFIG_EXT2_FS_XATTR extern const struct xattr_handler ext2_xattr_user_handler; @@ -65,10 +67,9 @@ extern int ext2_xattr_get(struct inode *, int, const char *, void *, size_t); extern int ext2_xattr_set(struct inode *, int, const char *, const void *, size_t, int); extern void ext2_xattr_delete_inode(struct inode *); -extern void ext2_xattr_put_super(struct super_block *); -extern int init_ext2_xattr(void); -extern void exit_ext2_xattr(void); +extern struct mb2_cache *ext2_xattr_create_cache(void); +extern void ext2_xattr_destroy_cache(struct mb2_cache *cache); extern const struct xattr_handler *ext2_xattr_handlers[]; @@ -93,19 +94,7 @@ ext2_xattr_delete_inode(struct inode *inode) { } -static inline void -ext2_xattr_put_super(struct super_block *sb) -{ -} - -static inline int -init_ext2_xattr(void) -{ - return 0; -} - -static inline void -exit_ext2_xattr(void) +static inline void ext2_xattr_destroy_cache(struct mb2_cache *cache) { } From patchwork Thu Nov 16 16:17:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838654 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yd4rXZz9t2R; Fri, 17 Nov 2017 03:18:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs0-00006o-TW; Thu, 16 Nov 2017 16:18:28 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMrw-0008VB-9r for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:24 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMrv-0000ir-Jw for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:24 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 03/11] ext4: convert to mbcache2 Date: Thu, 16 Nov 2017 14:17:41 -0200 Message-Id: <20171116161749.15878-4-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara The conversion is generally straightforward. The only tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (backported from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/ext4/ext4.h | 2 +- fs/ext4/super.c | 7 ++- fs/ext4/xattr.c | 136 ++++++++++++++++++++++++++++---------------------------- fs/ext4/xattr.h | 5 +-- 4 files changed, 75 insertions(+), 75 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 331f82510d15..25e36e527b0e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1441,7 +1441,7 @@ struct ext4_sb_info { struct list_head s_es_list; /* List of inodes with reclaimable extents */ long s_es_nr_inode; struct ext4_es_stats s_es_stats; - struct mb_cache *s_mb_cache; + struct mb2_cache *s_mb_cache; spinlock_t s_es_lock ____cacheline_aligned_in_smp; /* Ratelimit ext4 messages. */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 9181cec0224c..1bb69c90358c 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -821,7 +821,6 @@ static void ext4_put_super(struct super_block *sb) ext4_release_system_zone(sb); ext4_mb_release(sb); ext4_ext_release(sb); - ext4_xattr_put_super(sb); if (!(sb->s_flags & MS_RDONLY) && !aborted) { ext4_clear_feature_journal_needs_recovery(sb); @@ -3889,7 +3888,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) no_journal: if (ext4_mballoc_ready) { - sbi->s_mb_cache = ext4_xattr_create_cache(sb->s_id); + sbi->s_mb_cache = ext4_xattr_create_cache(); if (!sbi->s_mb_cache) { ext4_msg(sb, KERN_ERR, "Failed to create an mb_cache"); goto failed_mount_wq; @@ -4121,6 +4120,10 @@ failed_mount4: if (EXT4_SB(sb)->rsv_conversion_wq) destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq); failed_mount_wq: + if (sbi->s_mb_cache) { + ext4_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (sbi->s_journal) { jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 571166eb1dfc..ba6b8ac5f462 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -53,7 +53,7 @@ #include #include #include -#include +#include #include #include "ext4_jbd2.h" #include "ext4.h" @@ -80,10 +80,10 @@ # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__) #endif -static void ext4_xattr_cache_insert(struct mb_cache *, struct buffer_head *); +static void ext4_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext4_xattr_cache_find(struct inode *, struct ext4_xattr_header *, - struct mb_cache_entry **); + struct mb2_cache_entry **); static void ext4_xattr_rehash(struct ext4_xattr_header *, struct ext4_xattr_entry *); static int ext4_xattr_list(struct dentry *dentry, char *buffer, @@ -295,7 +295,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, struct ext4_xattr_entry *entry; size_t size; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -442,7 +442,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct inode *inode = d_inode(dentry); struct buffer_head *bh = NULL; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -559,11 +559,8 @@ static void ext4_xattr_release_block(handle_t *handle, struct inode *inode, struct buffer_head *bh) { - struct mb_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); - ce = mb_cache_entry_get(ext4_mb_cache, bh->b_bdev, bh->b_blocknr); BUFFER_TRACE(bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bh); if (error) @@ -571,9 +568,15 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, lock_buffer(bh); if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); + ea_bdebug(bh, "refcount now=0; freeing"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, + bh->b_blocknr); get_bh(bh); unlock_buffer(bh); ext4_free_blocks(handle, inode, bh, 0, 1, @@ -581,8 +584,6 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, EXT4_FREE_BLOCKS_FORGET); } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); ext4_xattr_block_csum_set(inode, bh); /* @@ -795,17 +796,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; struct ext4_xattr_search *s = &bs->s; - struct mb_cache_entry *ce = NULL; + struct mb2_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); #define header(x) ((struct ext4_xattr_header *)(x)) if (i->value && i->value_len > sb->s_blocksize) return -ENOSPC; if (s->base) { - ce = mb_cache_entry_get(ext4_mb_cache, bs->bh->b_bdev, - bs->bh->b_blocknr); BUFFER_TRACE(bs->bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bs->bh); if (error) @@ -813,10 +812,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, lock_buffer(bs->bh); if (header(s->base)->h_refcount == cpu_to_le32(1)) { - if (ce) { - mb_cache_entry_free(ce); - ce = NULL; - } + __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect modified + * block + */ + mb2_cache_entry_delete_block(ext4_mb_cache, hash, + bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s); if (!error) { @@ -841,10 +845,6 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, int offset = (char *)s->here - bs->bh->b_data; unlock_buffer(bs->bh); - if (ce) { - mb_cache_entry_release(ce); - ce = NULL; - } ea_bdebug(bs->bh, "cloning"); s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; @@ -899,6 +899,31 @@ inserted: if (error) goto cleanup_dquot; lock_buffer(new_bh); + /* + * We have to be careful about races with + * freeing or rehashing of xattr block. Once we + * hold buffer lock xattr block's state is + * stable so we can check whether the block got + * freed / rehashed or not. Since we unhash + * mbcache entry under buffer lock when freeing + * / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + /* + * Undo everything and check mbcache + * again. + */ + unlock_buffer(new_bh); + dquot_free_block(inode, + EXT4_C2B(EXT4_SB(sb), + 1)); + brelse(new_bh); + mb2_cache_entry_put(ext4_mb_cache, ce); + ce = NULL; + new_bh = NULL; + goto inserted; + } le32_add_cpu(&BHDR(new_bh)->h_refcount, 1); ea_bdebug(new_bh, "reusing; refcount now=%d", le32_to_cpu(BHDR(new_bh)->h_refcount)); @@ -910,7 +935,8 @@ inserted: if (error) goto cleanup_dquot; } - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext4_mb_cache, ce); + mb2_cache_entry_put(ext4_mb_cache, ce); ce = NULL; } else if (bs->bh && s->base == bs->bh->b_data) { /* We were modifying this block in-place. */ @@ -976,7 +1002,7 @@ getblk_failed: cleanup: if (ce) - mb_cache_entry_release(ce); + mb2_cache_entry_put(ext4_mb_cache, ce); brelse(new_bh); if (!(bs->bh && s->base == bs->bh->b_data)) kfree(s->base); @@ -1540,17 +1566,6 @@ cleanup: brelse(bh); } -/* - * ext4_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext4_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - /* * ext4_xattr_cache_insert() * @@ -1560,28 +1575,18 @@ ext4_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static void -ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh) +ext4_xattr_cache_insert(struct mb2_cache *ext4_mb_cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext4_mb_cache, GFP_NOFS); - if (!ce) { - ea_bdebug(bh, "out of memory"); - return; - } - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, + bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); - if (error == -EBUSY) { + if (error == -EBUSY) ea_bdebug(bh, "already in cache"); - error = 0; - } - } else { + } else ea_bdebug(bh, "inserting [%x]", (int)hash); - mb_cache_entry_release(ce); - } } /* @@ -1634,26 +1639,19 @@ ext4_xattr_cmp(struct ext4_xattr_header *header1, */ static struct buffer_head * ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, - struct mb_cache_entry **pce) + struct mb2_cache_entry **pce) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache_entry *ce; + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); -again: - ce = mb_cache_entry_find_first(ext4_mb_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext4_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { EXT4_ERROR_INODE(inode, "block %lu read error", @@ -1669,7 +1667,7 @@ again: return bh; } brelse(bh); - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext4_mb_cache, ce); } return NULL; } @@ -1744,15 +1742,15 @@ static void ext4_xattr_rehash(struct ext4_xattr_header *header, #define HASH_BUCKET_BITS 10 -struct mb_cache * -ext4_xattr_create_cache(char *name) +struct mb2_cache * +ext4_xattr_create_cache(void) { - return mb_cache_create(name, HASH_BUCKET_BITS); + return mb2_cache_create(HASH_BUCKET_BITS); } -void ext4_xattr_destroy_cache(struct mb_cache *cache) +void ext4_xattr_destroy_cache(struct mb2_cache *cache) { if (cache) - mb_cache_destroy(cache); + mb2_cache_destroy(cache); } diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index ddc0957760ba..10b0f7323ed6 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -108,7 +108,6 @@ extern int ext4_xattr_set(struct inode *, int, const char *, const void *, size_ extern int ext4_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int); extern void ext4_xattr_delete_inode(handle_t *, struct inode *); -extern void ext4_xattr_put_super(struct super_block *); extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, struct ext4_inode *raw_inode, handle_t *handle); @@ -124,8 +123,8 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is); -extern struct mb_cache *ext4_xattr_create_cache(char *name); -extern void ext4_xattr_destroy_cache(struct mb_cache *); +extern struct mb2_cache *ext4_xattr_create_cache(void); +extern void ext4_xattr_destroy_cache(struct mb2_cache *); #ifdef CONFIG_EXT4_FS_SECURITY extern int ext4_init_security(handle_t *handle, struct inode *inode, From patchwork Thu Nov 16 16:17:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838651 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yb3sZKz9s7f; Fri, 17 Nov 2017 03:18:31 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMrz-00005T-HM; Thu, 16 Nov 2017 16:18:27 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMrx-0008W4-GU for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:25 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMrw-0000ir-QS for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:25 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 04/11] mbcache2: limit cache size Date: Thu, 16 Nov 2017 14:17:42 -0200 Message-Id: <20171116161749.15878-5-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara So far number of entries in mbcache is limited only by the pressure from the shrinker. Since too many entries degrade the hash table and generally we expect that caching more entries has diminishing returns, limit number of entries the same way as in the old mbcache to 16 * hash table size. Once we exceed the desired maximum number of entries, we schedule a backround work to reclaim entries. If the background work cannot keep up and the number of entries exceeds two times the desired maximum, we reclaim some entries directly when allocating a new entry. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit c2f3140fe2eceb3a6c1615b2648b9471544881c6) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/mbcache2.c | 50 +++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 45 insertions(+), 5 deletions(-) diff --git a/fs/mbcache2.c b/fs/mbcache2.c index 5c3e1a8c38f6..3e3198d6b9d6 100644 --- a/fs/mbcache2.c +++ b/fs/mbcache2.c @@ -4,6 +4,7 @@ #include #include #include +#include #include /* @@ -27,16 +28,29 @@ struct mb2_cache { struct hlist_bl_head *c_hash; /* log2 of hash table size */ int c_bucket_bits; + /* Maximum entries in cache to avoid degrading hash too much */ + int c_max_entries; /* Protects c_lru_list, c_entry_count */ spinlock_t c_lru_list_lock; struct list_head c_lru_list; /* Number of entries in cache */ unsigned long c_entry_count; struct shrinker c_shrink; + /* Work for shrinking when the cache has too many entries */ + struct work_struct c_shrink_work; }; static struct kmem_cache *mb2_entry_cache; +static unsigned long mb2_cache_shrink(struct mb2_cache *cache, + unsigned int nr_to_scan); + +/* + * Number of entries to reclaim synchronously when there are too many entries + * in cache + */ +#define SYNC_SHRINK_BATCH 64 + /* * mb2_cache_entry_create - create entry in cache * @cache - cache where the entry should be created @@ -55,6 +69,13 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, struct hlist_bl_node *dup_node; struct hlist_bl_head *head; + /* Schedule background reclaim if there are too many entries */ + if (cache->c_entry_count >= cache->c_max_entries) + schedule_work(&cache->c_shrink_work); + /* Do some sync reclaim if background reclaim cannot keep up */ + if (cache->c_entry_count >= 2*cache->c_max_entries) + mb2_cache_shrink(cache, SYNC_SHRINK_BATCH); + entry = kmem_cache_alloc(mb2_entry_cache, mask); if (!entry) return -ENOMEM; @@ -223,12 +244,9 @@ static unsigned long mb2_cache_count(struct shrinker *shrink, } /* Shrink number of entries in cache */ -static unsigned long mb2_cache_scan(struct shrinker *shrink, - struct shrink_control *sc) +static unsigned long mb2_cache_shrink(struct mb2_cache *cache, + unsigned int nr_to_scan) { - int nr_to_scan = sc->nr_to_scan; - struct mb2_cache *cache = container_of(shrink, struct mb2_cache, - c_shrink); struct mb2_cache_entry *entry; struct hlist_bl_head *head; unsigned int shrunk = 0; @@ -261,6 +279,25 @@ static unsigned long mb2_cache_scan(struct shrinker *shrink, return shrunk; } +static unsigned long mb2_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) +{ + int nr_to_scan = sc->nr_to_scan; + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + return mb2_cache_shrink(cache, nr_to_scan); +} + +/* We shrink 1/X of the cache when we have too many entries in it */ +#define SHRINK_DIVISOR 16 + +static void mb2_cache_shrink_worker(struct work_struct *work) +{ + struct mb2_cache *cache = container_of(work, struct mb2_cache, + c_shrink_work); + mb2_cache_shrink(cache, cache->c_max_entries / SHRINK_DIVISOR); +} + /* * mb2_cache_create - create cache * @bucket_bits: log2 of the hash table size @@ -280,6 +317,7 @@ struct mb2_cache *mb2_cache_create(int bucket_bits) if (!cache) goto err_out; cache->c_bucket_bits = bucket_bits; + cache->c_max_entries = bucket_count << 4; INIT_LIST_HEAD(&cache->c_lru_list); spin_lock_init(&cache->c_lru_list_lock); cache->c_hash = kmalloc(bucket_count * sizeof(struct hlist_bl_head), @@ -296,6 +334,8 @@ struct mb2_cache *mb2_cache_create(int bucket_bits) cache->c_shrink.seeks = DEFAULT_SEEKS; register_shrinker(&cache->c_shrink); + INIT_WORK(&cache->c_shrink_work, mb2_cache_shrink_worker); + return cache; err_out: From patchwork Thu Nov 16 16:17:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838656 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yg6nJ2z9s83; Fri, 17 Nov 2017 03:18:35 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs3-0000CE-Tk; Thu, 16 Nov 2017 16:18:31 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMry-000056-Ox for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:26 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMry-0000ir-1F for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:26 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 05/11] mbcache2: Use referenced bit instead of LRU Date: Thu, 16 Nov 2017 14:17:43 -0200 Message-Id: <20171116161749.15878-6-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara Currently we maintain perfect LRU list by moving entry to the tail of the list when it gets used. However these operations on cache-global list are relatively expensive. In this patch we switch to lazy updates of LRU list. Whenever entry gets used, we set a referenced bit in it. When reclaiming entries, we give referenced entries another round in the LRU. Since the list is not a real LRU anymore, rename it to just 'list'. In my testing this logic gives about 30% boost to workloads with mostly unique xattr blocks (e.g. xattr-bench with 10 files and 10000 unique xattr values). Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit f0c8b46238db9d51ef9ea0858259958d0c601cec) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/mbcache2.c | 87 +++++++++++++++++++++++++++++++----------------- include/linux/mbcache2.h | 11 +++--- 2 files changed, 63 insertions(+), 35 deletions(-) diff --git a/fs/mbcache2.c b/fs/mbcache2.c index 3e3198d6b9d6..49f7a6feaa83 100644 --- a/fs/mbcache2.c +++ b/fs/mbcache2.c @@ -30,9 +30,9 @@ struct mb2_cache { int c_bucket_bits; /* Maximum entries in cache to avoid degrading hash too much */ int c_max_entries; - /* Protects c_lru_list, c_entry_count */ - spinlock_t c_lru_list_lock; - struct list_head c_lru_list; + /* Protects c_list, c_entry_count */ + spinlock_t c_list_lock; + struct list_head c_list; /* Number of entries in cache */ unsigned long c_entry_count; struct shrinker c_shrink; @@ -45,6 +45,29 @@ static struct kmem_cache *mb2_entry_cache; static unsigned long mb2_cache_shrink(struct mb2_cache *cache, unsigned int nr_to_scan); +static inline bool mb2_cache_entry_referenced(struct mb2_cache_entry *entry) +{ + return entry->_e_hash_list_head & 1; +} + +static inline void mb2_cache_entry_set_referenced(struct mb2_cache_entry *entry) +{ + entry->_e_hash_list_head |= 1; +} + +static inline void mb2_cache_entry_clear_referenced( + struct mb2_cache_entry *entry) +{ + entry->_e_hash_list_head &= ~1; +} + +static inline struct hlist_bl_head *mb2_cache_entry_head( + struct mb2_cache_entry *entry) +{ + return (struct hlist_bl_head *) + (entry->_e_hash_list_head & ~1); +} + /* * Number of entries to reclaim synchronously when there are too many entries * in cache @@ -80,13 +103,13 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, if (!entry) return -ENOMEM; - INIT_LIST_HEAD(&entry->e_lru_list); + INIT_LIST_HEAD(&entry->e_list); /* One ref for hash, one ref returned */ atomic_set(&entry->e_refcnt, 1); entry->e_key = key; entry->e_block = block; head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; - entry->e_hash_list_head = head; + entry->_e_hash_list_head = (unsigned long)head; hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { if (dup->e_key == key && dup->e_block == block) { @@ -98,12 +121,12 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, hlist_bl_add_head(&entry->e_hash_list, head); hlist_bl_unlock(head); - spin_lock(&cache->c_lru_list_lock); - list_add_tail(&entry->e_lru_list, &cache->c_lru_list); + spin_lock(&cache->c_list_lock); + list_add_tail(&entry->e_list, &cache->c_list); /* Grab ref for LRU list */ atomic_inc(&entry->e_refcnt); cache->c_entry_count++; - spin_unlock(&cache->c_lru_list_lock); + spin_unlock(&cache->c_list_lock); return 0; } @@ -124,7 +147,7 @@ static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, struct hlist_bl_head *head; if (entry) - head = entry->e_hash_list_head; + head = mb2_cache_entry_head(entry); else head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; hlist_bl_lock(head); @@ -203,13 +226,13 @@ void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, /* We keep hash list reference to keep entry alive */ hlist_bl_del_init(&entry->e_hash_list); hlist_bl_unlock(head); - spin_lock(&cache->c_lru_list_lock); - if (!list_empty(&entry->e_lru_list)) { - list_del_init(&entry->e_lru_list); + spin_lock(&cache->c_list_lock); + if (!list_empty(&entry->e_list)) { + list_del_init(&entry->e_list); cache->c_entry_count--; atomic_dec(&entry->e_refcnt); } - spin_unlock(&cache->c_lru_list_lock); + spin_unlock(&cache->c_list_lock); mb2_cache_entry_put(cache, entry); return; } @@ -222,15 +245,12 @@ EXPORT_SYMBOL(mb2_cache_entry_delete_block); * @cache - cache the entry belongs to * @entry - entry that got used * - * Move entry in lru list to reflect the fact that it was used. + * Marks entry as used to give hit higher chances of surviving in cache. */ void mb2_cache_entry_touch(struct mb2_cache *cache, struct mb2_cache_entry *entry) { - spin_lock(&cache->c_lru_list_lock); - if (!list_empty(&entry->e_lru_list)) - list_move_tail(&cache->c_lru_list, &entry->e_lru_list); - spin_unlock(&cache->c_lru_list_lock); + mb2_cache_entry_set_referenced(entry); } EXPORT_SYMBOL(mb2_cache_entry_touch); @@ -251,18 +271,23 @@ static unsigned long mb2_cache_shrink(struct mb2_cache *cache, struct hlist_bl_head *head; unsigned int shrunk = 0; - spin_lock(&cache->c_lru_list_lock); - while (nr_to_scan-- && !list_empty(&cache->c_lru_list)) { - entry = list_first_entry(&cache->c_lru_list, - struct mb2_cache_entry, e_lru_list); - list_del_init(&entry->e_lru_list); + spin_lock(&cache->c_list_lock); + while (nr_to_scan-- && !list_empty(&cache->c_list)) { + entry = list_first_entry(&cache->c_list, + struct mb2_cache_entry, e_list); + if (mb2_cache_entry_referenced(entry)) { + mb2_cache_entry_clear_referenced(entry); + list_move_tail(&cache->c_list, &entry->e_list); + continue; + } + list_del_init(&entry->e_list); cache->c_entry_count--; /* * We keep LRU list reference so that entry doesn't go away * from under us. */ - spin_unlock(&cache->c_lru_list_lock); - head = entry->e_hash_list_head; + spin_unlock(&cache->c_list_lock); + head = mb2_cache_entry_head(entry); hlist_bl_lock(head); if (!hlist_bl_unhashed(&entry->e_hash_list)) { hlist_bl_del_init(&entry->e_hash_list); @@ -272,9 +297,9 @@ static unsigned long mb2_cache_shrink(struct mb2_cache *cache, if (mb2_cache_entry_put(cache, entry)) shrunk++; cond_resched(); - spin_lock(&cache->c_lru_list_lock); + spin_lock(&cache->c_list_lock); } - spin_unlock(&cache->c_lru_list_lock); + spin_unlock(&cache->c_list_lock); return shrunk; } @@ -318,8 +343,8 @@ struct mb2_cache *mb2_cache_create(int bucket_bits) goto err_out; cache->c_bucket_bits = bucket_bits; cache->c_max_entries = bucket_count << 4; - INIT_LIST_HEAD(&cache->c_lru_list); - spin_lock_init(&cache->c_lru_list_lock); + INIT_LIST_HEAD(&cache->c_list); + spin_lock_init(&cache->c_list_lock); cache->c_hash = kmalloc(bucket_count * sizeof(struct hlist_bl_head), GFP_KERNEL); if (!cache->c_hash) { @@ -361,13 +386,13 @@ void mb2_cache_destroy(struct mb2_cache *cache) * We don't bother with any locking. Cache must not be used at this * point. */ - list_for_each_entry_safe(entry, next, &cache->c_lru_list, e_lru_list) { + list_for_each_entry_safe(entry, next, &cache->c_list, e_list) { if (!hlist_bl_unhashed(&entry->e_hash_list)) { hlist_bl_del_init(&entry->e_hash_list); atomic_dec(&entry->e_refcnt); } else WARN_ON(1); - list_del(&entry->e_lru_list); + list_del(&entry->e_list); WARN_ON(atomic_read(&entry->e_refcnt) != 1); mb2_cache_entry_put(cache, entry); } diff --git a/include/linux/mbcache2.h b/include/linux/mbcache2.h index b6f160ff2533..c934843a6a31 100644 --- a/include/linux/mbcache2.h +++ b/include/linux/mbcache2.h @@ -10,8 +10,8 @@ struct mb2_cache; struct mb2_cache_entry { - /* LRU list - protected by cache->c_lru_list_lock */ - struct list_head e_lru_list; + /* List of entries in cache - protected by cache->c_list_lock */ + struct list_head e_list; /* Hash table list - protected by bitlock in e_hash_list_head */ struct hlist_bl_node e_hash_list; atomic_t e_refcnt; @@ -19,8 +19,11 @@ struct mb2_cache_entry { u32 e_key; /* Block number of hashed block - stable during lifetime of the entry */ sector_t e_block; - /* Head of hash list (for list bit lock) - stable */ - struct hlist_bl_head *e_hash_list_head; + /* + * Head of hash list (for list bit lock) - stable. Combined with + * referenced bit of entry + */ + unsigned long _e_hash_list_head; }; struct mb2_cache *mb2_cache_create(int bucket_bits); From patchwork Thu Nov 16 16:17:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838652 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yd2sFWz9s4q; Fri, 17 Nov 2017 03:18:33 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs1-00007N-60; Thu, 16 Nov 2017 16:18:29 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMrz-00005t-VJ for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:27 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMrz-0000ir-8j for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:27 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 06/11] ext4: kill ext4_mballoc_ready Date: Thu, 16 Nov 2017 14:17:44 -0200 Message-Id: <20171116161749.15878-7-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Andreas Gruenbacher This variable, introduced in commit 9c191f70, is unnecessary: it is set once the module has been initialized correctly, and ext4_fill_super cannot run unless the module has been initialized correctly. Signed-off-by: Andreas Gruenbacher Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit 2335d05f3a83f5290ec28c1ed30c1c742a37edc9) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/ext4/super.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 1bb69c90358c..89a013629820 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -56,7 +56,6 @@ static struct ext4_lazy_init *ext4_li_info; static struct mutex ext4_li_mtx; -static int ext4_mballoc_ready; static struct ratelimit_state ext4_mount_msg_ratelimit; static int ext4_load_journal(struct super_block *, struct ext4_super_block *, @@ -3887,12 +3886,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) sbi->s_journal->j_commit_callback = ext4_journal_commit_callback; no_journal: - if (ext4_mballoc_ready) { - sbi->s_mb_cache = ext4_xattr_create_cache(); - if (!sbi->s_mb_cache) { - ext4_msg(sb, KERN_ERR, "Failed to create an mb_cache"); - goto failed_mount_wq; - } + sbi->s_mb_cache = ext4_xattr_create_cache(); + if (!sbi->s_mb_cache) { + ext4_msg(sb, KERN_ERR, "Failed to create an mb_cache"); + goto failed_mount_wq; } if ((DUMMY_ENCRYPTION_ENABLED(sbi) || ext4_has_feature_encrypt(sb)) && @@ -5431,8 +5428,6 @@ static int __init ext4_init_fs(void) err = ext4_init_mballoc(); if (err) goto out2; - else - ext4_mballoc_ready = 1; err = init_inodecache(); if (err) goto out1; @@ -5448,7 +5443,6 @@ out: unregister_as_ext3(); destroy_inodecache(); out1: - ext4_mballoc_ready = 0; ext4_exit_mballoc(); out2: ext4_exit_sysfs(); From patchwork Thu Nov 16 16:17:45 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838655 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yg0N9Nz9s7f; Fri, 17 Nov 2017 03:18:34 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs2-0000A8-Je; Thu, 16 Nov 2017 16:18:30 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMs1-000071-9p for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:29 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMs0-0000ir-FI for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:28 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 07/11] ext4: shortcut setting of xattr to the same value Date: Thu, 16 Nov 2017 14:17:45 -0200 Message-Id: <20171116161749.15878-8-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara When someone tried to set xattr to the same value (i.e., not changing anything) we did all the work of removing original xattr, possibly breaking references to shared xattr block, inserting new xattr, and merging xattr blocks again. Since this is not so rare operation and it is relatively cheap for us to detect this case, check for this and shortcut xattr setting in that case. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit 3fd164629d25b04f291a79a013dcc7ce1a301269) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/ext4/xattr.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index ba6b8ac5f462..6f8fd0ec76e2 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1112,6 +1112,17 @@ static int ext4_xattr_ibody_set(handle_t *handle, struct inode *inode, return 0; } +static int ext4_xattr_value_same(struct ext4_xattr_search *s, + struct ext4_xattr_info *i) +{ + void *value; + + if (le32_to_cpu(s->here->e_value_size) != i->value_len) + return 0; + value = ((void *)s->base) + le16_to_cpu(s->here->e_value_offs); + return !memcmp(value, i->value, i->value_len); +} + /* * ext4_xattr_set_handle() * @@ -1188,6 +1199,13 @@ ext4_xattr_set_handle(handle_t *handle, struct inode *inode, int name_index, else if (!bs.s.not_found) error = ext4_xattr_block_set(handle, inode, &i, &bs); } else { + error = 0; + /* Xattr value did not change? Save us some work and bail out */ + if (!is.s.not_found && ext4_xattr_value_same(&is.s, &i)) + goto cleanup; + if (!bs.s.not_found && ext4_xattr_value_same(&bs.s, &i)) + goto cleanup; + error = ext4_xattr_ibody_set(handle, inode, &i, &is); if (!error && !bs.s.not_found) { i.value = NULL; From patchwork Thu Nov 16 16:17:46 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838657 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yn1KK0z9s72; Fri, 17 Nov 2017 03:18:41 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs7-0000Hh-IX; Thu, 16 Nov 2017 16:18:35 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMs2-00009g-KT for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:30 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMs1-0000ir-Lr for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:30 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 08/11] mbcache: remove mbcache Date: Thu, 16 Nov 2017 14:17:46 -0200 Message-Id: <20171116161749.15878-9-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara Both ext2 and ext4 are now converted to mbcache2. Remove the old mbcache code. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit ecd1e64412d5242b8afdef58a714bab3c5464f79) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/Makefile | 2 +- fs/mbcache.c | 858 ------------------------------------------------ include/linux/mbcache.h | 55 ---- 3 files changed, 1 insertion(+), 914 deletions(-) delete mode 100644 fs/mbcache.c delete mode 100644 include/linux/mbcache.h diff --git a/fs/Makefile b/fs/Makefile index b7078cf5437c..4fe151d92550 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o -obj-$(CONFIG_FS_MBCACHE) += mbcache.o mbcache2.o +obj-$(CONFIG_FS_MBCACHE) += mbcache2.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o diff --git a/fs/mbcache.c b/fs/mbcache.c deleted file mode 100644 index 187477ded6b3..000000000000 --- a/fs/mbcache.c +++ /dev/null @@ -1,858 +0,0 @@ -/* - * linux/fs/mbcache.c - * (C) 2001-2002 Andreas Gruenbacher, - */ - -/* - * Filesystem Meta Information Block Cache (mbcache) - * - * The mbcache caches blocks of block devices that need to be located - * by their device/block number, as well as by other criteria (such - * as the block's contents). - * - * There can only be one cache entry in a cache per device and block number. - * Additional indexes need not be unique in this sense. The number of - * additional indexes (=other criteria) can be hardwired at compile time - * or specified at cache create time. - * - * Each cache entry is of fixed size. An entry may be `valid' or `invalid' - * in the cache. A valid entry is in the main hash tables of the cache, - * and may also be in the lru list. An invalid entry is not in any hashes - * or lists. - * - * A valid cache entry is only in the lru list if no handles refer to it. - * Invalid cache entries will be freed when the last handle to the cache - * entry is released. Entries that cannot be freed immediately are put - * back on the lru list. - */ - -/* - * Lock descriptions and usage: - * - * Each hash chain of both the block and index hash tables now contains - * a built-in lock used to serialize accesses to the hash chain. - * - * Accesses to global data structures mb_cache_list and mb_cache_lru_list - * are serialized via the global spinlock mb_cache_spinlock. - * - * Each mb_cache_entry contains a spinlock, e_entry_lock, to serialize - * accesses to its local data, such as e_used and e_queued. - * - * Lock ordering: - * - * Each block hash chain's lock has the highest lock order, followed by an - * index hash chain's lock, mb_cache_bg_lock (used to implement mb_cache_entry's - * lock), and mb_cach_spinlock, with the lowest order. While holding - * either a block or index hash chain lock, a thread can acquire an - * mc_cache_bg_lock, which in turn can also acquire mb_cache_spinlock. - * - * Synchronization: - * - * Since both mb_cache_entry_get and mb_cache_entry_find scan the block and - * index hash chian, it needs to lock the corresponding hash chain. For each - * mb_cache_entry within the chain, it needs to lock the mb_cache_entry to - * prevent either any simultaneous release or free on the entry and also - * to serialize accesses to either the e_used or e_queued member of the entry. - * - * To avoid having a dangling reference to an already freed - * mb_cache_entry, an mb_cache_entry is only freed when it is not on a - * block hash chain and also no longer being referenced, both e_used, - * and e_queued are 0's. When an mb_cache_entry is explicitly freed it is - * first removed from a block hash chain. - */ - -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#ifdef MB_CACHE_DEBUG -# define mb_debug(f...) do { \ - printk(KERN_DEBUG f); \ - printk("\n"); \ - } while (0) -#define mb_assert(c) do { if (!(c)) \ - printk(KERN_ERR "assertion " #c " failed\n"); \ - } while(0) -#else -# define mb_debug(f...) do { } while(0) -# define mb_assert(c) do { } while(0) -#endif -#define mb_error(f...) do { \ - printk(KERN_ERR f); \ - printk("\n"); \ - } while(0) - -#define MB_CACHE_WRITER ((unsigned short)~0U >> 1) - -#define MB_CACHE_ENTRY_LOCK_BITS ilog2(NR_BG_LOCKS) -#define MB_CACHE_ENTRY_LOCK_INDEX(ce) \ - (hash_long((unsigned long)ce, MB_CACHE_ENTRY_LOCK_BITS)) - -static DECLARE_WAIT_QUEUE_HEAD(mb_cache_queue); -static struct blockgroup_lock *mb_cache_bg_lock; -static struct kmem_cache *mb_cache_kmem_cache; - -MODULE_AUTHOR("Andreas Gruenbacher "); -MODULE_DESCRIPTION("Meta block cache (for extended attributes)"); -MODULE_LICENSE("GPL"); - -EXPORT_SYMBOL(mb_cache_create); -EXPORT_SYMBOL(mb_cache_shrink); -EXPORT_SYMBOL(mb_cache_destroy); -EXPORT_SYMBOL(mb_cache_entry_alloc); -EXPORT_SYMBOL(mb_cache_entry_insert); -EXPORT_SYMBOL(mb_cache_entry_release); -EXPORT_SYMBOL(mb_cache_entry_free); -EXPORT_SYMBOL(mb_cache_entry_get); -#if !defined(MB_CACHE_INDEXES_COUNT) || (MB_CACHE_INDEXES_COUNT > 0) -EXPORT_SYMBOL(mb_cache_entry_find_first); -EXPORT_SYMBOL(mb_cache_entry_find_next); -#endif - -/* - * Global data: list of all mbcache's, lru list, and a spinlock for - * accessing cache data structures on SMP machines. The lru list is - * global across all mbcaches. - */ - -static LIST_HEAD(mb_cache_list); -static LIST_HEAD(mb_cache_lru_list); -static DEFINE_SPINLOCK(mb_cache_spinlock); - -static inline void -__spin_lock_mb_cache_entry(struct mb_cache_entry *ce) -{ - spin_lock(bgl_lock_ptr(mb_cache_bg_lock, - MB_CACHE_ENTRY_LOCK_INDEX(ce))); -} - -static inline void -__spin_unlock_mb_cache_entry(struct mb_cache_entry *ce) -{ - spin_unlock(bgl_lock_ptr(mb_cache_bg_lock, - MB_CACHE_ENTRY_LOCK_INDEX(ce))); -} - -static inline int -__mb_cache_entry_is_block_hashed(struct mb_cache_entry *ce) -{ - return !hlist_bl_unhashed(&ce->e_block_list); -} - - -static inline void -__mb_cache_entry_unhash_block(struct mb_cache_entry *ce) -{ - if (__mb_cache_entry_is_block_hashed(ce)) - hlist_bl_del_init(&ce->e_block_list); -} - -static inline int -__mb_cache_entry_is_index_hashed(struct mb_cache_entry *ce) -{ - return !hlist_bl_unhashed(&ce->e_index.o_list); -} - -static inline void -__mb_cache_entry_unhash_index(struct mb_cache_entry *ce) -{ - if (__mb_cache_entry_is_index_hashed(ce)) - hlist_bl_del_init(&ce->e_index.o_list); -} - -/* - * __mb_cache_entry_unhash_unlock() - * - * This function is called to unhash both the block and index hash - * chain. - * It assumes both the block and index hash chain is locked upon entry. - * It also unlock both hash chains both exit - */ -static inline void -__mb_cache_entry_unhash_unlock(struct mb_cache_entry *ce) -{ - __mb_cache_entry_unhash_index(ce); - hlist_bl_unlock(ce->e_index_hash_p); - __mb_cache_entry_unhash_block(ce); - hlist_bl_unlock(ce->e_block_hash_p); -} - -static void -__mb_cache_entry_forget(struct mb_cache_entry *ce, gfp_t gfp_mask) -{ - struct mb_cache *cache = ce->e_cache; - - mb_assert(!(ce->e_used || ce->e_queued || atomic_read(&ce->e_refcnt))); - kmem_cache_free(cache->c_entry_cache, ce); - atomic_dec(&cache->c_entry_count); -} - -static void -__mb_cache_entry_release(struct mb_cache_entry *ce) -{ - /* First lock the entry to serialize access to its local data. */ - __spin_lock_mb_cache_entry(ce); - /* Wake up all processes queuing for this cache entry. */ - if (ce->e_queued) - wake_up_all(&mb_cache_queue); - if (ce->e_used >= MB_CACHE_WRITER) - ce->e_used -= MB_CACHE_WRITER; - /* - * Make sure that all cache entries on lru_list have - * both e_used and e_qued of 0s. - */ - ce->e_used--; - if (!(ce->e_used || ce->e_queued || atomic_read(&ce->e_refcnt))) { - if (!__mb_cache_entry_is_block_hashed(ce)) { - __spin_unlock_mb_cache_entry(ce); - goto forget; - } - /* - * Need access to lru list, first drop entry lock, - * then reacquire the lock in the proper order. - */ - spin_lock(&mb_cache_spinlock); - if (list_empty(&ce->e_lru_list)) - list_add_tail(&ce->e_lru_list, &mb_cache_lru_list); - spin_unlock(&mb_cache_spinlock); - } - __spin_unlock_mb_cache_entry(ce); - return; -forget: - mb_assert(list_empty(&ce->e_lru_list)); - __mb_cache_entry_forget(ce, GFP_KERNEL); -} - -/* - * mb_cache_shrink_scan() memory pressure callback - * - * This function is called by the kernel memory management when memory - * gets low. - * - * @shrink: (ignored) - * @sc: shrink_control passed from reclaim - * - * Returns the number of objects freed. - */ -static unsigned long -mb_cache_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) -{ - LIST_HEAD(free_list); - struct mb_cache_entry *entry, *tmp; - int nr_to_scan = sc->nr_to_scan; - gfp_t gfp_mask = sc->gfp_mask; - unsigned long freed = 0; - - mb_debug("trying to free %d entries", nr_to_scan); - spin_lock(&mb_cache_spinlock); - while ((nr_to_scan-- > 0) && !list_empty(&mb_cache_lru_list)) { - struct mb_cache_entry *ce = - list_entry(mb_cache_lru_list.next, - struct mb_cache_entry, e_lru_list); - list_del_init(&ce->e_lru_list); - if (ce->e_used || ce->e_queued || atomic_read(&ce->e_refcnt)) - continue; - spin_unlock(&mb_cache_spinlock); - /* Prevent any find or get operation on the entry */ - hlist_bl_lock(ce->e_block_hash_p); - hlist_bl_lock(ce->e_index_hash_p); - /* Ignore if it is touched by a find/get */ - if (ce->e_used || ce->e_queued || atomic_read(&ce->e_refcnt) || - !list_empty(&ce->e_lru_list)) { - hlist_bl_unlock(ce->e_index_hash_p); - hlist_bl_unlock(ce->e_block_hash_p); - spin_lock(&mb_cache_spinlock); - continue; - } - __mb_cache_entry_unhash_unlock(ce); - list_add_tail(&ce->e_lru_list, &free_list); - spin_lock(&mb_cache_spinlock); - } - spin_unlock(&mb_cache_spinlock); - - list_for_each_entry_safe(entry, tmp, &free_list, e_lru_list) { - __mb_cache_entry_forget(entry, gfp_mask); - freed++; - } - return freed; -} - -static unsigned long -mb_cache_shrink_count(struct shrinker *shrink, struct shrink_control *sc) -{ - struct mb_cache *cache; - unsigned long count = 0; - - spin_lock(&mb_cache_spinlock); - list_for_each_entry(cache, &mb_cache_list, c_cache_list) { - mb_debug("cache %s (%d)", cache->c_name, - atomic_read(&cache->c_entry_count)); - count += atomic_read(&cache->c_entry_count); - } - spin_unlock(&mb_cache_spinlock); - - return vfs_pressure_ratio(count); -} - -static struct shrinker mb_cache_shrinker = { - .count_objects = mb_cache_shrink_count, - .scan_objects = mb_cache_shrink_scan, - .seeks = DEFAULT_SEEKS, -}; - -/* - * mb_cache_create() create a new cache - * - * All entries in one cache are equal size. Cache entries may be from - * multiple devices. If this is the first mbcache created, registers - * the cache with kernel memory management. Returns NULL if no more - * memory was available. - * - * @name: name of the cache (informal) - * @bucket_bits: log2(number of hash buckets) - */ -struct mb_cache * -mb_cache_create(const char *name, int bucket_bits) -{ - int n, bucket_count = 1 << bucket_bits; - struct mb_cache *cache = NULL; - - if (!mb_cache_bg_lock) { - mb_cache_bg_lock = kmalloc(sizeof(struct blockgroup_lock), - GFP_KERNEL); - if (!mb_cache_bg_lock) - return NULL; - bgl_lock_init(mb_cache_bg_lock); - } - - cache = kmalloc(sizeof(struct mb_cache), GFP_KERNEL); - if (!cache) - return NULL; - cache->c_name = name; - atomic_set(&cache->c_entry_count, 0); - cache->c_bucket_bits = bucket_bits; - cache->c_block_hash = kmalloc(bucket_count * - sizeof(struct hlist_bl_head), GFP_KERNEL); - if (!cache->c_block_hash) - goto fail; - for (n=0; nc_block_hash[n]); - cache->c_index_hash = kmalloc(bucket_count * - sizeof(struct hlist_bl_head), GFP_KERNEL); - if (!cache->c_index_hash) - goto fail; - for (n=0; nc_index_hash[n]); - if (!mb_cache_kmem_cache) { - mb_cache_kmem_cache = kmem_cache_create(name, - sizeof(struct mb_cache_entry), 0, - SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); - if (!mb_cache_kmem_cache) - goto fail2; - } - cache->c_entry_cache = mb_cache_kmem_cache; - - /* - * Set an upper limit on the number of cache entries so that the hash - * chains won't grow too long. - */ - cache->c_max_entries = bucket_count << 4; - - spin_lock(&mb_cache_spinlock); - list_add(&cache->c_cache_list, &mb_cache_list); - spin_unlock(&mb_cache_spinlock); - return cache; - -fail2: - kfree(cache->c_index_hash); - -fail: - kfree(cache->c_block_hash); - kfree(cache); - return NULL; -} - - -/* - * mb_cache_shrink() - * - * Removes all cache entries of a device from the cache. All cache entries - * currently in use cannot be freed, and thus remain in the cache. All others - * are freed. - * - * @bdev: which device's cache entries to shrink - */ -void -mb_cache_shrink(struct block_device *bdev) -{ - LIST_HEAD(free_list); - struct list_head *l; - struct mb_cache_entry *ce, *tmp; - - l = &mb_cache_lru_list; - spin_lock(&mb_cache_spinlock); - while (!list_is_last(l, &mb_cache_lru_list)) { - l = l->next; - ce = list_entry(l, struct mb_cache_entry, e_lru_list); - if (ce->e_bdev == bdev) { - list_del_init(&ce->e_lru_list); - if (ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt)) - continue; - spin_unlock(&mb_cache_spinlock); - /* - * Prevent any find or get operation on the entry. - */ - hlist_bl_lock(ce->e_block_hash_p); - hlist_bl_lock(ce->e_index_hash_p); - /* Ignore if it is touched by a find/get */ - if (ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt) || - !list_empty(&ce->e_lru_list)) { - hlist_bl_unlock(ce->e_index_hash_p); - hlist_bl_unlock(ce->e_block_hash_p); - l = &mb_cache_lru_list; - spin_lock(&mb_cache_spinlock); - continue; - } - __mb_cache_entry_unhash_unlock(ce); - mb_assert(!(ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt))); - list_add_tail(&ce->e_lru_list, &free_list); - l = &mb_cache_lru_list; - spin_lock(&mb_cache_spinlock); - } - } - spin_unlock(&mb_cache_spinlock); - - list_for_each_entry_safe(ce, tmp, &free_list, e_lru_list) { - __mb_cache_entry_forget(ce, GFP_KERNEL); - } -} - - -/* - * mb_cache_destroy() - * - * Shrinks the cache to its minimum possible size (hopefully 0 entries), - * and then destroys it. If this was the last mbcache, un-registers the - * mbcache from kernel memory management. - */ -void -mb_cache_destroy(struct mb_cache *cache) -{ - LIST_HEAD(free_list); - struct mb_cache_entry *ce, *tmp; - - spin_lock(&mb_cache_spinlock); - list_for_each_entry_safe(ce, tmp, &mb_cache_lru_list, e_lru_list) { - if (ce->e_cache == cache) - list_move_tail(&ce->e_lru_list, &free_list); - } - list_del(&cache->c_cache_list); - spin_unlock(&mb_cache_spinlock); - - list_for_each_entry_safe(ce, tmp, &free_list, e_lru_list) { - list_del_init(&ce->e_lru_list); - /* - * Prevent any find or get operation on the entry. - */ - hlist_bl_lock(ce->e_block_hash_p); - hlist_bl_lock(ce->e_index_hash_p); - mb_assert(!(ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt))); - __mb_cache_entry_unhash_unlock(ce); - __mb_cache_entry_forget(ce, GFP_KERNEL); - } - - if (atomic_read(&cache->c_entry_count) > 0) { - mb_error("cache %s: %d orphaned entries", - cache->c_name, - atomic_read(&cache->c_entry_count)); - } - - if (list_empty(&mb_cache_list)) { - kmem_cache_destroy(mb_cache_kmem_cache); - mb_cache_kmem_cache = NULL; - } - kfree(cache->c_index_hash); - kfree(cache->c_block_hash); - kfree(cache); -} - -/* - * mb_cache_entry_alloc() - * - * Allocates a new cache entry. The new entry will not be valid initially, - * and thus cannot be looked up yet. It should be filled with data, and - * then inserted into the cache using mb_cache_entry_insert(). Returns NULL - * if no more memory was available. - */ -struct mb_cache_entry * -mb_cache_entry_alloc(struct mb_cache *cache, gfp_t gfp_flags) -{ - struct mb_cache_entry *ce; - - if (atomic_read(&cache->c_entry_count) >= cache->c_max_entries) { - struct list_head *l; - - l = &mb_cache_lru_list; - spin_lock(&mb_cache_spinlock); - while (!list_is_last(l, &mb_cache_lru_list)) { - l = l->next; - ce = list_entry(l, struct mb_cache_entry, e_lru_list); - if (ce->e_cache == cache) { - list_del_init(&ce->e_lru_list); - if (ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt)) - continue; - spin_unlock(&mb_cache_spinlock); - /* - * Prevent any find or get operation on the - * entry. - */ - hlist_bl_lock(ce->e_block_hash_p); - hlist_bl_lock(ce->e_index_hash_p); - /* Ignore if it is touched by a find/get */ - if (ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt) || - !list_empty(&ce->e_lru_list)) { - hlist_bl_unlock(ce->e_index_hash_p); - hlist_bl_unlock(ce->e_block_hash_p); - l = &mb_cache_lru_list; - spin_lock(&mb_cache_spinlock); - continue; - } - mb_assert(list_empty(&ce->e_lru_list)); - mb_assert(!(ce->e_used || ce->e_queued || - atomic_read(&ce->e_refcnt))); - __mb_cache_entry_unhash_unlock(ce); - goto found; - } - } - spin_unlock(&mb_cache_spinlock); - } - - ce = kmem_cache_alloc(cache->c_entry_cache, gfp_flags); - if (!ce) - return NULL; - atomic_inc(&cache->c_entry_count); - INIT_LIST_HEAD(&ce->e_lru_list); - INIT_HLIST_BL_NODE(&ce->e_block_list); - INIT_HLIST_BL_NODE(&ce->e_index.o_list); - ce->e_cache = cache; - ce->e_queued = 0; - atomic_set(&ce->e_refcnt, 0); -found: - ce->e_block_hash_p = &cache->c_block_hash[0]; - ce->e_index_hash_p = &cache->c_index_hash[0]; - ce->e_used = 1 + MB_CACHE_WRITER; - return ce; -} - - -/* - * mb_cache_entry_insert() - * - * Inserts an entry that was allocated using mb_cache_entry_alloc() into - * the cache. After this, the cache entry can be looked up, but is not yet - * in the lru list as the caller still holds a handle to it. Returns 0 on - * success, or -EBUSY if a cache entry for that device + inode exists - * already (this may happen after a failed lookup, but when another process - * has inserted the same cache entry in the meantime). - * - * @bdev: device the cache entry belongs to - * @block: block number - * @key: lookup key - */ -int -mb_cache_entry_insert(struct mb_cache_entry *ce, struct block_device *bdev, - sector_t block, unsigned int key) -{ - struct mb_cache *cache = ce->e_cache; - unsigned int bucket; - struct hlist_bl_node *l; - struct hlist_bl_head *block_hash_p; - struct hlist_bl_head *index_hash_p; - struct mb_cache_entry *lce; - - mb_assert(ce); - bucket = hash_long((unsigned long)bdev + (block & 0xffffffff), - cache->c_bucket_bits); - block_hash_p = &cache->c_block_hash[bucket]; - hlist_bl_lock(block_hash_p); - hlist_bl_for_each_entry(lce, l, block_hash_p, e_block_list) { - if (lce->e_bdev == bdev && lce->e_block == block) { - hlist_bl_unlock(block_hash_p); - return -EBUSY; - } - } - mb_assert(!__mb_cache_entry_is_block_hashed(ce)); - __mb_cache_entry_unhash_block(ce); - __mb_cache_entry_unhash_index(ce); - ce->e_bdev = bdev; - ce->e_block = block; - ce->e_block_hash_p = block_hash_p; - ce->e_index.o_key = key; - hlist_bl_add_head(&ce->e_block_list, block_hash_p); - hlist_bl_unlock(block_hash_p); - bucket = hash_long(key, cache->c_bucket_bits); - index_hash_p = &cache->c_index_hash[bucket]; - hlist_bl_lock(index_hash_p); - ce->e_index_hash_p = index_hash_p; - hlist_bl_add_head(&ce->e_index.o_list, index_hash_p); - hlist_bl_unlock(index_hash_p); - return 0; -} - - -/* - * mb_cache_entry_release() - * - * Release a handle to a cache entry. When the last handle to a cache entry - * is released it is either freed (if it is invalid) or otherwise inserted - * in to the lru list. - */ -void -mb_cache_entry_release(struct mb_cache_entry *ce) -{ - __mb_cache_entry_release(ce); -} - - -/* - * mb_cache_entry_free() - * - */ -void -mb_cache_entry_free(struct mb_cache_entry *ce) -{ - mb_assert(ce); - mb_assert(list_empty(&ce->e_lru_list)); - hlist_bl_lock(ce->e_index_hash_p); - __mb_cache_entry_unhash_index(ce); - hlist_bl_unlock(ce->e_index_hash_p); - hlist_bl_lock(ce->e_block_hash_p); - __mb_cache_entry_unhash_block(ce); - hlist_bl_unlock(ce->e_block_hash_p); - __mb_cache_entry_release(ce); -} - - -/* - * mb_cache_entry_get() - * - * Get a cache entry by device / block number. (There can only be one entry - * in the cache per device and block.) Returns NULL if no such cache entry - * exists. The returned cache entry is locked for exclusive access ("single - * writer"). - */ -struct mb_cache_entry * -mb_cache_entry_get(struct mb_cache *cache, struct block_device *bdev, - sector_t block) -{ - unsigned int bucket; - struct hlist_bl_node *l; - struct mb_cache_entry *ce; - struct hlist_bl_head *block_hash_p; - - bucket = hash_long((unsigned long)bdev + (block & 0xffffffff), - cache->c_bucket_bits); - block_hash_p = &cache->c_block_hash[bucket]; - /* First serialize access to the block corresponding hash chain. */ - hlist_bl_lock(block_hash_p); - hlist_bl_for_each_entry(ce, l, block_hash_p, e_block_list) { - mb_assert(ce->e_block_hash_p == block_hash_p); - if (ce->e_bdev == bdev && ce->e_block == block) { - /* - * Prevent a free from removing the entry. - */ - atomic_inc(&ce->e_refcnt); - hlist_bl_unlock(block_hash_p); - __spin_lock_mb_cache_entry(ce); - atomic_dec(&ce->e_refcnt); - if (ce->e_used > 0) { - DEFINE_WAIT(wait); - while (ce->e_used > 0) { - ce->e_queued++; - prepare_to_wait(&mb_cache_queue, &wait, - TASK_UNINTERRUPTIBLE); - __spin_unlock_mb_cache_entry(ce); - schedule(); - __spin_lock_mb_cache_entry(ce); - ce->e_queued--; - } - finish_wait(&mb_cache_queue, &wait); - } - ce->e_used += 1 + MB_CACHE_WRITER; - __spin_unlock_mb_cache_entry(ce); - - if (!list_empty(&ce->e_lru_list)) { - spin_lock(&mb_cache_spinlock); - list_del_init(&ce->e_lru_list); - spin_unlock(&mb_cache_spinlock); - } - if (!__mb_cache_entry_is_block_hashed(ce)) { - __mb_cache_entry_release(ce); - return NULL; - } - return ce; - } - } - hlist_bl_unlock(block_hash_p); - return NULL; -} - -#if !defined(MB_CACHE_INDEXES_COUNT) || (MB_CACHE_INDEXES_COUNT > 0) - -static struct mb_cache_entry * -__mb_cache_entry_find(struct hlist_bl_node *l, struct hlist_bl_head *head, - struct block_device *bdev, unsigned int key) -{ - - /* The index hash chain is alredy acquire by caller. */ - while (l != NULL) { - struct mb_cache_entry *ce = - hlist_bl_entry(l, struct mb_cache_entry, - e_index.o_list); - mb_assert(ce->e_index_hash_p == head); - if (ce->e_bdev == bdev && ce->e_index.o_key == key) { - /* - * Prevent a free from removing the entry. - */ - atomic_inc(&ce->e_refcnt); - hlist_bl_unlock(head); - __spin_lock_mb_cache_entry(ce); - atomic_dec(&ce->e_refcnt); - ce->e_used++; - /* Incrementing before holding the lock gives readers - priority over writers. */ - if (ce->e_used >= MB_CACHE_WRITER) { - DEFINE_WAIT(wait); - - while (ce->e_used >= MB_CACHE_WRITER) { - ce->e_queued++; - prepare_to_wait(&mb_cache_queue, &wait, - TASK_UNINTERRUPTIBLE); - __spin_unlock_mb_cache_entry(ce); - schedule(); - __spin_lock_mb_cache_entry(ce); - ce->e_queued--; - } - finish_wait(&mb_cache_queue, &wait); - } - __spin_unlock_mb_cache_entry(ce); - if (!list_empty(&ce->e_lru_list)) { - spin_lock(&mb_cache_spinlock); - list_del_init(&ce->e_lru_list); - spin_unlock(&mb_cache_spinlock); - } - if (!__mb_cache_entry_is_block_hashed(ce)) { - __mb_cache_entry_release(ce); - return ERR_PTR(-EAGAIN); - } - return ce; - } - l = l->next; - } - hlist_bl_unlock(head); - return NULL; -} - - -/* - * mb_cache_entry_find_first() - * - * Find the first cache entry on a given device with a certain key in - * an additional index. Additional matches can be found with - * mb_cache_entry_find_next(). Returns NULL if no match was found. The - * returned cache entry is locked for shared access ("multiple readers"). - * - * @cache: the cache to search - * @bdev: the device the cache entry should belong to - * @key: the key in the index - */ -struct mb_cache_entry * -mb_cache_entry_find_first(struct mb_cache *cache, struct block_device *bdev, - unsigned int key) -{ - unsigned int bucket = hash_long(key, cache->c_bucket_bits); - struct hlist_bl_node *l; - struct mb_cache_entry *ce = NULL; - struct hlist_bl_head *index_hash_p; - - index_hash_p = &cache->c_index_hash[bucket]; - hlist_bl_lock(index_hash_p); - if (!hlist_bl_empty(index_hash_p)) { - l = hlist_bl_first(index_hash_p); - ce = __mb_cache_entry_find(l, index_hash_p, bdev, key); - } else - hlist_bl_unlock(index_hash_p); - return ce; -} - - -/* - * mb_cache_entry_find_next() - * - * Find the next cache entry on a given device with a certain key in an - * additional index. Returns NULL if no match could be found. The previous - * entry is atomatically released, so that mb_cache_entry_find_next() can - * be called like this: - * - * entry = mb_cache_entry_find_first(); - * while (entry) { - * ... - * entry = mb_cache_entry_find_next(entry, ...); - * } - * - * @prev: The previous match - * @bdev: the device the cache entry should belong to - * @key: the key in the index - */ -struct mb_cache_entry * -mb_cache_entry_find_next(struct mb_cache_entry *prev, - struct block_device *bdev, unsigned int key) -{ - struct mb_cache *cache = prev->e_cache; - unsigned int bucket = hash_long(key, cache->c_bucket_bits); - struct hlist_bl_node *l; - struct mb_cache_entry *ce; - struct hlist_bl_head *index_hash_p; - - index_hash_p = &cache->c_index_hash[bucket]; - mb_assert(prev->e_index_hash_p == index_hash_p); - hlist_bl_lock(index_hash_p); - mb_assert(!hlist_bl_empty(index_hash_p)); - l = prev->e_index.o_list.next; - ce = __mb_cache_entry_find(l, index_hash_p, bdev, key); - __mb_cache_entry_release(prev); - return ce; -} - -#endif /* !defined(MB_CACHE_INDEXES_COUNT) || (MB_CACHE_INDEXES_COUNT > 0) */ - -static int __init init_mbcache(void) -{ - register_shrinker(&mb_cache_shrinker); - return 0; -} - -static void __exit exit_mbcache(void) -{ - unregister_shrinker(&mb_cache_shrinker); -} - -module_init(init_mbcache) -module_exit(exit_mbcache) - diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h deleted file mode 100644 index 6a392e7a723a..000000000000 --- a/include/linux/mbcache.h +++ /dev/null @@ -1,55 +0,0 @@ -/* - File: linux/mbcache.h - - (C) 2001 by Andreas Gruenbacher, -*/ -struct mb_cache_entry { - struct list_head e_lru_list; - struct mb_cache *e_cache; - unsigned short e_used; - unsigned short e_queued; - atomic_t e_refcnt; - struct block_device *e_bdev; - sector_t e_block; - struct hlist_bl_node e_block_list; - struct { - struct hlist_bl_node o_list; - unsigned int o_key; - } e_index; - struct hlist_bl_head *e_block_hash_p; - struct hlist_bl_head *e_index_hash_p; -}; - -struct mb_cache { - struct list_head c_cache_list; - const char *c_name; - atomic_t c_entry_count; - int c_max_entries; - int c_bucket_bits; - struct kmem_cache *c_entry_cache; - struct hlist_bl_head *c_block_hash; - struct hlist_bl_head *c_index_hash; -}; - -/* Functions on caches */ - -struct mb_cache *mb_cache_create(const char *, int); -void mb_cache_shrink(struct block_device *); -void mb_cache_destroy(struct mb_cache *); - -/* Functions on cache entries */ - -struct mb_cache_entry *mb_cache_entry_alloc(struct mb_cache *, gfp_t); -int mb_cache_entry_insert(struct mb_cache_entry *, struct block_device *, - sector_t, unsigned int); -void mb_cache_entry_release(struct mb_cache_entry *); -void mb_cache_entry_free(struct mb_cache_entry *); -struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *, - struct block_device *, - sector_t); -struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, - struct block_device *, - unsigned int); -struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache_entry *, - struct block_device *, - unsigned int); From patchwork Thu Nov 16 16:17:47 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838660 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yw6HpMz9s83; Fri, 17 Nov 2017 03:18:48 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMsF-0000P7-IG; Thu, 16 Nov 2017 16:18:43 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMs3-0000Bw-RB for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:31 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMs3-0000ir-45 for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:31 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 09/11] mbcache2: rename to mbcache Date: Thu, 16 Nov 2017 14:17:47 -0200 Message-Id: <20171116161749.15878-10-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara Since old mbcache code is gone, let's rename new code to mbcache since number 2 is now meaningless. This is just a mechanical replacement. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit 7a2508e1b657cfc7e1371550f88c7a7bc4288f32) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/Makefile | 2 +- fs/ext2/ext2.h | 4 +- fs/ext2/xattr.c | 48 ++++++------ fs/ext2/xattr.h | 8 +- fs/ext4/ext4.h | 2 +- fs/ext4/xattr.c | 54 ++++++------- fs/ext4/xattr.h | 4 +- fs/{mbcache2.c => mbcache.c} | 180 +++++++++++++++++++++---------------------- include/linux/mbcache.h | 53 +++++++++++++ include/linux/mbcache2.h | 53 ------------- 10 files changed, 204 insertions(+), 204 deletions(-) rename fs/{mbcache2.c => mbcache.c} (66%) create mode 100644 include/linux/mbcache.h delete mode 100644 include/linux/mbcache2.h diff --git a/fs/Makefile b/fs/Makefile index 4fe151d92550..a7c7f160371c 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o -obj-$(CONFIG_FS_MBCACHE) += mbcache2.o +obj-$(CONFIG_FS_MBCACHE) += mbcache.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index f98ce7e60a0f..170939f379d7 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -61,7 +61,7 @@ struct ext2_block_alloc_info { #define rsv_start rsv_window._rsv_start #define rsv_end rsv_window._rsv_end -struct mb2_cache; +struct mb_cache; /* * second extended-fs super-block data in memory @@ -113,7 +113,7 @@ struct ext2_sb_info { * of the mount options. */ spinlock_t s_lock; - struct mb2_cache *s_mb_cache; + struct mb_cache *s_mb_cache; }; static inline spinlock_t * diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c index 24736c8b3d51..d24506dba8f7 100644 --- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -56,7 +56,7 @@ #include #include #include -#include +#include #include #include #include @@ -92,7 +92,7 @@ static int ext2_xattr_set2(struct inode *, struct buffer_head *, struct ext2_xattr_header *); -static int ext2_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); +static int ext2_xattr_cache_insert(struct mb_cache *, struct buffer_head *); static struct buffer_head *ext2_xattr_cache_find(struct inode *, struct ext2_xattr_header *); static void ext2_xattr_rehash(struct ext2_xattr_header *, @@ -152,7 +152,7 @@ ext2_xattr_get(struct inode *inode, int name_index, const char *name, size_t name_len, size; char *end; int error; - struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; + struct mb_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -248,7 +248,7 @@ ext2_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size) char *end; size_t rest = buffer_size; int error; - struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; + struct mb_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -490,8 +490,8 @@ bad_block: ext2_error(sb, "ext2_xattr_set", * This must happen under buffer lock for * ext2_xattr_set2() to reliably detect modified block */ - mb2_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, - hash, bh->b_blocknr); + mb_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, + hash, bh->b_blocknr); /* keep the buffer locked while modifying it. */ } else { @@ -624,7 +624,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; int error; - struct mb2_cache *ext2_mb_cache = EXT2_SB(sb)->s_mb_cache; + struct mb_cache *ext2_mb_cache = EXT2_SB(sb)->s_mb_cache; if (header) { new_bh = ext2_xattr_cache_find(inode, header); @@ -718,8 +718,8 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, * This must happen under buffer lock for * ext2_xattr_set2() to reliably detect freed block */ - mb2_cache_entry_delete_block(ext2_mb_cache, - hash, old_bh->b_blocknr); + mb_cache_entry_delete_block(ext2_mb_cache, + hash, old_bh->b_blocknr); /* Free the old block. */ ea_bdebug(old_bh, "freeing"); ext2_free_blocks(inode, old_bh->b_blocknr, 1); @@ -783,8 +783,8 @@ ext2_xattr_delete_inode(struct inode *inode) * This must happen under buffer lock for ext2_xattr_set2() to * reliably detect freed block */ - mb2_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, - hash, bh->b_blocknr); + mb_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, + hash, bh->b_blocknr); ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1); get_bh(bh); bforget(bh); @@ -815,12 +815,12 @@ cleanup: * Returns 0, or a negative error number on failure. */ static int -ext2_xattr_cache_insert(struct mb2_cache *cache, struct buffer_head *bh) +ext2_xattr_cache_insert(struct mb_cache *cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(HDR(bh)->h_hash); int error; - error = mb2_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); + error = mb_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); if (error) { if (error == -EBUSY) { ea_bdebug(bh, "already in cache (%d cache entries)", @@ -884,14 +884,14 @@ static struct buffer_head * ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb2_cache_entry *ce; - struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; + struct mb_cache_entry *ce; + struct mb_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); again: - ce = mb2_cache_entry_find_first(ext2_mb_cache, hash); + ce = mb_cache_entry_find_first(ext2_mb_cache, hash); while (ce) { struct buffer_head *bh; @@ -912,7 +912,7 @@ again: * entry is still hashed is reliable. */ if (hlist_bl_unhashed(&ce->e_hash_list)) { - mb2_cache_entry_put(ext2_mb_cache, ce); + mb_cache_entry_put(ext2_mb_cache, ce); unlock_buffer(bh); brelse(bh); goto again; @@ -925,14 +925,14 @@ again: } else if (!ext2_xattr_cmp(header, HDR(bh))) { ea_bdebug(bh, "b_count=%d", atomic_read(&(bh->b_count))); - mb2_cache_entry_touch(ext2_mb_cache, ce); - mb2_cache_entry_put(ext2_mb_cache, ce); + mb_cache_entry_touch(ext2_mb_cache, ce); + mb_cache_entry_put(ext2_mb_cache, ce); return bh; } unlock_buffer(bh); brelse(bh); } - ce = mb2_cache_entry_find_next(ext2_mb_cache, ce); + ce = mb_cache_entry_find_next(ext2_mb_cache, ce); } return NULL; } @@ -1007,13 +1007,13 @@ static void ext2_xattr_rehash(struct ext2_xattr_header *header, #define HASH_BUCKET_BITS 10 -struct mb2_cache *ext2_xattr_create_cache(void) +struct mb_cache *ext2_xattr_create_cache(void) { - return mb2_cache_create(HASH_BUCKET_BITS); + return mb_cache_create(HASH_BUCKET_BITS); } -void ext2_xattr_destroy_cache(struct mb2_cache *cache) +void ext2_xattr_destroy_cache(struct mb_cache *cache) { if (cache) - mb2_cache_destroy(cache); + mb_cache_destroy(cache); } diff --git a/fs/ext2/xattr.h b/fs/ext2/xattr.h index 6ea38aa9563a..6f82ab1b00ca 100644 --- a/fs/ext2/xattr.h +++ b/fs/ext2/xattr.h @@ -53,7 +53,7 @@ struct ext2_xattr_entry { #define EXT2_XATTR_SIZE(size) \ (((size) + EXT2_XATTR_ROUND) & ~EXT2_XATTR_ROUND) -struct mb2_cache; +struct mb_cache; # ifdef CONFIG_EXT2_FS_XATTR @@ -68,8 +68,8 @@ extern int ext2_xattr_set(struct inode *, int, const char *, const void *, size_ extern void ext2_xattr_delete_inode(struct inode *); -extern struct mb2_cache *ext2_xattr_create_cache(void); -extern void ext2_xattr_destroy_cache(struct mb2_cache *cache); +extern struct mb_cache *ext2_xattr_create_cache(void); +extern void ext2_xattr_destroy_cache(struct mb_cache *cache); extern const struct xattr_handler *ext2_xattr_handlers[]; @@ -94,7 +94,7 @@ ext2_xattr_delete_inode(struct inode *inode) { } -static inline void ext2_xattr_destroy_cache(struct mb2_cache *cache) +static inline void ext2_xattr_destroy_cache(struct mb_cache *cache) { } diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 25e36e527b0e..331f82510d15 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1441,7 +1441,7 @@ struct ext4_sb_info { struct list_head s_es_list; /* List of inodes with reclaimable extents */ long s_es_nr_inode; struct ext4_es_stats s_es_stats; - struct mb2_cache *s_mb_cache; + struct mb_cache *s_mb_cache; spinlock_t s_es_lock ____cacheline_aligned_in_smp; /* Ratelimit ext4 messages. */ diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 6f8fd0ec76e2..09edc652a04e 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -53,7 +53,7 @@ #include #include #include -#include +#include #include #include "ext4_jbd2.h" #include "ext4.h" @@ -80,10 +80,10 @@ # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__) #endif -static void ext4_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); +static void ext4_xattr_cache_insert(struct mb_cache *, struct buffer_head *); static struct buffer_head *ext4_xattr_cache_find(struct inode *, struct ext4_xattr_header *, - struct mb2_cache_entry **); + struct mb_cache_entry **); static void ext4_xattr_rehash(struct ext4_xattr_header *, struct ext4_xattr_entry *); static int ext4_xattr_list(struct dentry *dentry, char *buffer, @@ -295,7 +295,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, struct ext4_xattr_entry *entry; size_t size; int error; - struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -442,7 +442,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct inode *inode = d_inode(dentry); struct buffer_head *bh = NULL; int error; - struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -575,8 +575,8 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, * This must happen under buffer lock for * ext4_xattr_block_set() to reliably detect freed block */ - mb2_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, - bh->b_blocknr); + mb_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, + bh->b_blocknr); get_bh(bh); unlock_buffer(bh); ext4_free_blocks(handle, inode, bh, 0, 1, @@ -796,9 +796,9 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; struct ext4_xattr_search *s = &bs->s; - struct mb2_cache_entry *ce = NULL; + struct mb_cache_entry *ce = NULL; int error = 0; - struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); #define header(x) ((struct ext4_xattr_header *)(x)) @@ -819,8 +819,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, * ext4_xattr_block_set() to reliably detect modified * block */ - mb2_cache_entry_delete_block(ext4_mb_cache, hash, - bs->bh->b_blocknr); + mb_cache_entry_delete_block(ext4_mb_cache, hash, + bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s); if (!error) { @@ -919,7 +919,7 @@ inserted: EXT4_C2B(EXT4_SB(sb), 1)); brelse(new_bh); - mb2_cache_entry_put(ext4_mb_cache, ce); + mb_cache_entry_put(ext4_mb_cache, ce); ce = NULL; new_bh = NULL; goto inserted; @@ -935,8 +935,8 @@ inserted: if (error) goto cleanup_dquot; } - mb2_cache_entry_touch(ext4_mb_cache, ce); - mb2_cache_entry_put(ext4_mb_cache, ce); + mb_cache_entry_touch(ext4_mb_cache, ce); + mb_cache_entry_put(ext4_mb_cache, ce); ce = NULL; } else if (bs->bh && s->base == bs->bh->b_data) { /* We were modifying this block in-place. */ @@ -1002,7 +1002,7 @@ getblk_failed: cleanup: if (ce) - mb2_cache_entry_put(ext4_mb_cache, ce); + mb_cache_entry_put(ext4_mb_cache, ce); brelse(new_bh); if (!(bs->bh && s->base == bs->bh->b_data)) kfree(s->base); @@ -1593,13 +1593,13 @@ cleanup: * Returns 0, or a negative error number on failure. */ static void -ext4_xattr_cache_insert(struct mb2_cache *ext4_mb_cache, struct buffer_head *bh) +ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); int error; - error = mb2_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, - bh->b_blocknr); + error = mb_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, + bh->b_blocknr); if (error) { if (error == -EBUSY) ea_bdebug(bh, "already in cache"); @@ -1657,16 +1657,16 @@ ext4_xattr_cmp(struct ext4_xattr_header *header1, */ static struct buffer_head * ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, - struct mb2_cache_entry **pce) + struct mb_cache_entry **pce) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb2_cache_entry *ce; - struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb_cache_entry *ce; + struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); - ce = mb2_cache_entry_find_first(ext4_mb_cache, hash); + ce = mb_cache_entry_find_first(ext4_mb_cache, hash); while (ce) { struct buffer_head *bh; @@ -1685,7 +1685,7 @@ ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, return bh; } brelse(bh); - ce = mb2_cache_entry_find_next(ext4_mb_cache, ce); + ce = mb_cache_entry_find_next(ext4_mb_cache, ce); } return NULL; } @@ -1760,15 +1760,15 @@ static void ext4_xattr_rehash(struct ext4_xattr_header *header, #define HASH_BUCKET_BITS 10 -struct mb2_cache * +struct mb_cache * ext4_xattr_create_cache(void) { - return mb2_cache_create(HASH_BUCKET_BITS); + return mb_cache_create(HASH_BUCKET_BITS); } -void ext4_xattr_destroy_cache(struct mb2_cache *cache) +void ext4_xattr_destroy_cache(struct mb_cache *cache) { if (cache) - mb2_cache_destroy(cache); + mb_cache_destroy(cache); } diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index 10b0f7323ed6..69dd3e6566e0 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -123,8 +123,8 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is); -extern struct mb2_cache *ext4_xattr_create_cache(void); -extern void ext4_xattr_destroy_cache(struct mb2_cache *); +extern struct mb_cache *ext4_xattr_create_cache(void); +extern void ext4_xattr_destroy_cache(struct mb_cache *); #ifdef CONFIG_EXT4_FS_SECURITY extern int ext4_init_security(handle_t *handle, struct inode *inode, diff --git a/fs/mbcache2.c b/fs/mbcache.c similarity index 66% rename from fs/mbcache2.c rename to fs/mbcache.c index 49f7a6feaa83..4241b633f155 100644 --- a/fs/mbcache2.c +++ b/fs/mbcache.c @@ -5,12 +5,12 @@ #include #include #include -#include +#include /* * Mbcache is a simple key-value store. Keys need not be unique, however * key-value pairs are expected to be unique (we use this fact in - * mb2_cache_entry_delete_block()). + * mb_cache_entry_delete_block()). * * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. * They use hash of a block contents as a key and block number as a value. @@ -23,7 +23,7 @@ * size hash table is used for fast key lookups. */ -struct mb2_cache { +struct mb_cache { /* Hash table of entries */ struct hlist_bl_head *c_hash; /* log2 of hash table size */ @@ -40,29 +40,29 @@ struct mb2_cache { struct work_struct c_shrink_work; }; -static struct kmem_cache *mb2_entry_cache; +static struct kmem_cache *mb_entry_cache; -static unsigned long mb2_cache_shrink(struct mb2_cache *cache, - unsigned int nr_to_scan); +static unsigned long mb_cache_shrink(struct mb_cache *cache, + unsigned int nr_to_scan); -static inline bool mb2_cache_entry_referenced(struct mb2_cache_entry *entry) +static inline bool mb_cache_entry_referenced(struct mb_cache_entry *entry) { return entry->_e_hash_list_head & 1; } -static inline void mb2_cache_entry_set_referenced(struct mb2_cache_entry *entry) +static inline void mb_cache_entry_set_referenced(struct mb_cache_entry *entry) { entry->_e_hash_list_head |= 1; } -static inline void mb2_cache_entry_clear_referenced( - struct mb2_cache_entry *entry) +static inline void mb_cache_entry_clear_referenced( + struct mb_cache_entry *entry) { entry->_e_hash_list_head &= ~1; } -static inline struct hlist_bl_head *mb2_cache_entry_head( - struct mb2_cache_entry *entry) +static inline struct hlist_bl_head *mb_cache_entry_head( + struct mb_cache_entry *entry) { return (struct hlist_bl_head *) (entry->_e_hash_list_head & ~1); @@ -75,7 +75,7 @@ static inline struct hlist_bl_head *mb2_cache_entry_head( #define SYNC_SHRINK_BATCH 64 /* - * mb2_cache_entry_create - create entry in cache + * mb_cache_entry_create - create entry in cache * @cache - cache where the entry should be created * @mask - gfp mask with which the entry should be allocated * @key - key of the entry @@ -85,10 +85,10 @@ static inline struct hlist_bl_head *mb2_cache_entry_head( * block @block. The function returns -EBUSY if entry with the same key * and for the same block already exists in cache. Otherwise 0 is returned. */ -int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, - sector_t block) +int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + sector_t block) { - struct mb2_cache_entry *entry, *dup; + struct mb_cache_entry *entry, *dup; struct hlist_bl_node *dup_node; struct hlist_bl_head *head; @@ -97,9 +97,9 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, schedule_work(&cache->c_shrink_work); /* Do some sync reclaim if background reclaim cannot keep up */ if (cache->c_entry_count >= 2*cache->c_max_entries) - mb2_cache_shrink(cache, SYNC_SHRINK_BATCH); + mb_cache_shrink(cache, SYNC_SHRINK_BATCH); - entry = kmem_cache_alloc(mb2_entry_cache, mask); + entry = kmem_cache_alloc(mb_entry_cache, mask); if (!entry) return -ENOMEM; @@ -114,7 +114,7 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { if (dup->e_key == key && dup->e_block == block) { hlist_bl_unlock(head); - kmem_cache_free(mb2_entry_cache, entry); + kmem_cache_free(mb_entry_cache, entry); return -EBUSY; } } @@ -130,24 +130,24 @@ int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, return 0; } -EXPORT_SYMBOL(mb2_cache_entry_create); +EXPORT_SYMBOL(mb_cache_entry_create); -void __mb2_cache_entry_free(struct mb2_cache_entry *entry) +void __mb_cache_entry_free(struct mb_cache_entry *entry) { - kmem_cache_free(mb2_entry_cache, entry); + kmem_cache_free(mb_entry_cache, entry); } -EXPORT_SYMBOL(__mb2_cache_entry_free); +EXPORT_SYMBOL(__mb_cache_entry_free); -static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, - struct mb2_cache_entry *entry, - u32 key) +static struct mb_cache_entry *__entry_find(struct mb_cache *cache, + struct mb_cache_entry *entry, + u32 key) { - struct mb2_cache_entry *old_entry = entry; + struct mb_cache_entry *old_entry = entry; struct hlist_bl_node *node; struct hlist_bl_head *head; if (entry) - head = mb2_cache_entry_head(entry); + head = mb_cache_entry_head(entry); else head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; hlist_bl_lock(head); @@ -156,7 +156,7 @@ static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, else node = hlist_bl_first(head); while (node) { - entry = hlist_bl_entry(node, struct mb2_cache_entry, + entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list); if (entry->e_key == key) { atomic_inc(&entry->e_refcnt); @@ -168,28 +168,28 @@ static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, out: hlist_bl_unlock(head); if (old_entry) - mb2_cache_entry_put(cache, old_entry); + mb_cache_entry_put(cache, old_entry); return entry; } /* - * mb2_cache_entry_find_first - find the first entry in cache with given key + * mb_cache_entry_find_first - find the first entry in cache with given key * @cache: cache where we should search * @key: key to look for * * Search in @cache for entry with key @key. Grabs reference to the first * entry found and returns the entry. */ -struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, - u32 key) +struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, + u32 key) { return __entry_find(cache, NULL, key); } -EXPORT_SYMBOL(mb2_cache_entry_find_first); +EXPORT_SYMBOL(mb_cache_entry_find_first); /* - * mb2_cache_entry_find_next - find next entry in cache with the same + * mb_cache_entry_find_next - find next entry in cache with the same * @cache: cache where we should search * @entry: entry to start search from * @@ -198,26 +198,26 @@ EXPORT_SYMBOL(mb2_cache_entry_find_first); * with the search), finds the first entry in the hash chain. The function * drops reference to @entry and returns with a reference to the found entry. */ -struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, - struct mb2_cache_entry *entry) +struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache *cache, + struct mb_cache_entry *entry) { return __entry_find(cache, entry, entry->e_key); } -EXPORT_SYMBOL(mb2_cache_entry_find_next); +EXPORT_SYMBOL(mb_cache_entry_find_next); -/* mb2_cache_entry_delete_block - remove information about block from cache +/* mb_cache_entry_delete_block - remove information about block from cache * @cache - cache we work with * @key - key of the entry to remove * @block - block containing data for @key * * Remove entry from cache @cache with key @key with data stored in @block. */ -void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, - sector_t block) +void mb_cache_entry_delete_block(struct mb_cache *cache, u32 key, + sector_t block) { struct hlist_bl_node *node; struct hlist_bl_head *head; - struct mb2_cache_entry *entry; + struct mb_cache_entry *entry; head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; hlist_bl_lock(head); @@ -233,50 +233,50 @@ void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, atomic_dec(&entry->e_refcnt); } spin_unlock(&cache->c_list_lock); - mb2_cache_entry_put(cache, entry); + mb_cache_entry_put(cache, entry); return; } } hlist_bl_unlock(head); } -EXPORT_SYMBOL(mb2_cache_entry_delete_block); +EXPORT_SYMBOL(mb_cache_entry_delete_block); -/* mb2_cache_entry_touch - cache entry got used +/* mb_cache_entry_touch - cache entry got used * @cache - cache the entry belongs to * @entry - entry that got used * * Marks entry as used to give hit higher chances of surviving in cache. */ -void mb2_cache_entry_touch(struct mb2_cache *cache, - struct mb2_cache_entry *entry) +void mb_cache_entry_touch(struct mb_cache *cache, + struct mb_cache_entry *entry) { - mb2_cache_entry_set_referenced(entry); + mb_cache_entry_set_referenced(entry); } -EXPORT_SYMBOL(mb2_cache_entry_touch); +EXPORT_SYMBOL(mb_cache_entry_touch); -static unsigned long mb2_cache_count(struct shrinker *shrink, - struct shrink_control *sc) +static unsigned long mb_cache_count(struct shrinker *shrink, + struct shrink_control *sc) { - struct mb2_cache *cache = container_of(shrink, struct mb2_cache, - c_shrink); + struct mb_cache *cache = container_of(shrink, struct mb_cache, + c_shrink); return cache->c_entry_count; } /* Shrink number of entries in cache */ -static unsigned long mb2_cache_shrink(struct mb2_cache *cache, - unsigned int nr_to_scan) +static unsigned long mb_cache_shrink(struct mb_cache *cache, + unsigned int nr_to_scan) { - struct mb2_cache_entry *entry; + struct mb_cache_entry *entry; struct hlist_bl_head *head; unsigned int shrunk = 0; spin_lock(&cache->c_list_lock); while (nr_to_scan-- && !list_empty(&cache->c_list)) { entry = list_first_entry(&cache->c_list, - struct mb2_cache_entry, e_list); - if (mb2_cache_entry_referenced(entry)) { - mb2_cache_entry_clear_referenced(entry); + struct mb_cache_entry, e_list); + if (mb_cache_entry_referenced(entry)) { + mb_cache_entry_clear_referenced(entry); list_move_tail(&cache->c_list, &entry->e_list); continue; } @@ -287,14 +287,14 @@ static unsigned long mb2_cache_shrink(struct mb2_cache *cache, * from under us. */ spin_unlock(&cache->c_list_lock); - head = mb2_cache_entry_head(entry); + head = mb_cache_entry_head(entry); hlist_bl_lock(head); if (!hlist_bl_unhashed(&entry->e_hash_list)) { hlist_bl_del_init(&entry->e_hash_list); atomic_dec(&entry->e_refcnt); } hlist_bl_unlock(head); - if (mb2_cache_entry_put(cache, entry)) + if (mb_cache_entry_put(cache, entry)) shrunk++; cond_resched(); spin_lock(&cache->c_list_lock); @@ -304,41 +304,41 @@ static unsigned long mb2_cache_shrink(struct mb2_cache *cache, return shrunk; } -static unsigned long mb2_cache_scan(struct shrinker *shrink, - struct shrink_control *sc) +static unsigned long mb_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) { int nr_to_scan = sc->nr_to_scan; - struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + struct mb_cache *cache = container_of(shrink, struct mb_cache, c_shrink); - return mb2_cache_shrink(cache, nr_to_scan); + return mb_cache_shrink(cache, nr_to_scan); } /* We shrink 1/X of the cache when we have too many entries in it */ #define SHRINK_DIVISOR 16 -static void mb2_cache_shrink_worker(struct work_struct *work) +static void mb_cache_shrink_worker(struct work_struct *work) { - struct mb2_cache *cache = container_of(work, struct mb2_cache, - c_shrink_work); - mb2_cache_shrink(cache, cache->c_max_entries / SHRINK_DIVISOR); + struct mb_cache *cache = container_of(work, struct mb_cache, + c_shrink_work); + mb_cache_shrink(cache, cache->c_max_entries / SHRINK_DIVISOR); } /* - * mb2_cache_create - create cache + * mb_cache_create - create cache * @bucket_bits: log2 of the hash table size * * Create cache for keys with 2^bucket_bits hash entries. */ -struct mb2_cache *mb2_cache_create(int bucket_bits) +struct mb_cache *mb_cache_create(int bucket_bits) { - struct mb2_cache *cache; + struct mb_cache *cache; int bucket_count = 1 << bucket_bits; int i; if (!try_module_get(THIS_MODULE)) return NULL; - cache = kzalloc(sizeof(struct mb2_cache), GFP_KERNEL); + cache = kzalloc(sizeof(struct mb_cache), GFP_KERNEL); if (!cache) goto err_out; cache->c_bucket_bits = bucket_bits; @@ -354,12 +354,12 @@ struct mb2_cache *mb2_cache_create(int bucket_bits) for (i = 0; i < bucket_count; i++) INIT_HLIST_BL_HEAD(&cache->c_hash[i]); - cache->c_shrink.count_objects = mb2_cache_count; - cache->c_shrink.scan_objects = mb2_cache_scan; + cache->c_shrink.count_objects = mb_cache_count; + cache->c_shrink.scan_objects = mb_cache_scan; cache->c_shrink.seeks = DEFAULT_SEEKS; register_shrinker(&cache->c_shrink); - INIT_WORK(&cache->c_shrink_work, mb2_cache_shrink_worker); + INIT_WORK(&cache->c_shrink_work, mb_cache_shrink_worker); return cache; @@ -367,18 +367,18 @@ err_out: module_put(THIS_MODULE); return NULL; } -EXPORT_SYMBOL(mb2_cache_create); +EXPORT_SYMBOL(mb_cache_create); /* - * mb2_cache_destroy - destroy cache + * mb_cache_destroy - destroy cache * @cache: the cache to destroy * * Free all entries in cache and cache itself. Caller must make sure nobody * (except shrinker) can reach @cache when calling this. */ -void mb2_cache_destroy(struct mb2_cache *cache) +void mb_cache_destroy(struct mb_cache *cache) { - struct mb2_cache_entry *entry, *next; + struct mb_cache_entry *entry, *next; unregister_shrinker(&cache->c_shrink); @@ -394,30 +394,30 @@ void mb2_cache_destroy(struct mb2_cache *cache) WARN_ON(1); list_del(&entry->e_list); WARN_ON(atomic_read(&entry->e_refcnt) != 1); - mb2_cache_entry_put(cache, entry); + mb_cache_entry_put(cache, entry); } kfree(cache->c_hash); kfree(cache); module_put(THIS_MODULE); } -EXPORT_SYMBOL(mb2_cache_destroy); +EXPORT_SYMBOL(mb_cache_destroy); -static int __init mb2cache_init(void) +static int __init mbcache_init(void) { - mb2_entry_cache = kmem_cache_create("mbcache", - sizeof(struct mb2_cache_entry), 0, + mb_entry_cache = kmem_cache_create("mbcache", + sizeof(struct mb_cache_entry), 0, SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); - BUG_ON(!mb2_entry_cache); + BUG_ON(!mb_entry_cache); return 0; } -static void __exit mb2cache_exit(void) +static void __exit mbcache_exit(void) { - kmem_cache_destroy(mb2_entry_cache); + kmem_cache_destroy(mb_entry_cache); } -module_init(mb2cache_init) -module_exit(mb2cache_exit) +module_init(mbcache_init) +module_exit(mbcache_exit) MODULE_AUTHOR("Jan Kara "); MODULE_DESCRIPTION("Meta block cache (for extended attributes)"); diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h new file mode 100644 index 000000000000..a74a1f3082fb --- /dev/null +++ b/include/linux/mbcache.h @@ -0,0 +1,53 @@ +#ifndef _LINUX_MBCACHE_H +#define _LINUX_MBCACHE_H + +#include +#include +#include +#include +#include + +struct mb_cache; + +struct mb_cache_entry { + /* List of entries in cache - protected by cache->c_list_lock */ + struct list_head e_list; + /* Hash table list - protected by bitlock in e_hash_list_head */ + struct hlist_bl_node e_hash_list; + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; + /* Block number of hashed block - stable during lifetime of the entry */ + sector_t e_block; + /* + * Head of hash list (for list bit lock) - stable. Combined with + * referenced bit of entry + */ + unsigned long _e_hash_list_head; +}; + +struct mb_cache *mb_cache_create(int bucket_bits); +void mb_cache_destroy(struct mb_cache *cache); + +int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, + sector_t block); +void __mb_cache_entry_free(struct mb_cache_entry *entry); +static inline int mb_cache_entry_put(struct mb_cache *cache, + struct mb_cache_entry *entry) +{ + if (!atomic_dec_and_test(&entry->e_refcnt)) + return 0; + __mb_cache_entry_free(entry); + return 1; +} + +void mb_cache_entry_delete_block(struct mb_cache *cache, u32 key, + sector_t block); +struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, + u32 key); +struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache *cache, + struct mb_cache_entry *entry); +void mb_cache_entry_touch(struct mb_cache *cache, + struct mb_cache_entry *entry); + +#endif /* _LINUX_MBCACHE_H */ diff --git a/include/linux/mbcache2.h b/include/linux/mbcache2.h deleted file mode 100644 index c934843a6a31..000000000000 --- a/include/linux/mbcache2.h +++ /dev/null @@ -1,53 +0,0 @@ -#ifndef _LINUX_MB2CACHE_H -#define _LINUX_MB2CACHE_H - -#include -#include -#include -#include -#include - -struct mb2_cache; - -struct mb2_cache_entry { - /* List of entries in cache - protected by cache->c_list_lock */ - struct list_head e_list; - /* Hash table list - protected by bitlock in e_hash_list_head */ - struct hlist_bl_node e_hash_list; - atomic_t e_refcnt; - /* Key in hash - stable during lifetime of the entry */ - u32 e_key; - /* Block number of hashed block - stable during lifetime of the entry */ - sector_t e_block; - /* - * Head of hash list (for list bit lock) - stable. Combined with - * referenced bit of entry - */ - unsigned long _e_hash_list_head; -}; - -struct mb2_cache *mb2_cache_create(int bucket_bits); -void mb2_cache_destroy(struct mb2_cache *cache); - -int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, - sector_t block); -void __mb2_cache_entry_free(struct mb2_cache_entry *entry); -static inline int mb2_cache_entry_put(struct mb2_cache *cache, - struct mb2_cache_entry *entry) -{ - if (!atomic_dec_and_test(&entry->e_refcnt)) - return 0; - __mb2_cache_entry_free(entry); - return 1; -} - -void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, - sector_t block); -struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, - u32 key); -struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, - struct mb2_cache_entry *entry); -void mb2_cache_entry_touch(struct mb2_cache *cache, - struct mb2_cache_entry *entry); - -#endif /* _LINUX_MB2CACHE_H */ From patchwork Thu Nov 16 16:17:48 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838659 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yw15FQz9s72; Fri, 17 Nov 2017 03:18:48 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMsG-0000Pb-4T; Thu, 16 Nov 2017 16:18:44 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMs5-0000E6-7S for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:33 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMs4-0000ir-As for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:32 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 10/11] mbcache: get rid of _e_hash_list_head Date: Thu, 16 Nov 2017 14:17:48 -0200 Message-Id: <20171116161749.15878-11-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Andreas Gruenbacher Get rid of field _e_hash_list_head in cache entries and add bit field e_referenced instead. Signed-off-by: Andreas Gruenbacher Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit dc8d5e565f00c9442fa1cbf9acc115475628527c) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/mbcache.c | 41 ++++++++++------------------------------- include/linux/mbcache.h | 8 ++------ 2 files changed, 12 insertions(+), 37 deletions(-) diff --git a/fs/mbcache.c b/fs/mbcache.c index 4241b633f155..903be151dcfe 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -45,27 +45,10 @@ static struct kmem_cache *mb_entry_cache; static unsigned long mb_cache_shrink(struct mb_cache *cache, unsigned int nr_to_scan); -static inline bool mb_cache_entry_referenced(struct mb_cache_entry *entry) +static inline struct hlist_bl_head *mb_cache_entry_head(struct mb_cache *cache, + u32 key) { - return entry->_e_hash_list_head & 1; -} - -static inline void mb_cache_entry_set_referenced(struct mb_cache_entry *entry) -{ - entry->_e_hash_list_head |= 1; -} - -static inline void mb_cache_entry_clear_referenced( - struct mb_cache_entry *entry) -{ - entry->_e_hash_list_head &= ~1; -} - -static inline struct hlist_bl_head *mb_cache_entry_head( - struct mb_cache_entry *entry) -{ - return (struct hlist_bl_head *) - (entry->_e_hash_list_head & ~1); + return &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; } /* @@ -108,8 +91,7 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 1); entry->e_key = key; entry->e_block = block; - head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; - entry->_e_hash_list_head = (unsigned long)head; + head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { if (dup->e_key == key && dup->e_block == block) { @@ -146,10 +128,7 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, struct hlist_bl_node *node; struct hlist_bl_head *head; - if (entry) - head = mb_cache_entry_head(entry); - else - head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); if (entry && !hlist_bl_unhashed(&entry->e_hash_list)) node = entry->e_hash_list.next; @@ -219,7 +198,7 @@ void mb_cache_entry_delete_block(struct mb_cache *cache, u32 key, struct hlist_bl_head *head; struct mb_cache_entry *entry; - head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(entry, node, head, e_hash_list) { if (entry->e_key == key && entry->e_block == block) { @@ -250,7 +229,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_block); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry) { - mb_cache_entry_set_referenced(entry); + entry->e_referenced = 1; } EXPORT_SYMBOL(mb_cache_entry_touch); @@ -275,8 +254,8 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, while (nr_to_scan-- && !list_empty(&cache->c_list)) { entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); - if (mb_cache_entry_referenced(entry)) { - mb_cache_entry_clear_referenced(entry); + if (entry->e_referenced) { + entry->e_referenced = 0; list_move_tail(&cache->c_list, &entry->e_list); continue; } @@ -287,7 +266,7 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, * from under us. */ spin_unlock(&cache->c_list_lock); - head = mb_cache_entry_head(entry); + head = mb_cache_entry_head(cache, entry->e_key); hlist_bl_lock(head); if (!hlist_bl_unhashed(&entry->e_hash_list)) { hlist_bl_del_init(&entry->e_hash_list); diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index a74a1f3082fb..607e6968542e 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -12,18 +12,14 @@ struct mb_cache; struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; - /* Hash table list - protected by bitlock in e_hash_list_head */ + /* Hash table list - protected by hash chain bitlock */ struct hlist_bl_node e_hash_list; atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; + u32 e_referenced:1; /* Block number of hashed block - stable during lifetime of the entry */ sector_t e_block; - /* - * Head of hash list (for list bit lock) - stable. Combined with - * referenced bit of entry - */ - unsigned long _e_hash_list_head; }; struct mb_cache *mb_cache_create(int bucket_bits); From patchwork Thu Nov 16 16:17:49 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thadeu Lima de Souza Cascardo X-Patchwork-Id: 838658 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3yd5yn3d1bz9s9Y; Fri, 17 Nov 2017 03:18:41 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1eFMs8-0000JX-VH; Thu, 16 Nov 2017 16:18:36 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1eFMs6-0000Fx-7w for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:34 +0000 Received: from 1.general.cascardo.us.vpn ([10.172.70.58] helo=localhost.localdomain) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1eFMs5-0000ir-HP for kernel-team@lists.ubuntu.com; Thu, 16 Nov 2017 16:18:34 +0000 From: Thadeu Lima de Souza Cascardo To: kernel-team@lists.ubuntu.com Subject: [PATCH 11/11] mbcache: add reusable flag to cache entries Date: Thu, 16 Nov 2017 14:17:49 -0200 Message-Id: <20171116161749.15878-12-cascardo@canonical.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171116161749.15878-1-cascardo@canonical.com> References: <20171116161749.15878-1-cascardo@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Andreas Gruenbacher To reduce amount of damage caused by single bad block, we limit number of inodes sharing an xattr block to 1024. Thus there can be more xattr blocks with the same contents when there are lots of files with the same extended attributes. These xattr blocks naturally result in hash collisions and can form long hash chains and we unnecessarily check each such block only to find out we cannot use it because it is already shared by too many inodes. Add a reusable flag to cache entries which is cleared when a cache entry has reached its maximum refcount. Cache entries which are not marked reusable are skipped by mb_cache_entry_find_{first,next}. This significantly speeds up mbcache when there are many same xattr blocks. For example for xattr-bench with 5 values and each process handling 20000 files, the run for 64 processes is 25x faster with this patch. Even for 8 processes the speedup is almost 3x. We have also verified that for situations where there is only one xattr block of each kind, the patch doesn't have a measurable cost. [JK: Remove handling of setting the same value since it is not needed anymore, check for races in e_reusable setting, improve changelog, add measurements] Signed-off-by: Andreas Gruenbacher Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o (cherry picked from commit 6048c64b26097a0ffbd966866b599f990e674e9b) CVE-2015-8952 Signed-off-by: Thadeu Lima de Souza Cascardo --- fs/ext2/xattr.c | 2 +- fs/ext4/xattr.c | 65 +++++++++++++++++++++++++++++++------------------ fs/mbcache.c | 38 ++++++++++++++++++++++++++--- include/linux/mbcache.h | 5 +++- 4 files changed, 80 insertions(+), 30 deletions(-) diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c index d24506dba8f7..bf77a9f0051a 100644 --- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -820,7 +820,7 @@ ext2_xattr_cache_insert(struct mb_cache *cache, struct buffer_head *bh) __u32 hash = le32_to_cpu(HDR(bh)->h_hash); int error; - error = mb_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); + error = mb_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr, 1); if (error) { if (error == -EBUSY) { ea_bdebug(bh, "already in cache (%d cache entries)", diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 09edc652a04e..00eb94a3a6dc 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -559,6 +559,8 @@ static void ext4_xattr_release_block(handle_t *handle, struct inode *inode, struct buffer_head *bh) { + struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + u32 hash, ref; int error = 0; BUFFER_TRACE(bh, "get_write_access"); @@ -567,23 +569,33 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, goto out; lock_buffer(bh); - if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { - __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); - + hash = le32_to_cpu(BHDR(bh)->h_hash); + ref = le32_to_cpu(BHDR(bh)->h_refcount); + if (ref == 1) { ea_bdebug(bh, "refcount now=0; freeing"); /* * This must happen under buffer lock for * ext4_xattr_block_set() to reliably detect freed block */ - mb_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, - bh->b_blocknr); + mb_cache_entry_delete_block(ext4_mb_cache, hash, bh->b_blocknr); get_bh(bh); unlock_buffer(bh); ext4_free_blocks(handle, inode, bh, 0, 1, EXT4_FREE_BLOCKS_METADATA | EXT4_FREE_BLOCKS_FORGET); } else { - le32_add_cpu(&BHDR(bh)->h_refcount, -1); + ref--; + BHDR(bh)->h_refcount = cpu_to_le32(ref); + if (ref == EXT4_XATTR_REFCOUNT_MAX - 1) { + struct mb_cache_entry *ce; + + ce = mb_cache_entry_get(ext4_mb_cache, hash, + bh->b_blocknr); + if (ce) { + ce->e_reusable = 1; + mb_cache_entry_put(ext4_mb_cache, ce); + } + } ext4_xattr_block_csum_set(inode, bh); /* @@ -887,6 +899,8 @@ inserted: if (new_bh == bs->bh) ea_bdebug(new_bh, "keeping"); else { + u32 ref; + /* The old block is released after updating the inode. */ error = dquot_alloc_block(inode, @@ -901,15 +915,18 @@ inserted: lock_buffer(new_bh); /* * We have to be careful about races with - * freeing or rehashing of xattr block. Once we - * hold buffer lock xattr block's state is - * stable so we can check whether the block got - * freed / rehashed or not. Since we unhash - * mbcache entry under buffer lock when freeing - * / rehashing xattr block, checking whether - * entry is still hashed is reliable. + * freeing, rehashing or adding references to + * xattr block. Once we hold buffer lock xattr + * block's state is stable so we can check + * whether the block got freed / rehashed or + * not. Since we unhash mbcache entry under + * buffer lock when freeing / rehashing xattr + * block, checking whether entry is still + * hashed is reliable. Same rules hold for + * e_reusable handling. */ - if (hlist_bl_unhashed(&ce->e_hash_list)) { + if (hlist_bl_unhashed(&ce->e_hash_list) || + !ce->e_reusable) { /* * Undo everything and check mbcache * again. @@ -924,9 +941,12 @@ inserted: new_bh = NULL; goto inserted; } - le32_add_cpu(&BHDR(new_bh)->h_refcount, 1); + ref = le32_to_cpu(BHDR(new_bh)->h_refcount) + 1; + BHDR(new_bh)->h_refcount = cpu_to_le32(ref); + if (ref >= EXT4_XATTR_REFCOUNT_MAX) + ce->e_reusable = 0; ea_bdebug(new_bh, "reusing; refcount now=%d", - le32_to_cpu(BHDR(new_bh)->h_refcount)); + ref); ext4_xattr_block_csum_set(inode, new_bh); unlock_buffer(new_bh); error = ext4_handle_dirty_metadata(handle, @@ -1595,11 +1615,14 @@ cleanup: static void ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh) { - __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); + struct ext4_xattr_header *header = BHDR(bh); + __u32 hash = le32_to_cpu(header->h_hash); + int reusable = le32_to_cpu(header->h_refcount) < + EXT4_XATTR_REFCOUNT_MAX; int error; error = mb_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, - bh->b_blocknr); + bh->b_blocknr, reusable); if (error) { if (error == -EBUSY) ea_bdebug(bh, "already in cache"); @@ -1674,12 +1697,6 @@ ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, if (!bh) { EXT4_ERROR_INODE(inode, "block %lu read error", (unsigned long) ce->e_block); - } else if (le32_to_cpu(BHDR(bh)->h_refcount) >= - EXT4_XATTR_REFCOUNT_MAX) { - ea_idebug(inode, "block %lu refcount %d>=%d", - (unsigned long) ce->e_block, - le32_to_cpu(BHDR(bh)->h_refcount), - EXT4_XATTR_REFCOUNT_MAX); } else if (ext4_xattr_cmp(header, BHDR(bh)) == 0) { *pce = ce; return bh; diff --git a/fs/mbcache.c b/fs/mbcache.c index 903be151dcfe..eccda3a02de6 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -63,13 +63,14 @@ static inline struct hlist_bl_head *mb_cache_entry_head(struct mb_cache *cache, * @mask - gfp mask with which the entry should be allocated * @key - key of the entry * @block - block that contains data + * @reusable - is the block reusable by other inodes? * * Creates entry in @cache with key @key and records that data is stored in * block @block. The function returns -EBUSY if entry with the same key * and for the same block already exists in cache. Otherwise 0 is returned. */ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, - sector_t block) + sector_t block, bool reusable) { struct mb_cache_entry *entry, *dup; struct hlist_bl_node *dup_node; @@ -91,6 +92,7 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 1); entry->e_key = key; entry->e_block = block; + entry->e_reusable = reusable; head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { @@ -137,7 +139,7 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list); - if (entry->e_key == key) { + if (entry->e_key == key && entry->e_reusable) { atomic_inc(&entry->e_refcnt); goto out; } @@ -184,10 +186,38 @@ struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache *cache, } EXPORT_SYMBOL(mb_cache_entry_find_next); +/* + * mb_cache_entry_get - get a cache entry by block number (and key) + * @cache - cache we work with + * @key - key of block number @block + * @block - block number + */ +struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, + sector_t block) +{ + struct hlist_bl_node *node; + struct hlist_bl_head *head; + struct mb_cache_entry *entry; + + head = mb_cache_entry_head(cache, key); + hlist_bl_lock(head); + hlist_bl_for_each_entry(entry, node, head, e_hash_list) { + if (entry->e_key == key && entry->e_block == block) { + atomic_inc(&entry->e_refcnt); + goto out; + } + } + entry = NULL; +out: + hlist_bl_unlock(head); + return entry; +} +EXPORT_SYMBOL(mb_cache_entry_get); + /* mb_cache_entry_delete_block - remove information about block from cache * @cache - cache we work with - * @key - key of the entry to remove - * @block - block containing data for @key + * @key - key of block @block + * @block - block number * * Remove entry from cache @cache with key @key with data stored in @block. */ diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 607e6968542e..86c9a8b480c5 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -18,6 +18,7 @@ struct mb_cache_entry { /* Key in hash - stable during lifetime of the entry */ u32 e_key; u32 e_referenced:1; + u32 e_reusable:1; /* Block number of hashed block - stable during lifetime of the entry */ sector_t e_block; }; @@ -26,7 +27,7 @@ struct mb_cache *mb_cache_create(int bucket_bits); void mb_cache_destroy(struct mb_cache *cache); int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, - sector_t block); + sector_t block, bool reusable); void __mb_cache_entry_free(struct mb_cache_entry *entry); static inline int mb_cache_entry_put(struct mb_cache *cache, struct mb_cache_entry *entry) @@ -39,6 +40,8 @@ static inline int mb_cache_entry_put(struct mb_cache *cache, void mb_cache_entry_delete_block(struct mb_cache *cache, u32 key, sector_t block); +struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, + sector_t block); struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, u32 key); struct mb_cache_entry *mb_cache_entry_find_next(struct mb_cache *cache,