From patchwork Fri Sep 22 09:11:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shrirang Bagul X-Patchwork-Id: 817398 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3xz75W70yPz9t3h; Fri, 22 Sep 2017 19:11:43 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1dvJzo-0002qk-A1; Fri, 22 Sep 2017 09:11:40 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1dvJzm-0002pw-A6 for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:38 +0000 Received: from 1.general.shrirang--bagul.uk.vpn ([10.172.198.4] helo=snb-ubuntu.taipei) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1dvJzk-0001Ln-Uh for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:38 +0000 From: Shrirang Bagul To: kernel-team@lists.ubuntu.com Subject: [Xenial SRU][PATCH 1/3] mbcache2: reimplement mbcache Date: Fri, 22 Sep 2017 17:11:28 +0800 Message-Id: <20170922091130.15674-2-shrirang.bagul@canonical.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170922091130.15674-1-shrirang.bagul@canonical.com> References: <20170922091130.15674-1-shrirang.bagul@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara Original mbcache was designed to have more features than what ext? filesystems ended up using. It supported entry being in more hashes, it had a home-grown rwlocking of each entry, and one cache could cache entries from multiple filesystems. This genericity also resulted in more complex locking, larger cache entries, and generally more code complexity. This is reimplementation of the mbcache functionality to exactly fit the purpose ext? filesystems use it for. Cache entries are now considerably smaller (7 instead of 13 longs), the code is considerably smaller as well (414 vs 913 lines of code), and IMO also simpler. The new code is also much more lightweight. I have measured the speed using artificial xattr-bench benchmark, which spawns P processes, each process sets xattr for F different files, and the value of xattr is randomly chosen from a pool of V values. Averages of runtimes for 5 runs for various combinations of parameters are below. The first value in each cell is old mbache, the second value is the new mbcache. V=10 F\P 1 2 4 8 16 32 64 10 0.158,0.157 0.208,0.196 0.500,0.277 0.798,0.400 3.258,0.584 13.807,1.047 61.339,2.803 100 0.172,0.167 0.279,0.222 0.520,0.275 0.825,0.341 2.981,0.505 12.022,1.202 44.641,2.943 1000 0.185,0.174 0.297,0.239 0.445,0.283 0.767,0.340 2.329,0.480 6.342,1.198 16.440,3.888 V=100 F\P 1 2 4 8 16 32 64 10 0.162,0.153 0.200,0.186 0.362,0.257 0.671,0.496 1.433,0.943 3.801,1.345 7.938,2.501 100 0.153,0.160 0.221,0.199 0.404,0.264 0.945,0.379 1.556,0.485 3.761,1.156 7.901,2.484 1000 0.215,0.191 0.303,0.246 0.471,0.288 0.960,0.347 1.647,0.479 3.916,1.176 8.058,3.160 V=1000 F\P 1 2 4 8 16 32 64 10 0.151,0.129 0.210,0.163 0.326,0.245 0.685,0.521 1.284,0.859 3.087,2.251 6.451,4.801 100 0.154,0.153 0.211,0.191 0.276,0.282 0.687,0.506 1.202,0.877 3.259,1.954 8.738,2.887 1000 0.145,0.179 0.202,0.222 0.449,0.319 0.899,0.333 1.577,0.524 4.221,1.240 9.782,3.579 V=10000 F\P 1 2 4 8 16 32 64 10 0.161,0.154 0.198,0.190 0.296,0.256 0.662,0.480 1.192,0.818 2.989,2.200 6.362,4.746 100 0.176,0.174 0.236,0.203 0.326,0.255 0.696,0.511 1.183,0.855 4.205,3.444 19.510,17.760 1000 0.199,0.183 0.240,0.227 1.159,1.014 2.286,2.154 6.023,6.039 ---,10.933 ---,36.620 V=100000 F\P 1 2 4 8 16 32 64 10 0.171,0.162 0.204,0.198 0.285,0.230 0.692,0.500 1.225,0.881 2.990,2.243 6.379,4.771 100 0.151,0.171 0.220,0.210 0.295,0.255 0.720,0.518 1.226,0.844 3.423,2.831 19.234,17.544 1000 0.192,0.189 0.249,0.225 1.162,1.043 2.257,2.093 5.853,4.997 ---,10.399 ---,32.198 We see that the new code is faster in pretty much all the cases and starting from 4 processes there are significant gains with the new code resulting in upto 20-times shorter runtimes. Also for large numbers of cached entries all values for the old code could not be measured as the kernel started hitting softlockups and died before the test completed. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o This fixes CVE-2015-8952 (cherry picked from commit f9a61eb4e2471c56a63cd804c7474128138c38ac) Signed-off-by: Shrirang Bagul --- fs/Makefile | 2 +- fs/mbcache2.c | 359 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mbcache2.h | 50 +++++++ 3 files changed, 410 insertions(+), 1 deletion(-) create mode 100644 fs/mbcache2.c create mode 100644 include/linux/mbcache2.h diff --git a/fs/Makefile b/fs/Makefile index a7c7f160371c..b7078cf5437c 100644 --- a/fs/Makefile +++ b/fs/Makefile @@ -41,7 +41,7 @@ obj-$(CONFIG_COMPAT_BINFMT_ELF) += compat_binfmt_elf.o obj-$(CONFIG_BINFMT_ELF_FDPIC) += binfmt_elf_fdpic.o obj-$(CONFIG_BINFMT_FLAT) += binfmt_flat.o -obj-$(CONFIG_FS_MBCACHE) += mbcache.o +obj-$(CONFIG_FS_MBCACHE) += mbcache.o mbcache2.o obj-$(CONFIG_FS_POSIX_ACL) += posix_acl.o obj-$(CONFIG_NFS_COMMON) += nfs_common/ obj-$(CONFIG_COREDUMP) += coredump.o diff --git a/fs/mbcache2.c b/fs/mbcache2.c new file mode 100644 index 000000000000..5c3e1a8c38f6 --- /dev/null +++ b/fs/mbcache2.c @@ -0,0 +1,359 @@ +#include +#include +#include +#include +#include +#include +#include + +/* + * Mbcache is a simple key-value store. Keys need not be unique, however + * key-value pairs are expected to be unique (we use this fact in + * mb2_cache_entry_delete_block()). + * + * Ext2 and ext4 use this cache for deduplication of extended attribute blocks. + * They use hash of a block contents as a key and block number as a value. + * That's why keys need not be unique (different xattr blocks may end up having + * the same hash). However block number always uniquely identifies a cache + * entry. + * + * We provide functions for creation and removal of entries, search by key, + * and a special "delete entry with given key-value pair" operation. Fixed + * size hash table is used for fast key lookups. + */ + +struct mb2_cache { + /* Hash table of entries */ + struct hlist_bl_head *c_hash; + /* log2 of hash table size */ + int c_bucket_bits; + /* Protects c_lru_list, c_entry_count */ + spinlock_t c_lru_list_lock; + struct list_head c_lru_list; + /* Number of entries in cache */ + unsigned long c_entry_count; + struct shrinker c_shrink; +}; + +static struct kmem_cache *mb2_entry_cache; + +/* + * mb2_cache_entry_create - create entry in cache + * @cache - cache where the entry should be created + * @mask - gfp mask with which the entry should be allocated + * @key - key of the entry + * @block - block that contains data + * + * Creates entry in @cache with key @key and records that data is stored in + * block @block. The function returns -EBUSY if entry with the same key + * and for the same block already exists in cache. Otherwise 0 is returned. + */ +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block) +{ + struct mb2_cache_entry *entry, *dup; + struct hlist_bl_node *dup_node; + struct hlist_bl_head *head; + + entry = kmem_cache_alloc(mb2_entry_cache, mask); + if (!entry) + return -ENOMEM; + + INIT_LIST_HEAD(&entry->e_lru_list); + /* One ref for hash, one ref returned */ + atomic_set(&entry->e_refcnt, 1); + entry->e_key = key; + entry->e_block = block; + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + entry->e_hash_list_head = head; + hlist_bl_lock(head); + hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { + if (dup->e_key == key && dup->e_block == block) { + hlist_bl_unlock(head); + kmem_cache_free(mb2_entry_cache, entry); + return -EBUSY; + } + } + hlist_bl_add_head(&entry->e_hash_list, head); + hlist_bl_unlock(head); + + spin_lock(&cache->c_lru_list_lock); + list_add_tail(&entry->e_lru_list, &cache->c_lru_list); + /* Grab ref for LRU list */ + atomic_inc(&entry->e_refcnt); + cache->c_entry_count++; + spin_unlock(&cache->c_lru_list_lock); + + return 0; +} +EXPORT_SYMBOL(mb2_cache_entry_create); + +void __mb2_cache_entry_free(struct mb2_cache_entry *entry) +{ + kmem_cache_free(mb2_entry_cache, entry); +} +EXPORT_SYMBOL(__mb2_cache_entry_free); + +static struct mb2_cache_entry *__entry_find(struct mb2_cache *cache, + struct mb2_cache_entry *entry, + u32 key) +{ + struct mb2_cache_entry *old_entry = entry; + struct hlist_bl_node *node; + struct hlist_bl_head *head; + + if (entry) + head = entry->e_hash_list_head; + else + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + if (entry && !hlist_bl_unhashed(&entry->e_hash_list)) + node = entry->e_hash_list.next; + else + node = hlist_bl_first(head); + while (node) { + entry = hlist_bl_entry(node, struct mb2_cache_entry, + e_hash_list); + if (entry->e_key == key) { + atomic_inc(&entry->e_refcnt); + goto out; + } + node = node->next; + } + entry = NULL; +out: + hlist_bl_unlock(head); + if (old_entry) + mb2_cache_entry_put(cache, old_entry); + + return entry; +} + +/* + * mb2_cache_entry_find_first - find the first entry in cache with given key + * @cache: cache where we should search + * @key: key to look for + * + * Search in @cache for entry with key @key. Grabs reference to the first + * entry found and returns the entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key) +{ + return __entry_find(cache, NULL, key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_first); + +/* + * mb2_cache_entry_find_next - find next entry in cache with the same + * @cache: cache where we should search + * @entry: entry to start search from + * + * Finds next entry in the hash chain which has the same key as @entry. + * If @entry is unhashed (which can happen when deletion of entry races + * with the search), finds the first entry in the hash chain. The function + * drops reference to @entry and returns with a reference to the found entry. + */ +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + return __entry_find(cache, entry, entry->e_key); +} +EXPORT_SYMBOL(mb2_cache_entry_find_next); + +/* mb2_cache_entry_delete_block - remove information about block from cache + * @cache - cache we work with + * @key - key of the entry to remove + * @block - block containing data for @key + * + * Remove entry from cache @cache with key @key with data stored in @block. + */ +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block) +{ + struct hlist_bl_node *node; + struct hlist_bl_head *head; + struct mb2_cache_entry *entry; + + head = &cache->c_hash[hash_32(key, cache->c_bucket_bits)]; + hlist_bl_lock(head); + hlist_bl_for_each_entry(entry, node, head, e_hash_list) { + if (entry->e_key == key && entry->e_block == block) { + /* We keep hash list reference to keep entry alive */ + hlist_bl_del_init(&entry->e_hash_list); + hlist_bl_unlock(head); + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) { + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + atomic_dec(&entry->e_refcnt); + } + spin_unlock(&cache->c_lru_list_lock); + mb2_cache_entry_put(cache, entry); + return; + } + } + hlist_bl_unlock(head); +} +EXPORT_SYMBOL(mb2_cache_entry_delete_block); + +/* mb2_cache_entry_touch - cache entry got used + * @cache - cache the entry belongs to + * @entry - entry that got used + * + * Move entry in lru list to reflect the fact that it was used. + */ +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + spin_lock(&cache->c_lru_list_lock); + if (!list_empty(&entry->e_lru_list)) + list_move_tail(&cache->c_lru_list, &entry->e_lru_list); + spin_unlock(&cache->c_lru_list_lock); +} +EXPORT_SYMBOL(mb2_cache_entry_touch); + +static unsigned long mb2_cache_count(struct shrinker *shrink, + struct shrink_control *sc) +{ + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + + return cache->c_entry_count; +} + +/* Shrink number of entries in cache */ +static unsigned long mb2_cache_scan(struct shrinker *shrink, + struct shrink_control *sc) +{ + int nr_to_scan = sc->nr_to_scan; + struct mb2_cache *cache = container_of(shrink, struct mb2_cache, + c_shrink); + struct mb2_cache_entry *entry; + struct hlist_bl_head *head; + unsigned int shrunk = 0; + + spin_lock(&cache->c_lru_list_lock); + while (nr_to_scan-- && !list_empty(&cache->c_lru_list)) { + entry = list_first_entry(&cache->c_lru_list, + struct mb2_cache_entry, e_lru_list); + list_del_init(&entry->e_lru_list); + cache->c_entry_count--; + /* + * We keep LRU list reference so that entry doesn't go away + * from under us. + */ + spin_unlock(&cache->c_lru_list_lock); + head = entry->e_hash_list_head; + hlist_bl_lock(head); + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } + hlist_bl_unlock(head); + if (mb2_cache_entry_put(cache, entry)) + shrunk++; + cond_resched(); + spin_lock(&cache->c_lru_list_lock); + } + spin_unlock(&cache->c_lru_list_lock); + + return shrunk; +} + +/* + * mb2_cache_create - create cache + * @bucket_bits: log2 of the hash table size + * + * Create cache for keys with 2^bucket_bits hash entries. + */ +struct mb2_cache *mb2_cache_create(int bucket_bits) +{ + struct mb2_cache *cache; + int bucket_count = 1 << bucket_bits; + int i; + + if (!try_module_get(THIS_MODULE)) + return NULL; + + cache = kzalloc(sizeof(struct mb2_cache), GFP_KERNEL); + if (!cache) + goto err_out; + cache->c_bucket_bits = bucket_bits; + INIT_LIST_HEAD(&cache->c_lru_list); + spin_lock_init(&cache->c_lru_list_lock); + cache->c_hash = kmalloc(bucket_count * sizeof(struct hlist_bl_head), + GFP_KERNEL); + if (!cache->c_hash) { + kfree(cache); + goto err_out; + } + for (i = 0; i < bucket_count; i++) + INIT_HLIST_BL_HEAD(&cache->c_hash[i]); + + cache->c_shrink.count_objects = mb2_cache_count; + cache->c_shrink.scan_objects = mb2_cache_scan; + cache->c_shrink.seeks = DEFAULT_SEEKS; + register_shrinker(&cache->c_shrink); + + return cache; + +err_out: + module_put(THIS_MODULE); + return NULL; +} +EXPORT_SYMBOL(mb2_cache_create); + +/* + * mb2_cache_destroy - destroy cache + * @cache: the cache to destroy + * + * Free all entries in cache and cache itself. Caller must make sure nobody + * (except shrinker) can reach @cache when calling this. + */ +void mb2_cache_destroy(struct mb2_cache *cache) +{ + struct mb2_cache_entry *entry, *next; + + unregister_shrinker(&cache->c_shrink); + + /* + * We don't bother with any locking. Cache must not be used at this + * point. + */ + list_for_each_entry_safe(entry, next, &cache->c_lru_list, e_lru_list) { + if (!hlist_bl_unhashed(&entry->e_hash_list)) { + hlist_bl_del_init(&entry->e_hash_list); + atomic_dec(&entry->e_refcnt); + } else + WARN_ON(1); + list_del(&entry->e_lru_list); + WARN_ON(atomic_read(&entry->e_refcnt) != 1); + mb2_cache_entry_put(cache, entry); + } + kfree(cache->c_hash); + kfree(cache); + module_put(THIS_MODULE); +} +EXPORT_SYMBOL(mb2_cache_destroy); + +static int __init mb2cache_init(void) +{ + mb2_entry_cache = kmem_cache_create("mbcache", + sizeof(struct mb2_cache_entry), 0, + SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD, NULL); + BUG_ON(!mb2_entry_cache); + return 0; +} + +static void __exit mb2cache_exit(void) +{ + kmem_cache_destroy(mb2_entry_cache); +} + +module_init(mb2cache_init) +module_exit(mb2cache_exit) + +MODULE_AUTHOR("Jan Kara "); +MODULE_DESCRIPTION("Meta block cache (for extended attributes)"); +MODULE_LICENSE("GPL"); diff --git a/include/linux/mbcache2.h b/include/linux/mbcache2.h new file mode 100644 index 000000000000..b6f160ff2533 --- /dev/null +++ b/include/linux/mbcache2.h @@ -0,0 +1,50 @@ +#ifndef _LINUX_MB2CACHE_H +#define _LINUX_MB2CACHE_H + +#include +#include +#include +#include +#include + +struct mb2_cache; + +struct mb2_cache_entry { + /* LRU list - protected by cache->c_lru_list_lock */ + struct list_head e_lru_list; + /* Hash table list - protected by bitlock in e_hash_list_head */ + struct hlist_bl_node e_hash_list; + atomic_t e_refcnt; + /* Key in hash - stable during lifetime of the entry */ + u32 e_key; + /* Block number of hashed block - stable during lifetime of the entry */ + sector_t e_block; + /* Head of hash list (for list bit lock) - stable */ + struct hlist_bl_head *e_hash_list_head; +}; + +struct mb2_cache *mb2_cache_create(int bucket_bits); +void mb2_cache_destroy(struct mb2_cache *cache); + +int mb2_cache_entry_create(struct mb2_cache *cache, gfp_t mask, u32 key, + sector_t block); +void __mb2_cache_entry_free(struct mb2_cache_entry *entry); +static inline int mb2_cache_entry_put(struct mb2_cache *cache, + struct mb2_cache_entry *entry) +{ + if (!atomic_dec_and_test(&entry->e_refcnt)) + return 0; + __mb2_cache_entry_free(entry); + return 1; +} + +void mb2_cache_entry_delete_block(struct mb2_cache *cache, u32 key, + sector_t block); +struct mb2_cache_entry *mb2_cache_entry_find_first(struct mb2_cache *cache, + u32 key); +struct mb2_cache_entry *mb2_cache_entry_find_next(struct mb2_cache *cache, + struct mb2_cache_entry *entry); +void mb2_cache_entry_touch(struct mb2_cache *cache, + struct mb2_cache_entry *entry); + +#endif /* _LINUX_MB2CACHE_H */ From patchwork Fri Sep 22 09:11:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shrirang Bagul X-Patchwork-Id: 817400 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3xz75c04wDz9t3h; Fri, 22 Sep 2017 19:11:48 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1dvJzs-0002tn-1p; Fri, 22 Sep 2017 09:11:44 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1dvJzn-0002qX-OD for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:39 +0000 Received: from 1.general.shrirang--bagul.uk.vpn ([10.172.198.4] helo=snb-ubuntu.taipei) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1dvJzm-0001Ln-Uu for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:39 +0000 From: Shrirang Bagul To: kernel-team@lists.ubuntu.com Subject: [Xenial SRU][PATCH 2/3] ext2: convert to mbcache2 Date: Fri, 22 Sep 2017 17:11:29 +0800 Message-Id: <20170922091130.15674-3-shrirang.bagul@canonical.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170922091130.15674-1-shrirang.bagul@canonical.com> References: <20170922091130.15674-1-shrirang.bagul@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara The conversion is generally straightforward. We convert filesystem from a global cache to per-fs one. Similarly to ext4 the tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting the buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o This fixes CVE-2015-8952 (cherry picked from commit be0726d33cb8f411945884664924bed3cb8c70ee) Signed-off-by: Shrirang Bagul --- fs/ext2/ext2.h | 3 ++ fs/ext2/super.c | 25 ++++++---- fs/ext2/xattr.c | 143 ++++++++++++++++++++++++++------------------------------ fs/ext2/xattr.h | 21 ++------- 4 files changed, 92 insertions(+), 100 deletions(-) diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 4c69c94cafd8..f98ce7e60a0f 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -61,6 +61,8 @@ struct ext2_block_alloc_info { #define rsv_start rsv_window._rsv_start #define rsv_end rsv_window._rsv_end +struct mb2_cache; + /* * second extended-fs super-block data in memory */ @@ -111,6 +113,7 @@ struct ext2_sb_info { * of the mount options. */ spinlock_t s_lock; + struct mb2_cache *s_mb_cache; }; static inline spinlock_t * diff --git a/fs/ext2/super.c b/fs/ext2/super.c index 748d35afc902..111a31761ffa 100644 --- a/fs/ext2/super.c +++ b/fs/ext2/super.c @@ -131,7 +131,10 @@ static void ext2_put_super (struct super_block * sb) dquot_disable(sb, -1, DQUOT_USAGE_ENABLED | DQUOT_LIMITS_ENABLED); - ext2_xattr_put_super(sb); + if (sbi->s_mb_cache) { + ext2_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (!(sb->s_flags & MS_RDONLY)) { struct ext2_super_block *es = sbi->s_es; @@ -1104,6 +1107,14 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent) ext2_msg(sb, KERN_ERR, "error: insufficient memory"); goto failed_mount3; } + +#ifdef CONFIG_EXT2_FS_XATTR + sbi->s_mb_cache = ext2_xattr_create_cache(); + if (!sbi->s_mb_cache) { + ext2_msg(sb, KERN_ERR, "Failed to create an mb_cache"); + goto failed_mount3; + } +#endif /* * set up enough so that it can read an inode */ @@ -1149,6 +1160,8 @@ cantfind_ext2: sb->s_id); goto failed_mount; failed_mount3: + if (sbi->s_mb_cache) + ext2_xattr_destroy_cache(sbi->s_mb_cache); percpu_counter_destroy(&sbi->s_freeblocks_counter); percpu_counter_destroy(&sbi->s_freeinodes_counter); percpu_counter_destroy(&sbi->s_dirs_counter); @@ -1555,20 +1568,17 @@ MODULE_ALIAS_FS("ext2"); static int __init init_ext2_fs(void) { - int err = init_ext2_xattr(); - if (err) - return err; + int err; + err = init_inodecache(); if (err) - goto out1; + return err; err = register_filesystem(&ext2_fs_type); if (err) goto out; return 0; out: destroy_inodecache(); -out1: - exit_ext2_xattr(); return err; } @@ -1576,7 +1586,6 @@ static void __exit exit_ext2_fs(void) { unregister_filesystem(&ext2_fs_type); destroy_inodecache(); - exit_ext2_xattr(); } MODULE_AUTHOR("Remy Card and others"); diff --git a/fs/ext2/xattr.c b/fs/ext2/xattr.c index fa70848afa8f..24736c8b3d51 100644 --- a/fs/ext2/xattr.c +++ b/fs/ext2/xattr.c @@ -56,7 +56,7 @@ #include #include #include -#include +#include #include #include #include @@ -92,14 +92,12 @@ static int ext2_xattr_set2(struct inode *, struct buffer_head *, struct ext2_xattr_header *); -static int ext2_xattr_cache_insert(struct buffer_head *); +static int ext2_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext2_xattr_cache_find(struct inode *, struct ext2_xattr_header *); static void ext2_xattr_rehash(struct ext2_xattr_header *, struct ext2_xattr_entry *); -static struct mb_cache *ext2_xattr_cache; - static const struct xattr_handler *ext2_xattr_handler_map[] = { [EXT2_XATTR_INDEX_USER] = &ext2_xattr_user_handler, #ifdef CONFIG_EXT2_FS_POSIX_ACL @@ -154,6 +152,7 @@ ext2_xattr_get(struct inode *inode, int name_index, const char *name, size_t name_len, size; char *end; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -198,7 +197,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_get", goto found; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); error = -ENODATA; goto cleanup; @@ -211,7 +210,7 @@ found: le16_to_cpu(entry->e_value_offs) + size > inode->i_sb->s_blocksize) goto bad_block; - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); if (buffer) { error = -ERANGE; @@ -249,6 +248,7 @@ ext2_xattr_list(struct dentry *dentry, char *buffer, size_t buffer_size) char *end; size_t rest = buffer_size; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -283,7 +283,7 @@ bad_block: ext2_error(inode->i_sb, "ext2_xattr_list", goto bad_block; entry = next; } - if (ext2_xattr_cache_insert(bh)) + if (ext2_xattr_cache_insert(ext2_mb_cache, bh)) ea_idebug(inode, "cache insert failed"); /* list the attribute names */ @@ -480,22 +480,23 @@ bad_block: ext2_error(sb, "ext2_xattr_set", /* Here we know that we can set the new attribute. */ if (header) { - struct mb_cache_entry *ce; - /* assert(header == HDR(bh)); */ - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, - bh->b_blocknr); lock_buffer(bh); if (header->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(header->h_hash); + ea_bdebug(bh, "modifying in-place"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect modified block + */ + mb2_cache_entry_delete_block(EXT2_SB(sb)->s_mb_cache, + hash, bh->b_blocknr); + /* keep the buffer locked while modifying it. */ } else { int offset; - if (ce) - mb_cache_entry_release(ce); unlock_buffer(bh); ea_bdebug(bh, "cloning"); header = kmalloc(bh->b_size, GFP_KERNEL); @@ -623,6 +624,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; int error; + struct mb2_cache *ext2_mb_cache = EXT2_SB(sb)->s_mb_cache; if (header) { new_bh = ext2_xattr_cache_find(inode, header); @@ -650,7 +652,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, don't need to change the reference count. */ new_bh = old_bh; get_bh(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); } else { /* We need to allocate a new block */ ext2_fsblk_t goal = ext2_group_first_block_no(sb, @@ -671,7 +673,7 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, memcpy(new_bh->b_data, header, new_bh->b_size); set_buffer_uptodate(new_bh); unlock_buffer(new_bh); - ext2_xattr_cache_insert(new_bh); + ext2_xattr_cache_insert(ext2_mb_cache, new_bh); ext2_xattr_update_super_block(sb); } @@ -704,19 +706,21 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, error = 0; if (old_bh && old_bh != new_bh) { - struct mb_cache_entry *ce; - /* * If there was an old block and we are no longer using it, * release the old block. */ - ce = mb_cache_entry_get(ext2_xattr_cache, old_bh->b_bdev, - old_bh->b_blocknr); lock_buffer(old_bh); if (HDR(old_bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(HDR(old_bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext2_xattr_set2() to reliably detect freed block + */ + mb2_cache_entry_delete_block(ext2_mb_cache, + hash, old_bh->b_blocknr); /* Free the old block. */ - if (ce) - mb_cache_entry_free(ce); ea_bdebug(old_bh, "freeing"); ext2_free_blocks(inode, old_bh->b_blocknr, 1); mark_inode_dirty(inode); @@ -727,8 +731,6 @@ ext2_xattr_set2(struct inode *inode, struct buffer_head *old_bh, } else { /* Decrement the refcount only. */ le32_add_cpu(&HDR(old_bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); dquot_free_block_nodirty(inode, 1); mark_inode_dirty(inode); mark_buffer_dirty(old_bh); @@ -754,7 +756,6 @@ void ext2_xattr_delete_inode(struct inode *inode) { struct buffer_head *bh = NULL; - struct mb_cache_entry *ce; down_write(&EXT2_I(inode)->xattr_sem); if (!EXT2_I(inode)->i_file_acl) @@ -774,19 +775,22 @@ ext2_xattr_delete_inode(struct inode *inode) EXT2_I(inode)->i_file_acl); goto cleanup; } - ce = mb_cache_entry_get(ext2_xattr_cache, bh->b_bdev, bh->b_blocknr); lock_buffer(bh); if (HDR(bh)->h_refcount == cpu_to_le32(1)) { - if (ce) - mb_cache_entry_free(ce); + __u32 hash = le32_to_cpu(HDR(bh)->h_hash); + + /* + * This must happen under buffer lock for ext2_xattr_set2() to + * reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT2_SB(inode->i_sb)->s_mb_cache, + hash, bh->b_blocknr); ext2_free_blocks(inode, EXT2_I(inode)->i_file_acl, 1); get_bh(bh); bforget(bh); unlock_buffer(bh); } else { le32_add_cpu(&HDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); ea_bdebug(bh, "refcount now=%d", le32_to_cpu(HDR(bh)->h_refcount)); unlock_buffer(bh); @@ -803,18 +807,6 @@ cleanup: } /* - * ext2_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext2_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - - -/* * ext2_xattr_cache_insert() * * Create a new entry in the extended attribute cache, and insert @@ -823,28 +815,20 @@ ext2_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static int -ext2_xattr_cache_insert(struct buffer_head *bh) +ext2_xattr_cache_insert(struct mb2_cache *cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(HDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext2_xattr_cache, GFP_NOFS); - if (!ce) - return -ENOMEM; - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(cache, GFP_NOFS, hash, bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); if (error == -EBUSY) { ea_bdebug(bh, "already in cache (%d cache entries)", atomic_read(&ext2_xattr_cache->c_entry_count)); error = 0; } - } else { - ea_bdebug(bh, "inserting [%x] (%d cache entries)", (int)hash, - atomic_read(&ext2_xattr_cache->c_entry_count)); - mb_cache_entry_release(ce); - } + } else + ea_bdebug(bh, "inserting [%x]", (int)hash); return error; } @@ -900,23 +884,17 @@ static struct buffer_head * ext2_xattr_cache_find(struct inode *inode, struct ext2_xattr_header *header) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; + struct mb2_cache_entry *ce; + struct mb2_cache *ext2_mb_cache = EXT2_SB(inode->i_sb)->s_mb_cache; if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); again: - ce = mb_cache_entry_find_first(ext2_xattr_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext2_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } - bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { ext2_error(inode->i_sb, "ext2_xattr_cache_find", @@ -924,7 +902,21 @@ again: inode->i_ino, (unsigned long) ce->e_block); } else { lock_buffer(bh); - if (le32_to_cpu(HDR(bh)->h_refcount) > + /* + * We have to be careful about races with freeing or + * rehashing of xattr block. Once we hold buffer lock + * xattr block's state is stable so we can check + * whether the block got freed / rehashed or not. + * Since we unhash mbcache entry under buffer lock when + * freeing / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + mb2_cache_entry_put(ext2_mb_cache, ce); + unlock_buffer(bh); + brelse(bh); + goto again; + } else if (le32_to_cpu(HDR(bh)->h_refcount) > EXT2_XATTR_REFCOUNT_MAX) { ea_idebug(inode, "block %ld refcount %d>%d", (unsigned long) ce->e_block, @@ -933,13 +925,14 @@ again: } else if (!ext2_xattr_cmp(header, HDR(bh))) { ea_bdebug(bh, "b_count=%d", atomic_read(&(bh->b_count))); - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext2_mb_cache, ce); + mb2_cache_entry_put(ext2_mb_cache, ce); return bh; } unlock_buffer(bh); brelse(bh); } - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext2_mb_cache, ce); } return NULL; } @@ -1012,17 +1005,15 @@ static void ext2_xattr_rehash(struct ext2_xattr_header *header, #undef BLOCK_HASH_SHIFT -int __init -init_ext2_xattr(void) +#define HASH_BUCKET_BITS 10 + +struct mb2_cache *ext2_xattr_create_cache(void) { - ext2_xattr_cache = mb_cache_create("ext2_xattr", 6); - if (!ext2_xattr_cache) - return -ENOMEM; - return 0; + return mb2_cache_create(HASH_BUCKET_BITS); } -void -exit_ext2_xattr(void) +void ext2_xattr_destroy_cache(struct mb2_cache *cache) { - mb_cache_destroy(ext2_xattr_cache); + if (cache) + mb2_cache_destroy(cache); } diff --git a/fs/ext2/xattr.h b/fs/ext2/xattr.h index 60edf298644e..6ea38aa9563a 100644 --- a/fs/ext2/xattr.h +++ b/fs/ext2/xattr.h @@ -53,6 +53,8 @@ struct ext2_xattr_entry { #define EXT2_XATTR_SIZE(size) \ (((size) + EXT2_XATTR_ROUND) & ~EXT2_XATTR_ROUND) +struct mb2_cache; + # ifdef CONFIG_EXT2_FS_XATTR extern const struct xattr_handler ext2_xattr_user_handler; @@ -65,10 +67,9 @@ extern int ext2_xattr_get(struct inode *, int, const char *, void *, size_t); extern int ext2_xattr_set(struct inode *, int, const char *, const void *, size_t, int); extern void ext2_xattr_delete_inode(struct inode *); -extern void ext2_xattr_put_super(struct super_block *); -extern int init_ext2_xattr(void); -extern void exit_ext2_xattr(void); +extern struct mb2_cache *ext2_xattr_create_cache(void); +extern void ext2_xattr_destroy_cache(struct mb2_cache *cache); extern const struct xattr_handler *ext2_xattr_handlers[]; @@ -93,19 +94,7 @@ ext2_xattr_delete_inode(struct inode *inode) { } -static inline void -ext2_xattr_put_super(struct super_block *sb) -{ -} - -static inline int -init_ext2_xattr(void) -{ - return 0; -} - -static inline void -exit_ext2_xattr(void) +static inline void ext2_xattr_destroy_cache(struct mb2_cache *cache) { } From patchwork Fri Sep 22 09:11:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shrirang Bagul X-Patchwork-Id: 817399 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) by ozlabs.org (Postfix) with ESMTP id 3xz75b69sBz9t16; Fri, 22 Sep 2017 19:11:47 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1dvJzr-0002tF-Nh; Fri, 22 Sep 2017 09:11:43 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.0:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.86_2) (envelope-from ) id 1dvJzp-0002rQ-5t for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:41 +0000 Received: from 1.general.shrirang--bagul.uk.vpn ([10.172.198.4] helo=snb-ubuntu.taipei) by youngberry.canonical.com with esmtpsa (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.76) (envelope-from ) id 1dvJzo-0001Ln-Cl for kernel-team@lists.ubuntu.com; Fri, 22 Sep 2017 09:11:40 +0000 From: Shrirang Bagul To: kernel-team@lists.ubuntu.com Subject: [Xenial SRU][PATCH 3/3] ext4: convert to mbcache2 Date: Fri, 22 Sep 2017 17:11:30 +0800 Message-Id: <20170922091130.15674-4-shrirang.bagul@canonical.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20170922091130.15674-1-shrirang.bagul@canonical.com> References: <20170922091130.15674-1-shrirang.bagul@canonical.com> X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Jan Kara The conversion is generally straightforward. The only tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting buffer lock. Signed-off-by: Jan Kara Signed-off-by: Theodore Ts'o This fixes CVE-2015-8952 (backported from commit 82939d7999dfc1f1998c4b1c12e2f19edbdff272) Signed-off-by: Shrirang Bagul --- fs/ext4/ext4.h | 2 +- fs/ext4/super.c | 7 ++- fs/ext4/xattr.c | 136 ++++++++++++++++++++++++++++---------------------------- fs/ext4/xattr.h | 5 +-- 4 files changed, 75 insertions(+), 75 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 331f82510d15..25e36e527b0e 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1441,7 +1441,7 @@ struct ext4_sb_info { struct list_head s_es_list; /* List of inodes with reclaimable extents */ long s_es_nr_inode; struct ext4_es_stats s_es_stats; - struct mb_cache *s_mb_cache; + struct mb2_cache *s_mb_cache; spinlock_t s_es_lock ____cacheline_aligned_in_smp; /* Ratelimit ext4 messages. */ diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 029296aa42f3..32a0c1ac0e3d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -821,7 +821,6 @@ static void ext4_put_super(struct super_block *sb) ext4_release_system_zone(sb); ext4_mb_release(sb); ext4_ext_release(sb); - ext4_xattr_put_super(sb); if (!(sb->s_flags & MS_RDONLY) && !aborted) { ext4_clear_feature_journal_needs_recovery(sb); @@ -3868,7 +3867,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) no_journal: if (ext4_mballoc_ready) { - sbi->s_mb_cache = ext4_xattr_create_cache(sb->s_id); + sbi->s_mb_cache = ext4_xattr_create_cache(); if (!sbi->s_mb_cache) { ext4_msg(sb, KERN_ERR, "Failed to create an mb_cache"); goto failed_mount_wq; @@ -4100,6 +4099,10 @@ failed_mount4: if (EXT4_SB(sb)->rsv_conversion_wq) destroy_workqueue(EXT4_SB(sb)->rsv_conversion_wq); failed_mount_wq: + if (sbi->s_mb_cache) { + ext4_xattr_destroy_cache(sbi->s_mb_cache); + sbi->s_mb_cache = NULL; + } if (sbi->s_journal) { jbd2_journal_destroy(sbi->s_journal); sbi->s_journal = NULL; diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 571166eb1dfc..ba6b8ac5f462 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -53,7 +53,7 @@ #include #include #include -#include +#include #include #include "ext4_jbd2.h" #include "ext4.h" @@ -80,10 +80,10 @@ # define ea_bdebug(bh, fmt, ...) no_printk(fmt, ##__VA_ARGS__) #endif -static void ext4_xattr_cache_insert(struct mb_cache *, struct buffer_head *); +static void ext4_xattr_cache_insert(struct mb2_cache *, struct buffer_head *); static struct buffer_head *ext4_xattr_cache_find(struct inode *, struct ext4_xattr_header *, - struct mb_cache_entry **); + struct mb2_cache_entry **); static void ext4_xattr_rehash(struct ext4_xattr_header *, struct ext4_xattr_entry *); static int ext4_xattr_list(struct dentry *dentry, char *buffer, @@ -295,7 +295,7 @@ ext4_xattr_block_get(struct inode *inode, int name_index, const char *name, struct ext4_xattr_entry *entry; size_t size; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "name=%d.%s, buffer=%p, buffer_size=%ld", name_index, name, buffer, (long)buffer_size); @@ -442,7 +442,7 @@ ext4_xattr_block_list(struct dentry *dentry, char *buffer, size_t buffer_size) struct inode *inode = d_inode(dentry); struct buffer_head *bh = NULL; int error; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); ea_idebug(inode, "buffer=%p, buffer_size=%ld", buffer, (long)buffer_size); @@ -559,11 +559,8 @@ static void ext4_xattr_release_block(handle_t *handle, struct inode *inode, struct buffer_head *bh) { - struct mb_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); - ce = mb_cache_entry_get(ext4_mb_cache, bh->b_bdev, bh->b_blocknr); BUFFER_TRACE(bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bh); if (error) @@ -571,9 +568,15 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, lock_buffer(bh); if (BHDR(bh)->h_refcount == cpu_to_le32(1)) { + __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); + ea_bdebug(bh, "refcount now=0; freeing"); - if (ce) - mb_cache_entry_free(ce); + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect freed block + */ + mb2_cache_entry_delete_block(EXT4_GET_MB_CACHE(inode), hash, + bh->b_blocknr); get_bh(bh); unlock_buffer(bh); ext4_free_blocks(handle, inode, bh, 0, 1, @@ -581,8 +584,6 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, EXT4_FREE_BLOCKS_FORGET); } else { le32_add_cpu(&BHDR(bh)->h_refcount, -1); - if (ce) - mb_cache_entry_release(ce); ext4_xattr_block_csum_set(inode, bh); /* @@ -795,17 +796,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, struct super_block *sb = inode->i_sb; struct buffer_head *new_bh = NULL; struct ext4_xattr_search *s = &bs->s; - struct mb_cache_entry *ce = NULL; + struct mb2_cache_entry *ce = NULL; int error = 0; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); #define header(x) ((struct ext4_xattr_header *)(x)) if (i->value && i->value_len > sb->s_blocksize) return -ENOSPC; if (s->base) { - ce = mb_cache_entry_get(ext4_mb_cache, bs->bh->b_bdev, - bs->bh->b_blocknr); BUFFER_TRACE(bs->bh, "get_write_access"); error = ext4_journal_get_write_access(handle, bs->bh); if (error) @@ -813,10 +812,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, lock_buffer(bs->bh); if (header(s->base)->h_refcount == cpu_to_le32(1)) { - if (ce) { - mb_cache_entry_free(ce); - ce = NULL; - } + __u32 hash = le32_to_cpu(BHDR(bs->bh)->h_hash); + + /* + * This must happen under buffer lock for + * ext4_xattr_block_set() to reliably detect modified + * block + */ + mb2_cache_entry_delete_block(ext4_mb_cache, hash, + bs->bh->b_blocknr); ea_bdebug(bs->bh, "modifying in-place"); error = ext4_xattr_set_entry(i, s); if (!error) { @@ -841,10 +845,6 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, int offset = (char *)s->here - bs->bh->b_data; unlock_buffer(bs->bh); - if (ce) { - mb_cache_entry_release(ce); - ce = NULL; - } ea_bdebug(bs->bh, "cloning"); s->base = kmalloc(bs->bh->b_size, GFP_NOFS); error = -ENOMEM; @@ -899,6 +899,31 @@ inserted: if (error) goto cleanup_dquot; lock_buffer(new_bh); + /* + * We have to be careful about races with + * freeing or rehashing of xattr block. Once we + * hold buffer lock xattr block's state is + * stable so we can check whether the block got + * freed / rehashed or not. Since we unhash + * mbcache entry under buffer lock when freeing + * / rehashing xattr block, checking whether + * entry is still hashed is reliable. + */ + if (hlist_bl_unhashed(&ce->e_hash_list)) { + /* + * Undo everything and check mbcache + * again. + */ + unlock_buffer(new_bh); + dquot_free_block(inode, + EXT4_C2B(EXT4_SB(sb), + 1)); + brelse(new_bh); + mb2_cache_entry_put(ext4_mb_cache, ce); + ce = NULL; + new_bh = NULL; + goto inserted; + } le32_add_cpu(&BHDR(new_bh)->h_refcount, 1); ea_bdebug(new_bh, "reusing; refcount now=%d", le32_to_cpu(BHDR(new_bh)->h_refcount)); @@ -910,7 +935,8 @@ inserted: if (error) goto cleanup_dquot; } - mb_cache_entry_release(ce); + mb2_cache_entry_touch(ext4_mb_cache, ce); + mb2_cache_entry_put(ext4_mb_cache, ce); ce = NULL; } else if (bs->bh && s->base == bs->bh->b_data) { /* We were modifying this block in-place. */ @@ -976,7 +1002,7 @@ getblk_failed: cleanup: if (ce) - mb_cache_entry_release(ce); + mb2_cache_entry_put(ext4_mb_cache, ce); brelse(new_bh); if (!(bs->bh && s->base == bs->bh->b_data)) kfree(s->base); @@ -1541,17 +1567,6 @@ cleanup: } /* - * ext4_xattr_put_super() - * - * This is called when a file system is unmounted. - */ -void -ext4_xattr_put_super(struct super_block *sb) -{ - mb_cache_shrink(sb->s_bdev); -} - -/* * ext4_xattr_cache_insert() * * Create a new entry in the extended attribute cache, and insert @@ -1560,28 +1575,18 @@ ext4_xattr_put_super(struct super_block *sb) * Returns 0, or a negative error number on failure. */ static void -ext4_xattr_cache_insert(struct mb_cache *ext4_mb_cache, struct buffer_head *bh) +ext4_xattr_cache_insert(struct mb2_cache *ext4_mb_cache, struct buffer_head *bh) { __u32 hash = le32_to_cpu(BHDR(bh)->h_hash); - struct mb_cache_entry *ce; int error; - ce = mb_cache_entry_alloc(ext4_mb_cache, GFP_NOFS); - if (!ce) { - ea_bdebug(bh, "out of memory"); - return; - } - error = mb_cache_entry_insert(ce, bh->b_bdev, bh->b_blocknr, hash); + error = mb2_cache_entry_create(ext4_mb_cache, GFP_NOFS, hash, + bh->b_blocknr); if (error) { - mb_cache_entry_free(ce); - if (error == -EBUSY) { + if (error == -EBUSY) ea_bdebug(bh, "already in cache"); - error = 0; - } - } else { + } else ea_bdebug(bh, "inserting [%x]", (int)hash); - mb_cache_entry_release(ce); - } } /* @@ -1634,26 +1639,19 @@ ext4_xattr_cmp(struct ext4_xattr_header *header1, */ static struct buffer_head * ext4_xattr_cache_find(struct inode *inode, struct ext4_xattr_header *header, - struct mb_cache_entry **pce) + struct mb2_cache_entry **pce) { __u32 hash = le32_to_cpu(header->h_hash); - struct mb_cache_entry *ce; - struct mb_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); + struct mb2_cache_entry *ce; + struct mb2_cache *ext4_mb_cache = EXT4_GET_MB_CACHE(inode); if (!header->h_hash) return NULL; /* never share */ ea_idebug(inode, "looking for cached blocks [%x]", (int)hash); -again: - ce = mb_cache_entry_find_first(ext4_mb_cache, inode->i_sb->s_bdev, - hash); + ce = mb2_cache_entry_find_first(ext4_mb_cache, hash); while (ce) { struct buffer_head *bh; - if (IS_ERR(ce)) { - if (PTR_ERR(ce) == -EAGAIN) - goto again; - break; - } bh = sb_bread(inode->i_sb, ce->e_block); if (!bh) { EXT4_ERROR_INODE(inode, "block %lu read error", @@ -1669,7 +1667,7 @@ again: return bh; } brelse(bh); - ce = mb_cache_entry_find_next(ce, inode->i_sb->s_bdev, hash); + ce = mb2_cache_entry_find_next(ext4_mb_cache, ce); } return NULL; } @@ -1744,15 +1742,15 @@ static void ext4_xattr_rehash(struct ext4_xattr_header *header, #define HASH_BUCKET_BITS 10 -struct mb_cache * -ext4_xattr_create_cache(char *name) +struct mb2_cache * +ext4_xattr_create_cache(void) { - return mb_cache_create(name, HASH_BUCKET_BITS); + return mb2_cache_create(HASH_BUCKET_BITS); } -void ext4_xattr_destroy_cache(struct mb_cache *cache) +void ext4_xattr_destroy_cache(struct mb2_cache *cache) { if (cache) - mb_cache_destroy(cache); + mb2_cache_destroy(cache); } diff --git a/fs/ext4/xattr.h b/fs/ext4/xattr.h index ddc0957760ba..10b0f7323ed6 100644 --- a/fs/ext4/xattr.h +++ b/fs/ext4/xattr.h @@ -108,7 +108,6 @@ extern int ext4_xattr_set(struct inode *, int, const char *, const void *, size_ extern int ext4_xattr_set_handle(handle_t *, struct inode *, int, const char *, const void *, size_t, int); extern void ext4_xattr_delete_inode(handle_t *, struct inode *); -extern void ext4_xattr_put_super(struct super_block *); extern int ext4_expand_extra_isize_ea(struct inode *inode, int new_extra_isize, struct ext4_inode *raw_inode, handle_t *handle); @@ -124,8 +123,8 @@ extern int ext4_xattr_ibody_inline_set(handle_t *handle, struct inode *inode, struct ext4_xattr_info *i, struct ext4_xattr_ibody_find *is); -extern struct mb_cache *ext4_xattr_create_cache(char *name); -extern void ext4_xattr_destroy_cache(struct mb_cache *); +extern struct mb2_cache *ext4_xattr_create_cache(void); +extern void ext4_xattr_destroy_cache(struct mb2_cache *); #ifdef CONFIG_EXT4_FS_SECURITY extern int ext4_init_security(handle_t *handle, struct inode *inode,