From patchwork Mon Sep 10 18:13:21 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [1/2, Oneiric, CVE-2012-2127] VFS : mount lock scalability for internal mounts Date: Mon, 10 Sep 2012 08:13:21 -0000 From: Tim Gardner X-Patchwork-Id: 182942 Message-Id: <1347300802-57353-1-git-send-email-tim.gardner@canonical.com> To: kernel-team@lists.ubuntu.com From: Tim Chen CVE-2012-2127 BugLink: http://bugs.launchpad.net/bugs/990365 For a number of file systems that don't have a mount point (e.g. sockfs and pipefs), they are not marked as long term. Therefore in mntput_no_expire, all locks in vfs_mount lock are taken instead of just local cpu's lock to aggregate reference counts when we release reference to file objects. In fact, only local lock need to have been taken to update ref counts as these file systems are in no danger of going away until we are ready to unregister them. The attached patch marks file systems using kern_mount without mount point as long term. The contentions of vfs_mount lock is now eliminated. Before un-registering such file system, kern_unmount should be called to remove the long term flag and make the mount point ready to be freed. Signed-off-by: Tim Chen Signed-off-by: Al Viro (back ported from commit 423e0ab086ad8b33626e45fa94ac7613146b7ffa) Conflicts: fs/namespace.c Signed-off-by: Tim Gardner --- drivers/mtd/mtdchar.c | 2 +- fs/anon_inodes.c | 2 +- fs/hugetlbfs/inode.c | 1 + fs/namespace.c | 21 ++++++++++++++++++++- fs/pipe.c | 2 +- include/linux/fs.h | 1 + security/selinux/selinuxfs.c | 1 + 7 files changed, 26 insertions(+), 4 deletions(-) diff --git a/drivers/mtd/mtdchar.c b/drivers/mtd/mtdchar.c index 9f8658e..49e20a4 100644 --- a/drivers/mtd/mtdchar.c +++ b/drivers/mtd/mtdchar.c @@ -1193,7 +1193,7 @@ err_unregister_chdev: static void __exit cleanup_mtdchar(void) { unregister_mtd_user(&mtdchar_notifier); - mntput(mtd_inode_mnt); + kern_unmount(mtd_inode_mnt); unregister_filesystem(&mtd_inodefs_type); __unregister_chrdev(MTD_CHAR_MAJOR, 0, 1 << MINORBITS, "mtd"); } diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c index c5567cb..4d433d3 100644 --- a/fs/anon_inodes.c +++ b/fs/anon_inodes.c @@ -233,7 +233,7 @@ static int __init anon_inode_init(void) return 0; err_mntput: - mntput(anon_inode_mnt); + kern_unmount(anon_inode_mnt); err_unregister_filesystem: unregister_filesystem(&anon_inode_fs_type); err_exit: diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 7476273..203e520 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1012,6 +1012,7 @@ static int __init init_hugetlbfs_fs(void) static void __exit exit_hugetlbfs_fs(void) { kmem_cache_destroy(hugetlbfs_inode_cachep); + kern_unmount(hugetlbfs_vfsmount); unregister_filesystem(&hugetlbfs_fs_type); bdi_destroy(&hugetlbfs_backing_dev_info); } diff --git a/fs/namespace.c b/fs/namespace.c index 5e25baa..c22d58a 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -2738,7 +2738,16 @@ EXPORT_SYMBOL(put_mnt_ns); struct vfsmount *kern_mount_data(struct file_system_type *type, void *data) { - return vfs_kern_mount(type, MS_KERNMOUNT, type->name, data); + struct vfsmount *mnt; + mnt = vfs_kern_mount(type, MS_KERNMOUNT, type->name, data); + if (!IS_ERR(mnt)) { + /* + * it is a longterm mount, don't release mnt until + * we unmount before file sys is unregistered + */ + mnt_make_longterm(mnt); + } + return mnt; } EXPORT_SYMBOL_GPL(kern_mount_data); @@ -2746,3 +2755,13 @@ bool our_mnt(struct vfsmount *mnt) { return check_mnt(mnt); } + +void kern_unmount(struct vfsmount *mnt) +{ + /* release long term mount so mount point can be released */ + if (!IS_ERR_OR_NULL(mnt)) { + mnt_make_shortterm(mnt); + mntput(mnt); + } +} +EXPORT_SYMBOL(kern_unmount); diff --git a/fs/pipe.c b/fs/pipe.c index 0499a96..b97d5ec 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1318,8 +1318,8 @@ static int __init init_pipe_fs(void) static void __exit exit_pipe_fs(void) { + kern_unmount(pipe_mnt); unregister_filesystem(&pipe_fs_type); - mntput(pipe_mnt); } fs_initcall(init_pipe_fs); diff --git a/include/linux/fs.h b/include/linux/fs.h index d5447ab..834e60a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1886,6 +1886,7 @@ extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *); extern struct vfsmount *kern_mount_data(struct file_system_type *, void *data); #define kern_mount(type) kern_mount_data(type, NULL) +extern void kern_unmount(struct vfsmount *mnt); extern int may_umount_tree(struct vfsmount *); extern int may_umount(struct vfsmount *); extern long do_mount(char *, char *, char *, unsigned long, void *); diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c index 27a9673..7cefaf3 100644 --- a/security/selinux/selinuxfs.c +++ b/security/selinux/selinuxfs.c @@ -1985,6 +1985,7 @@ __initcall(init_sel_fs); void exit_sel_fs(void) { kobject_put(selinuxfs_kobj); + kern_unmount(selinuxfs_mount); unregister_filesystem(&sel_fs_type); } #endif