From patchwork Thu Nov 20 00:32:00 2008 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Toshiyuki Okajima X-Patchwork-Id: 9670 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id 9662EDDDE3 for ; Thu, 20 Nov 2008 11:44:39 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751751AbYKTAoh (ORCPT ); Wed, 19 Nov 2008 19:44:37 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751929AbYKTAoh (ORCPT ); Wed, 19 Nov 2008 19:44:37 -0500 Received: from fgwmail7.fujitsu.co.jp ([192.51.44.37]:55729 "EHLO fgwmail7.fujitsu.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751009AbYKTAog (ORCPT ); Wed, 19 Nov 2008 19:44:36 -0500 Received: from m3.gw.fujitsu.co.jp ([10.0.50.73]) by fgwmail7.fujitsu.co.jp (Fujitsu Gateway) with ESMTP id mAK0iYfx007221 (envelope-from toshi.okajima@jp.fujitsu.com); Thu, 20 Nov 2008 09:44:34 +0900 Received: from smail (m3 [127.0.0.1]) by outgoing.m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 36FF045DD7E; Thu, 20 Nov 2008 09:44:34 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (s3.gw.fujitsu.co.jp [10.0.50.93]) by m3.gw.fujitsu.co.jp (Postfix) with ESMTP id 1367945DD78; Thu, 20 Nov 2008 09:44:34 +0900 (JST) Received: from s3.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id DFFCDE08003; Thu, 20 Nov 2008 09:44:33 +0900 (JST) Received: from ml10.s.css.fujitsu.com (ml10.s.css.fujitsu.com [10.249.87.100]) by s3.gw.fujitsu.co.jp (Postfix) with ESMTP id 7A5BC1DB803B; Thu, 20 Nov 2008 09:44:33 +0900 (JST) Received: from ml10.css.fujitsu.com (ml10 [127.0.0.1]) by ml10.s.css.fujitsu.com (Postfix) with ESMTP id 86A68D0001A; Thu, 20 Nov 2008 09:43:50 +0900 (JST) Received: from stratos.soft.fujitsu.com (stratos.soft.fujitsu.com [10.124.101.114]) by ml10.s.css.fujitsu.com (Postfix) with SMTP id 7060AD00006; Thu, 20 Nov 2008 09:43:28 +0900 (JST) Date: Thu, 20 Nov 2008 09:32:00 +0900 From: Toshiyuki Okajima To: akpm@linux-foundation.org, tytso@mit.edu, viro@zeniv.linux.org.uk, sct@redhat.com, adilger@sun.com Cc: linux-ext4@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [RESEND][PATCH 1/3 BUG,RFC] vfs: release block-device-mapping buffer_heads which have the filesystem private data for avoiding oom-killer Message-Id: <20081120093200.68b5bbd6.toshi.okajima@jp.fujitsu.com> Organization: Fujitsu co.,ltd. X-Mailer: Sylpheed 2.5.0rc2 (GTK+ 2.12.12; i386-redhat-linux-gnu) Mime-Version: 1.0 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org From: Toshiyuki Okajima Implement blkdev_releasepage() to release the buffer_heads and page after we release private data which are a client's. One of the clients is a filesystem. blkdev_releasepage() can call the client's releasepage() which is registered by blkdev_register_client_releasepage() to release its private data. Signed-off-by: Toshiyuki Okajima --- fs/block_dev.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++ fs/super.c | 22 ++++++++++++++++++ include/linux/fs.h | 9 +++++++ 3 files changed, 93 insertions(+) -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff -Nurp linux-2.6.28-rc4.orig/fs/block_dev.c linux-2.6.28-rc4/fs/block_dev.c --- linux-2.6.28-rc4.orig/fs/block_dev.c 2008-11-10 09:36:15.000000000 +0900 +++ linux-2.6.28-rc4/fs/block_dev.c 2008-11-10 18:33:52.000000000 +0900 @@ -29,6 +29,9 @@ struct bdev_inode { struct block_device bdev; + void *client; + int (*client_releasepage)(void*, struct page*, gfp_t); + rwlock_t client_lock; struct inode vfs_inode; }; @@ -260,6 +263,9 @@ static struct inode *bdev_alloc_inode(st struct bdev_inode *ei = kmem_cache_alloc(bdev_cachep, GFP_KERNEL); if (!ei) return NULL; + ei->client = NULL; + ei->client_releasepage = NULL; + rwlock_init(&ei->client_lock); return &ei->vfs_inode; } @@ -1208,6 +1214,61 @@ static long block_ioctl(struct file *fil return blkdev_ioctl(bdev, mode, cmd, arg); } +/* + * blkdev_releasepage: execute ei->client_releasepage() if it exists. + * Otherwise, execute try_to_free_buffers(). + * ei->client_releasepage() releases private client's page if possible. + * Because a buffer_head's using counter is bigger than 0 if a client has + * a page for private usage. If so, try_to_free_buffers() cannot release it. + * Therefore a client must try to release a page itself. + */ +static int blkdev_releasepage(struct page *page, gfp_t wait) +{ + struct bdev_inode *ei = BDEV_I(page->mapping->host); + int ret; + + read_lock(&ei->client_lock); + if (ei->client_releasepage != NULL) + ret = (*ei->client_releasepage)(ei->client, page, wait); + else + ret = try_to_free_buffers(page); + read_unlock(&ei->client_lock); + return ret; +} + +/* + * blkdev_register_client_releasepage: register client_releasepage. + */ +int blkdev_register_client_releasepage(struct block_device *bdev, + void *client, int (*releasepage)(void*, struct page*, gfp_t)) +{ + struct bdev_inode *ei = BDEV_I(bdev->bd_inode); + int ret = 1; + + write_lock(&ei->client_lock); + if (ei->client == NULL && ei->client_releasepage == NULL) { + ei->client = client; + ei->client_releasepage = releasepage; + } else if (ei->client != client + || ei->client_releasepage != releasepage) + ret = 0; + write_unlock(&ei->client_lock); + return ret; +} + +/* + * blkdev_unregister_client_releasepage: unregister client_releasepage. + */ +void blkdev_unregister_client_releasepage(struct block_device *bdev) +{ + struct bdev_inode *ei = BDEV_I(bdev->bd_inode); + + write_lock(&ei->client_lock); + ei->client = NULL; + ei->client_releasepage = NULL; + write_unlock(&ei->client_lock); +} + static const struct address_space_operations def_blk_aops = { .readpage = blkdev_readpage, .writepage = blkdev_writepage, @@ -1215,6 +1276,7 @@ static const struct address_space_operat .write_begin = blkdev_write_begin, .write_end = blkdev_write_end, .writepages = generic_writepages, + .releasepage = blkdev_releasepage, .direct_IO = blkdev_direct_IO, }; diff -Nurp linux-2.6.28-rc4.orig/fs/super.c linux-2.6.28-rc4/fs/super.c --- linux-2.6.28-rc4.orig/fs/super.c 2008-11-10 09:36:15.000000000 +0900 +++ linux-2.6.28-rc4/fs/super.c 2008-11-11 09:25:04.000000000 +0900 @@ -801,6 +801,18 @@ int get_sb_bdev(struct file_system_type s->s_flags |= MS_ACTIVE; } + /* + * register a client function which releases a page whose mapping is + * block device + */ + if (fs_type->release_metadata != NULL + && !blkdev_register_client_releasepage(bdev, s, + fs_type->release_metadata)) { + up_write(&s->s_umount); + deactivate_super(s); + error = -EBUSY; + goto error_bdev; + } return simple_set_mnt(mnt, s); @@ -819,6 +831,16 @@ void kill_block_super(struct super_block struct block_device *bdev = sb->s_bdev; fmode_t mode = sb->s_mode; + /* + * unregister a client function which releases a page whose mapping is + * block device + * + * This is sure to be unmounting here, and it releases all own data + * itself. Therefore the filesystem's function which is owned by the + * block device, which releases its data is not needed any more. + */ + if (sb->s_type->release_metadata != NULL) + blkdev_unregister_client_releasepage(bdev); generic_shutdown_super(sb); sync_blockdev(bdev); close_bdev_exclusive(bdev, mode); diff -Nurp linux-2.6.28-rc4.orig/include/linux/fs.h linux-2.6.28-rc4/include/linux/fs.h --- linux-2.6.28-rc4.orig/include/linux/fs.h 2008-11-10 09:36:15.000000000 +0900 +++ linux-2.6.28-rc4/include/linux/fs.h 2008-11-11 09:01:12.000000000 +0900 @@ -1538,6 +1538,7 @@ struct file_system_type { int (*get_sb) (struct file_system_type *, int, const char *, void *, struct vfsmount *); void (*kill_sb) (struct super_block *); + int (*release_metadata)(void*, struct page*, gfp_t); struct module *owner; struct file_system_type * next; struct list_head fs_supers; @@ -1699,8 +1700,16 @@ extern void bd_set_size(struct block_dev extern void bd_forget(struct inode *inode); extern void bdput(struct block_device *); extern struct block_device *open_by_devnum(dev_t, fmode_t); +extern int blkdev_register_client_releasepage(struct block_device *, + void *, int (*releasepage)(void *, struct page*, gfp_t)); +extern void blkdev_unregister_client_releasepage(struct block_device *); #else static inline void bd_forget(struct inode *inode) {} +static inline int blkdev_register_client_releasepage(struct block_device *, + void *, int (*releasepage)(void *, struct page*, gfp_t)) +{ return 1; } +static inline void blkdev_unregister_client_releasepage(struct block_device *) +{} #endif extern const struct file_operations def_blk_fops; extern const struct file_operations def_chr_fops;