From patchwork Tue Jan 20 16:05:27 2009 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Kara X-Patchwork-Id: 19517 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.176.167]) by ozlabs.org (Postfix) with ESMTP id BC80EDDF59 for ; Wed, 21 Jan 2009 03:05:32 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757617AbZATQFa (ORCPT ); Tue, 20 Jan 2009 11:05:30 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1757705AbZATQFa (ORCPT ); Tue, 20 Jan 2009 11:05:30 -0500 Received: from styx.suse.cz ([82.119.242.94]:57485 "EHLO mail.suse.cz" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1757617AbZATQF3 (ORCPT ); Tue, 20 Jan 2009 11:05:29 -0500 Received: from duck.suse.cz (duck.suse.cz [10.20.1.74]) by mail.suse.cz (Postfix) with ESMTP id 22B5E6280C9; Tue, 20 Jan 2009 17:05:28 +0100 (CET) Received: by duck.suse.cz (Postfix, from userid 10005) id E86011F1E2A; Tue, 20 Jan 2009 17:05:27 +0100 (CET) Date: Tue, 20 Jan 2009 17:05:27 +0100 From: Jan Kara To: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org, Andrew Morton , Theodore Tso Subject: [RFC] [PATCH] vfs: Call filesystem callback when backing device caches should be flushed Message-ID: <20090120160527.GA17067@duck.suse.cz> MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.17 (2007-11-01) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Hi, we noted in our testing that ext2 (and it seems some other filesystems as well) don't flush disk's write caches on cases like fsync() or changing DIRSYNC directory. This is my attempt to solve the problem in a generic way by calling a filesystem callback from VFS at appropriate place as Andrew suggested. For ext2 what I did is enough (it just then fills in block_flush_device() as .flush_device callback) and I think it could be fine for other filesystems as well. There is one remaining issue though: Should we call .flush_device in generic_sync_sb_inodes()? Generally places like __fsync_super() or do_sync() seem more appropriate (although in do_sync() we'd have to do one more traversal of superblocks) to me. But maybe question which should be answered first is: Is it correct how we implement __fsync_super() or do_sync()? E.g. in do_sync() we do: sync_inodes(0); /* All mappings, inodes and their blockdevs */ DQUOT_SYNC(NULL); sync_supers(); /* Write the superblocks */ sync_filesystems(0); /* Start syncing the filesystems */ sync_filesystems(wait); /* Waitingly sync the filesystems */ sync_inodes(wait); /* Mappings, inodes and blockdevs, again. */ But sync_inodes(0) results in writeback done with WB_SYNC_NONE which does not have to flush all the dirty data. So until last sync_inodes(wait) we cannot be sure that all the dirty data has been submitted to disk. But such writes could in theory dirty the superblock or similar structures again. So shouldn't we rather do: ... sync_inodes(wait); /* Mappings, inodes and blockdevs, again. */ sync_filesystems(wait); /* Waitingly sync the filesystems */ And one more question: Should we also call .flush_device after fsync? Filesystems can already issue the flush in their .fsync callbacks so it's not necessary. The question is what's easier to use... Thanks for comments. Honza diff --git a/fs/buffer.c b/fs/buffer.c index b6e8b86..1be876c 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -165,6 +165,17 @@ void end_buffer_write_sync(struct buffer_head *bh, int uptodate) put_bh(bh); } +/* Issue flush of write caches on the block device */ +int block_flush_device(struct super_block *sb) +{ + int ret = blkdev_issue_flush(sb->s_bdev, NULL); + + if (ret == -EOPNOTSUPP) + return 0; + return ret; +} +EXPORT_SYMBOL(block_flush_device); + /* * Write out and wait upon all the dirty data associated with a block * device via its mapping. Does not take the superblock lock. diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index e5eaa62..fcfacb2 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -412,6 +412,16 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc) } /* + * Make filesystem flush backing device of the filesystem + */ +static int flush_backing_device(struct super_block *sb) +{ + if (sb->s_op->flush_device) + return sb->s_op->flush_device(sb); + return 0; +} + +/* * Write out a superblock's list of dirty inodes. A wait will be performed * upon no inodes, all inodes or the final one, depending upon sync_mode. * @@ -557,6 +567,7 @@ void generic_sync_sb_inodes(struct super_block *sb, } spin_unlock(&inode_lock); iput(old_inode); + flush_backing_device(sb); } else spin_unlock(&inode_lock); @@ -752,6 +763,8 @@ int sync_inode(struct inode *inode, struct writeback_control *wbc) spin_lock(&inode_lock); ret = __writeback_single_inode(inode, wbc); spin_unlock(&inode_lock); + if (!ret && wbc->sync_mode == WB_SYNC_ALL) + ret = flush_backing_device(inode->i_sb); return ret; } EXPORT_SYMBOL(sync_inode); @@ -806,6 +819,9 @@ int generic_osync_inode(struct inode *inode, struct address_space *mapping, int else inode_sync_wait(inode); + if (!err) + err = flush_backing_device(inode->i_sb); + return err; } EXPORT_SYMBOL(generic_osync_inode); diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index bd7ac79..c154cfd 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -238,6 +238,7 @@ int nobh_write_end(struct file *, struct address_space *, int nobh_truncate_page(struct address_space *, loff_t, get_block_t *); int nobh_writepage(struct page *page, get_block_t *get_block, struct writeback_control *wbc); +int block_flush_device(struct super_block *sb); void buffer_init(void); diff --git a/include/linux/fs.h b/include/linux/fs.h index 6022f44..c37f9f0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1390,6 +1390,7 @@ struct super_operations { int (*remount_fs) (struct super_block *, int *, char *); void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); + int (*flush_device) (struct super_block *); int (*show_options)(struct seq_file *, struct vfsmount *); int (*show_stats)(struct seq_file *, struct vfsmount *);