Message ID | 20101125074909.GA4160@amd |
---|---|
State | New, archived |
Headers | show |
On 11/25/2010 09:49 AM, Nick Piggin wrote: > This is a call for review and comments from all filesystem developers. > Not just those cced, but everyone who maintains a filesystem[*], because > I wasn't able to put in a lot of time to understand the more complex > filesystems. > > [*] You all read linux-fsdevel anyway, right? If not, please do, it's > pretty low volume and high s/n. > > So there seem to be several problems with inode data and metadata > synchronization. Some of it is bugs in core code, but also a couple > of oft repeated bugs in filesystems. > > http://marc.info/?l=linux-fsdevel&m=129052164315259&w=2 > > Basically 2 patterns of problem. > > First is checking inode dirty bits > (I_DIRTY/I_DIRTY_SYNC/I_DIRTY_DATASYNC) without locking. What happens is > that other paths (async writeout or even a concurrent sync or fsync) can > clear these bits with the data not being on platter. > > * solution: must wait on I_SYNC before testing these things. See my > patch in above linked series. I think I covered everyone here, but > please double check. > > * There is opportunity in some filesystems for clearing inode dirty bits > in fsync, and for checking and avoiding fsync work if dirty bits are > not sure. > > Second is confusing sync and async inode metadata writeout > Core code clears I_DIRTY_SYNC and I_DIRTY_DATASYNC before calling > ->write_inode *regardless* of whether it is a for-integrity call or > not. This means background writeback can clear it, and subsequent > sync_inode_metadata or sync(2) call will skip the next ->write_inode > completely. > > * solution: for fsync, you must ensure that everything that a > synchronous ->write_inode call does is also done by an > asynchronous ->write_inode call *plus* a subsequent fsync. > > So if a synchronous ->write_inode syncs a bh, but an async one > just marks it dirty, your .fsync would have to sync the bh. > > * solution: for sync(2), you must ensure that everything that a > synchronous ->write_inode call does is also done by an > asynchronous ->write_inode call plus a subsequent ->sync_fs > call, plus __sync_blockdev call on the buffer mapping. > > Many simple filesystems just go via buffer mapping, and > ->write_inode simply dirties the inode's bh. These guys are > fine here (although most are broken wrt fsync). > > If you need any more state bits in your inode to work out what is going > on, Christoph's recent hfsplus fsync optimisation patches has a good > example of how it can be done. > > The ext2 fix below is an example of how we can do this nicely, the > rest of the filesystems I note when it looks like they went wrong, and > band aid fixed it. > > Index: linux-2.6/fs/exofs/inode.c > =================================================================== > --- linux-2.6.orig/fs/exofs/inode.c 2010-11-25 17:25:54.000000000 +1100 > +++ linux-2.6/fs/exofs/inode.c 2010-11-25 17:36:32.000000000 +1100 > @@ -1273,7 +1273,7 @@ static int exofs_update_inode(struct ino > > int exofs_write_inode(struct inode *inode, struct writeback_control *wbc) > { > - return exofs_update_inode(inode, wbc->sync_mode == WB_SYNC_ALL); > + return exofs_update_inode(inode, 1 /* XXX: fix fsync and use wbc->sync_mode == WB_SYNC_ALL */ ); > } > > /* Hi Nick. Thanks for digging into this issue, I bet it's causing pain. Which I totally missed in my tests. I wish I had a better xsync+reboot tests for all this. So in that previous patch you had: > Index: linux-2.6/fs/exofs/file.c > =================================================================== > --- linux-2.6.orig/fs/exofs/file.c 2010-11-19 16:50:00.000000000 +1100 > +++ linux-2.6/fs/exofs/file.c 2010-11-19 16:50:07.000000000 +1100 > @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file > struct inode *inode = filp->f_mapping->host; > struct super_block *sb; > > - if (!(inode->i_state & I_DIRTY)) > - return 0; > - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) > - return 0; > - > ret = sync_inode_metadata(inode, 1); > > /* This is a good place to write the sb */ > Is that a good enough fix for the issue in your opinion? Or is there more involved? In exofs there is nothing special to do other than VFS managment and the final call, by vfs, to .write_inode. I wish we had a simple_file_fsync() from VFS that does what the VFS expects us to do. So when code evolves it does not need to change all FSs. This is the third time I'm fixing this code trying to second guess the VFS. Actually the only other thing I need to do in file_fsync today is sb_sync. But this is a stupidity (and a bug) that I'm fixing soon. So that theoretical simple_file_fsync() would be all I need. Please advise? BTW: Do you want that I take the changes through my tree? Thanks for taking care of this Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 25, 2010 at 11:28:14AM +0200, Boaz Harrosh wrote: > Hi Nick. > Thanks for digging into this issue, I bet it's causing pain. Which > I totally missed in my tests. I wish I had a better xsync+reboot > tests for all this. That's no problem, thanks for looking. > So in that previous patch you had: > > Index: linux-2.6/fs/exofs/file.c > > =================================================================== > > --- linux-2.6.orig/fs/exofs/file.c 2010-11-19 16:50:00.000000000 +1100 > > +++ linux-2.6/fs/exofs/file.c 2010-11-19 16:50:07.000000000 +1100 > > @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file > > struct inode *inode = filp->f_mapping->host; > > struct super_block *sb; > > > > - if (!(inode->i_state & I_DIRTY)) > > - return 0; > > - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) > > - return 0; > > - > > ret = sync_inode_metadata(inode, 1); > > > > /* This is a good place to write the sb */ > > > > Is that a good enough fix for the issue in your opinion? > Or is there more involved? For the inode dirty bit race problem, yes it should fix it. sync_inode_metadata basically makes the same checks without races (in a subsequent patch I re-introduced the datasync optimisation). > In exofs there is nothing special to do other than VFS > managment and the final call, by vfs, to .write_inode. > > I wish we had a simple_file_fsync() from VFS that does > what the VFS expects us to do. So when code evolves it > does not need to change all FSs. This is the third time > I'm fixing this code trying to second guess the VFS. Well in your fsync, you need to wait for inode writeback that might have been started by an asynchronous write_inode. Also, with your sync_inode_metadata call, you shouldn't need the sync_inode call by the looks. > Actually the only other thing I need to do in file_fsync > today is sb_sync. But this is a stupidity (and a bug) that > I'm fixing soon. So that theoretical simple_file_fsync() > would be all I need. > > Please advise? > BTW: Do you want that I take the changes through my tree? At this point I'd just like some review and feedback, we might get some other opinions on how to fix it, so don't take the changes quite yet. I'll cc you again with a broken out patch. Thanks, Nick -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/2010 12:06 PM, Nick Piggin wrote: > On Thu, Nov 25, 2010 at 11:28:14AM +0200, Boaz Harrosh wrote: >>> Index: linux-2.6/fs/exofs/file.c >>> =================================================================== >>> --- linux-2.6.orig/fs/exofs/file.c 2010-11-19 16:50:00.000000000 +1100 >>> +++ linux-2.6/fs/exofs/file.c 2010-11-19 16:50:07.000000000 +1100 >>> @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file >>> struct inode *inode = filp->f_mapping->host; >>> struct super_block *sb; >>> >>> - if (!(inode->i_state & I_DIRTY)) >>> - return 0; >>> - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) >>> - return 0; >>> - >>> ret = sync_inode_metadata(inode, 1); >>> >>> /* This is a good place to write the sb */ >>> >> >> Is that a good enough fix for the issue in your opinion? >> Or is there more involved? > > For the inode dirty bit race problem, yes it should fix it. > sync_inode_metadata basically makes the same checks without > races (in a subsequent patch I re-introduced the datasync > optimisation). > > > > Well in your fsync, you need to wait for inode writeback > that might have been started by an asynchronous write_inode. > All I'm calling is sync_inode_metadata(,1) which calls sync_inode() which calls writeback_single_inode(sync_mode == WB_SYNC_ALL). It gets a little complicated but from the looks of it, even though the call to .write_inode() is not under any lock the state machine there will do inode_wait_for_writeback() if there was one in motion all ready. ? And it looks like writeback_single_inode() does all the proper checks in the correct order for these flags above. So current code in exofs_file_fsync() looks scary to me. I would like to push your above patch for this Kernel. (I'll repost it) > Also, with your sync_inode_metadata call, you shouldn't need the > sync_inode call by the looks. > What? I missed you. You mean I don't need to sync_inode_metadata(,wait==1), or what did you mean? > Thanks, > Nick > Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 25, 2010 at 12:51:11PM +0200, Boaz Harrosh wrote: > On 11/25/2010 12:06 PM, Nick Piggin wrote: > > On Thu, Nov 25, 2010 at 11:28:14AM +0200, Boaz Harrosh wrote: > > >>> Index: linux-2.6/fs/exofs/file.c > >>> =================================================================== > >>> --- linux-2.6.orig/fs/exofs/file.c 2010-11-19 16:50:00.000000000 +1100 > >>> +++ linux-2.6/fs/exofs/file.c 2010-11-19 16:50:07.000000000 +1100 > >>> @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file > >>> struct inode *inode = filp->f_mapping->host; > >>> struct super_block *sb; > >>> > >>> - if (!(inode->i_state & I_DIRTY)) > >>> - return 0; > >>> - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) > >>> - return 0; > >>> - > >>> ret = sync_inode_metadata(inode, 1); > >>> > >>> /* This is a good place to write the sb */ > >>> > >> > >> Is that a good enough fix for the issue in your opinion? > >> Or is there more involved? > > > > For the inode dirty bit race problem, yes it should fix it. > > sync_inode_metadata basically makes the same checks without > > races (in a subsequent patch I re-introduced the datasync > > optimisation). > > > > > > > > > Well in your fsync, you need to wait for inode writeback > > that might have been started by an asynchronous write_inode. > > > > All I'm calling is sync_inode_metadata(,1) which calls sync_inode() > which calls writeback_single_inode(sync_mode == WB_SYNC_ALL). It gets > a little complicated but from the looks of it, even though the > call to .write_inode() is not under any lock the state machine there > will do inode_wait_for_writeback() if there was one in motion > all ready. ? > > And it looks like writeback_single_inode() does all the proper > checks in the correct order for these flags above. > > So current code in exofs_file_fsync() looks scary to me. I would > like to push your above patch for this Kernel. (I'll repost it) It does not get it right, because of the situation I described above. Background writeout can come in first, and clear the inode dirty bits, and call your ->write_inode for async writeout. That means you skip doing the exofs_put_io_state(), and (I presume) this means you aren't waiting for write completion there. What then happens is that sync_inode_metadata() from your fsync does not call ->write_inode because the inode dirty bits are clear. It's basically a noop. So you need to either make your .write_inode always synchronous, or wait for it in your .fsync and .sync_fs. > > Also, with your sync_inode_metadata call, you shouldn't need the > > sync_inode call by the looks. > > > > What? I missed you. You mean I don't need to sync_inode_metadata(,wait==1), > or what did you mean? Sorry, I was looking at the wrong code, ignore that. Nick -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 25, 2010 at 06:49:09PM +1100, Nick Piggin wrote: > Second is confusing sync and async inode metadata writeout > Core code clears I_DIRTY_SYNC and I_DIRTY_DATASYNC before calling > ->write_inode *regardless* of whether it is a for-integrity call or > not. This means background writeback can clear it, and subsequent > sync_inode_metadata or sync(2) call will skip the next ->write_inode > completely. Hmm, this also means that write_inode_now(sync=1) is buggy. It needs to in fact call ->fsync -- which is a file operation unfortunately, Christoph didn't you have some patches to move it into an inode operation? -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Nov 25, 2010 at 10:54:57PM +1100, Nick Piggin wrote: > On Thu, Nov 25, 2010 at 06:49:09PM +1100, Nick Piggin wrote: > > Second is confusing sync and async inode metadata writeout > > Core code clears I_DIRTY_SYNC and I_DIRTY_DATASYNC before calling > > ->write_inode *regardless* of whether it is a for-integrity call or > > not. This means background writeback can clear it, and subsequent > > sync_inode_metadata or sync(2) call will skip the next ->write_inode > > completely. > > Hmm, this also means that write_inode_now(sync=1) is buggy. It > needs to in fact call ->fsync -- which is a file operation > unfortunately, Christoph didn't you have some patches to move it > into an inode operation? No, it doesn't really make much sense either. But what I've slowly started doing is to phase out write_inode_now. For the cases where we really only want to write the inode we should use sync_inode_metadata. That only leaves two others callsers: - iput_final for a filesystem during unmount. This should be caught by the need to call ->sync_fs rule you mentioned above, but needs a closer audit. - nfsd. Any filesystem that cares should just use the commit_metadata export operations, which is a subsystem of ->fsync as it only need to guarantee that metadata is on disk, but not actually any file data - so no cache flush mess as in a real fsync implementation. -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11/25/2010 01:47 PM, Nick Piggin wrote: > On Thu, Nov 25, 2010 at 12:51:11PM +0200, Boaz Harrosh wrote: > > It does not get it right, because of the situation I described > above. Background writeout can come in first, and clear the inode > dirty bits, and call your ->write_inode for async writeout. > > That means you skip doing the exofs_put_io_state(), and (I presume) > this means you aren't waiting for write completion there. > > What then happens is that sync_inode_metadata() from your fsync > does not call ->write_inode because the inode dirty bits are clear. > It's basically a noop. So you need to either make your .write_inode > always synchronous, or wait for it in your .fsync and .sync_fs. > Rrr I now see what you mean, that the previous call to writeback_single_inode(wait==0) came and go without actually finishing because exofs_write_inode(wait==0). So I wish there was an write_inode_async_done() that will actually do the final inode_sync_complete or something like that, right? (That I could call from exofs::updatei_done) Sigh. I think I'll go with the always wait==1 at exofs_write_inode() for now. Just as your patch. If there is nothing better by the next Kernel I might consider my own wait_event() in exofs_file_fsync() ACK-by: Boaz Harrosh <bharrosh@panasas.com> Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Index: linux-2.6/fs/ext2/inode.c =================================================================== --- linux-2.6.orig/fs/ext2/inode.c 2010-11-24 11:29:42.000000000 +1100 +++ linux-2.6/fs/ext2/inode.c 2010-11-25 17:22:49.000000000 +1100 @@ -1211,7 +1211,7 @@ static int ext2_setsize(struct inode *in return 0; } -static struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino, +struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino, struct buffer_head **p) { struct buffer_head * bh; @@ -1505,16 +1505,8 @@ static int __ext2_write_inode(struct ino } else for (n = 0; n < EXT2_N_BLOCKS; n++) raw_inode->i_block[n] = ei->i_data[n]; mark_buffer_dirty(bh); - if (do_sync) { - sync_dirty_buffer(bh); - if (buffer_req(bh) && !buffer_uptodate(bh)) { - printk ("IO error syncing ext2 inode [%s:%08lx]\n", - sb->s_id, (unsigned long) ino); - err = -EIO; - } - } - ei->i_state &= ~EXT2_STATE_NEW; brelse (bh); + ei->i_state &= ~EXT2_STATE_NEW; return err; } Index: linux-2.6/fs/ext2/file.c =================================================================== --- linux-2.6.orig/fs/ext2/file.c 2010-11-24 11:29:42.000000000 +1100 +++ linux-2.6/fs/ext2/file.c 2010-11-24 12:23:53.000000000 +1100 @@ -21,6 +21,7 @@ #include <linux/time.h> #include <linux/pagemap.h> #include <linux/quotaops.h> +#include <linux/buffer_head.h> #include "ext2.h" #include "xattr.h" #include "acl.h" @@ -43,16 +44,33 @@ static int ext2_release_file (struct ino int ext2_fsync(struct file *file, int datasync) { int ret; - struct super_block *sb = file->f_mapping->host->i_sb; - struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; + struct inode *inode = file->f_mapping->host; + ino_t ino = inode->i_ino; + struct super_block *sb = inode->i_sb; + struct address_space *sb_mapping = sb->s_bdev->bd_inode->i_mapping; + struct buffer_head *bh; + struct ext2_inode *raw_inode; ret = generic_file_fsync(file, datasync); - if (ret == -EIO || test_and_clear_bit(AS_EIO, &mapping->flags)) { + if (ret == -EIO || test_and_clear_bit(AS_EIO, &sb_mapping->flags)) { /* We don't really know where the IO error happened... */ ext2_error(sb, __func__, "detected IO error when writing metadata buffers"); + return -EIO; + } + + raw_inode = ext2_get_inode(sb, ino, &bh); + if (IS_ERR(raw_inode)) + return -EIO; + + sync_dirty_buffer(bh); + if (buffer_req(bh) && !buffer_uptodate(bh)) { + printk ("IO error syncing ext2 inode [%s:%08lx]\n", + sb->s_id, (unsigned long) ino); ret = -EIO; } + brelse (bh); + return ret; } Index: linux-2.6/fs/ext2/ext2.h =================================================================== --- linux-2.6.orig/fs/ext2/ext2.h 2010-11-24 11:29:42.000000000 +1100 +++ linux-2.6/fs/ext2/ext2.h 2010-11-24 12:23:53.000000000 +1100 @@ -124,6 +124,8 @@ extern int ext2_get_block(struct inode * extern int ext2_setattr (struct dentry *, struct iattr *); extern void ext2_set_inode_flags(struct inode *inode); extern void ext2_get_inode_flags(struct ext2_inode_info *); +extern struct ext2_inode *ext2_get_inode(struct super_block *sb, ino_t ino, + struct buffer_head **p); extern int ext2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo, u64 start, u64 len); Index: linux-2.6/fs/adfs/inode.c =================================================================== --- linux-2.6.orig/fs/adfs/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/adfs/inode.c 2010-11-25 17:30:08.000000000 +1100 @@ -383,7 +383,7 @@ int adfs_write_inode(struct inode *inode obj.attr = ADFS_I(inode)->attr; obj.size = inode->i_size; - ret = adfs_dir_update(sb, &obj, wbc->sync_mode == WB_SYNC_ALL); + ret = adfs_dir_update(sb, &obj, 1 /* XXX: fix fsync and use 'wbc->sync_mode == WB_SYNC_ALL' */); unlock_kernel(); return ret; } Index: linux-2.6/fs/affs/file.c =================================================================== --- linux-2.6.orig/fs/affs/file.c 2010-11-25 17:31:12.000000000 +1100 +++ linux-2.6/fs/affs/file.c 2010-11-25 17:31:30.000000000 +1100 @@ -931,6 +931,7 @@ int affs_file_fsync(struct file *filp, i int ret, err; ret = write_inode_now(inode, 0); + /* XXX: could just sync the buffer been dirtied by write_inode */ err = sync_blockdev(inode->i_sb->s_bdev); if (!ret) ret = err; Index: linux-2.6/fs/bfs/inode.c =================================================================== --- linux-2.6.orig/fs/bfs/inode.c 2010-11-25 17:23:40.000000000 +1100 +++ linux-2.6/fs/bfs/inode.c 2010-11-25 17:32:26.000000000 +1100 @@ -151,7 +151,7 @@ static int bfs_write_inode(struct inode di->i_eoffset = cpu_to_le32(i_sblock * BFS_BSIZE + inode->i_size - 1); mark_buffer_dirty(bh); - if (wbc->sync_mode == WB_SYNC_ALL) { + if (1 /* XXX: fix fsync and use wbc->sync_mode == WB_SYNC_ALL */ ) { sync_dirty_buffer(bh); if (buffer_req(bh) && !buffer_uptodate(bh)) err = -EIO; Index: linux-2.6/fs/exofs/inode.c =================================================================== --- linux-2.6.orig/fs/exofs/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/exofs/inode.c 2010-11-25 17:36:32.000000000 +1100 @@ -1273,7 +1273,7 @@ static int exofs_update_inode(struct ino int exofs_write_inode(struct inode *inode, struct writeback_control *wbc) { - return exofs_update_inode(inode, wbc->sync_mode == WB_SYNC_ALL); + return exofs_update_inode(inode, 1 /* XXX: fix fsync and use wbc->sync_mode == WB_SYNC_ALL */ ); } /* Index: linux-2.6/fs/ext4/inode.c =================================================================== --- linux-2.6.orig/fs/ext4/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/ext4/inode.c 2010-11-25 17:40:37.000000000 +1100 @@ -5240,7 +5240,7 @@ int ext4_write_inode(struct inode *inode err = __ext4_get_inode_loc(inode, &iloc, 0); if (err) return err; - if (wbc->sync_mode == WB_SYNC_ALL) + if (1 /* XXX: fix fxync and use wbc->sync_mode == WB_SYNC_ALL */) sync_dirty_buffer(iloc.bh); if (buffer_req(iloc.bh) && !buffer_uptodate(iloc.bh)) { EXT4_ERROR_INODE_BLOCK(inode, iloc.bh->b_blocknr, Index: linux-2.6/fs/fat/inode.c =================================================================== --- linux-2.6.orig/fs/fat/inode.c 2010-11-25 17:23:40.000000000 +1100 +++ linux-2.6/fs/fat/inode.c 2010-11-25 17:42:04.000000000 +1100 @@ -645,7 +645,7 @@ static int __fat_write_inode(struct inod spin_unlock(&sbi->inode_hash_lock); mark_buffer_dirty(bh); err = 0; - if (wait) + if (1 /* XXX: fix fsync and use wait */) err = sync_dirty_buffer(bh); brelse(bh); return err; Index: linux-2.6/fs/jfs/inode.c =================================================================== --- linux-2.6.orig/fs/jfs/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/jfs/inode.c 2010-11-25 17:45:27.000000000 +1100 @@ -123,7 +123,7 @@ int jfs_commit_inode(struct inode *inode int jfs_write_inode(struct inode *inode, struct writeback_control *wbc) { - int wait = wbc->sync_mode == WB_SYNC_ALL; + int wait = 1; /* XXX fix fsync and use wbc->sync_mode == WB_SYNC_ALL; */ if (test_cflag(COMMIT_Nolink, inode)) return 0; Index: linux-2.6/fs/minix/inode.c =================================================================== --- linux-2.6.orig/fs/minix/inode.c 2010-11-25 17:23:40.000000000 +1100 +++ linux-2.6/fs/minix/inode.c 2010-11-25 17:46:42.000000000 +1100 @@ -576,7 +576,7 @@ static int minix_write_inode(struct inod bh = V2_minix_update_inode(inode); if (!bh) return -EIO; - if (wbc->sync_mode == WB_SYNC_ALL && buffer_dirty(bh)) { + if (1 /* XXX: fix fsync and use wbc->sync_mode == WB_SYNC_ALL */ && buffer_dirty(bh)) { sync_dirty_buffer(bh); if (buffer_req(bh) && !buffer_uptodate(bh)) { printk("IO error syncing minix inode [%s:%08lx]\n", Index: linux-2.6/fs/omfs/inode.c =================================================================== --- linux-2.6.orig/fs/omfs/inode.c 2010-11-25 17:23:40.000000000 +1100 +++ linux-2.6/fs/omfs/inode.c 2010-11-25 17:50:50.000000000 +1100 @@ -169,7 +169,7 @@ static int __omfs_write_inode(struct ino static int omfs_write_inode(struct inode *inode, struct writeback_control *wbc) { - return __omfs_write_inode(inode, wbc->sync_mode == WB_SYNC_ALL); + return __omfs_write_inode(inode, 1 /* XXX: fix fsync and use wbc->sync_mode == WB_SYNC_ALL */); } int omfs_sync_inode(struct inode *inode) Index: linux-2.6/fs/reiserfs/inode.c =================================================================== --- linux-2.6.orig/fs/reiserfs/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/reiserfs/inode.c 2010-11-25 17:53:45.000000000 +1100 @@ -1635,6 +1635,8 @@ int reiserfs_write_inode(struct inode *i ** these cases are just when the system needs ram, not when the ** inode needs to reach disk for safety, and they can safely be ** ignored because the altered inode has already been logged. + ** XXX: is this really OK? The caller clears the inode dirty bit, so + ** a subsequent sync for integrity might never reach here. */ if (wbc->sync_mode == WB_SYNC_ALL && !(current->flags & PF_MEMALLOC)) { reiserfs_write_lock(inode->i_sb); Index: linux-2.6/fs/sysv/inode.c =================================================================== --- linux-2.6.orig/fs/sysv/inode.c 2010-11-25 17:23:40.000000000 +1100 +++ linux-2.6/fs/sysv/inode.c 2010-11-25 17:54:47.000000000 +1100 @@ -286,7 +286,7 @@ static int __sysv_write_inode(struct ino write3byte(sbi, (u8 *)&si->i_data[block], &raw_inode->i_data[3*block]); mark_buffer_dirty(bh); - if (wait) { + if (1 /* XXX: fix fsync and use wait */) { sync_dirty_buffer(bh); if (buffer_req(bh) && !buffer_uptodate(bh)) { printk ("IO error syncing sysv inode [%s:%08x]\n", Index: linux-2.6/fs/udf/inode.c =================================================================== --- linux-2.6.orig/fs/udf/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/udf/inode.c 2010-11-25 17:55:36.000000000 +1100 @@ -1598,7 +1598,7 @@ static int udf_update_inode(struct inode /* write the data blocks */ mark_buffer_dirty(bh); - if (do_sync) { + if (1 /* XXX fix fsync and use do_sync */) { sync_dirty_buffer(bh); if (buffer_write_io_error(bh)) { printk(KERN_WARNING "IO error syncing udf inode " Index: linux-2.6/fs/ufs/inode.c =================================================================== --- linux-2.6.orig/fs/ufs/inode.c 2010-11-25 17:25:54.000000000 +1100 +++ linux-2.6/fs/ufs/inode.c 2010-11-25 17:56:12.000000000 +1100 @@ -889,7 +889,7 @@ static int ufs_update_inode(struct inode } mark_buffer_dirty(bh); - if (do_sync) + if (1 /* XXX: fix fsync and use do_sync */) sync_dirty_buffer(bh); brelse (bh);