Message ID | 1305097841-2308-1-git-send-email-surbhi.palande@canonical.com |
---|---|
State | Superseded, archived |
Headers | show |
On 5/11/11 2:10 AM, Surbhi Palande wrote: > While the fsstress background writes are busy dirtying the page cache, if a > fsfreeze happens then the background writes should stall. A sync should then > not have any data to sync to the FS. If it does have any data to sync then > sync will cause a deadlock by holding the s_umount write semaphore and waiting > in the wait queue for the FS to thaw, whereas the F.S can never thaw without > getting the s_umount write semaphore. > > Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Seems ok to me. In the future, when sending xfstests patches, if you can add "xfstests" to the subject, and cc: the xfs list, it'd be great. I presume that this test does fail for you without your fixes? I'll see if anyone on the xfs list has comments and if not, I can check this in. Thanks, -Eric > --- > 068 | 5 +++++ > 1 files changed, 5 insertions(+), 0 deletions(-) > > diff --git a/068 b/068 > index 82c1a4e..b9ac58d 100755 > --- a/068 > +++ b/068 > @@ -101,6 +101,11 @@ do > tee -a $seq.full > sleep 2 > > + # there should be nothing to sync at this point. This may hang in case > + # of fsstress background writes dirtying the page cache while the F.S is frozen > + sync & > + sleep 2 > + > echo "*** thawing \$SCRATCH_MNT" | tee -a $seq.full > xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full > [ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \ -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote: > While the fsstress background writes are busy dirtying the page cache, if a > fsfreeze happens then the background writes should stall. A sync should then > not have any data to sync to the FS. If it does have any data to sync then > sync will cause a deadlock by holding the s_umount write semaphore and waiting > in the wait queue for the FS to thaw, whereas the F.S can never thaw without > getting the s_umount write semaphore. > > Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> Hi Surbhi, Have you tried out Jan Kara's patches? [1/3] fs: Create __block_page_mkwrite() helper passing error values back [2/3] vfs: Block mmapped writes while the fs is frozen [3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page Do these patches fix the problem you've been trying to fix with your patches? I believe they should, but I would appreciate confirmation that with these patches, you're no longer able to reproduce the problem you've been concerned about. Thanks, regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Ted, On 05/25/2011 12:42 AM, Ted Ts'o wrote: > On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote: >> While the fsstress background writes are busy dirtying the page cache, if a >> fsfreeze happens then the background writes should stall. A sync should then >> not have any data to sync to the FS. If it does have any data to sync then >> sync will cause a deadlock by holding the s_umount write semaphore and waiting >> in the wait queue for the FS to thaw, whereas the F.S can never thaw without >> getting the s_umount write semaphore. >> >> Signed-off-by: Surbhi Palande<surbhi.palande@canonical.com> > > Hi Surbhi, > > Have you tried out Jan Kara's patches? > > [1/3] fs: Create __block_page_mkwrite() helper passing error values back > [2/3] vfs: Block mmapped writes while the fs is frozen > [3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page Yes! We have tried these patches and we still see the same deadlock/hang. The following is the reason for it: // lets assume the inode is clean and so are its pages. P1: process that tries mmap write t1) __do_fault() t2) ext4_page_mkwrite() t3) block_page_mkwrite() t4) vfs_check_frozen() // filesystem is not frozen so control falls through. t5) __block_page_mkwrite() t6) set_page_dirty() t7) __set_page_dirty() t8) radix_tree_tag_set(PAGECACHE_TAG_DIRTY) // page is dirtied, but inode is yet clean. ---------------------- Pre-empted----------------- P2: freeze process t9) freeze_super() t10) sync_filesystem() // page cache now clean! no inode is dirty. // however we have a dirty page belonging to a clean inode. ----------------------Freeze process finishes, filesystem frozen!---- P1: process that tries mmap write gets control. t11) __set_page_dirty() // gets control back t12) __mark_inode_dirty()v // inode is now dirty and it has a dirty page. // though in reality there is no write which has occured. t13) if (inode->i_sb->s_frozen != SB_UNFROZEN) // __block_page_mkwrite() gets control back t14) unlock_page() t15) __block_page_mkwrite() returns -EAGAIN t16) block_page_mkwrite() returns VM_FAULT_RETRY --------------------------- // now we see the original deadlock reported. P3: sync a filesystem t17) down_read(s_umount) t18) sync_filesystem() t19) sb->s_op->sync_fs() // =ext4_sync_fs() t20) vfs_check_frozen() // now blocks for thaw. // so thaw cannot happen because sync process sleeps with s_umount! This deadlock can occur whenever the freeze happens after the vfs_check_frozen() but before the __mark_inode_dirty(). We see blocked sync processes every time we do the following: 1) executing iozone on multipath and 2) I modified the script that Toshiyuki sent, attaching it here. This script reproduces the bug faster when executed with iozone. (Note, that since this is a race, this script _may not_ always produce it on its own) I also found one more missing piece in the "Add support to freeze and unfreeze journal": 1) Call jdb2_journal_thaw() from ext4_unfreeze() to restart the transactions. I shall send a patch for the same as a reply to this email again. Thanks! Warm Regards, Surbhi. P3: sync > > Do these patches fix the problem you've been trying to fix with your > patches? I believe they should, but I would appreciate confirmation > that with these patches, you're no longer able to reproduce the > problem you've been concerned about. > > Thanks, regards, > > - Ted
Hi Surbhi, Just as a request --- could you start a new thread (this one is getting so long it's hard to follow)? And could you also include a reliable reproduction case? Many thanks!! -- Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Ted, On Wed 25-05-11 08:12:15, Ted Tso wrote: > Just as a request --- could you start a new thread (this one is getting > so long it's hard to follow)? > > And could you also include a reliable reproduction case? Just a quick note - this patch series was not really meant to fix the deadlocks. They are meant to make freezing reliable in combination with mmapped writes. As a side-effect, they make the deadlock Surbhi describes less probable but I'm aware it's still there. I plan to have another look at how the deadlock could be fixed (the first attempt was rejected by Dave Chinner) but currently I'm busy with other stuff... Honza
diff --git a/068 b/068 index 82c1a4e..b9ac58d 100755 --- a/068 +++ b/068 @@ -101,6 +101,11 @@ do tee -a $seq.full sleep 2 + # there should be nothing to sync at this point. This may hang in case + # of fsstress background writes dirtying the page cache while the F.S is frozen + sync & + sleep 2 + echo "*** thawing \$SCRATCH_MNT" | tee -a $seq.full xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full [ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \
While the fsstress background writes are busy dirtying the page cache, if a fsfreeze happens then the background writes should stall. A sync should then not have any data to sync to the FS. If it does have any data to sync then sync will cause a deadlock by holding the s_umount write semaphore and waiting in the wait queue for the FS to thaw, whereas the F.S can never thaw without getting the s_umount write semaphore. Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com> --- 068 | 5 +++++ 1 files changed, 5 insertions(+), 0 deletions(-)