Patchwork Attempt to sync the fsstress writes to a frozen F.S

login
register
mail settings
Submitter Surbhi Palande
Date May 11, 2011, 7:10 a.m.
Message ID <1305097841-2308-1-git-send-email-surbhi.palande@canonical.com>
Download mbox | patch
Permalink /patch/95177/
State Superseded
Headers show

Comments

Surbhi Palande - May 11, 2011, 7:10 a.m.
While the fsstress background writes are busy dirtying the page cache, if a
fsfreeze happens then the background writes should stall. A sync should then
not have any data to sync to the FS. If it does have any data to sync then
sync will cause a deadlock by holding the s_umount write semaphore and waiting
in the wait queue for the FS to thaw, whereas the F.S can never thaw without
getting the s_umount write semaphore.

Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
---
 068 |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)
Eric Sandeen - May 12, 2011, 2:22 p.m.
On 5/11/11 2:10 AM, Surbhi Palande wrote:
> While the fsstress background writes are busy dirtying the page cache, if a
> fsfreeze happens then the background writes should stall. A sync should then
> not have any data to sync to the FS. If it does have any data to sync then
> sync will cause a deadlock by holding the s_umount write semaphore and waiting
> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
> getting the s_umount write semaphore.
> 
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>

Seems ok to me.  In the future, when sending xfstests patches,
if you can add "xfstests" to the subject, and cc: the xfs list,
it'd be great.

I presume that this test does fail for you without your fixes?

I'll see if anyone on the xfs list has comments and if not, I can check this in.

Thanks,
-Eric

> ---
>  068 |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/068 b/068
> index 82c1a4e..b9ac58d 100755
> --- a/068
> +++ b/068
> @@ -101,6 +101,11 @@ do
>  	    tee -a $seq.full
>  	sleep 2
>  
> +	# there should be nothing to sync at this point. This may hang in case
> +	# of fsstress background writes dirtying the page cache while the F.S is frozen
> +	sync &
> +	sleep 2
> +
>  	echo "*** thawing  \$SCRATCH_MNT" | tee -a $seq.full
>  	xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full
>  	[ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o - May 24, 2011, 9:42 p.m.
On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote:
> While the fsstress background writes are busy dirtying the page cache, if a
> fsfreeze happens then the background writes should stall. A sync should then
> not have any data to sync to the FS. If it does have any data to sync then
> sync will cause a deadlock by holding the s_umount write semaphore and waiting
> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
> getting the s_umount write semaphore.
> 
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>

Hi Surbhi,

Have you tried out Jan Kara's patches?

[1/3] fs: Create __block_page_mkwrite() helper passing error values back
[2/3] vfs: Block mmapped writes while the fs is frozen
[3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page

Do these patches fix the problem you've been trying to fix with your
patches?  I believe they should, but I would appreciate confirmation
that with these patches, you're no longer able to reproduce the
problem you've been concerned about.

Thanks, regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Surbhi Palande - May 25, 2011, noon
Hi Ted,


On 05/25/2011 12:42 AM, Ted Ts'o wrote:
> On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote:
>> While the fsstress background writes are busy dirtying the page cache, if a
>> fsfreeze happens then the background writes should stall. A sync should then
>> not have any data to sync to the FS. If it does have any data to sync then
>> sync will cause a deadlock by holding the s_umount write semaphore and waiting
>> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
>> getting the s_umount write semaphore.
>>
>> Signed-off-by: Surbhi Palande<surbhi.palande@canonical.com>
>
> Hi Surbhi,
>
> Have you tried out Jan Kara's patches?
>
> [1/3] fs: Create __block_page_mkwrite() helper passing error values back
> [2/3] vfs: Block mmapped writes while the fs is frozen
> [3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page

Yes! We have tried these patches and we still see the same 
deadlock/hang. The following is the reason for it:


// lets assume the inode is clean and so are its pages.
P1: process that tries mmap write
t1) __do_fault()
   t2) ext4_page_mkwrite()
     t3) block_page_mkwrite()
       t4) vfs_check_frozen()
// filesystem is not frozen so control falls through.
       t5) __block_page_mkwrite()
         t6) set_page_dirty()
           t7) __set_page_dirty()
	    t8) radix_tree_tag_set(PAGECACHE_TAG_DIRTY)
// page is dirtied, but inode is yet clean.
---------------------- Pre-empted-----------------
P2: freeze process

t9) freeze_super()
   t10) sync_filesystem()
  // page cache now clean! no inode is dirty.
// however we have a dirty page belonging to a clean inode.
----------------------Freeze process finishes, filesystem frozen!----


P1: process that tries mmap write gets control.
t11) __set_page_dirty() // gets control back
     t12) __mark_inode_dirty()v
    // inode is now dirty and it has a dirty page.
    // though in reality there is no write which has occured.
t13)   if (inode->i_sb->s_frozen != SB_UNFROZEN)
     // __block_page_mkwrite() gets control back
t14) unlock_page()
t15) __block_page_mkwrite() returns -EAGAIN
t16) block_page_mkwrite() returns VM_FAULT_RETRY

---------------------------
// now we see the original deadlock reported.
P3: sync a filesystem
t17) down_read(s_umount)
  t18) sync_filesystem()
   t19) sb->s_op->sync_fs() // =ext4_sync_fs()
    t20) vfs_check_frozen() // now blocks for thaw.
// so thaw cannot happen because sync process sleeps with s_umount!

This deadlock can occur whenever the freeze happens after the 
vfs_check_frozen() but before the __mark_inode_dirty().

We see blocked sync processes every time we do the following:

1) executing iozone on multipath and
2) I modified the script that Toshiyuki sent, attaching it here. This 
script reproduces the bug faster when executed with iozone.
(Note, that since this is a race, this script _may not_ always produce 
it on its own)


I also found one more missing piece in the "Add support to freeze and 
unfreeze journal":
1) Call jdb2_journal_thaw() from ext4_unfreeze() to restart the 
transactions.

I shall send a patch for the same as a reply to this email again.

Thanks!

Warm Regards,
Surbhi.












P3: sync







>
> Do these patches fix the problem you've been trying to fix with your
> patches?  I believe they should, but I would appreciate confirmation
> that with these patches, you're no longer able to reproduce the
> problem you've been concerned about.
>
> Thanks, regards,
>
> 						- Ted
Theodore Ts'o - May 25, 2011, 12:12 p.m.
Hi Surbhi,

Just as a request --- could you start a new thread (this one is getting so long it's hard to follow)?

And could you also include a reliable reproduction case?

Many thanks!!

-- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara - May 27, 2011, 4:28 p.m.
Ted,

On Wed 25-05-11 08:12:15, Ted Tso wrote:
> Just as a request --- could you start a new thread (this one is getting
> so long it's hard to follow)?
> 
> And could you also include a reliable reproduction case?
  Just a quick note - this patch series was not really meant to fix the
deadlocks. They are meant to make freezing reliable in combination with
mmapped writes. As a side-effect, they make the deadlock Surbhi describes
less probable but I'm aware it's still there.

I plan to have another look at how the deadlock could be fixed (the first
attempt was rejected by Dave Chinner) but currently I'm busy with other
stuff...

								Honza

Patch

diff --git a/068 b/068
index 82c1a4e..b9ac58d 100755
--- a/068
+++ b/068
@@ -101,6 +101,11 @@  do
 	    tee -a $seq.full
 	sleep 2
 
+	# there should be nothing to sync at this point. This may hang in case
+	# of fsstress background writes dirtying the page cache while the F.S is frozen
+	sync &
+	sleep 2
+
 	echo "*** thawing  \$SCRATCH_MNT" | tee -a $seq.full
 	xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full
 	[ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \