diff mbox

Attempt to sync the fsstress writes to a frozen F.S

Message ID 1305097841-2308-1-git-send-email-surbhi.palande@canonical.com
State Superseded, archived
Headers show

Commit Message

Surbhi Palande May 11, 2011, 7:10 a.m. UTC
While the fsstress background writes are busy dirtying the page cache, if a
fsfreeze happens then the background writes should stall. A sync should then
not have any data to sync to the FS. If it does have any data to sync then
sync will cause a deadlock by holding the s_umount write semaphore and waiting
in the wait queue for the FS to thaw, whereas the F.S can never thaw without
getting the s_umount write semaphore.

Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
---
 068 |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

Comments

Eric Sandeen May 12, 2011, 2:22 p.m. UTC | #1
On 5/11/11 2:10 AM, Surbhi Palande wrote:
> While the fsstress background writes are busy dirtying the page cache, if a
> fsfreeze happens then the background writes should stall. A sync should then
> not have any data to sync to the FS. If it does have any data to sync then
> sync will cause a deadlock by holding the s_umount write semaphore and waiting
> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
> getting the s_umount write semaphore.
> 
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>

Seems ok to me.  In the future, when sending xfstests patches,
if you can add "xfstests" to the subject, and cc: the xfs list,
it'd be great.

I presume that this test does fail for you without your fixes?

I'll see if anyone on the xfs list has comments and if not, I can check this in.

Thanks,
-Eric

> ---
>  068 |    5 +++++
>  1 files changed, 5 insertions(+), 0 deletions(-)
> 
> diff --git a/068 b/068
> index 82c1a4e..b9ac58d 100755
> --- a/068
> +++ b/068
> @@ -101,6 +101,11 @@ do
>  	    tee -a $seq.full
>  	sleep 2
>  
> +	# there should be nothing to sync at this point. This may hang in case
> +	# of fsstress background writes dirtying the page cache while the F.S is frozen
> +	sync &
> +	sleep 2
> +
>  	echo "*** thawing  \$SCRATCH_MNT" | tee -a $seq.full
>  	xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full
>  	[ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o May 24, 2011, 9:42 p.m. UTC | #2
On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote:
> While the fsstress background writes are busy dirtying the page cache, if a
> fsfreeze happens then the background writes should stall. A sync should then
> not have any data to sync to the FS. If it does have any data to sync then
> sync will cause a deadlock by holding the s_umount write semaphore and waiting
> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
> getting the s_umount write semaphore.
> 
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>

Hi Surbhi,

Have you tried out Jan Kara's patches?

[1/3] fs: Create __block_page_mkwrite() helper passing error values back
[2/3] vfs: Block mmapped writes while the fs is frozen
[3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page

Do these patches fix the problem you've been trying to fix with your
patches?  I believe they should, but I would appreciate confirmation
that with these patches, you're no longer able to reproduce the
problem you've been concerned about.

Thanks, regards,

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Surbhi Palande May 25, 2011, noon UTC | #3
Hi Ted,


On 05/25/2011 12:42 AM, Ted Ts'o wrote:
> On Wed, May 11, 2011 at 10:10:41AM +0300, Surbhi Palande wrote:
>> While the fsstress background writes are busy dirtying the page cache, if a
>> fsfreeze happens then the background writes should stall. A sync should then
>> not have any data to sync to the FS. If it does have any data to sync then
>> sync will cause a deadlock by holding the s_umount write semaphore and waiting
>> in the wait queue for the FS to thaw, whereas the F.S can never thaw without
>> getting the s_umount write semaphore.
>>
>> Signed-off-by: Surbhi Palande<surbhi.palande@canonical.com>
>
> Hi Surbhi,
>
> Have you tried out Jan Kara's patches?
>
> [1/3] fs: Create __block_page_mkwrite() helper passing error values back
> [2/3] vfs: Block mmapped writes while the fs is frozen
> [3/3] ext4: Rewrite ext4_page_mkwrite() to return locked page

Yes! We have tried these patches and we still see the same 
deadlock/hang. The following is the reason for it:


// lets assume the inode is clean and so are its pages.
P1: process that tries mmap write
t1) __do_fault()
   t2) ext4_page_mkwrite()
     t3) block_page_mkwrite()
       t4) vfs_check_frozen()
// filesystem is not frozen so control falls through.
       t5) __block_page_mkwrite()
         t6) set_page_dirty()
           t7) __set_page_dirty()
	    t8) radix_tree_tag_set(PAGECACHE_TAG_DIRTY)
// page is dirtied, but inode is yet clean.
---------------------- Pre-empted-----------------
P2: freeze process

t9) freeze_super()
   t10) sync_filesystem()
  // page cache now clean! no inode is dirty.
// however we have a dirty page belonging to a clean inode.
----------------------Freeze process finishes, filesystem frozen!----


P1: process that tries mmap write gets control.
t11) __set_page_dirty() // gets control back
     t12) __mark_inode_dirty()v
    // inode is now dirty and it has a dirty page.
    // though in reality there is no write which has occured.
t13)   if (inode->i_sb->s_frozen != SB_UNFROZEN)
     // __block_page_mkwrite() gets control back
t14) unlock_page()
t15) __block_page_mkwrite() returns -EAGAIN
t16) block_page_mkwrite() returns VM_FAULT_RETRY

---------------------------
// now we see the original deadlock reported.
P3: sync a filesystem
t17) down_read(s_umount)
  t18) sync_filesystem()
   t19) sb->s_op->sync_fs() // =ext4_sync_fs()
    t20) vfs_check_frozen() // now blocks for thaw.
// so thaw cannot happen because sync process sleeps with s_umount!

This deadlock can occur whenever the freeze happens after the 
vfs_check_frozen() but before the __mark_inode_dirty().

We see blocked sync processes every time we do the following:

1) executing iozone on multipath and
2) I modified the script that Toshiyuki sent, attaching it here. This 
script reproduces the bug faster when executed with iozone.
(Note, that since this is a race, this script _may not_ always produce 
it on its own)


I also found one more missing piece in the "Add support to freeze and 
unfreeze journal":
1) Call jdb2_journal_thaw() from ext4_unfreeze() to restart the 
transactions.

I shall send a patch for the same as a reply to this email again.

Thanks!

Warm Regards,
Surbhi.












P3: sync







>
> Do these patches fix the problem you've been trying to fix with your
> patches?  I believe they should, but I would appreciate confirmation
> that with these patches, you're no longer able to reproduce the
> problem you've been concerned about.
>
> Thanks, regards,
>
> 						- Ted
Theodore Ts'o May 25, 2011, 12:12 p.m. UTC | #4
Hi Surbhi,

Just as a request --- could you start a new thread (this one is getting so long it's hard to follow)?

And could you also include a reliable reproduction case?

Many thanks!!

-- Ted


--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara May 27, 2011, 4:28 p.m. UTC | #5
Ted,

On Wed 25-05-11 08:12:15, Ted Tso wrote:
> Just as a request --- could you start a new thread (this one is getting
> so long it's hard to follow)?
> 
> And could you also include a reliable reproduction case?
  Just a quick note - this patch series was not really meant to fix the
deadlocks. They are meant to make freezing reliable in combination with
mmapped writes. As a side-effect, they make the deadlock Surbhi describes
less probable but I'm aware it's still there.

I plan to have another look at how the deadlock could be fixed (the first
attempt was rejected by Dave Chinner) but currently I'm busy with other
stuff...

								Honza
diff mbox

Patch

diff --git a/068 b/068
index 82c1a4e..b9ac58d 100755
--- a/068
+++ b/068
@@ -101,6 +101,11 @@  do
 	    tee -a $seq.full
 	sleep 2
 
+	# there should be nothing to sync at this point. This may hang in case
+	# of fsstress background writes dirtying the page cache while the F.S is frozen
+	sync &
+	sleep 2
+
 	echo "*** thawing  \$SCRATCH_MNT" | tee -a $seq.full
 	xfs_freeze -u "$SCRATCH_MNT" | tee -a $seq.full
 	[ $? != 0 ] && echo xfs_freeze -u "$SCRATCH_MNT" failed | \