Patchwork [Lucid] SRU: Fix LVM snapshot regression

login
register
mail settings
Submitter Stefan Bader
Date Aug. 6, 2010, 9:36 a.m.
Message ID <1281087376-2411-1-git-send-email-stefan.bader@canonical.com>
Download mbox | patch
Permalink /patch/61093/
State Accepted
Headers show

Comments

Stefan Bader - Aug. 6, 2010, 9:36 a.m.
SRU Justification:

Impact: Changes to ext4 which are part of 2.6.32.17 and we took in advance to
fix other issues are causing a lockup regression when working with lvm snap-
shots.

Fix: This patch, which is slowly making its way upstream, was verified to fix
this problem and is also small and contained enough to be reasonably save.
Under normal circumstances we would wait for this patch to appear upstream
but as this is a more serious regression and we are planning to do a limited
change upload to Lucid to address some priority problems I think we should
consider this patch even if it does not make it upstream in time.
For Maverick the same problem exists but it is probably enough time to wait
and this is marked to be reviewed before beta as well.
Should the patch go upstream before doing the Lucid upload, references will
be updated.

Testcase:
Use lvm to create a LV, mount it and have it actively doing IO, then try
to create a snapshot will hang without further IO being possible.

-Stefan

From 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f Mon Sep 17 00:00:00 2001
From: Eric Sandeen <sandeen@sandeen.net>
Date: Sun, 1 Aug 2010 17:33:29 -0400
Subject: [PATCH] (pre-stable) ext4: fix freeze deadlock under IO

BugLink: http://bugs.launchpad.net/bugs/595489

Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
when freezing a filesystem which had active IO; the vfs_check_frozen
level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
through.  Duh.

Changing the test to FREEZE_TRANS should let the normal freeze
syncing get through the fs, but still block any transactions from
starting once the fs is completely frozen.

I tested this by running fsstress in the background while periodically
snapshotting the fs and running fsck on the result.  I ran into
occasional deadlocks, but different ones.  I think this is a
fine fix for the problem at hand, and the other deadlocky things
will need more investigation.

Reported-by: Phillip Susi <psusi@cfl.rr.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
(cherry-picked from commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f linux-next)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 fs/ext4/super.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)
Tim Gardner - Aug. 6, 2010, 2:04 p.m.
On 08/06/2010 03:36 AM, Stefan Bader wrote:
> SRU Justification:
>
> Impact: Changes to ext4 which are part of 2.6.32.17 and we took in advance to
> fix other issues are causing a lockup regression when working with lvm snap-
> shots.
>
> Fix: This patch, which is slowly making its way upstream, was verified to fix
> this problem and is also small and contained enough to be reasonably save.
> Under normal circumstances we would wait for this patch to appear upstream
> but as this is a more serious regression and we are planning to do a limited
> change upload to Lucid to address some priority problems I think we should
> consider this patch even if it does not make it upstream in time.
> For Maverick the same problem exists but it is probably enough time to wait
> and this is marked to be reviewed before beta as well.
> Should the patch go upstream before doing the Lucid upload, references will
> be updated.
>
> Testcase:
> Use lvm to create a LV, mount it and have it actively doing IO, then try
> to create a snapshot will hang without further IO being possible.
>
> -Stefan
>
>  From 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f Mon Sep 17 00:00:00 2001
> From: Eric Sandeen<sandeen@sandeen.net>
> Date: Sun, 1 Aug 2010 17:33:29 -0400
> Subject: [PATCH] (pre-stable) ext4: fix freeze deadlock under IO
>
> BugLink: http://bugs.launchpad.net/bugs/595489
>
> Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
> when freezing a filesystem which had active IO; the vfs_check_frozen
> level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
> through.  Duh.
>
> Changing the test to FREEZE_TRANS should let the normal freeze
> syncing get through the fs, but still block any transactions from
> starting once the fs is completely frozen.
>
> I tested this by running fsstress in the background while periodically
> snapshotting the fs and running fsck on the result.  I ran into
> occasional deadlocks, but different ones.  I think this is a
> fine fix for the problem at hand, and the other deadlocky things
> will need more investigation.
>
> Reported-by: Phillip Susi<psusi@cfl.rr.com>
> Signed-off-by: Eric Sandeen<sandeen@redhat.com>
> Signed-off-by: "Theodore Ts'o"<tytso@mit.edu>
> (cherry-picked from commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f linux-next)
> Signed-off-by: Stefan Bader<stefan.bader@canonical.com>
> ---
>   fs/ext4/super.c |    4 ++--
>   1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e046eba..282a270 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -241,7 +241,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
>   	if (sb->s_flags&  MS_RDONLY)
>   		return ERR_PTR(-EROFS);
>
> -	vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +	vfs_check_frozen(sb, SB_FREEZE_TRANS);
>   	/* Special case here: if the journal has aborted behind our
>   	 * backs (eg. EIO in the commit thread), then we still need to
>   	 * take the FS itself readonly cleanly. */
> @@ -3608,7 +3608,7 @@ int ext4_force_commit(struct super_block *sb)
>
>   	journal = EXT4_SB(sb)->s_journal;
>   	if (journal) {
> -		vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +		vfs_check_frozen(sb, SB_FREEZE_TRANS);
>   		ret = ext4_journal_force_commit(journal);
>   	}
>

Acked-by: Tim Gardner <tim.gardner@canonical;.com>
Steve Conklin - Aug. 6, 2010, 2:36 p.m.
On Fri, 2010-08-06 at 11:36 +0200, Stefan Bader wrote:
> SRU Justification:
> 
> Impact: Changes to ext4 which are part of 2.6.32.17 and we took in advance to
> fix other issues are causing a lockup regression when working with lvm snap-
> shots.
> 
> Fix: This patch, which is slowly making its way upstream, was verified to fix
> this problem and is also small and contained enough to be reasonably save.
> Under normal circumstances we would wait for this patch to appear upstream
> but as this is a more serious regression and we are planning to do a limited
> change upload to Lucid to address some priority problems I think we should
> consider this patch even if it does not make it upstream in time.
> For Maverick the same problem exists but it is probably enough time to wait
> and this is marked to be reviewed before beta as well.
> Should the patch go upstream before doing the Lucid upload, references will
> be updated.
> 
> Testcase:
> Use lvm to create a LV, mount it and have it actively doing IO, then try
> to create a snapshot will hang without further IO being possible.
> 
> -Stefan
> 
> From 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f Mon Sep 17 00:00:00 2001
> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Sun, 1 Aug 2010 17:33:29 -0400
> Subject: [PATCH] (pre-stable) ext4: fix freeze deadlock under IO
> 
> BugLink: http://bugs.launchpad.net/bugs/595489
> 
> Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
> when freezing a filesystem which had active IO; the vfs_check_frozen
> level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
> through.  Duh.
> 
> Changing the test to FREEZE_TRANS should let the normal freeze
> syncing get through the fs, but still block any transactions from
> starting once the fs is completely frozen.
> 
> I tested this by running fsstress in the background while periodically
> snapshotting the fs and running fsck on the result.  I ran into
> occasional deadlocks, but different ones.  I think this is a
> fine fix for the problem at hand, and the other deadlocky things
> will need more investigation.
> 
> Reported-by: Phillip Susi <psusi@cfl.rr.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> (cherry-picked from commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f linux-next)
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  fs/ext4/super.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e046eba..282a270 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -241,7 +241,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
>  	if (sb->s_flags & MS_RDONLY)
>  		return ERR_PTR(-EROFS);
>  
> -	vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +	vfs_check_frozen(sb, SB_FREEZE_TRANS);
>  	/* Special case here: if the journal has aborted behind our
>  	 * backs (eg. EIO in the commit thread), then we still need to
>  	 * take the FS itself readonly cleanly. */
> @@ -3608,7 +3608,7 @@ int ext4_force_commit(struct super_block *sb)
>  
>  	journal = EXT4_SB(sb)->s_journal;
>  	if (journal) {
> -		vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +		vfs_check_frozen(sb, SB_FREEZE_TRANS);
>  		ret = ext4_journal_force_commit(journal);
>  	}
>  
> -- 
> 1.7.0.4
> 
> 
Acked-by: Steve Conklin <sconklin@canonical.com>
Stefan Bader - Aug. 6, 2010, 2:42 p.m.
Steve, though we got the acks, I would hold back with applying the patch itself.
I think we won't be stitching together the next proposed upload before Monday.
So this gives a little more time to potentially let the patch go upstream.

Stefan
Andy Whitcroft - Aug. 6, 2010, 5:15 p.m.
On Fri, Aug 06, 2010 at 11:36:16AM +0200, Stefan Bader wrote:
> SRU Justification:
> 
> Impact: Changes to ext4 which are part of 2.6.32.17 and we took in advance to
> fix other issues are causing a lockup regression when working with lvm snap-
> shots.
> 
> Fix: This patch, which is slowly making its way upstream, was verified to fix
> this problem and is also small and contained enough to be reasonably save.
> Under normal circumstances we would wait for this patch to appear upstream
> but as this is a more serious regression and we are planning to do a limited
> change upload to Lucid to address some priority problems I think we should
> consider this patch even if it does not make it upstream in time.
> For Maverick the same problem exists but it is probably enough time to wait
> and this is marked to be reviewed before beta as well.
> Should the patch go upstream before doing the Lucid upload, references will
> be updated.
> 
> Testcase:
> Use lvm to create a LV, mount it and have it actively doing IO, then try
> to create a snapshot will hang without further IO being possible.
> 
> -Stefan
> 
> From 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f Mon Sep 17 00:00:00 2001
> From: Eric Sandeen <sandeen@sandeen.net>
> Date: Sun, 1 Aug 2010 17:33:29 -0400
> Subject: [PATCH] (pre-stable) ext4: fix freeze deadlock under IO
> 
> BugLink: http://bugs.launchpad.net/bugs/595489
> 
> Commit 6b0310fbf087ad6 caused a regression resulting in deadlocks
> when freezing a filesystem which had active IO; the vfs_check_frozen
> level (SB_FREEZE_WRITE) did not let the freeze-related IO syncing
> through.  Duh.
> 
> Changing the test to FREEZE_TRANS should let the normal freeze
> syncing get through the fs, but still block any transactions from
> starting once the fs is completely frozen.
> 
> I tested this by running fsstress in the background while periodically
> snapshotting the fs and running fsck on the result.  I ran into
> occasional deadlocks, but different ones.  I think this is a
> fine fix for the problem at hand, and the other deadlocky things
> will need more investigation.
> 
> Reported-by: Phillip Susi <psusi@cfl.rr.com>
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
> (cherry-picked from commit 437f88cc031ffe7f37f3e705367f4fe1f4be8b0f linux-next)
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  fs/ext4/super.c |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index e046eba..282a270 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -241,7 +241,7 @@ handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
>  	if (sb->s_flags & MS_RDONLY)
>  		return ERR_PTR(-EROFS);
>  
> -	vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +	vfs_check_frozen(sb, SB_FREEZE_TRANS);
>  	/* Special case here: if the journal has aborted behind our
>  	 * backs (eg. EIO in the commit thread), then we still need to
>  	 * take the FS itself readonly cleanly. */
> @@ -3608,7 +3608,7 @@ int ext4_force_commit(struct super_block *sb)
>  
>  	journal = EXT4_SB(sb)->s_journal;
>  	if (journal) {
> -		vfs_check_frozen(sb, SB_FREEZE_WRITE);
> +		vfs_check_frozen(sb, SB_FREEZE_TRANS);
>  		ret = ext4_journal_force_commit(journal);
>  	}
>  

Looks good.  The change is very clear.

Acked-by: Andy Whitcroft <apw@canonical.com>

-apw
Stefan Bader - Aug. 9, 2010, 9:09 a.m.
Applied to Lucid master. Patch went upstream as of today (same SHA1).

Patch

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index e046eba..282a270 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -241,7 +241,7 @@  handle_t *ext4_journal_start_sb(struct super_block *sb, int nblocks)
 	if (sb->s_flags & MS_RDONLY)
 		return ERR_PTR(-EROFS);
 
-	vfs_check_frozen(sb, SB_FREEZE_WRITE);
+	vfs_check_frozen(sb, SB_FREEZE_TRANS);
 	/* Special case here: if the journal has aborted behind our
 	 * backs (eg. EIO in the commit thread), then we still need to
 	 * take the FS itself readonly cleanly. */
@@ -3608,7 +3608,7 @@  int ext4_force_commit(struct super_block *sb)
 
 	journal = EXT4_SB(sb)->s_journal;
 	if (journal) {
-		vfs_check_frozen(sb, SB_FREEZE_WRITE);
+		vfs_check_frozen(sb, SB_FREEZE_TRANS);
 		ret = ext4_journal_force_commit(journal);
 	}