diff mbox

[2/2] ext4: fix delalloc retry loop logic v2

Message ID 87sk9irve0.fsf@openvz.org
State Rejected, archived
Headers show

Commit Message

Dmitry Monakhov Feb. 3, 2010, 7:07 p.m. UTC
Dmitry Monakhov <dmonakhov@openvz.org> writes:

> Theodore please review this patch ASAP, currently ext4+quota is 
> fatally broken due to your patch. Christmas holidays when you
> submit your patch is not good time for good review, IMHO
> i was too lazy to review it carefully.
> Testcase is trivial it is enough just hit a quota barrier.
> dmon$ set-quota-limit /mnt id=dmon --bsoft=1000 --bsoft=1000
> dmon$ dd if=/dev/zefo of=/mnt/file 
>
> kernel BUG at fs/jbd2/transaction.c:1027!
OOps, i'm sorry. seems that i've send wrong patch version
the only difference is follows:
-	dqretry = (ret == -EDQUOT) || EXT4_I(inode)->i_reserved_meta_blocks;
+	dqretry = (ret == -EDQUOT) && EXT4_I(inode)->i_reserved_meta_blocks;
Correct version attached.
From 3dd53f88470fdc4ec3f06da34cfc760fa8359be8 Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@openvz.org>
Date: Wed, 3 Feb 2010 22:03:17 +0300
Subject: [PATCH 2/2] ext4: fix delalloc retry loop logic -v2

Current delalloc write path is broken:
ext4_da_write_begin()
  ext4_journal_start(inode, 1);  -> current->journal != NULL
  block_write_begin
    ext4_da_get_block_prep()
      ext4_da_reserve_space()
        ext4_should_retry_alloc() -> deadlock
	write_inode_now() -> BUG_ON due to lack of journal credits

Bug was partly introduced by following commit:
  0637c6f4135f592f094207c7c21e7c0fc5557834
  ext4: Patch up how we claim metadata blocks for quota purposes
In order to preserve retry logic and eliminate bugs we have to
move retry loop to ext4_da_write_begin()

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
---
 fs/ext4/inode.c |   41 ++++++++++++++++++-----------------------
 1 files changed, 18 insertions(+), 23 deletions(-)

Comments

Theodore Ts'o Feb. 3, 2010, 9:16 p.m. UTC | #1
On Wed, Feb 03, 2010 at 10:07:03PM +0300, Dmitry Monakhov wrote:
> Dmitry Monakhov <dmonakhov@openvz.org> writes:
> 
> > Theodore please review this patch ASAP, currently ext4+quota is 
> > fatally broken due to your patch. Christmas holidays when you
> > submit your patch is not good time for good review, IMHO
> > i was too lazy to review it carefully.
> > Testcase is trivial it is enough just hit a quota barrier.
> > dmon$ set-quota-limit /mnt id=dmon --bsoft=1000 --bsoft=1000
> > dmon$ dd if=/dev/zefo of=/mnt/file 

Sorry, I had to submit 0637c6f somewhat in a hurry because commit
d21cd8f (your patch) was causing a rather large number of failures
that users were complaining about.  In retrospect maybe I should have
just backed out d21cd8f entirely and tried to sort out this whole mess
before the next merge window.

OK, I'll look this over as soon as I can.

    	      	   	   	     - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Aneesh Kumar K.V Feb. 4, 2010, 11:29 a.m. UTC | #2
On Wed, 03 Feb 2010 22:07:03 +0300, Dmitry Monakhov <dmonakhov@openvz.org> wrote:
> Dmitry Monakhov <dmonakhov@openvz.org> writes:
> 
> > Theodore please review this patch ASAP, currently ext4+quota is 
> > fatally broken due to your patch. Christmas holidays when you
> > submit your patch is not good time for good review, IMHO
> > i was too lazy to review it carefully.
> > Testcase is trivial it is enough just hit a quota barrier.
> > dmon$ set-quota-limit /mnt id=dmon --bsoft=1000 --bsoft=1000
> > dmon$ dd if=/dev/zefo of=/mnt/file 
> >
> > kernel BUG at fs/jbd2/transaction.c:1027!
> OOps, i'm sorry. seems that i've send wrong patch version
> the only difference is follows:
> -	dqretry = (ret == -EDQUOT) || EXT4_I(inode)->i_reserved_meta_blocks;
> +	dqretry = (ret == -EDQUOT) && EXT4_I(inode)->i_reserved_meta_blocks;
> Correct version attached.
> From 3dd53f88470fdc4ec3f06da34cfc760fa8359be8 Mon Sep 17 00:00:00 2001
> From: Dmitry Monakhov <dmonakhov@openvz.org>
> Date: Wed, 3 Feb 2010 22:03:17 +0300
> Subject: [PATCH 2/2] ext4: fix delalloc retry loop logic -v2
> 
> Current delalloc write path is broken:
> ext4_da_write_begin()
>   ext4_journal_start(inode, 1);  -> current->journal != NULL
>   block_write_begin
>     ext4_da_get_block_prep()
>       ext4_da_reserve_space()
>         ext4_should_retry_alloc() -> deadlock
> 	write_inode_now() -> BUG_ON due to lack of journal credits
> 
> Bug was partly introduced by following commit:
>   0637c6f4135f592f094207c7c21e7c0fc5557834
>   ext4: Patch up how we claim metadata blocks for quota purposes
> In order to preserve retry logic and eliminate bugs we have to
> move retry loop to ext4_da_write_begin()
> 
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> ---
>  fs/ext4/inode.c |   41 ++++++++++++++++++-----------------------
>  1 files changed, 18 insertions(+), 23 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 2d3fe4d..bd9e573 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1815,7 +1815,6 @@ static int ext4_journalled_write_end(struct file *file,
>   */
>  static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
>  {
> -	int retries = 0;
>  	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
>  	struct ext4_inode_info *ei = EXT4_I(inode);
>  	unsigned long md_needed, md_reserved;
> @@ -1825,7 +1824,6 @@ static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
>  	 * in order to allocate nrblocks
>  	 * worse case is one extent per block
>  	 */
> -repeat:
>  	spin_lock(&ei->i_block_reservation_lock);
>  	md_reserved = ei->i_reserved_meta_blocks;
>  	md_needed = ext4_calc_metadata_amount(inode, lblock);
> @@ -1836,27 +1834,11 @@ repeat:
>  	 * later. Real quota accounting is done at pages writeout
>  	 * time.
>  	 */
> -	if (vfs_dq_reserve_block(inode, md_needed + 1)) {
> -		/* 
> -		 * We tend to badly over-estimate the amount of
> -		 * metadata blocks which are needed, so if we have
> -		 * reserved any metadata blocks, try to force out the
> -		 * inode and see if we have any better luck.
> -		 */
> -		if (md_reserved && retries++ <= 3)
> -			goto retry;
> +	if (vfs_dq_reserve_block(inode, md_needed + 1))
>  		return -EDQUOT;
> -	}
> 
>  	if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
>  		vfs_dq_release_reservation_block(inode, md_needed + 1);
> -		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
> -		retry:
> -			if (md_reserved)
> -				write_inode_now(inode, (retries == 3));
> -			yield();
> -			goto repeat;
> -		}
>  		return -ENOSPC;
>  	}
>  	spin_lock(&ei->i_block_reservation_lock);
> @@ -3033,7 +3015,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
>  			       loff_t pos, unsigned len, unsigned flags,
>  			       struct page **pagep, void **fsdata)
>  {
> -	int ret, retries = 0;
> +	int ret, dqretry, retries = 0;
>  	struct page *page;
>  	pgoff_t index;
>  	unsigned from, to;
> @@ -3090,9 +3072,22 @@ retry:
>  			ext4_truncate_failed_write(inode);
>  	}
> 
> -	if (!(flags & EXT4_AOP_FLAG_NORETRY) &&
> -		ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
> -		goto retry;
> +	dqretry = (ret == -EDQUOT) && EXT4_I(inode)->i_reserved_meta_blocks;
> +	if ( !(flags & EXT4_AOP_FLAG_NORETRY) &&
> +		(ret == -ENOSPC || dqretry) &&
> +		ext4_should_retry_alloc(inode->i_sb, &retries)) {
> +		if (dqretry) {
> +			/*
> +			 * We tend to badly over-estimate the amount of
> +			 * metadata blocks which are needed, so if we have
> +			 * reserved any metadata blocks, try to force out the
> +			 * inode and see if we have any better luck.
> +			 */
> +			write_inode_now(inode, (retries == 3));
> +		}
> +		yield();
> + 		goto retry;
> +	}
>  out:
>  	return ret;
>  }


Where is EXT4_AOP_FLAG_NORETRY defined ?. I have submitted a different
version of the patch and it is already upstream with commit
1db913823c0f8360fccbd24ca67eb073966a5ffd


-aneesh
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Feb. 4, 2010, 7:45 p.m. UTC | #3
Dmitry, what version were you testing when you ran into the problem
that you reported?  Annesh's patch hit mainline just before 2.6.33-rc6.

     	 	    	     	       - Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dmitry Monakhov Feb. 4, 2010, 9:50 p.m. UTC | #4
tytso@mit.edu writes:

> Dmitry, what version were you testing when you ran into the problem
> that you reported?  Annesh's patch hit mainline just before 2.6.33-rc6.
I've hit the bug on Jan's quota tree (33-rc5 before Annesh's patch)
Yesterday i have no inet connection to check mainstream tree.
Obviously i've expected what quota-tree should contain working quota code. 
My patch is almost equals to Annesh's. So current mainstream is OK.
Sorry for false warning.
BTW. I want to deploy automated testing suite in order to test some devel
trees on daily basis in order to avoid obvious regressions (f.e. when i
broke ext3+quota). Do you know a good one?
Currently i'm looking in to autotest.kernel.org

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Theodore Ts'o Feb. 5, 2010, 3:55 a.m. UTC | #5
On Fri, Feb 05, 2010 at 12:50:15AM +0300, Dmitry Monakhov wrote:
> BTW. I want to deploy automated testing suite in order to test some devel
> trees on daily basis in order to avoid obvious regressions (f.e. when i
> broke ext3+quota). Do you know a good one?

My general rule is that I won't push a patch set to Linus until I run
it against the XFSQA test suite.  There has been talk about adding
generic quota tests (as opposed to the XFS-specific quota tests, since
XFS has its own quota system different from the one used by other
Linux file systems) to XFSQA, and I think there are a few, but clearly
we need to add more.

So if you want to make the biggest impact in terms of trying to avoid
regressions, helping to contribute more tests to the XFSQA test suite
would be the most useful thing to do.  Right now Eric is the only ext4
developer is really familiar with the test suites, and he's added a
few tests, but he's super busy as of late.  I've dabbled with the test
suites a little, and made a few changes, but I haven't added a new
test before, and I'm also super busy as of late.  :-(

> Currently i'm looking in to autotest.kernel.org

Personally, I don't find frameworks for running automated tests to be
that useful.  They have their place, but the problem isn't really
running the tests; the challenge is getting someone to actually *look*
at the results.  Having a set of tests which is easy to set up, and
easy to run, is far more important.

If someone sets up autotest, but I don't have an occasion to look at
the results, it's not terribly useful.  If it's really easy for me to
run the XFSQA test suite, then I'll run it every couple of patches
that I add to the ext4 patch queue, and run the complete set before I
push a set of patches to Linus.  That's **far** more useful.

Automated tests are good, but they tend to be too noisy, and so no one
ever bothers to look at the output.  A useful automated system would
only run tests that had clear and unambiguous failures; be able to
tolerate it if some test starts to fail and still be useful, and then
be able to do git-style bisection searches so it can say, "test NNN
started failing at commit XXX", "test MMM started failing at commit
YYY", etc.  If it then mailed the results the relevant maintainer and
to the people who were the patch authors and the people who signed off
on the patch, then it would have a *chance* of being something that
people actually would pay attention to.  Unfortunately, I don't know
of any automated test framework which fits this bill.  :-(

So instead, I use the discpline of "make check" between almost every
single commit for e2fsprogs, and running "xfsqa -g quick" between most
patches (because the tests take a lot longer to run, I can't afford to
do it between every single patch), and "xfsqa -g auto" before I submit
a patchset to Linus (the most comprehensive set of tests, but it takes
hours so I have to run them overnight).

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 2d3fe4d..bd9e573 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1815,7 +1815,6 @@  static int ext4_journalled_write_end(struct file *file,
  */
 static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 {
-	int retries = 0;
 	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
 	struct ext4_inode_info *ei = EXT4_I(inode);
 	unsigned long md_needed, md_reserved;
@@ -1825,7 +1824,6 @@  static int ext4_da_reserve_space(struct inode *inode, sector_t lblock)
 	 * in order to allocate nrblocks
 	 * worse case is one extent per block
 	 */
-repeat:
 	spin_lock(&ei->i_block_reservation_lock);
 	md_reserved = ei->i_reserved_meta_blocks;
 	md_needed = ext4_calc_metadata_amount(inode, lblock);
@@ -1836,27 +1834,11 @@  repeat:
 	 * later. Real quota accounting is done at pages writeout
 	 * time.
 	 */
-	if (vfs_dq_reserve_block(inode, md_needed + 1)) {
-		/* 
-		 * We tend to badly over-estimate the amount of
-		 * metadata blocks which are needed, so if we have
-		 * reserved any metadata blocks, try to force out the
-		 * inode and see if we have any better luck.
-		 */
-		if (md_reserved && retries++ <= 3)
-			goto retry;
+	if (vfs_dq_reserve_block(inode, md_needed + 1))
 		return -EDQUOT;
-	}
 
 	if (ext4_claim_free_blocks(sbi, md_needed + 1)) {
 		vfs_dq_release_reservation_block(inode, md_needed + 1);
-		if (ext4_should_retry_alloc(inode->i_sb, &retries)) {
-		retry:
-			if (md_reserved)
-				write_inode_now(inode, (retries == 3));
-			yield();
-			goto repeat;
-		}
 		return -ENOSPC;
 	}
 	spin_lock(&ei->i_block_reservation_lock);
@@ -3033,7 +3015,7 @@  static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       loff_t pos, unsigned len, unsigned flags,
 			       struct page **pagep, void **fsdata)
 {
-	int ret, retries = 0;
+	int ret, dqretry, retries = 0;
 	struct page *page;
 	pgoff_t index;
 	unsigned from, to;
@@ -3090,9 +3072,22 @@  retry:
 			ext4_truncate_failed_write(inode);
 	}
 
-	if (!(flags & EXT4_AOP_FLAG_NORETRY) &&
-		ret == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))
-		goto retry;
+	dqretry = (ret == -EDQUOT) && EXT4_I(inode)->i_reserved_meta_blocks;
+	if ( !(flags & EXT4_AOP_FLAG_NORETRY) &&
+		(ret == -ENOSPC || dqretry) &&
+		ext4_should_retry_alloc(inode->i_sb, &retries)) {
+		if (dqretry) {
+			/*
+			 * We tend to badly over-estimate the amount of
+			 * metadata blocks which are needed, so if we have
+			 * reserved any metadata blocks, try to force out the
+			 * inode and see if we have any better luck.
+			 */
+			write_inode_now(inode, (retries == 3));
+		}
+		yield();
+ 		goto retry;
+	}
 out:
 	return ret;
 }