diff mbox

kernel BUG at fs/jbd2/transaction.c:1086!

Message ID 20110331222306.GI21524@quack.suse.cz
State Accepted, archived
Headers show

Commit Message

Jan Kara March 31, 2011, 10:23 p.m. UTC
On Thu 31-03-11 22:59:24, Lukas Czerner wrote:
> On Thu, 31 Mar 2011, Jan Kara wrote:
> 
> > On Wed 30-03-11 17:32:06, Lukas Czerner wrote:
> > > Hello,
> > > 
> > > I have hit BUG_ON while running xfstest 234 in the loop
> > > on the ext4. Backtrace below .
> > > 
> > > I thought that this problem was solved with
> > > 67eeb5685d2a211c0252ac7884142e503c759500 however it is still present.
> > > I might be a bit hard to hit, but once you run
> > > 
> > > while true; do sync; sleep 0.3; done
> > > 
> > > concurrently it is reproducible almost all the time. I have tried to
> > > understand what is going on but only thing I can see is this course
> > > of action:
> > > 
> > > ext4_write_dquot
> > >   ext4_journal_start <- h_buffer_credits = 2
> > >   dquot_commit
> > >     v2_write_dquot
> > >       qtree_write_dquot
> > >         ext4_quota_write
> > >           ext4_handle_dirty_metadata	<- this takes one block reserved
> > > 	  				   for journal
> > > 					   h_buffer_credits--
> > >     v2_write_file_info
> > >       ext4_quota_write
> > >         ext4_handle_dirty_metadata	<- this takes second block reserved
> > > 	  				   for journal
> > > 					   h_buffer_credits--
> > >   ext4_journal_stop
> > > 
> > > However apparently I am missing something, because according to the
> > > backtrace the second call of jbd2_journal_dirty_metadata() may end up
> > > with BUG_ON -> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
> > > 
> > > So someone else is touching our handle apparently, but I go not exacly
> > > know what is going on. From my simple debugging printk information I
> > > have put with before ext4_handle_dirty_metadata() in ext4_quota_write()
> > > we can see:
> > > 
> > > [  226.450263] ext4_quota_write: len 48, off 5136, type 0 h_ref 1
> > > credits 2
> > > [  226.458950] ext4_quota_write: len 24, off 8, type 0 h_ref 1 credits 0
> > > 
> > > It is clear someone changed handle between v2_write_dquot() and
> > > v2_write_file_info() does anyone have idea what is going on ?
> >   It's simpler than that I believe. ext4_write_dquot() does also
> >         inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> >         ext4_mark_inode_dirty(handle, inode);
> > and that eats another credit. Looking at the comment before
> > EXT4_QUOTA_TRANS_BLOCKS, we in fact counted with inode and data update but
> > didn't count with the info update. It's actually a race that we ended up
> > doing info update in our transaction - someone marked info dirty and before
> > he managed to write it, we went in, saw the dirty flag and wrote it for
> > him. So the attached patch should fix the problem for you.
> > 
> 
> The patch fixes the problem, you can add
> 
> Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com>
> 
> Ted, could you pick this up ASAP ? Also I think this needs to go into
> stable as well.
  I'll merge it from my tree next week, OK? I don't think there's so much
hurry as the bug has been there for years and you're the first one to
report it :).

BTW, I've also created the patch below which can slightly improve
performance of quotas on ext4. The modification of the inode was
unnecessary as well... Ted, can you merge this patch? It's independent from
the previous one.

								Honza
--
From 51741760fe3682ea37bd264471f897bda9d5bd6d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 31 Mar 2011 23:12:14 +0200
Subject: [PATCH] ext4: Remove unnecessary [cm]time update of quota file

It is not necessary to update [cm]time of quota file on each quota file
write and it wastes journal space and IO throughput with inode writes. So
just remove the updating from ext4_quota_write() and only update times when
quotas are being turned off. Userspace cannot get anything reliable from
quota files while they are used by the kernel anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4_jbd2.h |    4 ++--
 fs/ext4/super.c     |   16 ++++++++++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

Comments

Amir Goldstein April 1, 2011, 9:54 a.m. UTC | #1
On Thu, Mar 31, 2011 at 3:23 PM, Jan Kara <jack@suse.cz> wrote:
> On Thu 31-03-11 22:59:24, Lukas Czerner wrote:
>> On Thu, 31 Mar 2011, Jan Kara wrote:
>>
>> > On Wed 30-03-11 17:32:06, Lukas Czerner wrote:
>> > > Hello,
>> > >
>> > > I have hit BUG_ON while running xfstest 234 in the loop
>> > > on the ext4. Backtrace below .
>> > >
>> > > I thought that this problem was solved with
>> > > 67eeb5685d2a211c0252ac7884142e503c759500 however it is still present.
>> > > I might be a bit hard to hit, but once you run
>> > >
>> > > while true; do sync; sleep 0.3; done
>> > >
>> > > concurrently it is reproducible almost all the time. I have tried to
>> > > understand what is going on but only thing I can see is this course
>> > > of action:
>> > >
>> > > ext4_write_dquot
>> > >   ext4_journal_start <- h_buffer_credits = 2
>> > >   dquot_commit
>> > >     v2_write_dquot
>> > >       qtree_write_dquot
>> > >         ext4_quota_write
>> > >           ext4_handle_dirty_metadata      <- this takes one block reserved
>> > >                                      for journal
>> > >                                      h_buffer_credits--
>> > >     v2_write_file_info
>> > >       ext4_quota_write
>> > >         ext4_handle_dirty_metadata        <- this takes second block reserved
>> > >                                      for journal
>> > >                                      h_buffer_credits--
>> > >   ext4_journal_stop
>> > >
>> > > However apparently I am missing something, because according to the
>> > > backtrace the second call of jbd2_journal_dirty_metadata() may end up
>> > > with BUG_ON -> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
>> > >
>> > > So someone else is touching our handle apparently, but I go not exacly
>> > > know what is going on. From my simple debugging printk information I
>> > > have put with before ext4_handle_dirty_metadata() in ext4_quota_write()
>> > > we can see:
>> > >
>> > > [  226.450263] ext4_quota_write: len 48, off 5136, type 0 h_ref 1
>> > > credits 2
>> > > [  226.458950] ext4_quota_write: len 24, off 8, type 0 h_ref 1 credits 0
>> > >
>> > > It is clear someone changed handle between v2_write_dquot() and
>> > > v2_write_file_info() does anyone have idea what is going on ?
>> >   It's simpler than that I believe. ext4_write_dquot() does also
>> >         inode->i_mtime = inode->i_ctime = CURRENT_TIME;
>> >         ext4_mark_inode_dirty(handle, inode);
>> > and that eats another credit. Looking at the comment before
>> > EXT4_QUOTA_TRANS_BLOCKS, we in fact counted with inode and data update but
>> > didn't count with the info update. It's actually a race that we ended up
>> > doing info update in our transaction - someone marked info dirty and before
>> > he managed to write it, we went in, saw the dirty flag and wrote it for
>> > him. So the attached patch should fix the problem for you.
>> >
>>
>> The patch fixes the problem, you can add
>>
>> Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com>
>>
>> Ted, could you pick this up ASAP ? Also I think this needs to go into
>> stable as well.
>  I'll merge it from my tree next week, OK? I don't think there's so much
> hurry as the bug has been there for years and you're the first one to
> report it :).
>
> BTW, I've also created the patch below which can slightly improve
> performance of quotas on ext4. The modification of the inode was
> unnecessary as well... Ted, can you merge this patch? It's independent from
> the previous one.
>
>                                                                Honza
> --
> From 51741760fe3682ea37bd264471f897bda9d5bd6d Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 31 Mar 2011 23:12:14 +0200
> Subject: [PATCH] ext4: Remove unnecessary [cm]time update of quota file
>
> It is not necessary to update [cm]time of quota file on each quota file
> write and it wastes journal space and IO throughput with inode writes. So
> just remove the updating from ext4_quota_write() and only update times when
> quotas are being turned off. Userspace cannot get anything reliable from
> quota files while they are used by the kernel anyway.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ext4_jbd2.h |    4 ++--
>  fs/ext4/super.c     |   16 ++++++++++++++--
>  2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index d8b992e..3163583 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -86,8 +86,8 @@
>
>  #ifdef CONFIG_QUOTA
>  /* Amount of blocks needed for quota update - we know that the structure was
> - * allocated so we need to update only inode+data */
> -#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 2 : 0)
> + * allocated so we need to update only data block */
> +#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 1 : 0)

Cool :-)
Yongqiang,
Now we only need to account for 2 (user+group) snapshot quota credits for COW.

>  /* Amount of blocks needed for quota insert/delete - we do some block writes
>  * but inode, sb and group updates are done only once */
>  #define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 1f0acd4..baee1b4 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4608,11 +4608,24 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
>
>  static int ext4_quota_off(struct super_block *sb, int type)
>  {
> +       struct inode *inode = sb_dqopt(sb)->files[type];
> +       handle_t *handle;
> +
>        /* Force all delayed allocation blocks to be allocated.
>         * Caller already holds s_umount sem */
>        if (test_opt(sb, DELALLOC))
>                sync_filesystem(sb);
>
> +       /* Update modification times of quota files when userspace can
> +        * start looking at them */
> +       handle = ext4_journal_start(inode, 1);
> +       if (IS_ERR(handle))
> +               goto out;
> +       inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> +       ext4_mark_inode_dirty(handle, inode);
> +       ext4_journal_stop(handle);
> +
> +out:
>        return dquot_quota_off(sb, type);
>  }
>
> @@ -4708,9 +4721,8 @@ out:
>        if (inode->i_size < off + len) {
>                i_size_write(inode, off + len);
>                EXT4_I(inode)->i_disksize = inode->i_size;
> +               ext4_mark_inode_dirty(handle, inode);
>        }
> -       inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> -       ext4_mark_inode_dirty(handle, inode);
>        mutex_unlock(&inode->i_mutex);
>        return len;
>  }
> --
> 1.7.1
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Lukas Czerner April 1, 2011, 8:05 p.m. UTC | #2
On Fri, 1 Apr 2011, Jan Kara wrote:

> On Thu 31-03-11 22:59:24, Lukas Czerner wrote:
> > On Thu, 31 Mar 2011, Jan Kara wrote:
> > 
> > > On Wed 30-03-11 17:32:06, Lukas Czerner wrote:
> > > > Hello,
> > > > 
> > > > I have hit BUG_ON while running xfstest 234 in the loop
> > > > on the ext4. Backtrace below .
> > > > 
> > > > I thought that this problem was solved with
> > > > 67eeb5685d2a211c0252ac7884142e503c759500 however it is still present.
> > > > I might be a bit hard to hit, but once you run
> > > > 
> > > > while true; do sync; sleep 0.3; done
> > > > 
> > > > concurrently it is reproducible almost all the time. I have tried to
> > > > understand what is going on but only thing I can see is this course
> > > > of action:
> > > > 
> > > > ext4_write_dquot
> > > >   ext4_journal_start <- h_buffer_credits = 2
> > > >   dquot_commit
> > > >     v2_write_dquot
> > > >       qtree_write_dquot
> > > >         ext4_quota_write
> > > >           ext4_handle_dirty_metadata	<- this takes one block reserved
> > > > 	  				   for journal
> > > > 					   h_buffer_credits--
> > > >     v2_write_file_info
> > > >       ext4_quota_write
> > > >         ext4_handle_dirty_metadata	<- this takes second block reserved
> > > > 	  				   for journal
> > > > 					   h_buffer_credits--
> > > >   ext4_journal_stop
> > > > 
> > > > However apparently I am missing something, because according to the
> > > > backtrace the second call of jbd2_journal_dirty_metadata() may end up
> > > > with BUG_ON -> J_ASSERT_JH(jh, handle->h_buffer_credits > 0);
> > > > 
> > > > So someone else is touching our handle apparently, but I go not exacly
> > > > know what is going on. From my simple debugging printk information I
> > > > have put with before ext4_handle_dirty_metadata() in ext4_quota_write()
> > > > we can see:
> > > > 
> > > > [  226.450263] ext4_quota_write: len 48, off 5136, type 0 h_ref 1
> > > > credits 2
> > > > [  226.458950] ext4_quota_write: len 24, off 8, type 0 h_ref 1 credits 0
> > > > 
> > > > It is clear someone changed handle between v2_write_dquot() and
> > > > v2_write_file_info() does anyone have idea what is going on ?
> > >   It's simpler than that I believe. ext4_write_dquot() does also
> > >         inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> > >         ext4_mark_inode_dirty(handle, inode);
> > > and that eats another credit. Looking at the comment before
> > > EXT4_QUOTA_TRANS_BLOCKS, we in fact counted with inode and data update but
> > > didn't count with the info update. It's actually a race that we ended up
> > > doing info update in our transaction - someone marked info dirty and before
> > > he managed to write it, we went in, saw the dirty flag and wrote it for
> > > him. So the attached patch should fix the problem for you.
> > > 
> > 
> > The patch fixes the problem, you can add
> > 
> > Reported-and-tested-by: Lukas Czerner <lczerner@redhat.com>
> > 
> > Ted, could you pick this up ASAP ? Also I think this needs to go into
> > stable as well.
>   I'll merge it from my tree next week, OK? I don't think there's so much
> hurry as the bug has been there for years and you're the first one to
> report it :).

Of course :) sometimes I am a bit hasty.

> 
> BTW, I've also created the patch below which can slightly improve
> performance of quotas on ext4. The modification of the inode was
> unnecessary as well... Ted, can you merge this patch? It's independent from
> the previous one.
> 
> 								Honza
> --
> From 51741760fe3682ea37bd264471f897bda9d5bd6d Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Thu, 31 Mar 2011 23:12:14 +0200
> Subject: [PATCH] ext4: Remove unnecessary [cm]time update of quota file
> 
> It is not necessary to update [cm]time of quota file on each quota file
> write and it wastes journal space and IO throughput with inode writes. So
> just remove the updating from ext4_quota_write() and only update times when
> quotas are being turned off. Userspace cannot get anything reliable from
> quota files while they are used by the kernel anyway.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/ext4/ext4_jbd2.h |    4 ++--
>  fs/ext4/super.c     |   16 ++++++++++++++--
>  2 files changed, 16 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
> index d8b992e..3163583 100644
> --- a/fs/ext4/ext4_jbd2.h
> +++ b/fs/ext4/ext4_jbd2.h
> @@ -86,8 +86,8 @@
>  
>  #ifdef CONFIG_QUOTA
>  /* Amount of blocks needed for quota update - we know that the structure was
> - * allocated so we need to update only inode+data */
> -#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 2 : 0)
> + * allocated so we need to update only data block */
> +#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 1 : 0)
>  /* Amount of blocks needed for quota insert/delete - we do some block writes
>   * but inode, sb and group updates are done only once */
>  #define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 1f0acd4..baee1b4 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4608,11 +4608,24 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id,
>  
>  static int ext4_quota_off(struct super_block *sb, int type)
>  {
> +	struct inode *inode = sb_dqopt(sb)->files[type];
> +	handle_t *handle;
> +
>  	/* Force all delayed allocation blocks to be allocated.
>  	 * Caller already holds s_umount sem */
>  	if (test_opt(sb, DELALLOC))
>  		sync_filesystem(sb);
>  
> +	/* Update modification times of quota files when userspace can
> +	 * start looking at them */
> +	handle = ext4_journal_start(inode, 1);
> +	if (IS_ERR(handle))
> +		goto out;
> +	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> +	ext4_mark_inode_dirty(handle, inode);
> +	ext4_journal_stop(handle);
> +
> +out:
>  	return dquot_quota_off(sb, type);
>  }
>  
> @@ -4708,9 +4721,8 @@ out:
>  	if (inode->i_size < off + len) {
>  		i_size_write(inode, off + len);
>  		EXT4_I(inode)->i_disksize = inode->i_size;
> +		ext4_mark_inode_dirty(handle, inode);
>  	}
> -	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
> -	ext4_mark_inode_dirty(handle, inode);
>  	mutex_unlock(&inode->i_mutex);
>  	return len;
>  }
>
Theodore Ts'o April 4, 2011, 5:25 p.m. UTC | #3
On Fri, Apr 01, 2011 at 12:23:06AM +0200, Jan Kara wrote:
> > Ted, could you pick this up ASAP ? Also I think this needs to go into
> > stable as well.
>   I'll merge it from my tree next week, OK? I don't think there's so much
> hurry as the bug has been there for years and you're the first one to
> report it :).

Agreed, your first patch was quota specific and I'd much rather have
it go through the quota tree.

> BTW, I've also created the patch below which can slightly improve
> performance of quotas on ext4. The modification of the inode was
> unnecessary as well... Ted, can you merge this patch? It's independent from
> the previous one.

Yup, I've added to the ext4 patch queue.

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 51741760fe3682ea37bd264471f897bda9d5bd6d Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 31 Mar 2011 23:12:14 +0200
Subject: [PATCH] ext4: Remove unnecessary [cm]time update of quota file

It is not necessary to update [cm]time of quota file on each quota file
write and it wastes journal space and IO throughput with inode writes. So
just remove the updating from ext4_quota_write() and only update times when
quotas are being turned off. Userspace cannot get anything reliable from
quota files while they are used by the kernel anyway.

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/ext4_jbd2.h |    4 ++--
 fs/ext4/super.c     |   16 ++++++++++++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4_jbd2.h b/fs/ext4/ext4_jbd2.h
index d8b992e..3163583 100644
--- a/fs/ext4/ext4_jbd2.h
+++ b/fs/ext4/ext4_jbd2.h
@@ -86,8 +86,8 @@ 
 
 #ifdef CONFIG_QUOTA
 /* Amount of blocks needed for quota update - we know that the structure was
- * allocated so we need to update only inode+data */
-#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 2 : 0)
+ * allocated so we need to update only data block */
+#define EXT4_QUOTA_TRANS_BLOCKS(sb) (test_opt(sb, QUOTA) ? 1 : 0)
 /* Amount of blocks needed for quota insert/delete - we do some block writes
  * but inode, sb and group updates are done only once */
 #define EXT4_QUOTA_INIT_BLOCKS(sb) (test_opt(sb, QUOTA) ? (DQUOT_INIT_ALLOC*\
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 1f0acd4..baee1b4 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4608,11 +4608,24 @@  static int ext4_quota_on(struct super_block *sb, int type, int format_id,
 
 static int ext4_quota_off(struct super_block *sb, int type)
 {
+	struct inode *inode = sb_dqopt(sb)->files[type];
+	handle_t *handle;
+
 	/* Force all delayed allocation blocks to be allocated.
 	 * Caller already holds s_umount sem */
 	if (test_opt(sb, DELALLOC))
 		sync_filesystem(sb);
 
+	/* Update modification times of quota files when userspace can
+	 * start looking at them */
+	handle = ext4_journal_start(inode, 1);
+	if (IS_ERR(handle))
+		goto out;
+	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
+	ext4_mark_inode_dirty(handle, inode);
+	ext4_journal_stop(handle);
+
+out:
 	return dquot_quota_off(sb, type);
 }
 
@@ -4708,9 +4721,8 @@  out:
 	if (inode->i_size < off + len) {
 		i_size_write(inode, off + len);
 		EXT4_I(inode)->i_disksize = inode->i_size;
+		ext4_mark_inode_dirty(handle, inode);
 	}
-	inode->i_mtime = inode->i_ctime = CURRENT_TIME;
-	ext4_mark_inode_dirty(handle, inode);
 	mutex_unlock(&inode->i_mutex);
 	return len;
 }
-- 
1.7.1