Patchwork Bug#615998: linux-image-2.6.32-5-xen-amd64: Repeatable "kernel BUG at fs/jbd2/commit.c:534" from Postfix on ext4

login
register
mail settings
Submitter Jan Kara
Date Sept. 1, 2011, 3:17 p.m.
Message ID <20110901151744.GA2070@quack.suse.cz>
Download mbox | patch
Permalink /patch/112924/
State Not Applicable
Headers show

Comments

Jan Kara - Sept. 1, 2011, 3:17 p.m.
On Tue 30-08-11 19:26:22, Moffett, Kyle D wrote:
> On Aug 30, 2011, at 18:12, Jan Kara wrote:
> >> I can still trigger it on my VM snapshot very easily, so if you have anything
> >> you think I should test I would be very happy to give it a shot.
> > 
> >  OK, so in the meantime I found a bug in data=journal code which could be
> > related to your problem. It is fixed by commit
> > 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
> > tried that or newer kernel as well?
> > 
> > If the problem still is not fixed, I can provide some debugging patch to
> > you. We spoke with Josef Bacik how errors like yours could happen so I have
> > some places to watch...
> 
> I have not tried anything more recent; I'm actually a bit reluctant to move
> away from the Debian squeeze official kernels since I do need the security
> updates.
> 
> I took a quick look and I can't find that function in 2.6.32, so I assume it
> would be a rather nontrivial back-port.  It looks like the relevant code
> used to be in ext4_clear_inode somewhere?
  It's not that hard - untested patch attached.

> Out of curiosity, what would happen in data=journal mode if you unlinked a
> file which still had buffers pending?  That case does not seem to be handled
> by that commit you mentioned, was it already handled elsewhere?
  Once the file is deleted, it's OK to discard its data after a
transaction doing delete commits. The current code in JBD2 handles this
case fine - the problem was that for not-deleted files we cannot discard
dirty data after a transaction commits ;)

								Honza
Kyle Moffett - Dec. 6, 2011, 9:26 p.m.
Hello again!

I know it's been ages, but I finally got some time to get that patch
tested out and try additional debugging.

On Sep 01, 2011, at 11:17, Jan Kara wrote:
> On Tue 30-08-11 19:26:22, Moffett, Kyle D wrote:
>> On Aug 30, 2011, at 18:12, Jan Kara wrote:
>>>> I can still trigger it on my VM snapshot very easily, so if you have anything
>>>> you think I should test I would be very happy to give it a shot.
>>> 
>>> OK, so in the meantime I found a bug in data=journal code which could be
>>> related to your problem. It is fixed by commit
>>> 2d859db3e4a82a365572592d57624a5f996ed0ec which is in 3.1-rc1. Have you
>>> tried that or newer kernel as well?
>>> 
>>> If the problem still is not fixed, I can provide some debugging patch to
>>> you. We spoke with Josef Bacik how errors like yours could happen so I have
>>> some places to watch...
>> 
>> I have not tried anything more recent; I'm actually a bit reluctant to move
>> away from the Debian squeeze official kernels since I do need the security
>> updates.
>> 
>> I took a quick look and I can't find that function in 2.6.32, so I assume it
>> would be a rather nontrivial back-port.  It looks like the relevant code
>> used to be in ext4_clear_inode somewhere?
> It's not that hard - untested patch attached.

So this applied mostly cleanly (with one minor context-only conflict in
the 2.6.32.17 patch), unfortunately it didn't resolve the problem.
Just as a sanity check, I upgraded to the Debian 3.1.0-1-amd64 kernel,
based on kernel version 3.1.1 and the problem still occurs there too
(additional info at the end of the email).

Looking at the issue again, I don't think it has anything to do with
file deletion at all.

Specifically, there are a grand total of 4 files in that filesystem
(alongside an empty "lost+found" directory):
  master.lock
  prng_exch
  smtpd_scache.db
  smtp_scache.db

As far as I can tell, none of those is ever deleted during normal
operation.

The crash occurs very quickly after starting postfix.  It connects to
the external email server (using TLS) and begins to flush queued mail.

At that point, the "tlsmgr" daemon tries to update the "smtp_scache.db"
file, which is a Berkeley DB about 40k in size.  Somewhere in there,
the Berkeley DB does an fdatasync().

The "fdatasync()" apparently triggers the bad behavior from the "jbd2"
thread, which then oopses in fs/jbd2/commit.c:485 (which appears to be
the same same BUG_ON() as before).

The stack looks something like this:
  jbd_journal_commit_transaction+0x4ea/0x1053 [jbd2]
  kjournald2+0xc0/0x20a [jbd2]
  add_wait_queue+0x3c/0x3c
  commit_timeout+0x5/0x5 [jbd2]
  kthread+0x76/0x7e

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32.orig//fs/ext4/inode.c linux-2.6.32-ext4_fix//fs/ext4/inode.c
--- linux-2.6.32.orig//fs/ext4/inode.c	2009-12-03 04:51:21.000000000 +0100
+++ linux-2.6.32-ext4_fix//fs/ext4/inode.c	2011-09-01 17:15:39.742361528 +0200
@@ -1794,6 +1794,7 @@  static int ext4_journalled_write_end(str
 	if (new_i_size > inode->i_size)
 		i_size_write(inode, pos+copied);
 	EXT4_I(inode)->i_state |= EXT4_STATE_JDATA;
+	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
 	if (new_i_size > EXT4_I(inode)->i_disksize) {
 		ext4_update_i_disksize(inode, new_i_size);
 		ret2 = ext4_mark_inode_dirty(handle, inode);
@@ -2630,6 +2631,7 @@  static int __ext4_journalled_writepage(s
 				write_end_fn);
 	if (ret == 0)
 		ret = err;
+	EXT4_I(inode)->i_datasync_tid = handle->h_transaction->t_tid;
 	err = ext4_journal_stop(handle);
 	if (!ret)
 		ret = err;
diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32.orig//fs/ext4/super.c linux-2.6.32-ext4_fix//fs/ext4/super.c
--- linux-2.6.32.orig//fs/ext4/super.c	2009-12-03 04:51:21.000000000 +0100
+++ linux-2.6.32-ext4_fix//fs/ext4/super.c	2011-09-01 17:13:55.379363889 +0200
@@ -759,6 +759,40 @@  static void ext4_clear_inode(struct inod
 				       &EXT4_I(inode)->jinode);
 }
 
+static void ext4_drop_inode(struct inode *inode)
+{
+	if (inode->i_nlink) {
+		/*
+		 * When journalling data dirty buffers are tracked only in the
+		 * journal. So although mm thinks everything is clean and
+		 * ready for reaping the inode might still have some pages to
+		 * write in the running transaction or waiting to be
+		 * checkpointed. Thus calling jbd2_journal_invalidatepage()
+		 * (via truncate_inode_pages()) to discard these buffers can
+		 * cause data loss. Also even if we did not discard these
+		 * buffers, we would have no way to find them after the inode
+		 * is reaped and thus user could see stale data if he tries to
+		 * read them before the transaction is checkpointed. So be
+		 * careful and force everything to disk here... We use
+		 * ei->i_datasync_tid to store the newest transaction
+		 * containing inode's data.
+		 *
+		 * Note that directories do not have this problem because they
+		 * don't use page cache.
+		 */
+		if (ext4_should_journal_data(inode) &&
+		    (S_ISLNK(inode->i_mode) || S_ISREG(inode->i_mode))) {
+			journal_t *journal = EXT4_SB(inode->i_sb)->s_journal;
+			tid_t commit_tid = EXT4_I(inode)->i_datasync_tid;
+
+			jbd2_log_start_commit(journal, commit_tid);
+			jbd2_log_wait_commit(journal, commit_tid);
+			filemap_write_and_wait(&inode->i_data);
+		}
+	}
+	generic_drop_inode(inode);
+}
+
 static inline void ext4_show_quota_options(struct seq_file *seq,
 					   struct super_block *sb)
 {
@@ -1029,6 +1063,7 @@  static const struct super_operations ext
 	.statfs		= ext4_statfs,
 	.remount_fs	= ext4_remount,
 	.clear_inode	= ext4_clear_inode,
+	.drop_inode	= ext4_drop_inode,
 	.show_options	= ext4_show_options,
 #ifdef CONFIG_QUOTA
 	.quota_read	= ext4_quota_read,