diff mbox

crash in __jbd2_journal_file_buffer

Message ID 20130813103416.GA12197@quack.suse.cz
State Not Applicable, archived
Headers show

Commit Message

Jan Kara Aug. 13, 2013, 10:34 a.m. UTC
On Mon 12-08-13 11:13:06, Sage Weil wrote:
> Full dmesg is attached.
  Hum, nothing interesting in there...

> Our QA seems to hit this with some regularity.  Let me know if there's 
> some combination of patches that would help shed more light!
  If they can run with attached debug patch it could maybe sched some more
light. Please send also your System.map file together with the dmesg of the
kernel when the crash happens so that I can map addresses to function
names... Thanks!

								Honza

Comments

Sage Weil Aug. 22, 2013, 11:35 p.m. UTC | #1
On Tue, 13 Aug 2013, Jan Kara wrote:
> On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > Full dmesg is attached.
>   Hum, nothing interesting in there...
> 
> > Our QA seems to hit this with some regularity.  Let me know if there's 
> > some combination of patches that would help shed more light!
>   If they can run with attached debug patch it could maybe sched some more
> light. Please send also your System.map file together with the dmesg of the
> kernel when the crash happens so that I can map addresses to function
> names... Thanks!

Okay, finally hit it again:

<6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
<3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
<4>[77877.441200] ------------[ cut here ]------------
<4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
<4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
<4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
<4>[77877.535756] Call Trace:
<4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
<4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
<4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
<4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
<2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
<3>[77877.692983] Aborting journal on device sda1-8.
<2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
<0>[77877.721657] journal commit I/O error
<0>[77877.721706] journal commit I/O error
<0>[77877.721707] journal commit I/O error
<2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
<2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
<2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
<3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
<4>[77877.800516] ------------[ cut here ]------------
<4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()

<4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
<4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
<4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
<4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
<4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
<4>[77877.895962] Call Trace:
<4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
<4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
<4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
<4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
<4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
<4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
<4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
<4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
<4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
<4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
<4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
<4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
<4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
<4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
<4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
<4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
<4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
<4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
<4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
<4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
<4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
[7]kdb> rd
ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
[7]kdb> bt
Stack traceback for pid 26045
0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
 ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
 bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
 0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
Call Trace:
 [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
 [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
 [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
 [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
 [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
 [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
 [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
 [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
 [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
 [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
 [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
 [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
 [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
 [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
 [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
 [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
 [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b

Let me know if there is anything else I can gather from this machine that 
will help!

sage
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kara Aug. 23, 2013, 9:54 a.m. UTC | #2
On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> >   Hum, nothing interesting in there...
> > 
> > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > some combination of patches that would help shed more light!
> >   If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
> 
> Okay, finally hit it again:
> 
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
  Great! Can you please send me /proc/kallsyms from the machine?

								Honza

> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> <3>[77877.692983] Aborting journal on device sda1-8.
> <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> <0>[77877.721657] journal commit I/O error
> <0>[77877.721706] journal commit I/O error
> <0>[77877.721707] journal commit I/O error
> <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> <4>[77877.800516] ------------[ cut here ]------------
> <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> <4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> <4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> <4>[77877.895962] Call Trace:
> <4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> <4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> <4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> [7]kdb> rd
> ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
> dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
> bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
> r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
> r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
> r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
> ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
> [7]kdb> bt
> Stack traceback for pid 26045
> 0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
>  ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
>  bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
>  0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> Call Trace:
>  [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
>  [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
>  [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
>  [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
>  [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
>  [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
>  [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
>  [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
>  [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
>  [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
>  [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
>  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
>  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
>  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
>  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
>  [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
>  [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
> 
> Let me know if there is anything else I can gather from this machine that 
> will help!
> 
> sage
Sage Weil Aug. 23, 2013, 3:02 p.m. UTC | #3
On Fri, 23 Aug 2013, Jan Kara wrote:
> On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > Full dmesg is attached.
> > >   Hum, nothing interesting in there...
> > > 
> > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > some combination of patches that would help shed more light!
> > >   If they can run with attached debug patch it could maybe sched some more
> > > light. Please send also your System.map file together with the dmesg of the
> > > kernel when the crash happens so that I can map addresses to function
> > > names... Thanks!
> > 
> > Okay, finally hit it again:
> > 
> > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
>   Great! Can you please send me /proc/kallsyms from the machine?

Attached!

sage

> 
> 								Honza
> 
> > <4>[77877.441200] ------------[ cut here ]------------
> > <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > 
> > <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> > <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> > <4>[77877.535756] Call Trace:
> > <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> > <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> > <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> > <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
> > <2>[77877.674126] EXT4-fs error (device sda1) in ext4_handle_dirty_xattr_block:167: error 117
> > <3>[77877.692983] Aborting journal on device sda1-8.
> > <2>[77877.721561] EXT4-fs (sda1): Remounting filesystem read-only
> > <0>[77877.721657] journal commit I/O error
> > <0>[77877.721706] journal commit I/O error
> > <0>[77877.721707] journal commit I/O error
> > <2>[77877.727300] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727338] EXT4-fs (sda1): Remounting filesystem read-only
> > <2>[77877.727613] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727618] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.727625] EXT4-fs error (device sda1): ext4_journal_check_start:56: Detected aborted journal
> > <2>[77877.778239] EXT4-fs error (device sda1) in ext4_xattr_release_block:558: error 117
> > <3>[77877.786051] Dirtying buffer without jh at 4302720332: state 10c029,jh added from 0xffffffff8127eb88 at 4302720326, removed from 0xffffffff8127b5b0 at 4302720274
> > <4>[77877.800516] ------------[ cut here ]------------
> > <4>[77877.805156] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> > 
> > <4>[77877.856583] CPU: 7 PID: 26045 Comm: ceph-osd Tainted: G        W    3.11.0-rc5-ceph-00061-g546140d #1
> > <4>[77877.865896] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> > <4>[77877.873475]  0000000000000564 ffff880131ca19b8 ffffffff81642d85 ffff8802272ef290
> > <4>[77877.880954]  0000000000000000 ffff880131ca19f8 ffffffff8104985c ffff880131ca1a20
> > <4>[77877.888488]  ffff8801c499be58 0000000000000000 ffff880029010000 ffff880029010c30
> > <4>[77877.895962] Call Trace:
> > <4>[77877.898484]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> > <4>[77877.903643]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> > <4>[77877.909724]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> > <4>[77877.915577]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> > <4>[77877.922443]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> > <4>[77877.928353]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> > <4>[77877.935165]  [<ffffffff8122ca3e>] ext4_mark_iloc_dirty+0x40e/0x660
> > <4>[77877.941421]  [<ffffffff8126a465>] ext4_xattr_set_handle+0x265/0x4a0
> > <4>[77877.947766]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> > <4>[77877.953358]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> > <4>[77877.959354]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> > <4>[77877.965034]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> > <4>[77877.971289]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> > <4>[77877.976621]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> > <4>[77877.981837]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> > <4>[77877.987830]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77877.993944]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> > <4>[77877.999417]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> > <4>[77878.005530]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> > <4>[77878.011349]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> > <4>[77878.016856]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> > <4>[77878.022941] ---[ end trace bb7933908cd5a32b ]---
> > [7]kdb> rd
> > ax: ffff88020aadbf20  bx: ffff8800290100a0  cx: 0000000000000000
> > dx: ffff8800290100a0  si: ffff8800290100a0  di: ffff8800290100a0
> > bp: ffff880131ca1a28  sp: ffff880131ca1978  r8: 0000000000000002
> > r9: 0000000000000000  r10: 0000000000000001  r11: 0000000000000000
> > r12: ffff880029010000  r13: 0000000000000000  r14: ffff880029010c30
> > r15: 00000000ffffff8b  ip: ffffffff81279f84  flags: 00010286  cs: 00000010
> > ss: 00000018  ds: 00000018  es: 00000018  fs: 00000018  gs: 00000018
> > [7]kdb> bt
> > Stack traceback for pid 26045
> > 0xffff88020aadbf20    26045    25958  1    7   R  0xffff88020aadc3a8 *ceph-osd
> >  ffff880131ca1978 0000000000000018 ffff880131ca1978 ffff880131ca1998
> >  bb7933908cd5a32b 0000000000000000 0000000000000001 0000000000000002
> >  0000000000000000 ffff880131ca19b8 0000000000000000 ffff880131ca19f8
> > Call Trace:
> >  [<ffffffff8104986f>] ? warn_slowpath_common+0x9f/0xc0
> >  [<ffffffff81257308>] ? __ext4_journal_stop+0x78/0xa0
> >  [<ffffffff812578dc>] ? __ext4_handle_dirty_metadata+0xbc/0x140
> >  [<ffffffff8122ca3e>] ? ext4_mark_iloc_dirty+0x40e/0x660
> >  [<ffffffff8126a465>] ? ext4_xattr_set_handle+0x265/0x4a0
> >  [<ffffffff8126a765>] ? ext4_xattr_set+0xc5/0x140
> >  [<ffffffff8126b177>] ? ext4_xattr_user_set+0x47/0x50
> >  [<ffffffff811a3fee>] ? generic_setxattr+0x6e/0x90
> >  [<ffffffff811a48eb>] ? __vfs_setxattr_noperm+0x7b/0x1c0
> >  [<ffffffff811a4af4>] ? vfs_setxattr+0xc4/0xd0
> >  [<ffffffff811a4c3e>] ? setxattr+0x13e/0x1e0
> >  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> >  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> >  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> >  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> >  [<ffffffff811a50de>] ? SyS_fsetxattr+0xbe/0x100
> >  [<ffffffff81653782>] ? system_call_fastpath+0x16/0x1b
> > 
> > Let me know if there is anything else I can gather from this machine that 
> > will help!
> > 
> > sage
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
> 
>
Jan Kara Aug. 23, 2013, 7:52 p.m. UTC | #4
On Fri 23-08-13 08:02:45, Sage Weil wrote:
> On Fri, 23 Aug 2013, Jan Kara wrote:
> > On Thu 22-08-13 16:35:15, Sage Weil wrote:
> > > On Tue, 13 Aug 2013, Jan Kara wrote:
> > > > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > > > Full dmesg is attached.
> > > >   Hum, nothing interesting in there...
> > > > 
> > > > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > > > some combination of patches that would help shed more light!
> > > >   If they can run with attached debug patch it could maybe sched some more
> > > > light. Please send also your System.map file together with the dmesg of the
> > > > kernel when the crash happens so that I can map addresses to function
> > > > names... Thanks!
> > > 
> > > Okay, finally hit it again:
> > > 
> > > <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> > > <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> >   Great! Can you please send me /proc/kallsyms from the machine?
> 
> Attached!
  Sadly not. I think your MUA got confused by the fact that files in /proc
have 0 length and attached empty file.

								Honza
Jan Kara Aug. 23, 2013, 8:48 p.m. UTC | #5
On Thu 22-08-13 16:35:15, Sage Weil wrote:
> On Tue, 13 Aug 2013, Jan Kara wrote:
> > On Mon 12-08-13 11:13:06, Sage Weil wrote:
> > > Full dmesg is attached.
> >   Hum, nothing interesting in there...
> > 
> > > Our QA seems to hit this with some regularity.  Let me know if there's 
> > > some combination of patches that would help shed more light!
> >   If they can run with attached debug patch it could maybe sched some more
> > light. Please send also your System.map file together with the dmesg of the
> > kernel when the crash happens so that I can map addresses to function
> > names... Thanks!
> 
> Okay, finally hit it again:
> 
> <6>[75193.192249] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,user_xattr,user_xattr
> <3>[77877.426658] Dirtying buffer without jh at 4302720297: state 218c029,jh added from 0xffffffff8127ab1d at 4302720297, removed from 0xffffffff8127b5b0 at 4302720296
> <4>[77877.441200] ------------[ cut here ]------------
> <4>[77877.445845] WARNING: CPU: 7 PID: 26045 at /srv/autobuild-ceph/gitbuilder.git/build/fs/jbd2/transaction.c:1380 jbd2_journal_dirty_metadata+0x1f1/0x2e0()
> 
> <4>[77877.497349] CPU: 7 PID: 26045 Comm: ceph-osd Not tainted 3.11.0-rc5-ceph-00061-g546140d #1
> <4>[77877.505649] Hardware name: Dell Inc. PowerEdge R410/01V648, BIOS 1.6.3 02/07/2011
> <4>[77877.513213]  0000000000000564 ffff880131ca1938 ffffffff81642d85 ffff8802272ef290
> <4>[77877.520694]  0000000000000000 ffff880131ca1978 ffffffff8104985c ffff880131ca19a0
> <4>[77877.528218]  ffff88020f695aa0 0000000000000000 ffff880214c48b40 ffff88020be55000
> <4>[77877.535756] Call Trace:
> <4>[77877.538279]  [<ffffffff81642d85>] dump_stack+0x46/0x58
> <4>[77877.543439]  [<ffffffff8104985c>] warn_slowpath_common+0x8c/0xc0
> <4>[77877.549548]  [<ffffffff810498aa>] warn_slowpath_null+0x1a/0x20
> <4>[77877.555413]  [<ffffffff8127adb1>] jbd2_journal_dirty_metadata+0x1f1/0x2e0
> <4>[77877.562288]  [<ffffffff812578c3>] __ext4_handle_dirty_metadata+0xa3/0x140
> <4>[77877.569155]  [<ffffffff81268e23>] ext4_xattr_release_block+0x103/0x1f0
> <4>[77877.575723]  [<ffffffff812692b0>] ext4_xattr_block_set+0x1e0/0x910
> <4>[77877.581990]  [<ffffffff8126a58b>] ext4_xattr_set_handle+0x38b/0x4a0
> <4>[77877.588335]  [<ffffffff810af7cd>] ? trace_hardirqs_on+0xd/0x10
> <4>[77877.594188]  [<ffffffff8126a765>] ext4_xattr_set+0xc5/0x140
> <4>[77877.599837]  [<ffffffff8126b177>] ext4_xattr_user_set+0x47/0x50
> <4>[77877.605779]  [<ffffffff811a3fee>] generic_setxattr+0x6e/0x90
> <4>[77877.611514]  [<ffffffff811a48eb>] __vfs_setxattr_noperm+0x7b/0x1c0
> <4>[77877.617773]  [<ffffffff811a4af4>] vfs_setxattr+0xc4/0xd0
> <4>[77877.623103]  [<ffffffff811a4c3e>] setxattr+0x13e/0x1e0
> <4>[77877.628317]  [<ffffffff81181ec7>] ? __sb_start_write+0xe7/0x1b0
> <4>[77877.634260]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.640428]  [<ffffffff8119cf0c>] ? fget_light+0x3c/0x130
> <4>[77877.645847]  [<ffffffff8119fb98>] ? mnt_want_write_file+0x28/0x60
> <4>[77877.652015]  [<ffffffff8119e902>] ? mnt_clone_write+0x12/0x30
> <4>[77877.657897]  [<ffffffff811a50de>] SyS_fsetxattr+0xbe/0x100
> <4>[77877.663405]  [<ffffffff81653782>] system_call_fastpath+0x16/0x1b
> <4>[77877.669488] ---[ end trace bb7933908cd5a32a ]---
  I was scratching my head for a while how this could happen but I think I
see the race now. Think we have two inodes A and B which share the same
xattr block (so &BHDR(bh)->h_refcount == 2). Now we are changing xattrs for
both inodes in parallel - that can easily happen because xattr locking is
generally per inode (EXT4_I(inode)->xattr_sem). The following race then
happens:
CPU1					CPU2
ext4_xattr_release_block()		ext4_xattr_release_block()
lock_buffer(bh);
/* False */
if (BHDR(bh)->h_refcount == cpu_to_le32(1))
} else {
	le32_add_cpu(&BHDR(bh)->h_refcount, -1);
	unlock_buffer(bh);
					lock_buffer(bh);
					/* True */
					if (BHDR(bh)->h_refcount == cpu_to_le32(1))
						get_bh(bh);
						ext4_free_blocks()
							...
							jbd2_journal_forget()
								jbd2_journal_unfile_buffer()
								-> JH is gone
	error = ext4_handle_dirty_xattr_block(handle, inode, bh);
	-> triggers warning

Now easy fix would be to move ext4_handle_dirty_xattr_block() before
unlock_buffer() but I don't really like that because of locking
constraints. I'll think about it...

								Honza
diff mbox

Patch

From ae473a63b28ee1719cf47a637c73d9d163ddb4db Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Tue, 13 Aug 2013 12:29:35 +0200
Subject: [PATCH] jbd2: Debug journal_head additions and removals

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/jbd2/journal.c           | 4 ++++
 fs/jbd2/transaction.c       | 4 ++++
 include/linux/buffer_head.h | 4 ++++
 3 files changed, 12 insertions(+)

diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 02c7ad9..227a1ab 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -2433,6 +2433,8 @@  repeat:
 		bh->b_private = jh;
 		jh->b_bh = bh;
 		get_bh(bh);
+		bh->added_from = _RET_IP_;
+		bh->added_jiffies = jiffies;
 		BUFFER_TRACE(bh, "added journal_head");
 	}
 	jh->b_jcount++;
@@ -2498,6 +2500,8 @@  void jbd2_journal_put_journal_head(struct journal_head *jh)
 	--jh->b_jcount;
 	if (!jh->b_jcount) {
 		__journal_remove_journal_head(bh);
+		bh->removed_from = _RET_IP_;
+		bh->removed_jiffies = jiffies;
 		jbd_unlock_bh_journal_head(bh);
 		__brelse(bh);
 	} else
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 7aa9a32..aacd1cd 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1275,6 +1275,10 @@  int jbd2_journal_dirty_metadata(handle_t *handle, struct buffer_head *bh)
 	journal = transaction->t_journal;
 	jh = jbd2_journal_grab_journal_head(bh);
 	if (!jh) {
+		printk(KERN_ERR "Dirtying buffer without jh at %lu: state %lx,"
+			"jh added from 0x%lx at %lu, removed from 0x%lx "
+			"at %lu\n", jiffies, bh->b_state, bh->removed_from,
+			bh->removed_jiffies, bh->added_from, bh->added_jiffies);
 		ret = -EUCLEAN;
 		goto out;
 	}
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 91fa9a9..f24073e 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -74,6 +74,10 @@  struct buffer_head {
 	struct address_space *b_assoc_map;	/* mapping this buffer is
 						   associated with */
 	atomic_t b_count;		/* users using this buffer_head */
+	unsigned long removed_from;
+	unsigned long removed_jiffies;
+	unsigned long added_from;
+	unsigned long added_jiffies;
 };
 
 /*
-- 
1.8.1.4