Message ID | 20150625034626.GA21682@thunk.org |
---|---|
State | Accepted, archived |
Headers | show |
Hi Linus, Here's my suggested merge resolution to deal with Al Viro's symlink changes. - Ted diff --cc fs/ext4/symlink.c index ba5bd18,68e915a..0000000 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c @@@ -35,19 -34,20 +34,17 @@@ static const char *ext4_follow_link(str int res; u32 plen, max_size = inode->i_sb->s_blocksize; - ctx = ext4_get_fname_crypto_ctx(inode, inode->i_sb->s_blocksize); - if (IS_ERR(ctx)) - return ERR_CAST(ctx); - if (!ext4_encrypted_inode(inode)) - return page_follow_link_light(dentry, nd); - + res = ext4_get_encryption_info(inode); + if (res) + return ERR_PTR(res); if (ext4_inode_is_fast_symlink(inode)) { caddr = (char *) EXT4_I(inode)->i_data; max_size = sizeof(EXT4_I(inode)->i_data); } else { cpage = read_mapping_page(inode->i_mapping, 0, NULL); - if (IS_ERR(cpage)) { - ext4_put_fname_crypto_ctx(&ctx); + if (IS_ERR(cpage)) - return cpage; + return ERR_CAST(cpage); - } caddr = kmap(cpage); caddr[size] = 0; } @@@ -77,14 -78,13 +75,12 @@@ /* Null-terminate the name */ if (res <= plen) paddr[res] = '\0'; - ext4_put_fname_crypto_ctx(&ctx); - nd_set_link(nd, paddr); if (cpage) { kunmap(cpage); page_cache_release(cpage); } - return NULL; + return *cookie = paddr; errout: - ext4_put_fname_crypto_ctx(&ctx); if (cpage) { kunmap(cpage); page_cache_release(cpage); -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ I renamed "ext4_follow_link()" to "ext4_encrypted_follow_link()" in the merge resolution, to make it clear that that function is _only_ used for encrypted symlinks. The function doesn't actually work for non-encrypted symlinks at all, and they use the generic helpers - Linus ] Thanks, that was on my todo list as a cleanup patch after the merge window closed... :-) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jun 24, 2015 at 8:46 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > A very large number of cleanups and bug fixes --- in particular for > the ext4 encryption patches, which is a new feature added in the last > merge window. Also fix a number of long-standing xfstest failures. > (Quota writes failing due to ENOSPC, a race between truncate and > writepage in data=journalled mode that was causing generic/068 to > fail, and other corner cases.) > > Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2 > performance eliminating locking when a buffer is modified more than > once during a transaction (which is very common for allocation > bitmaps, for example), in which case the state of the journalled > buffer head doesn't need to change. I think this is very broken. I just got this while compiling: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1325! invalid opcode: 0000 [#1] SMP Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 ... CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679b411 #1 Hardware name: /DH87RL, BIOS RLH8710H.86A.0327.2014.0924.1645 09/24/2014 task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000 RIP: jbd2_journal_dirty_metadata+0x237/0x290 Call Trace: __ext4_handle_dirty_metadata+0x43/0x1f0 ext4_handle_dirty_dirent_node+0xde/0x160 ? jbd2_journal_get_write_access+0x36/0x50 ext4_delete_entry+0x112/0x160 ? __ext4_journal_start_sb+0x52/0xb0 ext4_unlink+0xfa/0x260 vfs_unlink+0xec/0x190 do_unlinkat+0x24a/0x270 SyS_unlink+0x11/0x20 entry_SYSCALL_64_fastpath+0x12/0x6a Code: ff f3 90 48 8b 16 f7 c2 00 00 80 00 75 f3 e9 f4 fe ff ff b8 8b ff ff ff e9 8e fe ff ff 31 c0 e9 3a ff ff ff 31 ff e9 26 ff ff ff <0f> 0b 41 83 7c 24 0c 01 0f 84 e4 fe ff ff 0f 0b 0f 0b 4d 85 c9 RIP jbd2_journal_dirty_metadata+0x237/0x290 ---[ end trace ae033ebde8d080b4 ]--- followed by basically a dead machine (SIGSEGV's, unresponsive X etc). I assume it died with some major jbd2 or ext4 lock held. The most obvious candidate for a culprit would seem to be 2143c1965a76 "jbd2: speedup jbd2_journal_dirty_metadata()" which is the commit that introduced the assert that triggers. Ted? Jan? Nothing particularly odd was going on. I was reading email in a browser while doing an allmodconfig build with "make -j16". That's literally all I ever tend to do. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 26, 2015 at 08:05:48PM -0700, Linus Torvalds wrote: > On Wed, Jun 24, 2015 at 8:46 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > > > A very large number of cleanups and bug fixes --- in particular for > > the ext4 encryption patches, which is a new feature added in the last > > merge window. Also fix a number of long-standing xfstest failures. > > (Quota writes failing due to ENOSPC, a race between truncate and > > writepage in data=journalled mode that was causing generic/068 to > > fail, and other corner cases.) > > > > Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2 > > performance eliminating locking when a buffer is modified more than > > once during a transaction (which is very common for allocation > > bitmaps, for example), in which case the state of the journalled > > buffer head doesn't need to change. > > I think this is very broken. > > I just got this while compiling: > > ------------[ cut here ]------------ > kernel BUG at fs/jbd2/transaction.c:1325! > ... > > The most obvious candidate for a culprit would seem to be > > 2143c1965a76 "jbd2: speedup jbd2_journal_dirty_metadata()" > > which is the commit that introduced the assert that triggers. Ted? Jan? I would tend to agree. The weird thing though is that I haven't seen this problem myself, despite running multiple regression tests before I sent the pull request, as well as running it on my laptop and doing kernel compiles with make -j16 and reading e-mail (I'm typing this e-mail on a 4.1 kernel merged with the ext4 patches for this merge window, so I have this commit running in my development laptop, and it hasn't triggered for me). All of this being said, my suggestion would be to revert it for now and we'll investigate more closely for the next merge window. This is just a performance improvement, and so better safe than sorry. Jan, do you agree? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Jun 27, 2015 at 12:02:37AM -0400, Theodore Ts'o wrote: > > I would tend to agree. The weird thing though is that I haven't seen > this problem myself, despite running multiple regression tests before > I sent the pull request, as well as running it on my laptop and doing > kernel compiles with make -j16 and reading e-mail (I'm typing this > e-mail on a 4.1 kernel merged with the ext4 patches for this merge > window, so I have this commit running in my development laptop, and it > hasn't triggered for me). Update: I've been able to reproduce the crash using a post merge kernel (of course, I had to use CONFIG_SLUB since CONFIG_SLAB is busted) a single time. Unfortunately, I've not been able to reproduce it reliably so I can't speak to whether reverting 2143c1965a76 will fix things. But it does seem very likely. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat 27-06-15 10:07:43, Ted Tso wrote: > On Sat, Jun 27, 2015 at 12:02:37AM -0400, Theodore Ts'o wrote: > > > > I would tend to agree. The weird thing though is that I haven't seen > > this problem myself, despite running multiple regression tests before > > I sent the pull request, as well as running it on my laptop and doing > > kernel compiles with make -j16 and reading e-mail (I'm typing this > > e-mail on a 4.1 kernel merged with the ext4 patches for this merge > > window, so I have this commit running in my development laptop, and it > > hasn't triggered for me). > > Update: I've been able to reproduce the crash using a post merge > kernel (of course, I had to use CONFIG_SLUB since CONFIG_SLAB is > busted) a single time. Unfortunately, I've not been able to reproduce > it reliably so I can't speak to whether reverting 2143c1965a76 will > fix things. But it does seem very likely. Sure, I agree with the revert Linus did. That's the best we can do now. I'm now trying to figure out what could have gone wrong. Honza
On Sat 27-06-15 10:07:43, Ted Tso wrote: > On Sat, Jun 27, 2015 at 12:02:37AM -0400, Theodore Ts'o wrote: > > > > I would tend to agree. The weird thing though is that I haven't seen > > this problem myself, despite running multiple regression tests before > > I sent the pull request, as well as running it on my laptop and doing > > kernel compiles with make -j16 and reading e-mail (I'm typing this > > e-mail on a 4.1 kernel merged with the ext4 patches for this merge > > window, so I have this commit running in my development laptop, and it > > hasn't triggered for me). > > Update: I've been able to reproduce the crash using a post merge > kernel (of course, I had to use CONFIG_SLUB since CONFIG_SLAB is > busted) a single time. Unfortunately, I've not been able to reproduce > it reliably so I can't speak to whether reverting 2143c1965a76 will > fix things. But it does seem very likely. BTW, what did you use to trigger the error for you? The ext4 path where the assertion triggered for Linus is definitely correct so the assertion failure was a false positive. Since we don't hold proper locks when doing the assertion, we likely saw an intermediate state where we raced with __jbd2_journal_refile_buffer() called from jbd2 thread which was doing jh->b_transaction = jh->b_next_transaction; jh->b_next_transaction = NULL; and we saw NULL in b_next_transaction but old content in b_transaction. Originally I didn't think that's really possible but on a second thought don't know how I came to that conclusion :-|. Looking at the disassemly of __jbd2_journal_refile_buffer() I see my compiler compiled the above as: mov 0x30(%rbx),%rax b_next_transaction -> rax movq $0x0,0x30(%rbx) NULL -> b_next_transaction mov %rax,0x28(%rbx) rax -> b_transaction So even the compiler decided to reorder the stores to memory and I'm not even speaking of what the CPU could have done with the stores or loads from the assertion. I will fix the assertion in the same way the second assertion in jbd2_journal_dirty_metadata() was fixed - take the necessary locks if it seems the assertion is violated to make sure we don't see false positives. Honza
On Sat, Jun 27, 2015 at 11:05 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Wed, Jun 24, 2015 at 8:46 PM, Theodore Ts'o <tytso@mit.edu> wrote: >> >> A very large number of cleanups and bug fixes --- in particular for >> the ext4 encryption patches, which is a new feature added in the last >> merge window. Also fix a number of long-standing xfstest failures. >> (Quota writes failing due to ENOSPC, a race between truncate and >> writepage in data=journalled mode that was causing generic/068 to >> fail, and other corner cases.) >> >> Also add support for FALLOC_FL_INSERT_RANGE, and improve jbd2 >> performance eliminating locking when a buffer is modified more than >> once during a transaction (which is very common for allocation >> bitmaps, for example), in which case the state of the journalled >> buffer head doesn't need to change. > > I think this is very broken. > > I just got this while compiling: > > ------------[ cut here ]------------ > kernel BUG at fs/jbd2/transaction.c:1325! > invalid opcode: 0000 [#1] SMP > Modules linked in: bnep bluetooth fuse ip6t_rpfilter ip6t_REJECT > nf_reject_ipv6 nf_conntrack_ipv6 ... > CPU: 7 PID: 5509 Comm: gcc Not tainted 4.1.0-10944-g2a298679b411 #1 > Hardware name: /DH87RL, BIOS > RLH8710H.86A.0327.2014.0924.1645 09/24/2014 > task: ffff8803bf866040 ti: ffff880308528000 task.ti: ffff880308528000 > RIP: jbd2_journal_dirty_metadata+0x237/0x290 > Call Trace: > __ext4_handle_dirty_metadata+0x43/0x1f0 > ext4_handle_dirty_dirent_node+0xde/0x160 > ? jbd2_journal_get_write_access+0x36/0x50 > ext4_delete_entry+0x112/0x160 > ? __ext4_journal_start_sb+0x52/0xb0 > ext4_unlink+0xfa/0x260 > vfs_unlink+0xec/0x190 > do_unlinkat+0x24a/0x270 > SyS_unlink+0x11/0x20 > entry_SYSCALL_64_fastpath+0x12/0x6a > Code: ff f3 90 48 8b 16 f7 c2 00 00 80 00 75 f3 e9 f4 fe ff ff b8 > 8b ff ff ff e9 8e fe ff ff 31 c0 e9 3a ff ff ff 31 ff e9 26 ff ff ff > <0f> 0b 41 83 7c 24 0c 01 0f 84 e4 fe ff ff 0f 0b 0f 0b 4d 85 c9 > RIP jbd2_journal_dirty_metadata+0x237/0x290 > ---[ end trace ae033ebde8d080b4 ]--- > > followed by basically a dead machine (SIGSEGV's, unresponsive X etc). > I assume it died with some major jbd2 or ext4 lock held. > > The most obvious candidate for a culprit would seem to be > > 2143c1965a76 "jbd2: speedup jbd2_journal_dirty_metadata()" > > which is the commit that introduced the assert that triggers. Ted? Jan? > > Nothing particularly odd was going on. I was reading email in a > browser while doing an allmodconfig build with "make -j16". That's > literally all I ever tend to do. BTW, the similar trace stack can be triggered easily by xfstests generic 269, see the following trace, and looks the revert does fix it. [ 2391.200871] run fstests generic/269 at 2015-06-28 16:21:32^M [ 2391.606047] EXT4-fs (sda): mounted filesystem with ordered data mode. Opts: acl,user_xattr^M [ 2395.489941] ------------[ cut here ]------------^M [ 2395.490652] kernel BUG at fs/jbd2/transaction.c:1325!^M [ 2395.490652] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC ^M [ 2395.490652] Dumping ftrace buffer:^M [ 2395.490652] (ftrace buffer empty)^M [ 2395.490652] Modules linked in: nbd ipv6 kvm_intel kvm serio_raw [last unloaded: null_blk]^M [ 2395.490652] CPU: 10 PID: 2635 Comm: fsstress Not tainted 4.1.0-next-20150626 #359^M [ 2395.490652] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011^M [ 2395.490652] task: ffff8800ad84a200 ti: ffff88022cf90000 task.ti: ffff88022cf90000^M [ 2395.490652] RIP: 0010:[<ffffffff811dbceb>] [<ffffffff811dbceb>] jbd2_journal_dirty_metadata+0x51/0x207^M [ 2395.490652] RSP: 0018:ffff88022cf93c98 EFLAGS: 00010207^M [ 2395.490652] RAX: ffff88022972ae00 RBX: ffff8801f8eede10 RCX: 0000000000000000^M [ 2395.490652] RDX: ffff8800bb7da300 RSI: ffff8800a5c82208 RDI: ffff8802345efc00^M [ 2395.490652] RBP: ffff88022cf93cf8 R08: ffff8800a5c82208 R09: 00000000000009af^M [ 2395.490652] R10: 0000000000000000 R11: 000000000000000c R12: ffff8800bb7da300^M [ 2395.490652] R13: ffff8800a5c82208 R14: 0000000000000000 R15: ffff8802345efc00^M [ 2395.490652] FS: 00007f2559cc1740(0000) GS:ffff88023ed40000(0000) knlGS:0000000000000000^M [ 2395.490652] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M [ 2395.490652] CR2: 00007f2554109fb8 CR3: 000000021469a000 CR4: 00000000000007e0^M [ 2395.490652] Stack:^M [ 2395.490652] ffff88022cf93cb8 ffff88020895e800 0000000000000023 ffff8802345efc00^M [ 2395.490652] 00000000000009af 0000000000000000 ffff8801f8eede10 ffff8800a5c82208^M [ 2395.490652] ffff88013bf329a8 0000000000000000 ffff8802345efc00 00000000000003d4^M [ 2395.490652] Call Trace:^M [ 2395.490652] [<ffffffff811c7d38>] __ext4_handle_dirty_metadata+0xcf/0x1bf^M [ 2395.490652] [<ffffffff810d83fb>] ? trace_preempt_on+0x12/0x2f^M [ 2395.490652] [<ffffffff811a0a53>] __ext4_new_inode+0x930/0xdd0^M [ 2395.490652] [<ffffffff811acec8>] ext4_mknod+0x8f/0x131^M [ 2395.490652] [<ffffffff81149193>] vfs_mknod+0x7c/0x9c^M [ 2395.490652] [<ffffffff8114bf0c>] ? user_path_create+0x2b/0x32^M [ 2395.490652] [<ffffffff8114c043>] SYSC_mknodat+0x130/0x176^M [ 2395.490652] [<ffffffff8114cd48>] SyS_mknod+0x1d/0x1f^M [ 2395.490652] [<ffffffff814f0917>] entry_SYSCALL_64_fastpath+0x12/0x6a^M [ 2395.490652] Code: 41 89 c6 0f 85 b8 01 00 00 49 8b 45 00 a9 00 00 02 00 0f 84 b1 01 00 00 49 8b 5d 40 48 8b 43 28 4c 39 e0 74 08 4c 39 63 30 74 02 <0f> 0b 83 7b 10 01 75 39 4c 39 e0 0f 85 93 01 00 00 83 7b 0c 01 ^M [ 2395.490652] RIP [<ffffffff811dbceb>] jbd2_journal_dirty_metadata+0x51/0x207^M [ 2395.490652] RSP <ffff88022cf93c98>^M [ 2395.516997] ---[ end trace dc5028a237345935 ]---^M
On Mon, Jun 29, 2015 at 11:04:37AM +0200, Jan Kara wrote: > > BTW, what did you use to trigger the error for you? The ext4 path where the > assertion triggered for Linus is definitely correct so the assertion > failure was a false positive. I've managed to trigger it twice in about 5 or 6 full runs of "./kvm-xfstests -g auto". Once it was with generic/233 in 1k block file system mode, the second time it was generic/233 in ext3 compat mode. i've tried running just generic/233, and it doesn't trigger. "i.e., "./kvm-xfstests -C 10 generic/233" didn't trigger the bug. (I'm currently on vacation, so I've been kicking off the run each morning or evening and then checking to see if it blew up or not.) Cheers, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html