Message ID | 20171113031502.f6mctmlmgk5psh77@thunk.org |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | [GIT,PULL] ext4 updates for 4.15 | expand |
I forgot to mention, there's a merge conflict when pulling the ext4 and fscrypt trees. The fixup is relatively straightforward: commit daf886f04e60eda3bbc957e79d81d72965afd947 Merge: a0b3bc855374 232530680290 Author: Theodore Ts'o <tytso@mit.edu> Date: Sun Nov 12 22:03:15 2017 -0500 Merge tag 'ext4_for_linus' into test Add support for online resizing of file systems with bigalloc. Fix a two data corruption bugs involving DAX, as well as a corruption bug after a crash during a racing fallocate and delayed allocation. Finally, a number of cleanups and optimizations. diff --cc fs/ext4/inode.c index 617c7feced24,9f836e2ec18c..737c43d724fb --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@@ -4587,15 -4640,10 +4640,13 @@@ void ext4_set_inode_flags(struct inode new_fl |= S_NOATIME; if (flags & EXT4_DIRSYNC_FL) new_fl |= S_DIRSYNC; - if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) && - !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) && - !(flags & EXT4_ENCRYPT_FL)) + if (ext4_should_use_dax(inode)) new_fl |= S_DAX; + if (flags & EXT4_ENCRYPT_FL) + new_fl |= S_ENCRYPTED; inode_set_flags(inode, new_fl, - S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX); + S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX| + S_ENCRYPTED); } static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
On Mon, Nov 13, 2017 at 8:25 AM, Theodore Ts'o <tytso@mit.edu> wrote: > I forgot to mention, there's a merge conflict when pulling the ext4 > and fscrypt trees. The fixup is relatively straightforward: It doesn't actually look all that straightforward, and in particular, the resolution you sent me doesn't actually seem correct: > new_fl |= S_NOATIME; > if (flags & EXT4_DIRSYNC_FL) > new_fl |= S_DIRSYNC; > - if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) && > - !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) && > - !(flags & EXT4_ENCRYPT_FL)) > + if (ext4_should_use_dax(inode)) > new_fl |= S_DAX; This now loses the "!(flags & EXT4_ENCRYPT_FL)" test when it sets S_DAX. Yes, in ext4_should_use_dax(), it has this code if (ext4_encrypted_inode(inode)) return false; but that test was what commit 2ee6a576be56 changed in favor of just checking !(flags & EXT4_ENCRYPT_FL). So that suggested merge resolkution actually undoes some of that commit 2ee6a576be56. Of course, (flags & EXT4_ENCRYPT_FL) _should_ be the same as ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT); so It does seem to be harmless, but it's a bit dodgy. I'll do that suggested resolution, but I have to say that the ext4 bit testing is incredibly broken and non-obvious. Just as an example: fs/ext4/ext4.h:#define EXT4_ENCRYPT_FL 0x00000800 /* encrypted file */ fs/ext4/ext4.h: EXT4_INODE_ENCRYPT = 11, /* Encrypted file */ yeah, it's the same bit, but it sure as hell isn't obvious. Why the two totally different ways to define that data? Linus
On Tue, Nov 14, 2017 at 12:59:17PM -0800, Linus Torvalds wrote: > Of course, > > (flags & EXT4_ENCRYPT_FL) > > _should_ be the same as > > ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT); And in the second is the preferred way to do things, actually. > I'll do that suggested resolution, but I have to say that the ext4 bit > testing is incredibly broken and non-obvious. Just as an example: > > fs/ext4/ext4.h:#define EXT4_ENCRYPT_FL 0x00000800 > /* encrypted file */ > fs/ext4/ext4.h: EXT4_INODE_ENCRYPT = 11, /* Encrypted file */ > > yeah, it's the same bit, but it sure as hell isn't obvious. Why the > two totally different ways to define that data? Yes, it's non-obvious and ugly. Sorry about that. We originally used EXT4_*_FL, and we needed to use the bit number encoding so we could use test_bit(). We just never converted all the way over. We do have a way to make sure the two ways of defining a bit position are in sync; see ext4_check_flag_values() and CHECK_FLAG_VALUE in ext4.h. It's a bit gross, and we probably should clean this up, at least in the kernel. The e2fsprogs user space libraries all use EXT4_*_FL, and we can't change that without breaking applications depending on userspace, but we can keep things consistent in the kernel, and that probably means completely converting away from EXT4_*_FL, if possible. - Ted
On Tue, Nov 14, 2017 at 4:56 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > The e2fsprogs user space libraries all use > EXT4_*_FL, and we can't change that without breaking applications > depending on userspace, but we can keep things consistent in the > kernel, and that probably means completely converting away from > EXT4_*_FL, if possible. So I don't think you'd necessarily need to convert from one to the other, but wouldn't it be nice if you at least defined one in terms of the other, ie something like #define EXT4_ENCRYPT_FL (1u << EXT4_INODE_ENCRYPT) so that when you grep for one you see how they are directly related. Now it was much less obvious, and I was nervous because that whole series did introduce _different_ bits that were not in the same space at all, and encoded the same thing (ie that S_ENCRYPTED bit). Maybe this normally doesn't come up, but it was not all that obvious, particularly since there was a lot of indirection: ext4_encrypted_inode() -> ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT) -> EXT4_INODE_BIT_FNS() That EXT4_INODE_BIT_FNS thing was really fascinating to see. So just confirming that yes, ext4_encrypted_inode() is the same thing as EXT4_I(inode)->i_flags & EXT4_ENCRYPT_FL was a real adventure. Making it clear that EXT4_ENCRYPT_FL and EXT4_INODE_ENCRYPT are the same bit would maybe have lessened the confusion at least a tiny bit. Of course, not having five different ways to test the same bit would have been even better. Ok, I'm exaggerating. But there really does seem to be a lot of different ways to check i_flags bits, with some uses checking it directly, the places _setting_ it using ext4_set_inode_flag(), and then other testers using bit-specific helper. And that somewhat confusing model seems to be true of pretty much all the bits. As long as you can keep track of it, I guess it's fine. Linus
Hi, On 11/13/2017, 04:15 AM, Theodore Ts'o wrote: > The following changes since commit 9e66317d3c92ddaab330c125dfe9d06eee268aff: > > Linux 4.14-rc3 (2017-10-01 14:54:54 -0700) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus ... > Christoph Hellwig (1): > ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA So I bisected a 32bit hole problem in 4.15 down to this one. tar selftests never finish during build on the 32bit kernel. The stack trace is something like this: tar R running task 0 24780 24590 0x00000000 Call Trace: ? ext4_map_blocks+0x2fa/0x5f0 ? ext4_map_blocks+0x348/0x5f0 ? ext4_iomap_begin+0x39b/0x4d0 ? ext4_evict_inode+0x5c0/0x5c0 ? iomap_apply+0x61/0x160 ? iomap_to_fiemap+0xd0/0xd0 ? iomap_seek_data+0xac/0x110 ? iomap_seek_hole_actor+0x80/0x80 ? ext4_llseek+0x13a/0x140 ? ext4_file_mmap+0x90/0x90 ? SyS_llseek+0x7b/0xc0 ? do_int80_syscall_32+0x51/0x100 ? entry_INT80_32+0x36/0x36 The tar test does this: AT_TAR_CHECK([ AT_DATA([mapfile], [0 =2560 m4_for([i], 1, 999, 1, [10M =2560 ])]) genfile --sparse --file BIGFILE --block-size 4K - < mapfile || AT_SKIP_TEST tar -f - -c --sparse --posix BIGFILE | tar tvf - | awk '{ print $3, $(NF) }' ], I.e. it generates a file with 1000 blocks of 2560*4K bytes. The blocks are 10M from each other, that is 999 10M holes. Tar then tries to read&seek in the file, but the seek above never finishes (or maybe finishes after a long time). Given this happens only on 32bit kernel, I assume some 32bit overflow. But I am unable to see it (yet). thanks,
On 02/12/2018, 11:02 AM, Jiri Slaby wrote: > Given this happens only on 32bit kernel, I assume some 32bit overflow. > But I am unable to see it (yet). Just to add, a diff of strace in good and bad kernels: @@ -655,14 +655,4 @@ _llseek(3, 4275568640, [4286054400], SEEK_DATA) = 0 _llseek(3, 4286054400, [4288675840], SEEK_HOLE) = 0 _llseek(3, 4288675840, [4299161600], SEEK_DATA) = 0 -_llseek(3, 4299161600, [4301783040], SEEK_HOLE) = 0 +_llseek(3, 4299161600, [4299161600], SEEK_HOLE) = 2621440 llseek returns a very invalid value when it comes to 0x100400000. regards,
On 02/12/2018, 01:14 PM, Jiri Slaby wrote: > On 02/12/2018, 11:02 AM, Jiri Slaby wrote: >> Given this happens only on 32bit kernel, I assume some 32bit overflow. >> But I am unable to see it (yet). > > Just to add, a diff of strace in good and bad kernels: > @@ -655,14 +655,4 @@ > _llseek(3, 4275568640, [4286054400], SEEK_DATA) = 0 > _llseek(3, 4286054400, [4288675840], SEEK_HOLE) = 0 > _llseek(3, 4288675840, [4299161600], SEEK_DATA) = 0 > -_llseek(3, 4299161600, [4301783040], SEEK_HOLE) = 0 > +_llseek(3, 4299161600, [4299161600], SEEK_HOLE) = 2621440 > > llseek returns a very invalid value when it comes to 0x100400000. It gets cropped to 0x400000, so: --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3523,7 +3523,7 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length, iomap->flags |= IOMAP_F_DIRTY; iomap->bdev = inode->i_sb->s_bdev; iomap->dax_dev = sbi->s_daxdev; - iomap->offset = first_block << blkbits; + iomap->offset = (u64)first_block << blkbits; iomap->length = (u64)map.m_len << blkbits; if (ret == 0) { sounds about right? > regards,
On 02/12/2018, 02:34 PM, Jiri Slaby wrote: > @@ -3523,7 +3523,7 @@ static int ext4_iomap_begin(struct inode *inode, > loff_t offset, loff_t length, > iomap->flags |= IOMAP_F_DIRTY; > iomap->bdev = inode->i_sb->s_bdev; > iomap->dax_dev = sbi->s_daxdev; > - iomap->offset = first_block << blkbits; > + iomap->offset = (u64)first_block << blkbits; > iomap->length = (u64)map.m_len << blkbits; > > if (ret == 0) { > > sounds about right? Or even simpler: - iomap->offset = first_block << blkbits; + iomap->offset = offset; >> regards,-- js suse labs
On Mon, Feb 12, 2018 at 01:14:07PM +0100, Jiri Slaby wrote: > On 02/12/2018, 11:02 AM, Jiri Slaby wrote: > > Given this happens only on 32bit kernel, I assume some 32bit overflow. > > But I am unable to see it (yet). > > Just to add, a diff of strace in good and bad kernels: > @@ -655,14 +655,4 @@ > _llseek(3, 4275568640, [4286054400], SEEK_DATA) = 0 > _llseek(3, 4286054400, [4288675840], SEEK_HOLE) = 0 > _llseek(3, 4288675840, [4299161600], SEEK_DATA) = 0 > -_llseek(3, 4299161600, [4301783040], SEEK_HOLE) = 0 > +_llseek(3, 4299161600, [4299161600], SEEK_HOLE) = 2621440 > > llseek returns a very invalid value when it comes to 0x100400000. Thanks for the bugreport! Can you send me the output of "filefrag -v" on the file so I can make sure we can exactly replicate what you're seeing? Thanks, - Ted