[GIT,PULL] ext4 updates for 4.15

Message ID 20171113031502.f6mctmlmgk5psh77@thunk.org
State New
Headers show
Series
  • [GIT,PULL] ext4 updates for 4.15
Related show

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus

Message

Theodore Ts'o Nov. 13, 2017, 3:15 a.m.
The following changes since commit 9e66317d3c92ddaab330c125dfe9d06eee268aff:

  Linux 4.14-rc3 (2017-10-01 14:54:54 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git tags/ext4_for_linus

for you to fetch changes up to 232530680290ba94ca37852ab10d9556ea28badf:

  ext4: improve smp scalability for inode generation (2017-11-08 22:23:20 -0500)

----------------------------------------------------------------
Add support for online resizing of file systems with bigalloc.  Fix a
two data corruption bugs involving DAX, as well as a corruption bug
after a crash during a racing fallocate and delayed allocation.
Finally, a number of cleanups and optimizations.

----------------------------------------------------------------
Andreas Gruenbacher (3):
      iomap: Switch from blkno to disk offset
      iomap: Add IOMAP_F_DATA_INLINE flag
      ext4: Add iomap support for inline data

Christoph Hellwig (1):
      ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA

Kees Cook (2):
      jbd2: convert timers to use timer_setup()
      ext4: convert timers to use timer_setup()

Pavel Machek (1):
      Documentation: fix little inconsistencies

Ross Zwisler (5):
      ext4: prevent data corruption with inline data + DAX
      ext4: prevent data corruption with journaling + DAX
      ext4: add sanity check for encryption + DAX
      ext4: add ext4_should_use_dax()
      ext4: remove duplicate extended attributes defs

Simon Ruderich (1):
      ext4: mention noload when recovering on read-only device

Theodore Ts'o (3):
      ext4: retry allocations conservatively
      ext4: fix interaction between i_size, fallocate, and delalloc after a crash
      ext4: improve smp scalability for inode generation

harshads (1):
      ext4: add support for online resizing with bigalloc

 Documentation/filesystems/ext4.txt |   8 +--
 fs/buffer.c                        |   4 +-
 fs/dax.c                           |   2 +-
 fs/ext2/inode.c                    |   4 +-
 fs/ext4/Kconfig                    |   1 +
 fs/ext4/balloc.c                   |  15 ++--
 fs/ext4/ext4.h                     |  50 ++-----------
 fs/ext4/extents.c                  |   6 +-
 fs/ext4/file.c                     | 263 ++++-----------------------------------------------------------------
 fs/ext4/ialloc.c                   |   4 +-
 fs/ext4/inline.c                   |  43 +++++++++---
 fs/ext4/inode.c                    | 153 ++++++++++++++++++----------------------
 fs/ext4/ioctl.c                    |  30 ++++----
 fs/ext4/mballoc.c                  |  28 ++++----
 fs/ext4/resize.c                   | 104 +++++++++++++++++----------
 fs/ext4/super.c                    |  27 +++----
 fs/iomap.c                         |  13 ++--
 fs/jbd2/journal.c                  |   9 ++-
 fs/nfsd/blocklayout.c              |   4 +-
 fs/xfs/xfs_iomap.c                 |   6 +-
 include/linux/iomap.h              |  15 ++--
 21 files changed, 278 insertions(+), 511 deletions(-)

Comments

Theodore Ts'o Nov. 13, 2017, 4:25 p.m. | #1
I forgot to mention, there's a merge conflict when pulling the ext4
and fscrypt trees.  The fixup is relatively straightforward:

commit daf886f04e60eda3bbc957e79d81d72965afd947
Merge: a0b3bc855374 232530680290
Author: Theodore Ts'o <tytso@mit.edu>
Date:   Sun Nov 12 22:03:15 2017 -0500

    Merge tag 'ext4_for_linus' into test
    
    Add support for online resizing of file systems with bigalloc.  Fix a
    two data corruption bugs involving DAX, as well as a corruption bug
    after a crash during a racing fallocate and delayed allocation.
    Finally, a number of cleanups and optimizations.

diff --cc fs/ext4/inode.c
index 617c7feced24,9f836e2ec18c..737c43d724fb
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@@ -4587,15 -4640,10 +4640,13 @@@ void ext4_set_inode_flags(struct inode 
  		new_fl |= S_NOATIME;
  	if (flags & EXT4_DIRSYNC_FL)
  		new_fl |= S_DIRSYNC;
- 	if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
- 	    !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
- 	    !(flags & EXT4_ENCRYPT_FL))
+ 	if (ext4_should_use_dax(inode))
  		new_fl |= S_DAX;
 +	if (flags & EXT4_ENCRYPT_FL)
 +		new_fl |= S_ENCRYPTED;
  	inode_set_flags(inode, new_fl,
 -			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
 +			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
 +			S_ENCRYPTED);
  }
  
  static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
Linus Torvalds Nov. 14, 2017, 8:59 p.m. | #2
On Mon, Nov 13, 2017 at 8:25 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> I forgot to mention, there's a merge conflict when pulling the ext4
> and fscrypt trees.  The fixup is relatively straightforward:

It doesn't actually look all that straightforward, and in particular,
the resolution you sent me doesn't actually seem correct:

>                 new_fl |= S_NOATIME;
>         if (flags & EXT4_DIRSYNC_FL)
>                 new_fl |= S_DIRSYNC;
> -       if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
> -           !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
> -           !(flags & EXT4_ENCRYPT_FL))
> +       if (ext4_should_use_dax(inode))
>                 new_fl |= S_DAX;

This now loses the "!(flags & EXT4_ENCRYPT_FL)" test when it sets S_DAX.

Yes, in ext4_should_use_dax(), it has this code

        if (ext4_encrypted_inode(inode))
                return false;

but that test was what commit 2ee6a576be56 changed in favor of just
checking !(flags & EXT4_ENCRYPT_FL).

So that suggested merge resolkution actually undoes some of that
commit 2ee6a576be56.

Of course,

        (flags & EXT4_ENCRYPT_FL)

_should_ be the same as

        ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);

so It does seem to be harmless, but it's a bit dodgy.

I'll do that suggested resolution, but I have to say that the ext4 bit
testing is incredibly broken and non-obvious. Just as an example:

  fs/ext4/ext4.h:#define EXT4_ENCRYPT_FL                  0x00000800
/* encrypted file */
  fs/ext4/ext4.h: EXT4_INODE_ENCRYPT      = 11,   /* Encrypted file */

yeah, it's the same bit, but it sure as hell isn't obvious. Why the
two totally different ways to define that data?

            Linus
Theodore Ts'o Nov. 15, 2017, 12:56 a.m. | #3
On Tue, Nov 14, 2017 at 12:59:17PM -0800, Linus Torvalds wrote:
> Of course,
> 
>         (flags & EXT4_ENCRYPT_FL)
> 
> _should_ be the same as
> 
>         ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT);

And in the second is the preferred way to do things, actually.

> I'll do that suggested resolution, but I have to say that the ext4 bit
> testing is incredibly broken and non-obvious. Just as an example:
> 
>   fs/ext4/ext4.h:#define EXT4_ENCRYPT_FL                  0x00000800
> /* encrypted file */
>   fs/ext4/ext4.h: EXT4_INODE_ENCRYPT      = 11,   /* Encrypted file */
> 
> yeah, it's the same bit, but it sure as hell isn't obvious. Why the
> two totally different ways to define that data?

Yes, it's non-obvious and ugly.  Sorry about that.

We originally used EXT4_*_FL, and we needed to use the bit number
encoding so we could use test_bit().  We just never converted all the
way over.

We do have a way to make sure the two ways of defining a bit position
are in sync; see ext4_check_flag_values() and CHECK_FLAG_VALUE in
ext4.h.  It's a bit gross, and we probably should clean this up, at
least in the kernel.  The e2fsprogs user space libraries all use
EXT4_*_FL, and we can't change that without breaking applications
depending on userspace, but we can keep things consistent in the
kernel, and that probably means completely converting away from
EXT4_*_FL, if possible.

       	   	      	 	  - Ted
Linus Torvalds Nov. 15, 2017, 1:11 a.m. | #4
On Tue, Nov 14, 2017 at 4:56 PM, Theodore Ts'o <tytso@mit.edu> wrote:
>
>    The e2fsprogs user space libraries all use
> EXT4_*_FL, and we can't change that without breaking applications
> depending on userspace, but we can keep things consistent in the
> kernel, and that probably means completely converting away from
> EXT4_*_FL, if possible.

So I don't think you'd necessarily need to convert from one to the
other, but wouldn't it be nice if you at least defined one in terms of
the other, ie something like

  #define EXT4_ENCRYPT_FL (1u << EXT4_INODE_ENCRYPT)

so that when you grep for one you see how they are directly related.

Now it was much less obvious, and I was nervous because that whole
series did introduce _different_ bits that were not in the same space
at all, and encoded the same thing (ie that S_ENCRYPTED bit).

Maybe this normally doesn't come up, but it was not all that obvious,
particularly since there was a lot of indirection:

 ext4_encrypted_inode() ->
    ext4_test_inode_flag(inode, EXT4_INODE_ENCRYPT) ->
        EXT4_INODE_BIT_FNS()

That EXT4_INODE_BIT_FNS thing was really fascinating to see.

So just confirming that yes,

   ext4_encrypted_inode()

is the same thing as

   EXT4_I(inode)->i_flags & EXT4_ENCRYPT_FL

was a real adventure.

Making it clear that EXT4_ENCRYPT_FL and EXT4_INODE_ENCRYPT are the
same bit would maybe have lessened the confusion at least a tiny bit.

Of course, not having five different ways to test the same bit would
have been even better.

Ok, I'm exaggerating.

But there really does seem to be a lot of different ways to check
i_flags bits, with some uses checking it directly, the places
_setting_ it using ext4_set_inode_flag(), and then other testers using
bit-specific helper.

And that somewhat confusing model seems to be true of pretty much all the bits.

As long as you can keep track of it, I guess it's fine.

            Linus