[17/20] ext4: Include encoding information in the superblock

Message ID 20180703170700.9306-18-krisman@collabora.co.uk
State Changes Requested
Headers show
Series
  • EXT4 encoding support
Related show

Commit Message

Gabriel Krisman Bertazi July 3, 2018, 5:06 p.m.
Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesystems.
If the feature flag is not enabled, the entire filesystem will operate
on opaque byte sequences, respecting the original behavior.

The charset data is encoded in a new field in the superblock using a
magic number specific to ext4.  This is the easiest way I found to avoid
writing the name of the charset in the superblock.  The magic number is
mapped to the exact NLS table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encodings I am supporting right now is utf8n-10.0.0 and ascii, both
using the NLS abstraction.

A mount option that forces the use of an encoding is also provided.
This allows the user to override the superblock information and force
the mount using a specific encoding.  There is little point in doing
that, except for debugging.

The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time.  The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases.  My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 fs/ext4/ext4.h  |   7 ++-
 fs/ext4/super.c | 119 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 125 insertions(+), 1 deletion(-)

Comments

kbuild test robot July 11, 2018, 9:59 a.m. | #1
Hi Gabriel,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v4.18-rc4 next-20180710]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Gabriel-Krisman-Bertazi/EXT4-encoding-support/20180704-050453
config: mips-malta_defconfig (attached as .config)
compiler: mipsel-linux-gnu-gcc (Debian 7.2.0-11) 7.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=7.2.0 make.cross ARCH=mips 

All errors (new ones prefixed by >>):

   fs/ext4/super.o: In function `ext4_put_super':
>> super.c:(.text+0x3ff0): undefined reference to `unload_nls'
   fs/ext4/super.o: In function `parse_options':
>> super.c:(.text+0x4fb4): undefined reference to `load_nls_version'
   fs/ext4/super.o: In function `ext4_fill_super':
   super.c:(.text+0x75a8): undefined reference to `unload_nls'
   super.c:(.text+0x810c): undefined reference to `load_nls_version'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Theodore Y. Ts'o July 12, 2018, 2:02 a.m. | #2
As the kbuild bot has already reported, you've added an unconditional
dependency on the NLS subsystem, which is not always compiled in.
What we need to do is to add #ifdef's on CONFIG_NLS, and only support
file systems with encoding if the kernel is compiled with CONFIG_NLS.

One thought --- do we really need to use CONFIG_NLS if the encoding is
ASCII?  If some use case (and Android may be one) only is interested
in supporting case-folding for ASCII, and they don't want to have the
overhead of compiling in the NLS subsystem, for ASCII couldn't use
just use strcasecmp() for its comparison function?

The normalization for ASCII is the identify function, so it's kind of
pointless to support ASCII if we ony have case-folding support and not
normalization for now, right?

					- Ted
Gabriel Krisman Bertazi July 12, 2018, 2:13 p.m. | #3
"Theodore Y. Ts'o" <tytso@mit.edu> writes:

> As the kbuild bot has already reported, you've added an unconditional
> dependency on the NLS subsystem, which is not always compiled in.
> What we need to do is to add #ifdef's on CONFIG_NLS, and only support
> file systems with encoding if the kernel is compiled with CONFIG_NLS.

Thanks, I will address this in the next iteration.

> One thought --- do we really need to use CONFIG_NLS if the encoding is
> ASCII?  If some use case (and Android may be one) only is interested
> in supporting case-folding for ASCII, and they don't want to have the
> overhead of compiling in the NLS subsystem, for ASCII couldn't use
> just use strcasecmp() for its comparison function?

My concern here is that the casefold and normalization operations don't
make sense, semantically, when dealing with opaque byte sequences.  We
can assume that no-encoding means ASCII, but this is an arbitrary
decision, that only make sense for english speakers.  I think it is
safer/less confusing to only allow this kind of operation when an
explicit encoding format is in place.

Is the dependency on NLS a problem for Android?  I assumed it already
has to enable it anyway to support fat filesystems.  For use cases other
than Android, I'm trying to reduce the footprint of NLS, by making it
more modular (and we could even make it smaller by reducing the
nls_default.c code and doing the ASCII operations algorithmically
instead of using tables), such that building NLS core + ASCII should
have a really tiny impact on the kernel size and build time (which I
think it already does?)

> The normalization for ASCII is the identify function, so it's kind of
> pointless to support ASCII if we ony have case-folding support and not
> normalization for now, right?

Yes.  the ASCII normalization is boilerplate code, in a sense, since it
is just the identity function, but I'm trying to make the NLS interface
generic enough to minimize how much EXT4 needs to know about encoding.
Ideally, this knowledge should be left for the NLS system, in my
opinion.  Does it make sense?

Thanks,
Theodore Y. Ts'o July 12, 2018, 9:54 p.m. | #4
On Thu, Jul 12, 2018 at 10:13:14AM -0400, Gabriel Krisman Bertazi wrote:
> 
> My concern here is that the casefold and normalization operations don't
> make sense, semantically, when dealing with opaque byte sequences.  We
> can assume that no-encoding means ASCII, but this is an arbitrary
> decision, that only make sense for english speakers.  I think it is
> safer/less confusing to only allow this kind of operation when an
> explicit encoding format is in place.

The real question which we need to answer (and document, so everyone
understands what should happen) is what should we do if we come across
an invalid byte sequence for a particular encoding?  And there are two
versions of this question --- what should we do if a stored name in a
directory is an invalid byte sequence?  What should we do if the user
has passed an invalid byte sequence to a system call?  (And for the
latter, should it be different depending on whether it is a creation,
lookup, deletion, or rename operation?)

We don't have a way of specifying the encoding of a filename being
passed in the system call, so usually people will either assume that
it's some fixed encoding (the native encoding of the system, whatever
that means, which in practice was most commonly ASCII, ISO-Latin-1, or
UTF-8, with the last being the most common in more modern systems).

In your patches, it looks like you aren't actually doing any
processing (either enforcing that the byte sequence is valid, or
normalizing, which I understand is highly controversial and has caused
much hand-wringing in the Apple world recently since the defaults have
changed post-APFS) on filenames when they are passed to ext4 for
creation.  So there will quite possibly be invalid byte characters in
a directory entry.  So we need to be clear how they should be handled.

And even if we did prevent those file names from being created during
the normal course of events, we still need to understand what to do if
we come across one (if the file system was corrupted in some way,
either accidentally or deliberately).

> > The normalization for ASCII is the identify function, so it's kind of
> > pointless to support ASCII if we ony have case-folding support and not
> > normalization for now, right?
> 
> Yes.  the ASCII normalization is boilerplate code, in a sense, since it
> is just the identity function, but I'm trying to make the NLS interface
> generic enough to minimize how much EXT4 needs to know about encoding.
> Ideally, this knowledge should be left for the NLS system, in my
> opinion.  Does it make sense?

It does.  How big does the kernel get if we enable only NLS and ASCII?
If it's small, maybe we don't need to worry all that much.

   	       	     	      	      	    	- Ted
Gabriel Krisman Bertazi July 16, 2018, 8:57 p.m. | #5
"Theodore Y. Ts'o" <tytso@mit.edu> writes:

> On Thu, Jul 12, 2018 at 10:13:14AM -0400, Gabriel Krisman Bertazi wrote:
>> 
>> My concern here is that the casefold and normalization operations don't
>> make sense, semantically, when dealing with opaque byte sequences.  We
>> can assume that no-encoding means ASCII, but this is an arbitrary
>> decision, that only make sense for english speakers.  I think it is
>> safer/less confusing to only allow this kind of operation when an
>> explicit encoding format is in place.
>
> The real question which we need to answer (and document, so everyone
> understands what should happen) is what should we do if we come across
> an invalid byte sequence for a particular encoding?  And there are two
> versions of this question --- what should we do if a stored name in a
> directory is an invalid byte sequence?  What should we do if the user
> has passed an invalid byte sequence to a system call?  (And for the
> latter, should it be different depending on whether it is a creation,
> lookup, deletion, or rename operation?)

Hi Ted,

I gave this more thought over the weekend, and I'm convinced now that we
cannot simply reject invalid filenames, like I wanted to do. There are
many cases where the user won't control the file names. For instance, in
the example of APFS you gave, there was a case where it was failing to
extract data from an archive that had an utf8 invalid sequence file
name.  I'm sure there are much other cases out there of programs that
save files with arbitrary encodings, and we would be breaking them by
blindly rejecting these files.

For the similar reason of not breaking programs, we should be able to
return the exact match file if the user requested it.  This would
provide a way for the user to access files with 'broken' names to fix
them or just use them normally.  The only case where we need to special
handle exact-match files is when dealing with invalid sequences if we
can prevent the possibility of name collisions, for instance, if we
require encoding to only be set at superblock creation time, and only
allow setting the casefold flag if the directory is empty.  Assuming
this is correct, I think it is safe to define for ext4 that invalid
sequences are considered opaque byte sequences, and fallback to treat
them as such.

I don't see why treating invalid names as opaque sequences would be
problematic, for the sake of the argument would it be equivalent to say
that invalid code-points are valid and decompose and fold to themselves?

> We don't have a way of specifying the encoding of a filename being
> passed in the system call, so usually people will either assume that
> it's some fixed encoding (the native encoding of the system, whatever
> that means, which in practice was most commonly ASCII, ISO-Latin-1, or
> UTF-8, with the last being the most common in more modern systems).
>
> In your patches, it looks like you aren't actually doing any
> processing (either enforcing that the byte sequence is valid, or
> normalizing, which I understand is highly controversial and has caused
> much hand-wringing in the Apple world recently since the defaults have
> changed post-APFS) on filenames when they are passed to ext4 for
> creation.  So there will quite possibly be invalid byte characters in
> a directory entry.  So we need to be clear how they should be handled.

I didn't want to save normalization on-disk because I'm thinking of
casefold as parallel to normalization, which just adds the handling of
case.  If we store the normalization/casefold in the disk as the dentry
name, the file system is no longer normalization/casefold-preserving to
what the user requested, and a readdir() would be confusing in the
casefold case.  I'd like to eventually try to store it side-by-side with
the user created name in the future, but it would require changing the
directory entry layout.

In the current patchset, I'm rejecting invalid sequences (or at least I
thought so) as soon as the normalization fails.  The open syscall will
trigger an -EINVAL once it identifies a bad sequence, for instance.

>> > The normalization for ASCII is the identify function, so it's kind of
>> > pointless to support ASCII if we ony have case-folding support and not
>> > normalization for now, right?
>> 
>> Yes.  the ASCII normalization is boilerplate code, in a sense, since it
>> is just the identity function, but I'm trying to make the NLS interface
>> generic enough to minimize how much EXT4 needs to know about encoding.
>> Ideally, this knowledge should be left for the NLS system, in my
>> opinion.  Does it make sense?
>
> It does.  How big does the kernel get if we enable only NLS and ASCII?
> If it's small, maybe we don't need to worry all that much.

looks like bzImage increased by 4k.

$ diff -u without-nls/config with-nls/config
--- without-nls/config	2018-07-16 16:28:21.833479753 -0400
+++ with-nls/config	2018-07-16 16:31:16.466500014 -0400
@@ -1511,7 +1511,58 @@
 CONFIG_9P_FS=y
 CONFIG_9P_FS_POSIX_ACL=y
 CONFIG_9P_FS_SECURITY=y
+CONFIG_NLS=y
+CONFIG_NLS_DEFAULT="ascii"
+CONFIG_NLS_ASCII=y

[trimmed out the 'CONFIG_NLS_* is not set' lines]

$ ls -l without-nls/bzImage with-nls/bzImage
-rw-r--r-- 1 krisman krisman 3477552 Jul 16 16:31 with-nls/bzImage
-rw-r--r-- 1 krisman krisman 3473456 Jul 16 16:28 without-nls/bzImage

Thanks!
Gabriel Krisman Bertazi July 16, 2018, 9:30 p.m. | #6
Gabriel Krisman Bertazi <krisman@collabora.co.uk> writes:


> In the current patchset, I'm rejecting invalid sequences (or at least I
> thought so) as soon as the normalization fails.  The open syscall will
> trigger an -EINVAL once it identifies a bad sequence, for instance.

Hmm, this is true for utf8n but not ASCII.  My bad. It is an error in
the patchset.

Patch

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0b127853c584..fb0b70d6eb68 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1298,7 +1298,8 @@  struct ext4_super_block {
 	__le32	s_lpf_ino;		/* Location of the lost+found inode */
 	__le32	s_prj_quota_inum;	/* inode for tracking project quota */
 	__le32	s_checksum_seed;	/* crc32c(uuid) if csum_seed set */
-	__le32	s_reserved[98];		/* Padding to the end of the block */
+	__le32  s_ioencoding;		/* charset encoding */
+	__le32	s_reserved[97];		/* Padding to the end of the block */
 	__le32	s_checksum;		/* crc32c(superblock) */
 };
 
@@ -1372,6 +1373,7 @@  struct ext4_sb_info {
 	struct kobject s_kobj;
 	struct completion s_kobj_unregister;
 	struct super_block *s_sb;
+	struct nls_table *encoding;
 
 	/* Journaling */
 	struct journal_s *s_journal;
@@ -1652,6 +1654,7 @@  static inline void ext4_clear_state_flags(struct ext4_inode_info *ei)
 #define EXT4_FEATURE_INCOMPAT_LARGEDIR		0x4000 /* >2GB or 3-lvl htree */
 #define EXT4_FEATURE_INCOMPAT_INLINE_DATA	0x8000 /* data in inode */
 #define EXT4_FEATURE_INCOMPAT_ENCRYPT		0x10000
+#define EXT4_FEATURE_INCOMPAT_IOENCODING	0x20000
 
 #define EXT4_FEATURE_COMPAT_FUNCS(name, flagname) \
 static inline bool ext4_has_feature_##name(struct super_block *sb) \
@@ -1740,6 +1743,7 @@  EXT4_FEATURE_INCOMPAT_FUNCS(csum_seed,		CSUM_SEED)
 EXT4_FEATURE_INCOMPAT_FUNCS(largedir,		LARGEDIR)
 EXT4_FEATURE_INCOMPAT_FUNCS(inline_data,	INLINE_DATA)
 EXT4_FEATURE_INCOMPAT_FUNCS(encrypt,		ENCRYPT)
+EXT4_FEATURE_INCOMPAT_FUNCS(ioencoding,		IOENCODING)
 
 #define EXT2_FEATURE_COMPAT_SUPP	EXT4_FEATURE_COMPAT_EXT_ATTR
 #define EXT2_FEATURE_INCOMPAT_SUPP	(EXT4_FEATURE_INCOMPAT_FILETYPE| \
@@ -1767,6 +1771,7 @@  EXT4_FEATURE_INCOMPAT_FUNCS(encrypt,		ENCRYPT)
 					 EXT4_FEATURE_INCOMPAT_MMP | \
 					 EXT4_FEATURE_INCOMPAT_INLINE_DATA | \
 					 EXT4_FEATURE_INCOMPAT_ENCRYPT | \
+					 EXT4_FEATURE_INCOMPAT_IOENCODING | \
 					 EXT4_FEATURE_INCOMPAT_CSUM_SEED | \
 					 EXT4_FEATURE_INCOMPAT_LARGEDIR)
 #define EXT4_FEATURE_RO_COMPAT_SUPP	(EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 0c4c2201b3aa..53db9b6c7e33 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -42,6 +42,7 @@ 
 #include <linux/cleancache.h>
 #include <linux/uaccess.h>
 #include <linux/iversion.h>
+#include <linux/nls.h>
 
 #include <linux/kthread.h>
 #include <linux/freezer.h>
@@ -985,6 +986,7 @@  static void ext4_put_super(struct super_block *sb)
 		crypto_free_shash(sbi->s_chksum_driver);
 	kfree(sbi->s_blockgroup_lock);
 	fs_put_dax(sbi->s_daxdev);
+	unload_nls(sbi->encoding);
 	kfree(sbi);
 }
 
@@ -1378,6 +1380,7 @@  enum {
 	Opt_dioread_nolock, Opt_dioread_lock,
 	Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable,
 	Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache,
+	Opt_encoding,
 };
 
 static const match_table_t tokens = {
@@ -1460,6 +1463,7 @@  static const match_table_t tokens = {
 	{Opt_noinit_itable, "noinit_itable"},
 	{Opt_max_dir_size_kb, "max_dir_size_kb=%u"},
 	{Opt_test_dummy_encryption, "test_dummy_encryption"},
+	{Opt_encoding, "encoding=%s"},
 	{Opt_nombcache, "nombcache"},
 	{Opt_nombcache, "no_mbcache"},	/* for backward compatibility */
 	{Opt_removed, "check=none"},	/* mount option from ext2/3 */
@@ -1670,9 +1674,58 @@  static const struct mount_opts {
 	{Opt_max_dir_size_kb, 0, MOPT_GTE0},
 	{Opt_test_dummy_encryption, 0, MOPT_GTE0},
 	{Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET},
+	{Opt_encoding, 0, MOPT_EXT4_ONLY | MOPT_STRING},
 	{Opt_err, 0, 0}
 };
 
+static const struct ext4_sb_encodings {
+	char *name;
+	char *version;
+} ext4_sb_encoding_map[] = {
+	/* 0x0 */	{"ascii", NULL},
+	/* 0x1 */	{"utf8n", "10.0.0"},
+};
+
+static const struct ext4_sb_encodings *
+ext4_sb_read_encoding(struct ext4_super_block *es)
+{
+	unsigned int magic = le32_to_cpu(es->s_ioencoding);
+
+	if (magic >= ARRAY_SIZE(ext4_sb_encoding_map))
+		return NULL;
+
+	return &ext4_sb_encoding_map[magic];
+}
+
+static const struct ext4_sb_encodings *ext4_parse_encoding_opt(const char *arg)
+{
+	int i, nlen;
+	const struct ext4_sb_encodings *e = NULL;
+	const char version_separator = '-';
+
+	for (i = 0; i < ARRAY_SIZE(ext4_sb_encoding_map); i++) {
+		e = &ext4_sb_encoding_map[i];
+		nlen = strlen(e->name);
+
+		if (strncmp(arg, e->name, nlen))
+			continue;
+
+		/* Encoding doesn't require version */
+		if (!e->version && !arg[nlen])
+			return e;
+
+		if (arg[nlen] != version_separator)
+			continue;
+
+		/* Eat out the separator */
+		nlen += 1;
+
+		if (!strcmp(&arg[nlen], e->version))
+			return e;
+	}
+	return NULL;
+}
+
 static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 			    substring_t *args, unsigned long *journal_devnum,
 			    unsigned int *journal_ioprio, int is_remount)
@@ -1905,6 +1958,40 @@  static int handle_mount_opt(struct super_block *sb, char *opt, int token,
 		sbi->s_mount_opt |= m->mount_opt;
 	} else if (token == Opt_data_err_ignore) {
 		sbi->s_mount_opt &= ~m->mount_opt;
+	} else if (token == Opt_encoding) {
+		const struct ext4_sb_encodings *encoding_info;
+		char *encoding = match_strdup(&args[0]);
+
+		if (!encoding)
+			return -ENOMEM;
+
+		if (ext4_has_feature_encrypt(sb)) {
+			ext4_msg(sb, KERN_ERR,
+				 "Can't mount with both encoding and encryption");
+			goto encoding_fail;
+		}
+
+		encoding_info = ext4_parse_encoding_opt(encoding);
+		if (!encoding_info) {
+			ext4_msg(sb, KERN_ERR,
+				 "Encoding %s not supported by ext4", encoding);
+			goto encoding_fail;
+		}
+
+		sbi->encoding = load_nls_version(encoding_info->name,
+						 encoding_info->version);
+		if (IS_ERR(sbi->encoding)) {
+			ext4_msg(sb, KERN_ERR, "Cannot load encoding: %s",
+				 encoding);
+			goto encoding_fail;
+		}
+
+		kfree(encoding);
+		return 0;
+encoding_fail:
+		sbi->encoding = NULL;
+		kfree(encoding);
+		return -1;
 	} else {
 		if (!args->from)
 			arg = 1;
@@ -3453,6 +3540,8 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 	int err = 0;
 	unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
 	ext4_group_t first_not_zeroed;
+	struct nls_table *encoding;
+	const struct ext4_sb_encodings *encoding_info;
 
 	if ((data && !orig_data) || !sbi)
 		goto out_free_base;
@@ -3625,6 +3714,35 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 			   &journal_ioprio, 0))
 		goto failed_mount;
 
+	if (ext4_has_feature_ioencoding(sb) && !sbi->encoding) {
+		if (ext4_has_feature_encrypt(sb)) {
+			ext4_msg(sb, KERN_ERR,
+				 "Can't mount with both encoding and encryption");
+			goto failed_mount;
+		}
+
+		encoding_info = ext4_sb_read_encoding(es);
+		if (!encoding_info) {
+			ext4_msg(sb, KERN_ERR,
+				 "Encoding requested by superblock is unknown");
+			goto failed_mount;
+		}
+
+		encoding = load_nls_version(encoding_info->name,
+					    encoding_info->version);
+		if (IS_ERR(encoding)) {
+			ext4_msg(sb, KERN_ERR, "can't mount with superblock charset:"
+				 "%s-%s not supported by the kernel",
+				 encoding_info->name, encoding_info->version);
+			goto failed_mount;
+		}
+		ext4_msg(sb, KERN_INFO,
+			 "Using encoding defined by superblock: %s %s",
+			 encoding_info->name, encoding_info->version);
+
+		sbi->encoding = encoding;
+	}
+
 	if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) {
 		printk_once(KERN_WARNING "EXT4-fs: Warning: mounting "
 			    "with data=journal disables delayed "
@@ -4442,6 +4560,7 @@  static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		brelse(sbi->s_group_desc[i]);
 	kvfree(sbi->s_group_desc);
 failed_mount:
+	unload_nls(sbi->encoding);
 	if (sbi->s_chksum_driver)
 		crypto_free_shash(sbi->s_chksum_driver);
 #ifdef CONFIG_QUOTA