diff mbox series

[v2,25/25] docs: ext4.txt: Document encoding and case-insensitive lookups

Message ID 20180815194811.9423-26-krisman@collabora.co.uk
State Not Applicable
Headers show
Series Ext4 Encoding and Case-insensitive support | expand

Commit Message

Gabriel Krisman Bertazi Aug. 15, 2018, 7:48 p.m. UTC
Introduces the encoding-awareness feature for ext4, explains some of the
design decisions and the mount options to enabled it.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
---
 Documentation/filesystems/ext4.txt | 37 ++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)
diff mbox series

Patch

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 7f628b9f7c4b..57ce78c18b26 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -99,6 +99,8 @@  Note: More extensive information for getting started with ext4 can be
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
   the ordering)
+* Encoding aware file names
+* Case insensitive file name lookups
 
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
@@ -122,6 +124,32 @@  grouping of bitmaps and inode tables.  Some test results available here:
  - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-write-2.6.27-rc1.html
  - http://www.bullopensource.org/ext4/20080818-ffsb/ffsb-readwrite-2.6.27-rc1.html
 
+2.3 Encoding-aware file names and case-insensitive lookups
+==========================================================
+
+Ext4 optionally supports filesystem-wide charset knowledge when handling
+file names, which allows the user to perform file system lookups using
+charset equivalent versions of the same file name, and optionally ensure
+that no invalid names are held by the filesystem.  charset encoding
+awareness is also essential for performing case-insensitive lookups,
+because it is what defines the casefold operation.
+
+The case-insensitive file name lookup feature is supported in a smaller
+granularity, on a per-directory basis, allowing the user to mix
+case-insensitive and case-sensitive directories in the same filesystem.
+It is enabled by flipping a file attribute on an empty directory.  For
+the reason stated above, the filesystem must have encoding enabled to
+use this feature.
+
+When we change from filenames as opaque byte sequences to seeing them as
+encoded strings we need to address what happens when a program tries to
+create a file with an invalid name.  The Natural Language System within
+the kernel leaves the decision of what to do to the filesystem, via
+configuring the NLS strict mode.  When Ext4 encounters one of those
+strings, it falls back to considering the entire string as one opaque
+byte sequence, which still allows the user to operate on that file but
+the case-insensitive and equivalent sequence lookups won't work.
+
 3. Options
 ==========
 
@@ -388,6 +416,15 @@  dax			Use direct access (no page cache).  See
 			Documentation/filesystems/dax.txt.  Note that
 			this option is incompatible with data=journal.
 
+encoding		Enable a specific encoding for file name lookups.
+			This cannot be used with per-directory encryption and
+			will fail on filesystems that have that flag enabled.
+
+encoding_flags		A bitmask to configure how the encoding aware mechanism
+			should function. It specifies whether to refuse invalid
+			sequences and the specific normalization and casefold
+			operations to use.
+
 Data Mode
 =========
 There are 3 different data modes: