From patchwork Tue Mar 23 18:31:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 1457421 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4F4g3y3pSJz9sTD for ; Wed, 24 Mar 2021 05:33:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232018AbhCWSdD (ORCPT ); Tue, 23 Mar 2021 14:33:03 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:53220 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231380AbhCWScm (ORCPT ); Tue, 23 Mar 2021 14:32:42 -0400 Received: from localhost.localdomain (unknown [IPv6:2401:4900:5170:240f:f606:c194:2a1c:c147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shreeya) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 82FE31F44A66; Tue, 23 Mar 2021 18:32:32 +0000 (GMT) From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com, kernel test robot Subject: [PATCH v3 1/5] fs: unicode: Use strscpy() instead of strncpy() Date: Wed, 24 Mar 2021 00:01:57 +0530 Message-Id: <20210323183201.812944-2-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210323183201.812944-1-shreeya.patel@collabora.com> References: <20210323183201.812944-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Following warning was reported by Kernel Test Robot. In function 'utf8_parse_version', inlined from 'utf8_load' at fs/unicode/utf8mod.c:195:7: >> fs/unicode/utf8mod.c:175:2: warning: 'strncpy' specified bound 12 equals destination size [-Wstringop-truncation] 175 | strncpy(version_string, version, sizeof(version_string)); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The -Wstringop-truncation warning highlights the unintended uses of the strncpy function that truncate the terminating NULL character from the source string. Unlike strncpy(), strscpy() always null-terminates the destination string, hence use strscpy() instead of strncpy(). Fixes: 9d53690f0d4e5 (unicode: implement higher level API for string handling) Signed-off-by: Shreeya Patel Reported-by: kernel test robot Acked-by: Gabriel Krisman Bertazi --- Changes in v3 - Return error if strscpy() returns value < 0 Changes in v2 - Resolve warning of -Wstringop-truncation reported by kernel test robot. fs/unicode/utf8-core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index dc25823bf..706f086bb 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -180,7 +180,10 @@ static int utf8_parse_version(const char *version, unsigned int *maj, {0, NULL} }; - strncpy(version_string, version, sizeof(version_string)); + int ret = strscpy(version_string, version, sizeof(version_string)); + + if (ret < 0) + return ret; if (match_token(version_string, token, args) != 1) return -EINVAL; From patchwork Tue Mar 23 18:31:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 1457422 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4F4g3y5frkz9sWF for ; Wed, 24 Mar 2021 05:33:34 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232088AbhCWSdE (ORCPT ); Tue, 23 Mar 2021 14:33:04 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:53240 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231833AbhCWScz (ORCPT ); Tue, 23 Mar 2021 14:32:55 -0400 Received: from localhost.localdomain (unknown [IPv6:2401:4900:5170:240f:f606:c194:2a1c:c147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shreeya) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 368961F44A70; Tue, 23 Mar 2021 18:32:41 +0000 (GMT) From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v3 2/5] fs: Check if utf8 encoding is loaded before calling utf8_unload() Date: Wed, 24 Mar 2021 00:01:58 +0530 Message-Id: <20210323183201.812944-3-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210323183201.812944-1-shreeya.patel@collabora.com> References: <20210323183201.812944-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org utf8_unload is being called if CONFIG_UNICODE is enabled. The ifdef block doesn't check if utf8 encoding has been loaded or not before calling the utf8_unload() function. This is not the expected behavior since it would sometimes lead to unloading utf8 even before loading it. Hence, add a condition which will check if sb->encoding is NOT NULL before calling the utf8_unload(). Signed-off-by: Shreeya Patel Reviewed-by: Gabriel Krisman Bertazi --- Changes in v3 - Add this patch to the series which checks if utf8 encoding was loaded before calling uft8_unload(). fs/ext4/super.c | 6 ++++-- fs/f2fs/super.c | 9 ++++++--- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index ad34a3727..e438d14f9 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1259,7 +1259,8 @@ static void ext4_put_super(struct super_block *sb) fs_put_dax(sbi->s_daxdev); fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + if (sb->s_encoding) + utf8_unload(sb->s_encoding); #endif kfree(sbi); } @@ -5165,7 +5166,8 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) crypto_free_shash(sbi->s_chksum_driver); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + if (sb->s_encoding) + utf8_unload(sb->s_encoding); #endif #ifdef CONFIG_QUOTA diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 706979375..0a04983c2 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1430,7 +1430,8 @@ static void f2fs_put_super(struct super_block *sb) for (i = 0; i < NR_PAGE_TYPE; i++) kvfree(sbi->write_io[i]); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); + if (sb->s_encoding) + utf8_unload(sb->s_encoding); #endif kfree(sbi); } @@ -4073,8 +4074,10 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) kvfree(sbi->write_io[i]); #ifdef CONFIG_UNICODE - utf8_unload(sb->s_encoding); - sb->s_encoding = NULL; + if (sb->s_encoding) { + utf8_unload(sb->s_encoding); + sb->s_encoding = NULL; + } #endif free_options: #ifdef CONFIG_QUOTA From patchwork Tue Mar 23 18:31:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 1457423 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4F4g4Y31Dkz9sR4 for ; Wed, 24 Mar 2021 05:34:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232288AbhCWSde (ORCPT ); Tue, 23 Mar 2021 14:33:34 -0400 Received: from bhuna.collabora.co.uk ([46.235.227.227]:53242 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231814AbhCWSdE (ORCPT ); Tue, 23 Mar 2021 14:33:04 -0400 Received: from localhost.localdomain (unknown [IPv6:2401:4900:5170:240f:f606:c194:2a1c:c147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shreeya) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id DA5631F44A67; Tue, 23 Mar 2021 18:32:53 +0000 (GMT) From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v3 3/5] fs: unicode: Rename function names from utf8 to unicode Date: Wed, 24 Mar 2021 00:01:59 +0530 Message-Id: <20210323183201.812944-4-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210323183201.812944-1-shreeya.patel@collabora.com> References: <20210323183201.812944-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Rename the function names from utf8 to unicode for taking the first step towards the transformation of utf8-core file into the unicode subsystem layer file. Signed-off-by: Shreeya Patel Reviewed-by: Gabriel Krisman Bertazi --- fs/ext4/hash.c | 2 +- fs/ext4/namei.c | 12 ++++---- fs/ext4/super.c | 6 ++-- fs/f2fs/dir.c | 12 ++++---- fs/f2fs/super.c | 6 ++-- fs/libfs.c | 6 ++-- fs/unicode/utf8-core.c | 57 +++++++++++++++++++------------------- fs/unicode/utf8-selftest.c | 8 +++--- include/linux/unicode.h | 32 ++++++++++----------- 9 files changed, 70 insertions(+), 71 deletions(-) diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c index a92eb79de..8890a76ab 100644 --- a/fs/ext4/hash.c +++ b/fs/ext4/hash.c @@ -285,7 +285,7 @@ int ext4fs_dirhash(const struct inode *dir, const char *name, int len, if (!buff) return -ENOMEM; - dlen = utf8_casefold(um, &qstr, buff, PATH_MAX); + dlen = unicode_casefold(um, &qstr, buff, PATH_MAX); if (dlen < 0) { kfree(buff); goto opaque_seq; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 686bf982c..dde5ce795 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1290,9 +1290,9 @@ int ext4_ci_compare(const struct inode *parent, const struct qstr *name, int ret; if (quick) - ret = utf8_strncasecmp_folded(um, name, entry); + ret = unicode_strncasecmp_folded(um, name, entry); else - ret = utf8_strncasecmp(um, name, entry); + ret = unicode_strncasecmp(um, name, entry); if (ret < 0) { /* Handle invalid character sequence as either an error @@ -1324,9 +1324,9 @@ void ext4_fname_setup_ci_filename(struct inode *dir, const struct qstr *iname, if (!cf_name->name) return; - len = utf8_casefold(dir->i_sb->s_encoding, - iname, cf_name->name, - EXT4_NAME_LEN); + len = unicode_casefold(dir->i_sb->s_encoding, + iname, cf_name->name, + EXT4_NAME_LEN); if (len <= 0) { kfree(cf_name->name); cf_name->name = NULL; @@ -2201,7 +2201,7 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, #ifdef CONFIG_UNICODE if (sb_has_strict_encoding(sb) && IS_CASEFOLDED(dir) && - sb->s_encoding && utf8_validate(sb->s_encoding, &dentry->d_name)) + sb->s_encoding && unicode_validate(sb->s_encoding, &dentry->d_name)) return -EINVAL; #endif diff --git a/fs/ext4/super.c b/fs/ext4/super.c index e438d14f9..853aeb294 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1260,7 +1260,7 @@ static void ext4_put_super(struct super_block *sb) fscrypt_free_dummy_policy(&sbi->s_dummy_enc_policy); #ifdef CONFIG_UNICODE if (sb->s_encoding) - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif kfree(sbi); } @@ -4305,7 +4305,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount; } - encoding = utf8_load(encoding_info->version); + encoding = unicode_load(encoding_info->version); if (IS_ERR(encoding)) { ext4_msg(sb, KERN_ERR, "can't mount with superblock charset: %s-%s " @@ -5167,7 +5167,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_UNICODE if (sb->s_encoding) - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif #ifdef CONFIG_QUOTA diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index e6270a867..f160f9dd6 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -84,10 +84,10 @@ int f2fs_init_casefolded_name(const struct inode *dir, GFP_NOFS); if (!fname->cf_name.name) return -ENOMEM; - fname->cf_name.len = utf8_casefold(sb->s_encoding, - fname->usr_fname, - fname->cf_name.name, - F2FS_NAME_LEN); + fname->cf_name.len = unicode_casefold(sb->s_encoding, + fname->usr_fname, + fname->cf_name.name, + F2FS_NAME_LEN); if ((int)fname->cf_name.len <= 0) { kfree(fname->cf_name.name); fname->cf_name.name = NULL; @@ -237,7 +237,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name, entry.len = decrypted_name.len; } - res = utf8_strncasecmp_folded(um, name, &entry); + res = unicode_strncasecmp_folded(um, name, &entry); /* * In strict mode, ignore invalid names. In non-strict mode, * fall back to treating them as opaque byte sequences. @@ -246,7 +246,7 @@ static int f2fs_match_ci_name(const struct inode *dir, const struct qstr *name, res = name->len == entry.len && memcmp(name->name, entry.name, name->len) == 0; } else { - /* utf8_strncasecmp_folded returns 0 on match */ + /* unicode_strncasecmp_folded returns 0 on match */ res = (res == 0); } out: diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c index 0a04983c2..a0cd9bfa4 100644 --- a/fs/f2fs/super.c +++ b/fs/f2fs/super.c @@ -1431,7 +1431,7 @@ static void f2fs_put_super(struct super_block *sb) kvfree(sbi->write_io[i]); #ifdef CONFIG_UNICODE if (sb->s_encoding) - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); #endif kfree(sbi); } @@ -3561,7 +3561,7 @@ static int f2fs_setup_casefold(struct f2fs_sb_info *sbi) return -EINVAL; } - encoding = utf8_load(encoding_info->version); + encoding = unicode_load(encoding_info->version); if (IS_ERR(encoding)) { f2fs_err(sbi, "can't mount with superblock charset: %s-%s " @@ -4075,7 +4075,7 @@ static int f2fs_fill_super(struct super_block *sb, void *data, int silent) #ifdef CONFIG_UNICODE if (sb->s_encoding) { - utf8_unload(sb->s_encoding); + unicode_unload(sb->s_encoding); sb->s_encoding = NULL; } #endif diff --git a/fs/libfs.c b/fs/libfs.c index e2de5401a..766556165 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -1404,7 +1404,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, * If the dentry name is stored in-line, then it may be concurrently * modified by a rename. If this happens, the VFS will eventually retry * the lookup, so it doesn't matter what ->d_compare() returns. - * However, it's unsafe to call utf8_strncasecmp() with an unstable + * However, it's unsafe to call unicode_strncasecmp() with an unstable * string. Therefore, we have to copy the name into a temporary buffer. */ if (len <= DNAME_INLINE_LEN - 1) { @@ -1414,7 +1414,7 @@ static int generic_ci_d_compare(const struct dentry *dentry, unsigned int len, /* prevent compiler from optimizing out the temporary buffer */ barrier(); } - ret = utf8_strncasecmp(um, name, &qstr); + ret = unicode_strncasecmp(um, name, &qstr); if (ret >= 0) return ret; @@ -1443,7 +1443,7 @@ static int generic_ci_d_hash(const struct dentry *dentry, struct qstr *str) if (!dir || !needs_casefold(dir)) return 0; - ret = utf8_casefold_hash(um, dentry, str); + ret = unicode_casefold_hash(um, dentry, str); if (ret < 0 && sb_has_strict_encoding(sb)) return -EINVAL; return 0; diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 706f086bb..686e95e90 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -10,7 +10,7 @@ #include "utf8n.h" -int utf8_validate(const struct unicode_map *um, const struct qstr *str) +int unicode_validate(const struct unicode_map *um, const struct qstr *str) { const struct utf8data *data = utf8nfdi(um->version); @@ -18,10 +18,10 @@ int utf8_validate(const struct unicode_map *um, const struct qstr *str) return -1; return 0; } -EXPORT_SYMBOL(utf8_validate); +EXPORT_SYMBOL(unicode_validate); -int utf8_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) { const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur1, cur2; @@ -45,10 +45,10 @@ int utf8_strncmp(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncmp); +EXPORT_SYMBOL(unicode_strncmp); -int utf8_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) +int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1, cur2; @@ -72,14 +72,14 @@ int utf8_strncasecmp(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncasecmp); +EXPORT_SYMBOL(unicode_strncasecmp); /* String cf is expected to be a valid UTF-8 casefolded * string. */ -int utf8_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1) +int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur1; @@ -100,10 +100,10 @@ int utf8_strncasecmp_folded(const struct unicode_map *um, return 0; } -EXPORT_SYMBOL(utf8_strncasecmp_folded); +EXPORT_SYMBOL(unicode_strncasecmp_folded); -int utf8_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; @@ -123,10 +123,10 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } -EXPORT_SYMBOL(utf8_casefold); +EXPORT_SYMBOL(unicode_casefold); -int utf8_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str) +int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) { const struct utf8data *data = utf8nfdicf(um->version); struct utf8cursor cur; @@ -144,10 +144,10 @@ int utf8_casefold_hash(const struct unicode_map *um, const void *salt, str->hash = end_name_hash(hash); return 0; } -EXPORT_SYMBOL(utf8_casefold_hash); +EXPORT_SYMBOL(unicode_casefold_hash); -int utf8_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) +int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) { const struct utf8data *data = utf8nfdi(um->version); struct utf8cursor cur; @@ -167,11 +167,10 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } +EXPORT_SYMBOL(unicode_normalize); -EXPORT_SYMBOL(utf8_normalize); - -static int utf8_parse_version(const char *version, unsigned int *maj, - unsigned int *min, unsigned int *rev) +static int unicode_parse_version(const char *version, unsigned int *maj, + unsigned int *min, unsigned int *rev) { substring_t args[3]; char version_string[12]; @@ -195,7 +194,7 @@ static int utf8_parse_version(const char *version, unsigned int *maj, return 0; } -struct unicode_map *utf8_load(const char *version) +struct unicode_map *unicode_load(const char *version) { struct unicode_map *um = NULL; int unicode_version; @@ -203,7 +202,7 @@ struct unicode_map *utf8_load(const char *version) if (version) { unsigned int maj, min, rev; - if (utf8_parse_version(version, &maj, &min, &rev) < 0) + if (unicode_parse_version(version, &maj, &min, &rev) < 0) return ERR_PTR(-EINVAL); if (!utf8version_is_supported(maj, min, rev)) @@ -228,12 +227,12 @@ struct unicode_map *utf8_load(const char *version) return um; } -EXPORT_SYMBOL(utf8_load); +EXPORT_SYMBOL(unicode_load); -void utf8_unload(struct unicode_map *um) +void unicode_unload(struct unicode_map *um) { kfree(um); } -EXPORT_SYMBOL(utf8_unload); +EXPORT_SYMBOL(unicode_unload); MODULE_LICENSE("GPL v2"); diff --git a/fs/unicode/utf8-selftest.c b/fs/unicode/utf8-selftest.c index 6fe8af7ed..796c1ed92 100644 --- a/fs/unicode/utf8-selftest.c +++ b/fs/unicode/utf8-selftest.c @@ -235,7 +235,7 @@ static void check_utf8_nfdicf(void) static void check_utf8_comparisons(void) { int i; - struct unicode_map *table = utf8_load("12.1.0"); + struct unicode_map *table = unicode_load("12.1.0"); if (IS_ERR(table)) { pr_err("%s: Unable to load utf8 %d.%d.%d. Skipping.\n", @@ -249,7 +249,7 @@ static void check_utf8_comparisons(void) const struct qstr s2 = {.name = nfdi_test_data[i].dec, .len = sizeof(nfdi_test_data[i].dec)}; - test_f(!utf8_strncmp(table, &s1, &s2), + test_f(!unicode_strncmp(table, &s1, &s2), "%s %s comparison mismatch\n", s1.name, s2.name); } @@ -259,11 +259,11 @@ static void check_utf8_comparisons(void) const struct qstr s2 = {.name = nfdicf_test_data[i].ncf, .len = sizeof(nfdicf_test_data[i].ncf)}; - test_f(!utf8_strncasecmp(table, &s1, &s2), + test_f(!unicode_strncasecmp(table, &s1, &s2), "%s %s comparison mismatch\n", s1.name, s2.name); } - utf8_unload(table); + unicode_unload(table); } static void check_supported_versions(void) diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 74484d44c..de23f9ee7 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -10,27 +10,27 @@ struct unicode_map { int version; }; -int utf8_validate(const struct unicode_map *um, const struct qstr *str); +int unicode_validate(const struct unicode_map *um, const struct qstr *str); -int utf8_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); +int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2); -int utf8_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); -int utf8_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1); +int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2); +int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1); -int utf8_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); -int utf8_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); -int utf8_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str); +int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str); -struct unicode_map *utf8_load(const char *version); -void utf8_unload(struct unicode_map *um); +struct unicode_map *unicode_load(const char *version); +void unicode_unload(struct unicode_map *um); #endif /* _LINUX_UNICODE_H */ From patchwork Tue Mar 23 18:32:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 1457424 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4F4g4Y5KNlz9sTD for ; Wed, 24 Mar 2021 05:34:05 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232323AbhCWSdg (ORCPT ); Tue, 23 Mar 2021 14:33:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45662 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232208AbhCWSdO (ORCPT ); Tue, 23 Mar 2021 14:33:14 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3FD1C061574; Tue, 23 Mar 2021 11:33:14 -0700 (PDT) Received: from localhost.localdomain (unknown [IPv6:2401:4900:5170:240f:f606:c194:2a1c:c147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shreeya) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 484801F44A65; Tue, 23 Mar 2021 18:33:04 +0000 (GMT) From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v3 4/5] fs: unicode: Rename utf8-core file to unicode-core Date: Wed, 24 Mar 2021 00:02:00 +0530 Message-Id: <20210323183201.812944-5-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210323183201.812944-1-shreeya.patel@collabora.com> References: <20210323183201.812944-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Rename the file name from utf8-core to unicode-core for transformation of utf8-core file into the unicode subsystem layer file and also for better understanding. Signed-off-by: Shreeya Patel Acked-by: Gabriel Krisman Bertazi --- fs/unicode/Makefile | 2 +- fs/unicode/{utf8-core.c => unicode-core.c} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename fs/unicode/{utf8-core.c => unicode-core.c} (100%) diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile index b88aecc86..fbf9a629e 100644 --- a/fs/unicode/Makefile +++ b/fs/unicode/Makefile @@ -3,7 +3,7 @@ obj-$(CONFIG_UNICODE) += unicode.o obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o -unicode-y := utf8-norm.o utf8-core.o +unicode-y := utf8-norm.o unicode-core.o $(obj)/utf8-norm.o: $(obj)/utf8data.h diff --git a/fs/unicode/utf8-core.c b/fs/unicode/unicode-core.c similarity index 100% rename from fs/unicode/utf8-core.c rename to fs/unicode/unicode-core.c From patchwork Tue Mar 23 18:32:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shreeya Patel X-Patchwork-Id: 1457425 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=vger.kernel.org (client-ip=23.128.96.18; helo=vger.kernel.org; envelope-from=linux-ext4-owner@vger.kernel.org; receiver=) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by ozlabs.org (Postfix) with ESMTP id 4F4g4Z0WBjz9sR4 for ; Wed, 24 Mar 2021 05:34:06 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232348AbhCWSdi (ORCPT ); Tue, 23 Mar 2021 14:33:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45728 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232276AbhCWSdd (ORCPT ); Tue, 23 Mar 2021 14:33:33 -0400 Received: from bhuna.collabora.co.uk (bhuna.collabora.co.uk [IPv6:2a00:1098:0:82:1000:25:2eeb:e3e3]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 40CE9C061574; Tue, 23 Mar 2021 11:33:32 -0700 (PDT) Received: from localhost.localdomain (unknown [IPv6:2401:4900:5170:240f:f606:c194:2a1c:c147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: shreeya) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 170CF1F44A6A; Tue, 23 Mar 2021 18:33:13 +0000 (GMT) From: Shreeya Patel To: tytso@mit.edu, adilger.kernel@dilger.ca, jaegeuk@kernel.org, chao@kernel.org, krisman@collabora.com, ebiggers@google.com, drosen@google.com, ebiggers@kernel.org, yuchao0@huawei.com Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-fsdevel@vger.kernel.org, kernel@collabora.com, andre.almeida@collabora.com Subject: [PATCH v3 5/5] fs: unicode: Add utf8 module and a unicode layer Date: Wed, 24 Mar 2021 00:02:01 +0530 Message-Id: <20210323183201.812944-6-shreeya.patel@collabora.com> X-Mailer: git-send-email 2.30.1 In-Reply-To: <20210323183201.812944-1-shreeya.patel@collabora.com> References: <20210323183201.812944-1-shreeya.patel@collabora.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org utf8data.h_shipped has a large database table which is an auto-generated decodification trie for the unicode normalization functions. It is not necessary to load this large table in the kernel if no file system is using it, hence make UTF-8 encoding loadable by converting it into a module. Modify the file called unicode-core which will act as a layer for unicode subsystem. It will load the UTF-8 module and access it's functions whenever any filesystem that needs unicode is mounted. Also, indirect calls using function pointers are easily exploitable by speculative execution attacks, hence use static_call() in unicode.h and unicode-core.c files inorder to prevent these attacks by making direct calls and also to improve the performance of function pointers. Signed-off-by: Shreeya Patel --- Changes in v3 - Correct the conditions to prevent NULL pointer dereference while accessing functions via utf8_ops variable. - Add spinlock to avoid race conditions that could occur if the module is deregistered after checking utf8_ops and before doing the try_module_get() in the following if condition if (!utf8_ops || !try_module_get(utf8_ops->owner) - Use static_call() for preventing speculative execution attacks. - WARN_ON in case utf8_ops is NULL in unicode_unload(). - Rename module file from utf8mod to unicode-utf8. Changes in v2 - Remove the duplicate file utf8-core.c - Make the wrapper functions inline. - Remove msleep and use try_module_get() and module_put() for ensuring that module is loaded correctly and also doesn't get unloaded while in use. fs/unicode/Kconfig | 11 +- fs/unicode/Makefile | 5 +- fs/unicode/unicode-core.c | 268 +++++++++++++------------------------- fs/unicode/unicode-utf8.c | 255 ++++++++++++++++++++++++++++++++++++ include/linux/unicode.h | 99 ++++++++++++-- 5 files changed, 441 insertions(+), 197 deletions(-) create mode 100644 fs/unicode/unicode-utf8.c diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig index 2c27b9a5c..2961b0206 100644 --- a/fs/unicode/Kconfig +++ b/fs/unicode/Kconfig @@ -8,7 +8,16 @@ config UNICODE Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding support. +# UTF-8 encoding can be compiled as a module using UNICODE_UTF8 option. +# Having UTF-8 encoding as a module will avoid carrying large +# database table present in utf8data.h_shipped into the kernel +# by being able to load it only when it is required by the filesystem. +config UNICODE_UTF8 + tristate "UTF-8 module" + depends on UNICODE + default m + config UNICODE_NORMALIZATION_SELFTEST tristate "Test UTF-8 normalization support" - depends on UNICODE + depends on UNICODE_UTF8 default n --- a/fs/unicode/Makefile +++ b/fs/unicode/Makefile @@ -1,11 +1,14 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_UNICODE) += unicode.o +obj-$(CONFIG_UNICODE_UTF8) += utf8.o obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o -unicode-y := utf8-norm.o unicode-core.o +unicode-y := unicode-core.o +utf8-y := unicode-utf8.o utf8-norm.o $(obj)/utf8-norm.o: $(obj)/utf8data.h +$(obj)/unicode-utf8.o: $(obj)/utf8-norm.o # In the normal build, the checked-in utf8data.h is just shipped. # --- a/fs/unicode/unicode-core.c +++ b/fs/unicode/unicode-core.c @@ -1,238 +1,144 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include #include -#include #include -#include #include #include -#include +#include -#include "utf8n.h" +DEFINE_SPINLOCK(utf8ops_lock); -int unicode_validate(const struct unicode_map *um, const struct qstr *str) -{ - const struct utf8data *data = utf8nfdi(um->version); - - if (utf8nlen(data, str->name, str->len) < 0) - return -1; - return 0; -} +struct unicode_ops *utf8_ops; +EXPORT_SYMBOL(utf8_ops); + +int _utf8_validate(const struct unicode_map *um, const struct qstr *str) +{ + return 0; +} -EXPORT_SYMBOL(unicode_validate); -int unicode_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) -{ - const struct utf8data *data = utf8nfdi(um->version); - struct utf8cursor cur1, cur2; - int c1, c2; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = utf8byte(&cur2); - - if (c1 < 0 || c2 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; -} +int _utf8_strncmp(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2) +{ + return 0; +} -EXPORT_SYMBOL(unicode_strncmp); -int unicode_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2) -{ - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur1, cur2; - int c1, c2; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = utf8byte(&cur2); - - if (c1 < 0 || c2 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; -} +int _utf8_strncasecmp(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2) +{ + return 0; +} -EXPORT_SYMBOL(unicode_strncasecmp); -/* String cf is expected to be a valid UTF-8 casefolded - * string. - */ -int unicode_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1) -{ - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur1; - int c1, c2; - int i = 0; - - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) - return -EINVAL; - - do { - c1 = utf8byte(&cur1); - c2 = cf->name[i++]; - if (c1 < 0) - return -EINVAL; - if (c1 != c2) - return 1; - } while (c1); - - return 0; -} +int _utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, const struct qstr *s1) +{ + return 0; +} -EXPORT_SYMBOL(unicode_strncasecmp_folded); -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) -{ - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur; - size_t nlen = 0; - - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; - - for (nlen = 0; nlen < dlen; nlen++) { - int c = utf8byte(&cur); - - dest[nlen] = c; - if (!c) - return nlen; - if (c == -1) - break; - } - return -EINVAL; -} +int _utf8_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return 0; +} -EXPORT_SYMBOL(unicode_casefold); -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str) -{ - const struct utf8data *data = utf8nfdicf(um->version); - struct utf8cursor cur; - int c; - unsigned long hash = init_name_hash(salt); - - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; - - while ((c = utf8byte(&cur))) { - if (c < 0) - return -EINVAL; - hash = partial_name_hash((unsigned char)c, hash); - } - str->hash = end_name_hash(hash); - return 0; -} +int _utf8_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return 0; +} -EXPORT_SYMBOL(unicode_casefold_hash); -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen) -{ - const struct utf8data *data = utf8nfdi(um->version); - struct utf8cursor cur; - ssize_t nlen = 0; - - if (utf8ncursor(&cur, data, str->name, str->len) < 0) - return -EINVAL; - - for (nlen = 0; nlen < dlen; nlen++) { - int c = utf8byte(&cur); - - dest[nlen] = c; - if (!c) - return nlen; - if (c == -1) - break; - } - return -EINVAL; -} +int _utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + return 0; +} + +struct unicode_map *_utf8_load(const char *version) +{ + return NULL; +} -EXPORT_SYMBOL(unicode_normalize); -static int unicode_parse_version(const char *version, unsigned int *maj, - unsigned int *min, unsigned int *rev) -{ - substring_t args[3]; - char version_string[12]; - static const struct match_token token[] = { - {1, "%d.%d.%d"}, - {0, NULL} - }; - - int ret = strscpy(version_string, version, sizeof(version_string)); - - if (ret < 0) - return ret; - - if (match_token(version_string, token, args) != 1) - return -EINVAL; - - if (match_int(&args[0], maj) || match_int(&args[1], min) || - match_int(&args[2], rev)) - return -EINVAL; - - return 0; -} +void _utf8_unload(struct unicode_map *um) +{ + return; +} + +DEFINE_STATIC_CALL(utf8_validate, _utf8_validate); +DEFINE_STATIC_CALL(utf8_strncmp, _utf8_strncmp); +DEFINE_STATIC_CALL(utf8_strncasecmp, _utf8_strncasecmp); +DEFINE_STATIC_CALL(utf8_strncasecmp_folded, _utf8_strncasecmp_folded); +DEFINE_STATIC_CALL(utf8_normalize, _utf8_normalize); +DEFINE_STATIC_CALL(utf8_casefold, _utf8_casefold); +DEFINE_STATIC_CALL(utf8_casefold_hash, _utf8_casefold_hash); +DEFINE_STATIC_CALL(utf8_load, _utf8_load); +DEFINE_STATIC_CALL_NULL(utf8_unload, _utf8_unload); +EXPORT_STATIC_CALL(utf8_strncmp); +EXPORT_STATIC_CALL(utf8_strncasecmp); +EXPORT_STATIC_CALL(utf8_strncasecmp_folded); + +static int unicode_load_module(void) +{ + int ret = request_module("utf8"); + + if (ret) { + pr_err("Failed to load UTF-8 module\n"); + return ret; + } + return 0; +} struct unicode_map *unicode_load(const char *version) -{ - struct unicode_map *um = NULL; - int unicode_version; - - if (version) { - unsigned int maj, min, rev; - - if (unicode_parse_version(version, &maj, &min, &rev) < 0) - return ERR_PTR(-EINVAL); - - if (!utf8version_is_supported(maj, min, rev)) - return ERR_PTR(-EINVAL); - - unicode_version = UNICODE_AGE(maj, min, rev); - } else { - unicode_version = utf8version_latest(); - printk(KERN_WARNING"UTF-8 version not specified. " - "Assuming latest supported version (%d.%d.%d).", - (unicode_version >> 16) & 0xff, - (unicode_version >> 8) & 0xff, - (unicode_version & 0xff)); - } - - um = kzalloc(sizeof(struct unicode_map), GFP_KERNEL); - if (!um) - return ERR_PTR(-ENOMEM); - - um->charset = "UTF-8"; - um->version = unicode_version; - - return um; -} +{ + int ret = unicode_load_module(); + + if (ret) + return ERR_PTR(ret); + + spin_lock(&utf8ops_lock); + if (!utf8_ops || !try_module_get(utf8_ops->owner)) { + spin_unlock(&utf8ops_lock); + return ERR_PTR(-ENODEV); + } else { + spin_unlock(&utf8ops_lock); + return static_call(utf8_load)(version); + } +} EXPORT_SYMBOL(unicode_load); void unicode_unload(struct unicode_map *um) { - kfree(um); + if (WARN_ON(!utf8_ops)) + return; + + module_put(utf8_ops->owner); + static_call(utf8_unload)(um); } EXPORT_SYMBOL(unicode_unload); +void unicode_register(struct unicode_ops *ops) +{ + spin_lock(&utf8ops_lock); + utf8_ops = ops; + + static_call_update(utf8_validate, utf8_ops->validate); + static_call_update(utf8_strncmp, utf8_ops->strncmp); + static_call_update(utf8_strncasecmp, utf8_ops->strncasecmp); + static_call_update(utf8_strncasecmp_folded, utf8_ops->strncasecmp_folded); + static_call_update(utf8_normalize, utf8_ops->normalize); + static_call_update(utf8_casefold, utf8_ops->casefold); + static_call_update(utf8_casefold_hash, utf8_ops->casefold_hash); + static_call_update(utf8_load, utf8_ops->load); + static_call_update(utf8_unload, utf8_ops->unload); + + spin_unlock(&utf8ops_lock); +} +EXPORT_SYMBOL(unicode_register); + +void unicode_unregister(void) +{ + spin_lock(&utf8ops_lock); + utf8_ops = NULL; + spin_unlock(&utf8ops_lock); +} +EXPORT_SYMBOL(unicode_unregister); + MODULE_LICENSE("GPL v2"); diff --git a/fs/unicode/unicode-utf8.c b/fs/unicode/unicode-utf8.c new file mode 100644 index 000000000..770e60696 --- /dev/null +++ b/fs/unicode/unicode-utf8.c @@ -0,0 +1,255 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include +#include +#include +#include + +#include "utf8n.h" + +static int utf8_validate(const struct unicode_map *um, const struct qstr *str) +{ + const struct utf8data *data = utf8nfdi(um->version); + + if (utf8nlen(data, str->name, str->len) < 0) + return -1; + return 0; +} + +static int utf8_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + const struct utf8data *data = utf8nfdi(um->version); + struct utf8cursor cur1, cur2; + int c1, c2; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = utf8byte(&cur2); + + if (c1 < 0 || c2 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +static int utf8_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur1, cur2; + int c1, c2; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = utf8byte(&cur2); + + if (c1 < 0 || c2 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +/* String cf is expected to be a valid UTF-8 casefolded + * string. + */ +static int utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur1; + int c1, c2; + int i = 0; + + if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0) + return -EINVAL; + + do { + c1 = utf8byte(&cur1); + c2 = cf->name[i++]; + if (c1 < 0) + return -EINVAL; + if (c1 != c2) + return 1; + } while (c1); + + return 0; +} + +static int utf8_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + size_t nlen = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + for (nlen = 0; nlen < dlen; nlen++) { + int c = utf8byte(&cur); + + dest[nlen] = c; + if (!c) + return nlen; + if (c == -1) + break; + } + return -EINVAL; +} + +static int utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + int c; + unsigned long hash = init_name_hash(salt); + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + while ((c = utf8byte(&cur))) { + if (c < 0) + return -EINVAL; + hash = partial_name_hash((unsigned char)c, hash); + } + str->hash = end_name_hash(hash); + return 0; +} + +static int utf8_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + const struct utf8data *data = utf8nfdi(um->version); + struct utf8cursor cur; + ssize_t nlen = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + for (nlen = 0; nlen < dlen; nlen++) { + int c = utf8byte(&cur); + + dest[nlen] = c; + if (!c) + return nlen; + if (c == -1) + break; + } + return -EINVAL; +} + +static int utf8_parse_version(const char *version, unsigned int *maj, + unsigned int *min, unsigned int *rev) +{ + substring_t args[3]; + char version_string[12]; + static const struct match_token token[] = { + {1, "%d.%d.%d"}, + {0, NULL} + }; + + int ret = strscpy(version_string, version, sizeof(version_string)); + + if (ret < 0) + return ret; + + if (match_token(version_string, token, args) != 1) + return -EINVAL; + + if (match_int(&args[0], maj) || match_int(&args[1], min) || + match_int(&args[2], rev)) + return -EINVAL; + + return 0; +} + +static struct unicode_map *utf8_load(const char *version) +{ + struct unicode_map *um = NULL; + int unicode_version; + + if (version) { + unsigned int maj, min, rev; + + if (utf8_parse_version(version, &maj, &min, &rev) < 0) + return ERR_PTR(-EINVAL); + + if (!utf8version_is_supported(maj, min, rev)) + return ERR_PTR(-EINVAL); + + unicode_version = UNICODE_AGE(maj, min, rev); + } else { + unicode_version = utf8version_latest(); + pr_warn("UTF-8 version not specified. Assuming latest supported version (%d.%d.%d).", + (unicode_version >> 16) & 0xff, + (unicode_version >> 8) & 0xff, + (unicode_version & 0xfe)); + } + + um = kzalloc(sizeof(*um), GFP_KERNEL); + if (!um) + return ERR_PTR(-ENOMEM); + + um->charset = "UTF-8"; + um->version = unicode_version; + + return um; +} + +void utf8_unload(struct unicode_map *um) +{ + kfree(um); +} + +static struct unicode_ops ops = { + .owner = THIS_MODULE, + .validate = utf8_validate, + .strncmp = utf8_strncmp, + .strncasecmp = utf8_strncasecmp, + .strncasecmp_folded = utf8_strncasecmp_folded, + .casefold = utf8_casefold, + .casefold_hash = utf8_casefold_hash, + .normalize = utf8_normalize, + .load = utf8_load, + .unload = utf8_unload, +}; + +static int __init utf8_init(void) +{ + unicode_register(&ops); + return 0; +} + +static void __exit utf8_exit(void) +{ + unicode_unregister(); +} + +module_init(utf8_init); +module_exit(utf8_exit); + +MODULE_LICENSE("GPL v2"); --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -4,33 +4,104 @@ #include #include +#include + struct unicode_map { const char *charset; int version; }; -int unicode_validate(const struct unicode_map *um, const struct qstr *str); +struct unicode_ops { + struct module *owner; + int (*validate)(const struct unicode_map *um, const struct qstr *str); + int (*strncmp)(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2); + int (*strncasecmp)(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2); + int (*strncasecmp_folded)(const struct unicode_map *um, const struct qstr *cf, + const struct qstr *s1); + int (*normalize)(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); + int (*casefold)(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); + int (*casefold_hash)(const struct unicode_map *um, const void *salt, + struct qstr *str); + struct unicode_map* (*load)(const char *version); + void (*unload)(struct unicode_map *um); +}; -int unicode_strncmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); +extern struct unicode_ops *utf8_ops; -int unicode_strncasecmp(const struct unicode_map *um, - const struct qstr *s1, const struct qstr *s2); -int unicode_strncasecmp_folded(const struct unicode_map *um, - const struct qstr *cf, - const struct qstr *s1); +int _utf8_validate(const struct unicode_map *um, const struct qstr *str); +int _utf8_strncmp(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2); +int _utf8_strncasecmp(const struct unicode_map *um, const struct qstr *s1, + const struct qstr *s2); +int _utf8_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1); +int _utf8_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); +int _utf8_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen); +int _utf8_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str); -int unicode_normalize(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +DECLARE_STATIC_CALL(utf8_validate, _utf8_validate); +DECLARE_STATIC_CALL(utf8_strncmp, _utf8_strncmp); +DECLARE_STATIC_CALL(utf8_strncasecmp, _utf8_strncasecmp); +DECLARE_STATIC_CALL(utf8_strncasecmp_folded, _utf8_strncasecmp_folded); +DECLARE_STATIC_CALL(utf8_normalize, _utf8_normalize); +DECLARE_STATIC_CALL(utf8_casefold, _utf8_casefold); +DECLARE_STATIC_CALL(utf8_casefold_hash, _utf8_casefold_hash); -int unicode_casefold(const struct unicode_map *um, const struct qstr *str, - unsigned char *dest, size_t dlen); +static inline int unicode_validate(const struct unicode_map *um, const struct qstr *str) +{ + return static_call(utf8_validate)(um, str); +} -int unicode_casefold_hash(const struct unicode_map *um, const void *salt, - struct qstr *str); +static inline int unicode_strncmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + return static_call(utf8_strncmp)(um, s1, s2); +} + +static inline int unicode_strncasecmp(const struct unicode_map *um, + const struct qstr *s1, const struct qstr *s2) +{ + return static_call(utf8_strncasecmp)(um, s1, s2); +} + +static inline int unicode_strncasecmp_folded(const struct unicode_map *um, + const struct qstr *cf, + const struct qstr *s1) +{ + return static_call(utf8_strncasecmp_folded)(um, cf, s1); +} + +static inline int unicode_normalize(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return static_call(utf8_normalize)(um, str, dest, dlen); +} + +static inline int unicode_casefold(const struct unicode_map *um, const struct qstr *str, + unsigned char *dest, size_t dlen) +{ + return static_call(utf8_casefold)(um, str, dest, dlen); +} + +static inline int unicode_casefold_hash(const struct unicode_map *um, const void *salt, + struct qstr *str) +{ + return static_call(utf8_casefold_hash)(um, salt, str); +} struct unicode_map *unicode_load(const char *version); void unicode_unload(struct unicode_map *um); +void unicode_register(struct unicode_ops *ops); +void unicode_unregister(void); + #endif /* _LINUX_UNICODE_H */