Message ID | 20200208013552.241832-2-drosen@google.com |
---|---|
State | Not Applicable |
Headers | show |
Series | Support fof Casefolding and Encryption | expand |
On Fri, Feb 07, 2020 at 05:35:45PM -0800, Daniel Rosenberg wrote: > This function will allow other uses of unicode to act upon a casefolded > string without needing to allocate their own copy of one. > > The actor function can return an nonzero value to exit early. > > Signed-off-by: Daniel Rosenberg <drosen@google.com> > --- > fs/unicode/utf8-core.c | 25 ++++++++++++++++++++++++- > include/linux/unicode.h | 10 ++++++++++ > 2 files changed, 34 insertions(+), 1 deletion(-) > > diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c > index 2a878b739115d..db050bf59a32b 100644 > --- a/fs/unicode/utf8-core.c > +++ b/fs/unicode/utf8-core.c > @@ -122,9 +122,32 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, > } > return -EINVAL; > } > - > EXPORT_SYMBOL(utf8_casefold); > > +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str, > + struct utf8_itr_context *ctx) > +{ > + const struct utf8data *data = utf8nfdicf(um->version); > + struct utf8cursor cur; > + int c; > + int res = 0; > + int pos = 0; > + > + if (utf8ncursor(&cur, data, str->name, str->len) < 0) > + return -EINVAL; > + > + while ((c = utf8byte(&cur))) { > + if (c < 0) > + return c; > + res = ctx->actor(ctx, c, pos); > + pos++; > + if (res) > + return res; > + } > + return res; > +} > +EXPORT_SYMBOL(utf8_casefold_iter); Indirect function calls are expensive these days for various reasons, including Spectre mitigations and CFI. Are you sure it's okay from a performance perspective to make an indirect call for every byte of the pathname? > +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos); The byte argument probably should be 'u8', to avoid confusion about whether it's a byte or a Unicode codepoint. - Eric
On Tue, Feb 11, 2020 at 7:38 PM Eric Biggers <ebiggers@kernel.org> wrote: > > Indirect function calls are expensive these days for various reasons, including > Spectre mitigations and CFI. Are you sure it's okay from a performance > perspective to make an indirect call for every byte of the pathname? > > > +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos); > > The byte argument probably should be 'u8', to avoid confusion about whether it's > a byte or a Unicode codepoint. > > - Eric Gabriel, what do you think here? I could change it to either exposing the things necessary to do the hashing in libfs, or instead of the general purpose iterator, just have a hash function inside of unicode that will compute the hash given a seed value. -Daniel
Daniel Rosenberg <drosen@google.com> writes: > On Tue, Feb 11, 2020 at 7:38 PM Eric Biggers <ebiggers@kernel.org> wrote: >> >> Indirect function calls are expensive these days for various reasons, including >> Spectre mitigations and CFI. Are you sure it's okay from a performance >> perspective to make an indirect call for every byte of the pathname? >> >> > +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos); >> >> The byte argument probably should be 'u8', to avoid confusion about whether it's >> a byte or a Unicode codepoint. >> just for the record, we use int utf8byte because it can fail error codes, but that is not the case here. It should be u8. > > Gabriel, what do you think here? I could change it to either exposing > the things necessary to do the hashing in libfs, or instead of the > general purpose iterator, just have a hash function inside of unicode > that will compute the hash given a seed value. Sorry for the delay, I'm away on a long vacation and intentionally staying away from my laptop :) Eric has a very good point, if not prohibitively, it is unnecessarily expensive for a hot path. Why not expose utf8ncursor and utf8byte to libfs and implement the hash in libfs?
diff --git a/fs/unicode/utf8-core.c b/fs/unicode/utf8-core.c index 2a878b739115d..db050bf59a32b 100644 --- a/fs/unicode/utf8-core.c +++ b/fs/unicode/utf8-core.c @@ -122,9 +122,32 @@ int utf8_casefold(const struct unicode_map *um, const struct qstr *str, } return -EINVAL; } - EXPORT_SYMBOL(utf8_casefold); +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str, + struct utf8_itr_context *ctx) +{ + const struct utf8data *data = utf8nfdicf(um->version); + struct utf8cursor cur; + int c; + int res = 0; + int pos = 0; + + if (utf8ncursor(&cur, data, str->name, str->len) < 0) + return -EINVAL; + + while ((c = utf8byte(&cur))) { + if (c < 0) + return c; + res = ctx->actor(ctx, c, pos); + pos++; + if (res) + return res; + } + return res; +} +EXPORT_SYMBOL(utf8_casefold_iter); + int utf8_normalize(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen) { diff --git a/include/linux/unicode.h b/include/linux/unicode.h index 990aa97d80496..2ae12f8710ae2 100644 --- a/include/linux/unicode.h +++ b/include/linux/unicode.h @@ -10,6 +10,13 @@ struct unicode_map { int version; }; +struct utf8_itr_context; +typedef int (*utf8_itr_actor_t)(struct utf8_itr_context *, int byte, int pos); + +struct utf8_itr_context { + utf8_itr_actor_t actor; +}; + int utf8_validate(const struct unicode_map *um, const struct qstr *str); int utf8_strncmp(const struct unicode_map *um, @@ -27,6 +34,9 @@ int utf8_normalize(const struct unicode_map *um, const struct qstr *str, int utf8_casefold(const struct unicode_map *um, const struct qstr *str, unsigned char *dest, size_t dlen); +int utf8_casefold_iter(const struct unicode_map *um, const struct qstr *str, + struct utf8_itr_context *ctx); + struct unicode_map *utf8_load(const char *version); void utf8_unload(struct unicode_map *um);
This function will allow other uses of unicode to act upon a casefolded string without needing to allocate their own copy of one. The actor function can return an nonzero value to exit early. Signed-off-by: Daniel Rosenberg <drosen@google.com> --- fs/unicode/utf8-core.c | 25 ++++++++++++++++++++++++- include/linux/unicode.h | 10 ++++++++++ 2 files changed, 34 insertions(+), 1 deletion(-)