Message ID | 20140510031356.22726.qmail@ns.horizon.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Hello, On Fri, May 09, 2014 at 11:13:56PM -0400, George Spelvin wrote: > +config GLOB > + tristate > +# (Prompt disabled to reduce kbuild clutter until someone needs it.) > +# prompt "glob_match() function" > + help > + This option provides a glob_match function for performing simple > + text pattern matching. It is primarily used by the ATA code > + to blacklist particular drive models, but other device drivers > + may need similar functionality. > + > + All in-kernel drivers that require this function automatically > + select this option. Say N unless you are compiling an out-of > + tree driver which tells you it depend on it. Just adding glob.o to lib-y should be enough. It will be excluded from linking if unused. > +#ifdef UNITTEST > +/* To do a basic sanity test, "cc -DUNITTEST glob.c" and run a.out. */ > + > +#include <stdbool.h> > +#define __pure __attribute__((pure)) > +#define NOP(x) > +#define EXPORT_SYMBOL NOP /* Two stages to avoid checkpatch complaints */ These things tend to bitrot. Let's please keep testing harness out of tree. > +#else > + > +#include <linux/module.h> > +#include <linux/glob.h> > + > +MODULE_DESCRIPTION("glob(7) matching"); > +MODULE_LICENSE("Dual MIT/GPL"); Do we make library routines separate modules usually? ... > +bool __pure > +glob_match(char const *pat, char const *str) The whole thing fits in a single 80 column line, right? bool __pure glob_match(char const *pat, char const *str) > +{ > + /* > + * Backtrack to previous * on mismatch and retry starting one > + * character later in the string. Because * matches all characters > + * (no exception for /), it can be easily proved that there's > + * never a need to backtrack multiple levels. > + */ > + char const *back_pat = 0, *back_str = back_str; Blank line here. I haven't delved into the actual implementation. Looks sane on the first glance. > +#ifdef UNITTEST > + > +/* Test code */ > +#include <stdio.h> > +#include <stdlib.h> > +struct glob_test { > + char const *pat, *str; > + bool expected; > +}; > + > +static void > +test(struct glob_test const *g) > +{ > + bool match = glob_match(g->pat, g->str); > + > + printf("\"%s\" vs. \"%s\": %s %s\n", g->pat, g->str, > + match ? "match" : "mismatch", > + match == g->expected ? "OK" : "*** ERROR ***"); > + if (match != g->expected) > + exit(1); > +} > + > +static struct glob_test const tests[] = { > + { "a", "a", true }, ... > + { "*ab*cd*", "abcabcabcabcefg", false } > +}; > + > +int > +main(void) > +{ > + size_t i; > + > + for (i = 0; i < sizeof(tests)/sizeof(*tests); i++) > + test(tests + i); > + > + return 0; > +} > + > +#endif /* UNITTEST */ Again, I don't really think the userland testing code belongs here. If you wanna keep them, please make it in-kernel selftesting. We don't really wanna keep code which can't get built and tested in kernel tree proper. Thanks.
Ooh, one more nitpick. On Fri, May 09, 2014 at 11:13:56PM -0400, George Spelvin wrote: > +/** > + * glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0) > + * @pat: Pattern to match. Metacharacters are ?, *, [ and \. > + * (And, inside character classes, !, - and ].) @ARG lines should be contained in a single line. Just "Pattern to match." should do. With detailed description in the body. Thanks.
Thanks a lot for the feedback! > On Fri, May 09, 2014 at 11:13:56PM -0400, George Spelvin wrote: >> +/** >> + * glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0) >> + * @pat: Pattern to match. Metacharacters are ?, *, [ and \. >> + * (And, inside character classes, !, - and ].) > @ARG lines should be contained in a single line. Just "Pattern to > match." should do. With detailed description in the body. Huh, Documentation/kernel-doc-nano-HOWTO.txt (lines 57-59, to be precise) implies otherwise pretty strongly. But I can certainly change it. > Just adding glob.o to lib-y should be enough. It will be excluded > from linking if unused. Will that work right if the caller is a module? What will it get linked into, the main kernel binary or the module? A significant and very helpful simplification; I just want to be sure it works right. >> +#ifdef UNITTEST >> +/* To do a basic sanity test, "cc -DUNITTEST glob.c" and run a.out. */ >> + >> +#include <stdbool.h> >> +#define __pure __attribute__((pure)) >> +#define NOP(x) >> +#define EXPORT_SYMBOL NOP /* Two stages to avoid checkpatch complaints */ > These things tend to bitrot. Let's please keep testing harness out of > tree. Damn, when separated it bitrots a lot faster. That's *is* my testing harness, and I wanted to keep it close so it has a chance on hell of being used by someone who updates it. Especially given that the function's interface is quite rigidly defined, do you really think there will be a lot of rot? > Do we make library routines separate modules usually? A large number of files in lib/ are implemented that way (lib/crc-ccitt.c, just for one example), and that's what I copied. But if I just do the obj-y thing, all that goes away >> +bool __pure >> +glob_match(char const *pat, char const *str) > > The whole thing fits in a single 80 column line, right? > > bool __pure glob_match(char const *pat, char const *str) Whoops, a residue of my personal code style. (I like to left-align function names in definitions so they're easy to search for with ^func.) But it's not kernel style. Will fix. >> +{ >> + /* >> + * Backtrack to previous * on mismatch and retry starting one >> + * character later in the string. Because * matches all characters >> + * (no exception for /), it can be easily proved that there's >> + * never a need to backtrack multiple levels. >> + */ >> + char const *back_pat = 0, *back_str = back_str; > Blank line here. I had considered the "/*" start of the following block comment as visually enough separation between variable declarations and statements, but sure, I can add one. > I haven't delved into the actual implementation. Looks sane on the > first glance. That's the part I'm least worried about, actually. > Again, I don't really think the userland testing code belongs here. > If you want to keep them, please make it in-kernel selftesting. We > don't really want to keep code which can't get built and tested in > kernel tree proper. I'll see if I can figure out how to do that. Simple as it is, I hate to throw away regression tests. Thank you very much. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/10/2014 07:03 AM, George Spelvin wrote: > Thanks a lot for the feedback! > >> On Fri, May 09, 2014 at 11:13:56PM -0400, George Spelvin wrote: >>> +/** >>> + * glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0) >>> + * @pat: Pattern to match. Metacharacters are ?, *, [ and \. >>> + * (And, inside character classes, !, - and ].) > >> @ARG lines should be contained in a single line. Just "Pattern to >> match." should do. With detailed description in the body. That's old/historical, not current. > Huh, Documentation/kernel-doc-nano-HOWTO.txt (lines 57-59, to be precise) > implies otherwise pretty strongly. But I can certainly change it. Either way should be OK. >> Just adding glob.o to lib-y should be enough. It will be excluded >> from linking if unused. > > Will that work right if the caller is a module? What will it get linked > into, the main kernel binary or the module? and sometimes we have to use obj-y instead of lib-y. > A significant and very helpful simplification; I just want to be sure > it works right. > >>> +#ifdef UNITTEST >>> +/* To do a basic sanity test, "cc -DUNITTEST glob.c" and run a.out. */ >>> + >>> +#include <stdbool.h> >>> +#define __pure __attribute__((pure)) >>> +#define NOP(x) >>> +#define EXPORT_SYMBOL NOP /* Two stages to avoid checkpatch complaints */ > >> These things tend to bitrot. Let's please keep testing harness out of >> tree. > > Damn, when separated it bitrots a lot faster. That's *is* my testing > harness, and I wanted to keep it close so it has a chance on hell of > being used by someone who updates it. > > Especially given that the function's interface is quite rigidly defined, > do you really think there will be a lot of rot? > >> Do we make library routines separate modules usually? > > A large number of files in lib/ are implemented that way (lib/crc-ccitt.c, > just for one example), and that's what I copied. But if I just do the > obj-y thing, all that goes away > >>> +bool __pure >>> +glob_match(char const *pat, char const *str) >> >> The whole thing fits in a single 80 column line, right? >> >> bool __pure glob_match(char const *pat, char const *str) > > Whoops, a residue of my personal code style. (I like to left-align > function names in definitions so they're easy to search for with ^func.) > But it's not kernel style. Will fix. > >>> +{ >>> + /* >>> + * Backtrack to previous * on mismatch and retry starting one >>> + * character later in the string. Because * matches all characters >>> + * (no exception for /), it can be easily proved that there's >>> + * never a need to backtrack multiple levels. >>> + */ >>> + char const *back_pat = 0, *back_str = back_str; > >> Blank line here. > > I had considered the "/*" start of the following block comment as visually > enough separation between variable declarations and statements, but sure, > I can add one. > >> I haven't delved into the actual implementation. Looks sane on the >> first glance. > > That's the part I'm least worried about, actually. > >> Again, I don't really think the userland testing code belongs here. >> If you want to keep them, please make it in-kernel selftesting. We >> don't really want to keep code which can't get built and tested in >> kernel tree proper. > > I'll see if I can figure out how to do that. Simple as it is, I hate to > throw away regression tests.
On 05/09/2014 08:13 PM, George Spelvin wrote: > This is a helper function from drivers/ata/libata_core.c, where it is used > to blacklist particular device models. It's being moved to lib/ so other > drivers may use it for the same purpose. > > This implementation in non-recursive, so is safe for the kernel stack. > > Signed-off-by: George Spelvin <linux@horizon.com> > --- > Finally rescued this from the back burner. The code size will go back > down in the second patch which removes the old implementation, although > the number of source line reflects more comments and a test driver. > > The infrastructure parts of this, such as the module name and Kconfig > declarations, are something I haven't done before, and comments are > appreciated. > > Tejun Heo <htejun@gmail.com> wrote: >> On Wed, Mar 12, 2014 at 06:52:44PM -0400, George Spelvin wrote: >>> Sure; I'll prepare some patches. May I feed it through you, or >>> is there a lib/ maintainer I need to go through? >> >> Please keep me cc'd but you'd probably also want to cc Linus, Andrew >> Morton and Ingo. > > include/linux/glob.h | 10 +++ > lib/Kconfig | 14 ++++ > lib/Makefile | 2 + > lib/glob.c | 225 +++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 251 insertions(+) > create mode 100644 include/linux/glob.h > create mode 100644 lib/glob.c > diff --git a/lib/Kconfig b/lib/Kconfig > index 991c98b..5333d10 100644 > --- a/lib/Kconfig > +++ b/lib/Kconfig > @@ -373,6 +373,20 @@ config CPU_RMAP > config DQL > bool > > +config GLOB > + tristate > +# (Prompt disabled to reduce kbuild clutter until someone needs it.) > +# prompt "glob_match() function" > + help > + This option provides a glob_match function for performing simple > + text pattern matching. It is primarily used by the ATA code > + to blacklist particular drive models, but other device drivers > + may need similar functionality. > + > + All in-kernel drivers that require this function automatically I would drop "automatically". It has to be coded. > + select this option. Say N unless you are compiling an out-of > + tree driver which tells you it depend on it. To support out-of-tree drivers, I'm pretty sure that you will need to use obj- instead of lib-. lib- drops unused code, like Tejun said. > + > # > # Netlink attribute parsing support is select'ed if needed > #
On Sat, 10 May 2014 08:21:38 -0400, Tejun Heo wrote: > On Fri, May 09, 2014 at 11:13:56PM -0400, George Spelvin wrote: >> +config GLOB >> + tristate >> +# (Prompt disabled to reduce kbuild clutter until someone needs it.) >> +# prompt "glob_match() function" >> + help >> + This option provides a glob_match function for performing simple >> + text pattern matching. It is primarily used by the ATA code >> + to blacklist particular drive models, but other device drivers >> + may need similar functionality. >> + >> + All in-kernel drivers that require this function automatically >> + select this option. Say N unless you are compiling an out-of >> + tree driver which tells you it depend on it. > Just adding glob.o to lib-y should be enough. It will be excluded > from linking if unused. But, I just confirmed, it will also be excluded from linking if CONFIG_ATA=m. See for example commit b4d3ba3346f0 where some helpers had to be moved from lib-y to obj-y to fix module load errors. (Thanks to Randy Dunlap for this example.) >> +#else >> + >> +#include <linux/module.h> >> +#include <linux/glob.h> >> + >> +MODULE_DESCRIPTION("glob(7) matching"); >> +MODULE_LICENSE("Dual MIT/GPL"); > Do we make library routines separate modules usually? There are basically three options I can see: 1) Make it non-configurable and always include the code in the kernel using oby-y, and just accept the wasted space if CONFIG_ATA=n 2) Make it a boolean option, so CONFIG_ATA=m will force CONFIG_GLOB=y and it will be included in the static kernel but unused until the module is loaded. 3) Make it a tristate option, which compiles into the kernel if CONFIG_ATA=y (the common case), and is its own module if CONFIG_ATA=m. An awful lot of the files in lib/ take this approach. Do you have a preference? Option 3 is the current status. (Revised patch will come when I have the self-test converted.) -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, May 10, 2014 at 10:29:18AM -0700, Randy Dunlap wrote: > > + select this option. Say N unless you are compiling an out-of > > + tree driver which tells you it depend on it. > > To support out-of-tree drivers, I'm pretty sure that you will need > to use obj- instead of lib-. lib- drops unused code, like Tejun said. I don't think we need to worry about out-of-tree drivers from the beginning. Thanks.
On Sat, 10 May 2014 08:21:38 -0400 Tejun Heo <tj@kernel.org> wrote: > > +int > > +main(void) > > +{ > > + size_t i; > > + > > + for (i = 0; i < sizeof(tests)/sizeof(*tests); i++) > > + test(tests + i); > > + > > + return 0; > > +} > > + > > +#endif /* UNITTEST */ > > Again, I don't really think the userland testing code belongs here. > If you wanna keep them, please make it in-kernel selftesting. We > don't really wanna keep code which can't get built and tested in > kernel tree proper. If the tests don't measurably increase boot times we could run them unconditionally at boot. Mark everything __init/__initdata and the costs are very low. lib/plist.c does this. -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/glob.h b/include/linux/glob.h new file mode 100644 index 0000000..c990b7f --- /dev/null +++ b/include/linux/glob.h @@ -0,0 +1,10 @@ +#ifndef _LINUX_GLOB_H +#define _LINUX_GLOB_H + +#include <linux/types.h> /* For bool */ +#include <linux/compiler.h> /* For __pure */ + +bool __pure glob_match(char const *pat, char const *str); + +#endif /* _LINUX_GLOB_H */ + diff --git a/lib/Kconfig b/lib/Kconfig index 991c98b..5333d10 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -373,6 +373,20 @@ config CPU_RMAP config DQL bool +config GLOB + tristate +# (Prompt disabled to reduce kbuild clutter until someone needs it.) +# prompt "glob_match() function" + help + This option provides a glob_match function for performing simple + text pattern matching. It is primarily used by the ATA code + to blacklist particular drive models, but other device drivers + may need similar functionality. + + All in-kernel drivers that require this function automatically + select this option. Say N unless you are compiling an out-of + tree driver which tells you it depend on it. + # # Netlink attribute parsing support is select'ed if needed # diff --git a/lib/Makefile b/lib/Makefile index 48140e3..c859f81 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -133,6 +133,8 @@ obj-$(CONFIG_CORDIC) += cordic.o obj-$(CONFIG_DQL) += dynamic_queue_limits.o +obj-$(CONFIG_GLOB) += glob.o + obj-$(CONFIG_MPILIB) += mpi/ obj-$(CONFIG_SIGNATURE) += digsig.o diff --git a/lib/glob.c b/lib/glob.c new file mode 100644 index 0000000..b71ca59 --- /dev/null +++ b/lib/glob.c @@ -0,0 +1,225 @@ +#ifdef UNITTEST +/* To do a basic sanity test, "cc -DUNITTEST glob.c" and run a.out. */ + +#include <stdbool.h> +#define __pure __attribute__((pure)) +#define NOP(x) +#define EXPORT_SYMBOL NOP /* Two stages to avoid checkpatch complaints */ + +#else + +#include <linux/module.h> +#include <linux/glob.h> + +MODULE_DESCRIPTION("glob(7) matching"); +MODULE_LICENSE("Dual MIT/GPL"); + +#endif + +/** + * glob_match - Shell-style pattern matching, like !fnmatch(pat, str, 0) + * @pat: Pattern to match. Metacharacters are ?, *, [ and \. + * (And, inside character classes, !, - and ].) + * @str: String to match. the pattern must match the entire string. + * + * Perform shell-style glob matching, returning true (1) if the match + * succeeds, or false (0) if it fails. + * + * This is small and simple implementation intended for device blacklists + * where a string is matched against a number of patterns. Thus, it + * does not preprocess the patterns. It is non-recursive, and run-time + * is at most quadratic: strlen(@str)*strlen(@pat). + * + * An example of the worst case is glob_match("*aaaaa", "aaaaaaaaaa"); + * it takes 6 passes over the pattern before matching the string. + * + * Like !fnmatch(@pat, @str, 0) and unlike the shell, this does NOT + * treat / or leading . specially; it isn't actually used for pathnames. + * + * Note that according to glob(7) (and unlike bash), character classes + * are complemented by a leading !; this does not support the regex-style + * [^a-z] syntax. + * + * An opening bracket without a matching close is matched literally. + */ +bool __pure +glob_match(char const *pat, char const *str) +{ + /* + * Backtrack to previous * on mismatch and retry starting one + * character later in the string. Because * matches all characters + * (no exception for /), it can be easily proved that there's + * never a need to backtrack multiple levels. + */ + char const *back_pat = 0, *back_str = back_str; + /* + * Loop over each token (character or class) in pat, matching + * it against the remaining unmatched tail of str. Return false + * on mismatch, or true after matching the trailing nul bytes. + */ + for (;;) { + unsigned char c = *str++; + unsigned char d = *pat++; + + switch (d) { + case '?': /* Wildcard: anything but nul */ + if (c == '\0') + return false; + break; + case '*': /* Any-length wildcard */ + if (*pat == '\0') /* Optimize trailing * case */ + return true; + back_pat = pat; + back_str = --str; /* Allow zero-length match */ + break; + case '[': { /* Character class */ + bool match = false, inverted = (*pat == '!'); + char const *class = pat + inverted; + unsigned char a = *class++; + + /* + * Iterate over each span in the character class. + * A span is either a single character a, or a + * range a-b. + */ + do { + unsigned char b = a; + + if (a == '\0') /* Malformed */ + goto literal; + + if (class[0] == '-' && class[1] != ']') { + b = class[1]; + + if (b == '\0') + goto literal; + + class += 2; + /* Any special action if a > b? */ + } + match |= (a <= c && c <= b); + } while ((a = *class++) != ']'); + + if (match == inverted) + goto backtrack; + pat = class; + } + break; + case '\\': + d = *pat++; + /* FALLLTHROUGH */ + default: /* Literal character */ +literal: + if (c == d) { + if (d == '\0') + return true; + break; + } +backtrack: + if (c == '\0' || !back_pat) + return false; /* No point continuing */ + /* Try again from last *, one character later in str. */ + pat = back_pat; + str = ++back_str; + break; + } + } +} +EXPORT_SYMBOL(glob_match); + + +#ifdef UNITTEST + +/* Test code */ +#include <stdio.h> +#include <stdlib.h> +struct glob_test { + char const *pat, *str; + bool expected; +}; + +static void +test(struct glob_test const *g) +{ + bool match = glob_match(g->pat, g->str); + + printf("\"%s\" vs. \"%s\": %s %s\n", g->pat, g->str, + match ? "match" : "mismatch", + match == g->expected ? "OK" : "*** ERROR ***"); + if (match != g->expected) + exit(1); +} + +static struct glob_test const tests[] = { + { "a", "a", true }, + { "a", "b", false }, + { "a", "aa", false }, + { "a", "", false }, + { "", "", true }, + { "", "a", false }, + { "[a]", "a", true }, + { "[!a]", "a", false }, + { "[!a]", "b", true }, + { "[ab]", "a", true }, + { "[ab]", "b", true }, + { "[ab]", "c", false }, + { "[!ab]", "c", true }, + { "[a-c]", "b", true }, + { "[a-c]", "d", false }, + { "[]a-ceg-ik[]", "a", true }, + { "[]a-ceg-ik[]", "]", true }, + { "[]a-ceg-ik[]", "[", true }, + { "[]a-ceg-ik[]", "h", true }, + { "[]a-ceg-ik[]", "f", false }, + { "[!]a-ceg-ik[]", "h", false }, + { "[!]a-ceg-ik[]", "]", false }, + { "[!]a-ceg-ik[]", "f", true }, + { "[a-b-]", "-", true }, + { "?", "a", true }, + { "?", "aa", false }, + { "??", "a", false }, + { "?x?", "axb", true }, + { "?x?", "abx", false }, + { "?x?", "xab", false }, + { "*??", "a", false }, + { "*??", "ab", true }, + { "*??", "abc", true }, + { "*??", "abcd", true }, + { "??*", "a", false }, + { "??*", "ab", true }, + { "??*", "abc", true }, + { "??*", "abcd", true }, + { "?*?", "a", false }, + { "?*?", "ab", true }, + { "?*?", "abc", true }, + { "?*?", "abcd", true }, + { "*b", "b", true }, + { "*b", "ab", true }, + { "*b", "aab", true }, + { "*bc", "abbc", true }, + { "*bc", "bc", true }, + { "*bc", "bbc", true }, + { "*bc", "bcbc", true }, + { "*ac*", "abacadaeafag", true }, + { "*ac*ae*ag*", "abacadaeafag", true }, + { "*a*b*[bc]*[ef]*g*", "abacadaeafag", true }, + { "*a*b*[ef]*[cd]*g*", "abacadaeafag", false }, + { "*abcd*", "abcabcabcabcdefg", true }, + { "*ab*cd*", "abcabcabcabcdefg", true }, + { "*abcd*abcdef*", "abcabcdabcdeabcdefg", true }, + { "*abcd*", "abcabcabcabcefg", false }, + { "*ab*cd*", "abcabcabcabcefg", false } +}; + +int +main(void) +{ + size_t i; + + for (i = 0; i < sizeof(tests)/sizeof(*tests); i++) + test(tests + i); + + return 0; +} + +#endif /* UNITTEST */