Message ID | 1460587532-5278-1-git-send-email-vapier@gentoo.org |
---|---|
State | New |
Headers | show |
Please also allow ISO 30112 categories. best regards keld On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote: > Currently localedef accepts any value for the category keyword. This has > allowed bad values to propagate to the vast majority of locales (~90%). > Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards. > > 2016-04-13 Mike Frysinger <vapier@gentoo.org> > > * locale/programs/ld-identification.c (identification_finish): Check > that the values in identification->category are only posix:1993 or > i18n:2002. > --- > locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++------- > 1 file changed, 35 insertions(+), 7 deletions(-) > > diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c > index 1e8fa84..eccb388 100644 > --- a/locale/programs/ld-identification.c > +++ b/locale/programs/ld-identification.c > @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION")); > TEST_ELEM (date); > > for (num = 0; num < __LC_LAST; ++num) > - if (num != LC_ALL && identification->category[num] == NULL) > - { > - if (verbose && ! nothing) > - WITH_CUR_LOCALE (error (0, 0, _("\ > + { > + /* We don't accept/parse this category, so skip it early. */ > + if (num == LC_ALL) > + continue; > + > + if (identification->category[num] == NULL) > + { > + if (verbose && ! nothing) > + WITH_CUR_LOCALE (error (0, 0, _("\ > %s: no identification for category `%s'"), > - "LC_IDENTIFICATION", category_name[num])); > - identification->category[num] = ""; > - } > + "LC_IDENTIFICATION", category_name[num])); > + identification->category[num] = ""; > + } > + else > + { > + /* Only list the standards we care about. */ > + static const char * const standards[] = > + { > + "posix:1993", > + "i18n:2002", > + }; > + size_t i; > + bool matched = false; > + > + for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i) > + if (strcmp (identification->category[num], standards[i]) == 0) > + matched = true; > + > + if (matched != true) > + WITH_CUR_LOCALE (error (0, 0, _("\ > +%s: unknown standard `%s' for category `%s'"), > + "LC_IDENTIFICATION", > + identification->category[num], > + category_name[num])); > + } > + } > } > > > -- > 2.7.4
Actually the standards 14652/30112 were set up so you could declare what version of the locale category was used for the data. POSIX is different from 14652 and again different from 30112. 30112 is the one that most closely corresponds to glibc implementations. I also think that POSIX allows for more categories than the ones that the 9945 standard defines, and in that way 14652 and 30112 are compatible with POSIX. I would advise that this still be allowed, but then declared in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like "non-standard" to indicate that. I would advice to use the values for the locale versions given in 30112. The values defined in 30112 are: i18n:2004 i18n:2012 posix:1993 Best regards Keld On Thu, Apr 14, 2016 at 10:59:19AM +0200, keld@keldix.com wrote: > Please also allow ISO 30112 categories. > > best regards > keld > > On Wed, Apr 13, 2016 at 06:45:32PM -0400, Mike Frysinger wrote: > > Currently localedef accepts any value for the category keyword. This has > > allowed bad values to propagate to the vast majority of locales (~90%). > > Add some logic to only accept the 1993 POSIX and 2002 ISO-14652 standards. > > > > 2016-04-13 Mike Frysinger <vapier@gentoo.org> > > > > * locale/programs/ld-identification.c (identification_finish): Check > > that the values in identification->category are only posix:1993 or > > i18n:2002. > > --- > > locale/programs/ld-identification.c | 42 ++++++++++++++++++++++++++++++------- > > 1 file changed, 35 insertions(+), 7 deletions(-) > > > > diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c > > index 1e8fa84..eccb388 100644 > > --- a/locale/programs/ld-identification.c > > +++ b/locale/programs/ld-identification.c > > @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION")); > > TEST_ELEM (date); > > > > for (num = 0; num < __LC_LAST; ++num) > > - if (num != LC_ALL && identification->category[num] == NULL) > > - { > > - if (verbose && ! nothing) > > - WITH_CUR_LOCALE (error (0, 0, _("\ > > + { > > + /* We don't accept/parse this category, so skip it early. */ > > + if (num == LC_ALL) > > + continue; > > + > > + if (identification->category[num] == NULL) > > + { > > + if (verbose && ! nothing) > > + WITH_CUR_LOCALE (error (0, 0, _("\ > > %s: no identification for category `%s'"), > > - "LC_IDENTIFICATION", category_name[num])); > > - identification->category[num] = ""; > > - } > > + "LC_IDENTIFICATION", category_name[num])); > > + identification->category[num] = ""; > > + } > > + else > > + { > > + /* Only list the standards we care about. */ > > + static const char * const standards[] = > > + { > > + "posix:1993", > > + "i18n:2002", > > + }; > > + size_t i; > > + bool matched = false; > > + > > + for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i) > > + if (strcmp (identification->category[num], standards[i]) == 0) > > + matched = true; > > + > > + if (matched != true) > > + WITH_CUR_LOCALE (error (0, 0, _("\ > > +%s: unknown standard `%s' for category `%s'"), > > + "LC_IDENTIFICATION", > > + identification->category[num], > > + category_name[num])); > > + } > > + } > > } > > > > > > -- > > 2.7.4
On 14 Apr 2016 11:26, keld@keldix.com wrote: > Actually the standards 14652/30112 were set up so you could declare > what version of the locale category was used for the data. > POSIX is different from 14652 and again different from 30112. > 30112 is the one that most closely corresponds to glibc implementations. in general, for standards that are stuck behind ISO's dumb paywall (they want to charge CHF198 for the pleasure of downloading what should be in the public), you'll have to tell me what values to plug in, and/or what it says. although i have found this link: http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf is that the same ? if it is, i would highlight that the examples provided in the spec do not seem to line up with the spec itself ;). the Danish example that is embedded in the file tries to use "i18n:2000", and it doesn't use double quotes like it says it should be. > I also think that POSIX allows for more categories than the ones that the > 9945 standard defines, and in that way 14652 and 30112 are compatible looks like ISO 9945 is just the combined POSIX standard (2003 edition). the public 2004 edition [1] and 2013 edition [2] do not define the cat LC_IDENTIFICATION, so they wouldn't have anything to say here. also, even if those allow for defining of arbitrary categories, that's kind of orthogonal to glibc's localedef needs isn't it ? the utility has been rejecting all unknown categories for basically ever at this point. [1] http://pubs.opengroup.org/onlinepubs/009695399/ [2] http://pubs.opengroup.org/onlinepubs/9699919799/ if you try to do: LC_FOO ... END LC_FOO localdef will reject it as a syntax error. if you try to do: LC_IDENTIFICATION ... category "en_US:2000";LC_FOO ... END LC_IDENTIFICATION localdef will reject it as a syntax error (ignoring the standard part). are you referring to something else ? > with POSIX. I would advise that this still be allowed, but then declared > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like > "non-standard" to indicate that. why do we need to support that ? we're talking about what localedef will accept, and localedef is entirely a glibc-specific utility. the binary format it produces is internal glibc ABI. seems like accepting other random values isn't useful to us. > I would advice to use the values for the locale versions > given in 30112. The values defined in 30112 are: > i18n:2004 > i18n:2012 > posix:1993 OK. shall i update all the locale files then to use i18n:2012 ? -mike
On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote: > On 14 Apr 2016 11:26, keld@keldix.com wrote: > > Actually the standards 14652/30112 were set up so you could declare > > what version of the locale category was used for the data. > > POSIX is different from 14652 and again different from 30112. > > 30112 is the one that most closely corresponds to glibc implementations. > > in general, for standards that are stuck behind ISO's dumb paywall (they > want to charge CHF198 for the pleasure of downloading what should be in > the public), you'll have to tell me what values to plug in, and/or what > it says. I agree. > although i have found this link: > http://www.open-std.org/JTC1/SC35/WG5/docs/30112d10.pdf > is that the same ? It is a new Working Draft for the revision of 30112, so it contains all of the approved TR 30112 from 2014, plus some. But it is not a standard, it is work in progress. That is why we are allowed to have it publically available. > if it is, i would highlight that the examples provided in the spec do > not seem to line up with the spec itself ;). the Danish example that > is embedded in the file tries to use "i18n:2000", and it doesn't use > double quotes like it says it should be. There are errors everywhere. This is a draft, and not supposed to be error-free. Anyway, the same inconsistency was probably in the approved TR. I will see to that this be corrected. Probably it should be marked with the new standards's identifying value. > > I also think that POSIX allows for more categories than the ones that the > > 9945 standard defines, and in that way 14652 and 30112 are compatible > > looks like ISO 9945 is just the combined POSIX standard (2003 edition). > the public 2004 edition [1] and 2013 edition [2] do not define the cat > LC_IDENTIFICATION, so they wouldn't have anything to say here. also, > even if those allow for defining of arbitrary categories, that's kind > of orthogonal to glibc's localedef needs isn't it ? the utility has > been rejecting all unknown categories for basically ever at this point. > [1] http://pubs.opengroup.org/onlinepubs/009695399/ > [2] http://pubs.opengroup.org/onlinepubs/9699919799/ Well, yes, LC_IDENTIFICATION is a novelty of 14652. But 9945 - POSIX does allow implementation defined categories AFAIK. There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether glibc supports LC_XLITERATE eitherC, or the functionality is present only in LC_CTYPE. > > if you try to do: > LC_FOO > ... > END LC_FOO > localdef will reject it as a syntax error. > > if you try to do: > LC_IDENTIFICATION > ... > category "en_US:2000";LC_FOO > ... > END LC_IDENTIFICATION > localdef will reject it as a syntax error (ignoring the standard part). > > are you referring to something else ? No. I would like your last example to not error, it could issue a warning, or at least that LC_KEYBOARD be accepted. In that way one could use localedef to test new functionality. > > with POSIX. I would advise that this still be allowed, but then declared > > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like > > "non-standard" to indicate that. > > why do we need to support that ? we're talking about what localedef > will accept, and localedef is entirely a glibc-specific utility. the > binary format it produces is internal glibc ABI. seems like accepting > other random values isn't useful to us. Localedef is specified in POSIX, http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html > > I would advice to use the values for the locale versions > > given in 30112. The values defined in 30112 are: > > i18n:2004 > > i18n:2012 > > posix:1993 > > OK. shall i update all the locale files then to use i18n:2012 ? Yes, I think that this is the most appropiate. Best regards Keld
On 14 Apr 2016 17:04, keld@keldix.com wrote: > On Thu, Apr 14, 2016 at 09:50:33AM -0400, Mike Frysinger wrote: > > On 14 Apr 2016 11:26, keld@keldix.com wrote: > > > I also think that POSIX allows for more categories than the ones that the > > > 9945 standard defines, and in that way 14652 and 30112 are compatible > > > > looks like ISO 9945 is just the combined POSIX standard (2003 edition). > > the public 2004 edition [1] and 2013 edition [2] do not define the cat > > LC_IDENTIFICATION, so they wouldn't have anything to say here. also, > > even if those allow for defining of arbitrary categories, that's kind > > of orthogonal to glibc's localedef needs isn't it ? the utility has > > been rejecting all unknown categories for basically ever at this point. > > [1] http://pubs.opengroup.org/onlinepubs/009695399/ > > [2] http://pubs.opengroup.org/onlinepubs/9699919799/ > > Well, yes, LC_IDENTIFICATION is a novelty of 14652. > But 9945 - POSIX does allow implementation defined categories AFAIK. sure -- see below > There is one new category in 30112, namely LC_KEYBOARD. I am not sure whether > glibc supports LC_XLITERATE eitherC, or the functionality is present only in > LC_CTYPE. we don't support LC_KEYBOARD or LC_XLITERATE today. i think any new categories would need to be proposed including why glibc should carry them at all. i haven't read the standard, so i can't speak to either. > > if you try to do: > > LC_FOO > > ... > > END LC_FOO > > localdef will reject it as a syntax error. > > > > if you try to do: > > LC_IDENTIFICATION > > ... > > category "en_US:2000";LC_FOO > > ... > > END LC_IDENTIFICATION > > localdef will reject it as a syntax error (ignoring the standard part). > > > > are you referring to something else ? > > No. I would like your last example to not error, it could issue a warning, > or at least that LC_KEYBOARD be accepted. > In that way one could use localedef to test new functionality. we can have it warn. localedef has precedence w/not warning about many things or being fatal by default, but adding -v makes it more strict. this seems to fall into that bucket. i'm not keen on -v/--verbose being a hidden alias to also "exit non-zero in many more cases", but that's a diff topic :). > > > with POSIX. I would advise that this still be allowed, but then declared > > > in the LC_IDENTIFICATION section. Maybe we should use a specifiv version value like > > > "non-standard" to indicate that. > > > > why do we need to support that ? we're talking about what localedef > > will accept, and localedef is entirely a glibc-specific utility. the > > binary format it produces is internal glibc ABI. seems like accepting > > other random values isn't useful to us. > > Localedef is specified in POSIX, > http://pubs.opengroup.org/onlinepubs/009696699/utilities/localedef.html on the frontend sure. i was thinking of its output format which is not specified by POSIX but is an internal glibc ABI detail. it even says: The localedef utility shall convert source definitions for locale categories into a format usable by the functions and utilities ... i.e. it doesn't specify that output format. back to the frontend, what POSIX specifically says is: In addition, the input may contain source for implementation-defined categories. so glibc's localedef is free to support as many more or few categories as it sees fit. that includes outright rejecting unknown ones. also, if we want to speak stricly about POSIX, it also says: -u code_set_name Specify the name of a codeset used as the target mapping of character symbols and collating element symbols whose encoding values are defined in terms of the ISO/IEC 10646-1:2000 standard position constant values. pretty sure that says we aren't even permitted to support a newer standard there. whether it matters in practice i'm not sure (haven't done a diff on the diff versions/standards). -mike
diff --git a/locale/programs/ld-identification.c b/locale/programs/ld-identification.c index 1e8fa84..eccb388 100644 --- a/locale/programs/ld-identification.c +++ b/locale/programs/ld-identification.c @@ -164,14 +164,42 @@ No definition for %s category found"), "LC_IDENTIFICATION")); TEST_ELEM (date); for (num = 0; num < __LC_LAST; ++num) - if (num != LC_ALL && identification->category[num] == NULL) - { - if (verbose && ! nothing) - WITH_CUR_LOCALE (error (0, 0, _("\ + { + /* We don't accept/parse this category, so skip it early. */ + if (num == LC_ALL) + continue; + + if (identification->category[num] == NULL) + { + if (verbose && ! nothing) + WITH_CUR_LOCALE (error (0, 0, _("\ %s: no identification for category `%s'"), - "LC_IDENTIFICATION", category_name[num])); - identification->category[num] = ""; - } + "LC_IDENTIFICATION", category_name[num])); + identification->category[num] = ""; + } + else + { + /* Only list the standards we care about. */ + static const char * const standards[] = + { + "posix:1993", + "i18n:2002", + }; + size_t i; + bool matched = false; + + for (i = 0; i < sizeof (standards) / sizeof (standards[0]); ++i) + if (strcmp (identification->category[num], standards[i]) == 0) + matched = true; + + if (matched != true) + WITH_CUR_LOCALE (error (0, 0, _("\ +%s: unknown standard `%s' for category `%s'"), + "LC_IDENTIFICATION", + identification->category[num], + category_name[num])); + } + } }