Message ID | 201810281058.AA04073@tamuki.linet.gr.jp |
---|---|
State | New |
Headers | show |
Series | [v2] Improve the width of alternate representation for year in strftime [BZ #23758] | expand |
TAMUKI Shoichi wrote: > Since only one Japanese era name is used by each emperor's reign, it > is rare that the year ends in one digit or lasts more than three > digits. Rare recently, but over the long term about 75% of Japanese imperial years have been single-digit years: since 701 AD there have been 989 single-digit years but only 329 two-digit years. (This calculation is approximate, but it's close enough; see attached shell script for how I did the calculation.) Although Japan is more stable now than it was centuries ago, the long reigns since 1868 are a historical aberration and it should not be surprising if the fraction of single-digit years reverts closer to historical levels in the not-too-distant future. Although I'm no expert in Japanese, as I understand it the most common style for formatting imperial dates in plain text uses no spaces anywhere, e.g., "平成2年3 月4日" for Heisei 2 March 4. It's far less common to see spaces to make things line up, presumably for tables. Since glibc is already defaulting to space padding for month and day-of-month, it makes sense for glibc to also default to space padding for imperial year. However, this change should be announced more clearly. The ChangeLog entry should say what's going on at a high level, and give an example call to strftime with the before-and-after output, along with how to generate imperial dates with no spaces; and (more important) the glibc documentation should for strftime should contain similar examples.
Hello Paul, Thank you for your review. From: Paul Eggert <eggert@cs.ucla.edu> Subject: Re: [PATCH v2] Improve the width of alternate representation for year in strftime [BZ #23758] Date: Sun, 28 Oct 2018 14:06:46 -0700 > TAMUKI Shoichi wrote: > > Since only one Japanese era name is used by each emperor's reign, it > > is rare that the year ends in one digit or lasts more than three > > digits. > > Rare recently, but over the long term about 75% of Japanese imperial > years have been single-digit years: since 701 AD there have been 989 > single-digit years but only 329 two-digit years. (This calculation is > approximate, but it's close enough; see attached shell script for how > I did the calculation.) Although Japan is more stable now than it was > centuries ago, the long reigns since 1868 are a historical aberration > and it should not be surprising if the fraction of single-digit years > reverts closer to historical levels in the not-too-distant future. As you mentioned, before the Meiji era (1868), there were many eras of short years. However, since they used the Lunisolar calendar instead of the Gregorian calendar before the Meiji era, it is difficult to accurately represent dates in the current glibc scheme and I think that we do not have to care from a practical point of view. In fact, before the Meiji era, there are not any era entries but defined AD and BC instead in the Japanese locale data in glibc. Also, it is interesting to speculate that era years in future might be shorter like before. However, it does not necessarily guarantee that all eras will be a single-digit year. I think that it is reasonable to change the width padding with zero of %Ey default to 2 so as to keep it a constant width across the past and the future. Regarding the commit message, I will change the expression as follows. | Since only one Japanese era name is recently used by each emperor's | reign, it is rare that the year ends in one digit or lasts more than | three digits. > Although I'm no expert in Japanese, as I understand it the most common > style for formatting imperial dates in plain text uses no spaces > anywhere, The most common style for formatting the Japanese calendar dates in plain text is not necessarily without spaces. > It's far less common to see spaces to make things line up, presumably > for tables. I think these are the ones that will be used properly according to the application. Both the regular representation (%c, %x, %X) and the alternate representation (%Ec, %Ex, %EX) in the Japanese locale of glibc are defaulting to padded with zeros. This is suitable for expressing width sensitive, such as business forms. Next, padding with space is easy to read by humans while expressing them in the same width, but on the other hand, it is not suitable for splitting fields with delimiters of spaces. Finally, a format that does not use padding is suitable for inputs of applications that create output equivalent to typesetting such as TeX. > Since glibc is already defaulting to space padding for month and day- > of-month, it makes sense for glibc to also default to space padding > for imperial year. However, this change should be announced more > clearly. The ChangeLog entry should say what's going on at a high > level, and give an example call to strftime with the before-and-after > output, along with how to generate imperial dates with no spaces; and > (more important) the glibc documentation should for strftime should > contain similar examples. As mentioned above, in the Japanese locale of glibc are defaulting to padded with zeros, so it is also natural to pad with a zero year in the Japanese calendar. In strftime of glibc document, it says as follows. | The default action is to pad the number with zeros to keep it a | constant width. The change from zero to space padding may cause backward compatibility in the Japanese locale, so I think that it is OK as it is. Since the change of this time makes sane handling of display width of one-digit year for the Japanese calendar which was not encountered directly so far since the Japanese locale of glibc appeared, I think for now that it is unnecessary to add new document about the issue specialized for the Japanese locale. Regards, TAMUKI Shoichi
Hello, Sorry, the previous my mail have contained several typos. > Subject: Re: [PATCH] Improve the width of alternate representation for year in strftime [BZ #23758] s/PATCH/& v2/ > The change from zero to space padding may cause backward compatibility > in the Japanese locale, so I think that it is OK as it is. s/cause/break/ Regards, TAMUKI Shoichi
diff --git a/time/Makefile b/time/Makefile index ec3e39dcea..6dc2acceaa 100644 --- a/time/Makefile +++ b/time/Makefile @@ -43,7 +43,7 @@ tests := test_time clocktest tst-posixtz tst-strptime tst_wcsftime \ tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \ tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \ tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \ - tst-tzname tst-y2039 + tst-tzname tst-y2039 tst-strftime2 include ../Rules diff --git a/time/strftime_l.c b/time/strftime_l.c index c71f9f47a9..8797341ea5 100644 --- a/time/strftime_l.c +++ b/time/strftime_l.c @@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] = #endif static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *, - const struct tm *, bool * + const struct tm *, int *, bool * ut_argument_spec LOCALE_PARAM) __THROW; @@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format, tmcopy = *tp; tp = &tmcopy; #endif + int yr_spec = 0; /* Override padding for %Ey. */ bool tzset_called = false; - return __strftime_internal (s, maxsize, format, tp, &tzset_called + return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called ut_argument LOCALE_ARG); } #ifdef _LIBC @@ -466,7 +467,7 @@ libc_hidden_def (my_strftime) static size_t __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, - const struct tm *tp, bool *tzset_called + const struct tm *tp, int *yr_spec, bool *tzset_called ut_argument_spec LOCALE_PARAM) { #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL @@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, if (modifier == L_('O')) goto bad_format; #ifdef _NL_CURRENT - if (! (modifier == 'E' + if (! (modifier == L_('E') && (*(subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_T_FMT))) @@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, { CHAR_T *old_start = p; size_t len = __strftime_internal (NULL, (size_t) -1, subfmt, - tp, tzset_called ut_argument - LOCALE_ARG); + tp, yr_spec, tzset_called + ut_argument LOCALE_ARG); add (len, __strftime_internal (p, maxsize - i, subfmt, - tp, tzset_called ut_argument - LOCALE_ARG)); + tp, yr_spec, tzset_called + ut_argument LOCALE_ARG)); + *yr_spec = 0; if (to_uppcase) while (old_start < p) @@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, #ifdef _NL_CURRENT if (! (modifier == L_('E') && (*(subfmt = - (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT))) + (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT))) != L_('\0')))) subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT)); goto subformat; @@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, DO_NUMBER (1, tp->tm_wday); case L_('Y'): - if (modifier == 'E') + if (modifier == L_('E')) { #if HAVE_STRUCT_ERA_ENTRY struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG); @@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, # else subfmt = era->era_format; # endif + if (pad != 0) + *yr_spec = pad; goto subformat; } #else @@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format, if (era) { int delta = tp->tm_year - era->start_date[0]; - DO_NUMBER (1, (era->offset + if (*yr_spec != 0) + pad = *yr_spec; + DO_NUMBER (2, (era->offset + delta * era->absolute_direction)); } #else diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c new file mode 100644 index 0000000000..3e7ddfe9ea --- /dev/null +++ b/time/tst-strftime2.c @@ -0,0 +1,134 @@ +/* Verify the behavior of strftime on alternate representation for year. + + Copyright (C) 2013-2018 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + <http://www.gnu.org/licenses/>. */ + +#include <locale.h> +#include <time.h> +#include <stdio.h> +#include <string.h> + +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" }; +#define nlocales (sizeof (locales) / sizeof (locales[0])) + +static const char *formats[] = { "%EY", "%_EY", "%-EY" }; +#define nformats (sizeof (formats) / sizeof (formats[0])) + +static const struct +{ + const int d, m, y; +} dates[] = + { + { 1, 3, 88 }, + { 7, 0, 89 }, + { 8, 0, 89 }, + { 1, 3, 90 }, + { 1, 3, 97 }, + { 1, 3, 98 } + }; +#define ndates (sizeof (dates) / sizeof (dates[0])) + +static char ref[nlocales][nformats][ndates][100]; + +static void +mkreftable (void) +{ + int i, j, k; + char era[10]; + static const int yrj[] = { 63, 64, 1, 2, 9, 10 }; + static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 }; + + for (i = 0; i < nlocales; i++) + for (j = 0; j < nformats; j++) + for (k = 0; k < ndates; k++) + { + if (i == 0) + { + sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c" + : "\xe5\xb9\xb3\xe6\x88\x90"); + if (yrj[k] == 1) + sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era); + else + { + if (j == 0) + sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]); + else if (j == 1) + sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]); + else + sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]); + } + } + else if (i == 1) + { + sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e "); + sprintf (ref[i][j][k], "%s%d", era, yrb[k]); + } + else + { + sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e "); + sprintf (ref[i][j][k], "%s%d", era, yrb[k]); + } + } +} + +static int +do_test (void) +{ + int i, j, k, result = 0; + struct tm ttm; + char date[11], buf[100]; + size_t r, e; + + mkreftable (); + for (i = 0; i < nlocales; i++) + { + if (setlocale (LC_ALL, locales[i]) == NULL) + { + printf ("locale %s does not exist, skipping...\n", locales[i]); + continue; + } + printf ("[%s]\n", locales[i]); + for (j = 0; j < nformats; j++) + { + for (k = 0; k < ndates; k++) + { + ttm.tm_mday = dates[k].d; + ttm.tm_mon = dates[k].m; + ttm.tm_year = dates[k].y; + strftime (date, sizeof (date), "%F", &ttm); + r = strftime (buf, sizeof (buf), formats[j], &ttm); + e = strlen (ref[i][j][k]); + printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf); + if (strcmp (buf, ref[i][j][k]) != 0) + { + printf ("\tshould be \"%s\"", ref[i][j][k]); + if (r != e) + printf ("\tgot: %zu, expected: %zu", r, e); + result = 1; + } + else + printf ("\tOK"); + putchar ('\n'); + } + putchar ('\n'); + } + } + return result; +} + +#define TEST_FUNCTION do_test () +#include "../test-skeleton.c"