[v2] Improve the width of alternate representation for year in strftime [BZ #23758]
diff mbox series

Message ID 201810281058.AA04073@tamuki.linet.gr.jp
State New
Headers show
Series
  • [v2] Improve the width of alternate representation for year in strftime [BZ #23758]
Related show

Commit Message

TAMUKI Shoichi Oct. 28, 2018, 10:58 a.m. UTC
The Japanese era name is scheduled to be changed on May 1, 2019.
Prior to this, change the alternate representation for year in
strftime to pad the number with zero to keep it constant width, so
that prevent the trouble we saw in the past from becoming obvious
again from the year after the era name changes onward.

Since only one Japanese era name is used by each emperor's reign, it
is rare that the year ends in one digit or lasts more than three
digits.  In addition, the width of month, day, hour, minute, and
second is 2, so adjust the width of year the same as them, and then
the whole display balance is improved.  Therefore, it would be
reasonable to change the width padding with zero of %Ey default to 2.

Currently in glibc, besides ja_JP locale, the locales using the
conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
these locales, they use the Buddhist era.  The Buddhist era is a value
obtained by adding 543 to the Christian era, so it is not affected by
this change of the conversion specifier %Ey.

Furthermore, for the output string of the conversion specifier %EY,
an optional flag is given to the conversion specifier so that it can
be also used the current non-padding format, and the padding format
can be controlled.  To achieve this, when an optional flag is given to
the conversion specifier %EY, the %Ey included in the combined
conversion specifier is interpreted as if decorated with the
appropriate flag.

ChangeLog:

	[BZ #23758]
	* time/Makefile (tests): Add tst-strftime2.
	* time/strftime_l.c (__strftime_internal): Add argument
	yr_spec to override padding for %Ey.
	Change the width padding with zero of %Ey default to 2.
	If an optional flag ('_' or '-') is specified to %EY, the %Ey
	in subformat is interpreted as if decorated with the
	appropriate flag.
	* time/tst-strftime2.c: New file.
---
 Changes since v1:
 - change to be handled by not rewriting the format string
 - slightly modify the commit message

 time/Makefile        |   2 +-
 time/strftime_l.c    |  28 ++++++-----
 time/tst-strftime2.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 152 insertions(+), 12 deletions(-)
 create mode 100644 time/tst-strftime2.c

Comments

Paul Eggert Oct. 28, 2018, 9:06 p.m. UTC | #1
TAMUKI Shoichi wrote:
> Since only one Japanese era name is used by each emperor's reign, it
> is rare that the year ends in one digit or lasts more than three
> digits.

Rare recently, but over the long term about 75% of Japanese imperial years have 
been single-digit years: since 701 AD there have been 989 single-digit years but 
only 329 two-digit years. (This calculation is approximate, but it's close 
enough; see attached shell script for how I did the calculation.) Although Japan 
is more stable now than it was centuries ago, the long reigns since 1868 are a 
historical aberration and it should not be surprising if the fraction of 
single-digit years reverts closer to historical levels in the not-too-distant 
future.

Although I'm no expert in Japanese, as I understand it the most common style for 
formatting imperial dates in plain text uses no spaces anywhere, e.g., "平成2年3 
月4日" for Heisei 2 March 4. It's far less common to see spaces to make things 
line up, presumably for tables.

Since glibc is already defaulting to space padding for month and day-of-month, 
it makes sense for glibc to also default to space padding for imperial year. 
However, this change should be announced more clearly. The ChangeLog entry 
should say what's going on at a high level, and give an example call to strftime 
with the before-and-after output, along with how to generate imperial dates with 
no spaces; and (more important) the glibc documentation should for strftime 
should contain similar examples.
TAMUKI Shoichi Oct. 29, 2018, 10:08 p.m. UTC | #2
Hello Paul,

Thank you for your review.

From: Paul Eggert <eggert@cs.ucla.edu>
Subject: Re: [PATCH v2] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Sun, 28 Oct 2018 14:06:46 -0700

> TAMUKI Shoichi wrote:
> > Since only one Japanese era name is used by each emperor's reign, it
> > is rare that the year ends in one digit or lasts more than three
> > digits.
> 
> Rare recently, but over the long term about 75% of Japanese imperial
> years have been single-digit years: since 701 AD there have been 989
> single-digit years but only 329 two-digit years.  (This calculation is
> approximate, but it's close enough; see attached shell script for how
> I did the calculation.)  Although Japan is more stable now than it was
> centuries ago, the long reigns since 1868 are a historical aberration
> and it should not be surprising if the fraction of single-digit years
> reverts closer to historical levels in the not-too-distant future.

As you mentioned, before the Meiji era (1868), there were many eras of
short years.  However, since they used the Lunisolar calendar instead
of the Gregorian calendar before the Meiji era, it is difficult to
accurately represent dates in the current glibc scheme and I think
that we do not have to care from a practical point of view.  In fact,
before the Meiji era, there are not any era entries but defined AD and
BC instead in the Japanese locale data in glibc.  Also, it is
interesting to speculate that era years in future might be shorter
like before.  However, it does not necessarily guarantee that all eras
will be a single-digit year.  I think that it is reasonable to change
the width padding with zero of %Ey default to 2 so as to keep it a
constant width across the past and the future.

Regarding the commit message, I will change the expression as follows.

| Since only one Japanese era name is recently used by each emperor's
| reign, it is rare that the year ends in one digit or lasts more than
| three digits.

> Although I'm no expert in Japanese, as I understand it the most common
> style for formatting imperial dates in plain text uses no spaces
> anywhere,

The most common style for formatting the Japanese calendar dates in
plain text is not necessarily without spaces.

> It's far less common to see spaces to make things line up, presumably
> for tables.

I think these are the ones that will be used properly according to the
application.  Both the regular representation (%c, %x, %X) and the
alternate representation (%Ec, %Ex, %EX) in the Japanese locale of
glibc are defaulting to padded with zeros.  This is suitable for
expressing width sensitive, such as business forms.  Next, padding
with space is easy to read by humans while expressing them in the same
width, but on the other hand, it is not suitable for splitting fields
with delimiters of spaces.  Finally, a format that does not use
padding is suitable for inputs of applications that create output
equivalent to typesetting such as TeX.

> Since glibc is already defaulting to space padding for month and day-
> of-month, it makes sense for glibc to also default to space padding
> for imperial year.  However, this change should be announced more
> clearly.  The ChangeLog entry should say what's going on at a high
> level, and give an example call to strftime with the before-and-after
> output, along with how to generate imperial dates with no spaces; and
> (more important) the glibc documentation should for strftime should
> contain similar examples.

As mentioned above, in the Japanese locale of glibc are defaulting to
padded with zeros, so it is also natural to pad with a zero year in
the Japanese calendar.  In strftime of glibc document, it says as
follows.

| The default action is to pad the number with zeros to keep it a
| constant width.

The change from zero to space padding may cause backward compatibility
in the Japanese locale, so I think that it is OK as it is.

Since the change of this time makes sane handling of display width of
one-digit year for the Japanese calendar which was not encountered
directly so far since the Japanese locale of glibc appeared, I think
for now that it is unnecessary to add new document about the issue
specialized for the Japanese locale.

Regards,
TAMUKI Shoichi
TAMUKI Shoichi Oct. 31, 2018, 3:25 a.m. UTC | #3
Hello,

Sorry, the previous my mail have contained several typos.

> Subject: Re: [PATCH] Improve the width of alternate representation for year in strftime [BZ #23758]

s/PATCH/& v2/

> The change from zero to space padding may cause backward compatibility
> in the Japanese locale, so I think that it is OK as it is.

s/cause/break/

Regards,
TAMUKI Shoichi

Patch
diff mbox series

diff --git a/time/Makefile b/time/Makefile
index ec3e39dcea..6dc2acceaa 100644
--- a/time/Makefile
+++ b/time/Makefile
@@ -43,7 +43,7 @@  tests	:= test_time clocktest tst-posixtz tst-strptime tst_wcsftime \
 	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
 	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
 	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
-	   tst-tzname tst-y2039
+	   tst-tzname tst-y2039 tst-strftime2
 
 include ../Rules
 
diff --git a/time/strftime_l.c b/time/strftime_l.c
index c71f9f47a9..8797341ea5 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -434,7 +434,7 @@  static CHAR_T const month_name[][10] =
 #endif
 
 static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
-				   const struct tm *, bool *
+				   const struct tm *, int *, bool *
 				   ut_argument_spec
 				   LOCALE_PARAM) __THROW;
 
@@ -456,8 +456,9 @@  my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format,
   tmcopy = *tp;
   tp = &tmcopy;
 #endif
+  int yr_spec = 0;		/* Override padding for %Ey.  */
   bool tzset_called = false;
-  return __strftime_internal (s, maxsize, format, tp, &tzset_called
+  return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
 			      ut_argument LOCALE_ARG);
 }
 #ifdef _LIBC
@@ -466,7 +467,7 @@  libc_hidden_def (my_strftime)
 
 static size_t
 __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
-		     const struct tm *tp, bool *tzset_called
+		     const struct tm *tp, int *yr_spec, bool *tzset_called
 		     ut_argument_spec LOCALE_PARAM)
 {
 #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
@@ -820,7 +821,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  if (modifier == L_('O'))
 	    goto bad_format;
 #ifdef _NL_CURRENT
-	  if (! (modifier == 'E'
+	  if (! (modifier == L_('E')
 		 && (*(subfmt =
 		       (const CHAR_T *) _NL_CURRENT (LC_TIME,
 						     NLW(ERA_D_T_FMT)))
@@ -838,11 +839,12 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  {
 	    CHAR_T *old_start = p;
 	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
-					      tp, tzset_called ut_argument
-					      LOCALE_ARG);
+					      tp, yr_spec, tzset_called
+					      ut_argument LOCALE_ARG);
 	    add (len, __strftime_internal (p, maxsize - i, subfmt,
-					   tp, tzset_called ut_argument
-					   LOCALE_ARG));
+					   tp, yr_spec, tzset_called
+					   ut_argument LOCALE_ARG));
+	    *yr_spec = 0;
 
 	    if (to_uppcase)
 	      while (old_start < p)
@@ -917,7 +919,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 #ifdef _NL_CURRENT
 	  if (! (modifier == L_('E')
 		 && (*(subfmt =
-		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
+		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
 		     != L_('\0'))))
 	    subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT));
 	  goto subformat;
@@ -1262,7 +1264,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  DO_NUMBER (1, tp->tm_wday);
 
 	case L_('Y'):
-	  if (modifier == 'E')
+	  if (modifier == L_('E'))
 	    {
 #if HAVE_STRUCT_ERA_ENTRY
 	      struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG);
@@ -1273,6 +1275,8 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 # else
 		  subfmt = era->era_format;
 # endif
+		  if (pad != 0)
+		    *yr_spec = pad;
 		  goto subformat;
 		}
 #else
@@ -1294,7 +1298,9 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	      if (era)
 		{
 		  int delta = tp->tm_year - era->start_date[0];
-		  DO_NUMBER (1, (era->offset
+		  if (*yr_spec != 0)
+		    pad = *yr_spec;
+		  DO_NUMBER (2, (era->offset
 				 + delta * era->absolute_direction));
 		}
 #else
diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
new file mode 100644
index 0000000000..3e7ddfe9ea
--- /dev/null
+++ b/time/tst-strftime2.c
@@ -0,0 +1,134 @@ 
+/* Verify the behavior of strftime on alternate representation for year.
+
+   Copyright (C) 2013-2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <locale.h>
+#include <time.h>
+#include <stdio.h>
+#include <string.h>
+
+static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
+#define nlocales (sizeof (locales) / sizeof (locales[0]))
+
+static const char *formats[] = { "%EY", "%_EY", "%-EY" };
+#define nformats (sizeof (formats) / sizeof (formats[0]))
+
+static const struct
+{
+  const int d, m, y;
+} dates[] =
+  {
+    { 1, 3, 88 },
+    { 7, 0, 89 },
+    { 8, 0, 89 },
+    { 1, 3, 90 },
+    { 1, 3, 97 },
+    { 1, 3, 98 }
+  };
+#define ndates (sizeof (dates) / sizeof (dates[0]))
+
+static char ref[nlocales][nformats][ndates][100];
+
+static void
+mkreftable (void)
+{
+  int i, j, k;
+  char era[10];
+  static const int yrj[] = { 63, 64, 1, 2, 9, 10 };
+  static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 };
+
+  for (i = 0; i < nlocales; i++)
+    for (j = 0; j < nformats; j++)
+      for (k = 0; k < ndates; k++)
+	{
+	  if (i == 0)
+	    {
+	      sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c"
+					  : "\xe5\xb9\xb3\xe6\x88\x90");
+	      if (yrj[k] == 1)
+		sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era);
+	      else
+		{
+		  if (j == 0)
+		    sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]);
+		  else if (j == 1)
+		    sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]);
+		  else
+		    sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]);
+		}
+	    }
+	  else if (i == 1)
+	    {
+	      sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	  else
+	    {
+	      sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	}
+}
+
+static int
+do_test (void)
+{
+  int i, j, k, result = 0;
+  struct tm ttm;
+  char date[11], buf[100];
+  size_t r, e;
+
+  mkreftable ();
+  for (i = 0; i < nlocales; i++)
+    {
+      if (setlocale (LC_ALL, locales[i]) == NULL)
+	{
+	  printf ("locale %s does not exist, skipping...\n", locales[i]);
+	  continue;
+	}
+      printf ("[%s]\n", locales[i]);
+      for (j = 0; j < nformats; j++)
+	{
+	  for (k = 0; k < ndates; k++)
+	    {
+	      ttm.tm_mday = dates[k].d;
+	      ttm.tm_mon  = dates[k].m;
+	      ttm.tm_year = dates[k].y;
+	      strftime (date, sizeof (date), "%F", &ttm);
+	      r = strftime (buf, sizeof (buf), formats[j], &ttm);
+	      e = strlen (ref[i][j][k]);
+	      printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf);
+	      if (strcmp (buf, ref[i][j][k]) != 0)
+		{
+		  printf ("\tshould be \"%s\"", ref[i][j][k]);
+		  if (r != e)
+		    printf ("\tgot: %zu, expected: %zu", r, e);
+		  result = 1;
+		}
+	      else
+		printf ("\tOK");
+	      putchar ('\n');
+	    }
+	  putchar ('\n');
+	}
+    }
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"