Improve the width of alternate representation for year in strftime [BZ #23758]

Message ID 201810112107.AA04060@tamuki.linet.gr.jp
State New
Headers show
Series
  • Improve the width of alternate representation for year in strftime [BZ #23758]
Related show

Commit Message

TAMUKI Shoichi Oct. 11, 2018, 9:07 p.m.
The Japanese era name is scheduled to be changed on May 1, 2019.
Prior to this, change the alternate representation for year in
strftime to pad the number with zero to keep it constant width, so
that prevent the trouble we saw in the past from becoming obvious
again from the year after the era name changes onward.

Since only one Japanese era name is used by each emperor's reign, it
is rare that the year ends in one digit or lasts more than three
digits.  In addition, the width of month, day, hour, minute, and
second is 2, so adjust the width of year the same as them, the whole
display balance is improved.  Therefore, it would be reasonable to
change the width padding with zero of %Ey default to 2.

In glibc, besides ja_JP locale, the locales currently using the
conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
these locales, they use the Buddhist era.  The Buddhist era is a value
obtained by adding 543 to the Christian era, so it is not affected by
this change of the conversion specifier %Ey.

Furthermore, for the output string of the conversion specifier %EY,
an optional flag is given to the conversion specifier so that it can
be also used the current non-padding format, and the padding format
can be controlled.  To achieve this, when an optional flag is given to
the conversion specifier %EY, processing is performed as the flag to
be given to %Ey included in the combined conversion specifier.

ChangeLog:

	[BZ #23758]
	* time/Makefile (tests): Add tst-strftime2.
	* time/strftime_l.c (STRCPY, STRSTR, MEMMOVE): New macros.
	(add): Free subfmt2 before return if using malloc.
	(subfmt2, using_malloc): New variables in __strftime_internal.
	(__strftime_internal): Change the width padding with zero of %Ey
	default to 2.
	If an optional flag ('_' or '-') is specified to %EY, delegate
	the flag to %Ey in subformat, and then free subfmt2 after add
	call if using malloc.
	* time/tst-strftime2.c: New file.
---
 time/Makefile        |   2 +-
 time/strftime_l.c    |  53 ++++++++++++++++++--
 time/tst-strftime2.c | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 6 deletions(-)
 create mode 100644 time/tst-strftime2.c

Comments

Florian Weimer Oct. 23, 2018, 12:54 p.m. | #1
* TAMUKI Shoichi:

> The Japanese era name is scheduled to be changed on May 1, 2019.
> Prior to this, change the alternate representation for year in
> strftime to pad the number with zero to keep it constant width, so
> that prevent the trouble we saw in the past from becoming obvious
> again from the year after the era name changes onward.
>
> Since only one Japanese era name is used by each emperor's reign, it
> is rare that the year ends in one digit or lasts more than three
> digits.  In addition, the width of month, day, hour, minute, and
> second is 2, so adjust the width of year the same as them, the whole
> display balance is improved.  Therefore, it would be reasonable to
> change the width padding with zero of %Ey default to 2.
>
> In glibc, besides ja_JP locale, the locales currently using the
> conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
> these locales, they use the Buddhist era.  The Buddhist era is a value
> obtained by adding 543 to the Christian era, so it is not affected by
> this change of the conversion specifier %Ey.
>
> Furthermore, for the output string of the conversion specifier %EY,
> an optional flag is given to the conversion specifier so that it can
> be also used the current non-padding format, and the padding format
> can be controlled.  To achieve this, when an optional flag is given to
> the conversion specifier %EY, processing is performed as the flag to
> be given to %Ey included in the combined conversion specifier.

Thanks for this suggestion.  To you have a copyright assignment for
glibc on file?

It looks like that there are two changes (plus some formatting cleanups)
in this patch: the change in padding:

> -		  DO_NUMBER (1, (era->offset
> +		  DO_NUMBER (2, (era->offset
>  				 + delta * era->absolute_direction));

and adding a way to override the padding, when %EY is exanded using
locale data, rewriting the %Ey contained in the format string.

I think the second part should be handled differently, by not rewriting
the format string, and rather adding strftime arguments (possibly in a
struct) to set the default padding.

This should also avoid the need for new memory allocations.

Thanks,
Florian
Rafal Luzynski Oct. 23, 2018, 8:13 p.m. | #2
23.10.2018 14:54 Florian Weimer <fweimer@redhat.com> wrote:
> [...]
> This should also avoid the need for new memory allocations.
> 
> Thanks,
> Florian

While at this, Florian, I think you tried to eliminate all occurrences
of alloca family functions in the past.  I can see that Tamuki Shoichi's
patch uses alloca family optionally.  Shouldn't this patch just assume
that alloca is never used, and using_malloc is always true and therefore
can be dropped?  The patch would be simpler and IIUC more secure.

Regards,

Rafal
Florian Weimer Oct. 24, 2018, 7:55 a.m. | #3
* Rafal Luzynski:

> 23.10.2018 14:54 Florian Weimer <fweimer@redhat.com> wrote:
>> [...]
>> This should also avoid the need for new memory allocations.
>> 
>> Thanks,
>> Florian
>
> While at this, Florian, I think you tried to eliminate all occurrences
> of alloca family functions in the past.  I can see that Tamuki Shoichi's
> patch uses alloca family optionally.  Shouldn't this patch just assume
> that alloca is never used, and using_malloc is always true and therefore
> can be dropped?  The patch would be simpler and IIUC more secure.

Right, that's why I suggested not to rewrite the format string.  Then
the alloca will be gone because the copy is not needed.

Thanks,
Florian
TAMUKI Shoichi Oct. 24, 2018, 10:54 a.m. | #4
Hello Florian,

Thank you for your review.

From: Florian Weimer <fweimer@redhat.com>
Subject: Re: [PATCH] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Tue, 23 Oct 2018 14:54:04 +0200

> Thanks for this suggestion.  To you have a copyright assignment for
> glibc on file?

Currently in progress.  I sent the information for request-assign.future
to assign@gnu.org over ten days ago.

> It looks like that there are two changes (plus some formatting cleanups)
> in this patch: the change in padding:
> 
> > -		  DO_NUMBER (1, (era->offset
> > +		  DO_NUMBER (2, (era->offset
> >  				 + delta * era->absolute_direction));
> 
> and adding a way to override the padding, when %EY is exanded using
> locale data, rewriting the %Ey contained in the format string.

Yes, that's right.

> I think the second part should be handled differently, by not rewriting
> the format string, and rather adding strftime arguments (possibly in a
> struct) to set the default padding.
> 
> This should also avoid the need for new memory allocations.

Certainly.  I also want to avoid dynamic memory allocation as much as
possible.  To add strftime arguments, which is not the prepared format
string (era_format in locale data) but the modified string (rewriting
the %Ey), anoher allocated separate memory is required.

An alternative plan will be to allocate memory beforehand statically.
In that case, we have to decide the size to be assumed.

Currently, there are 3 locales which use era format.  The maximum
length of era format (including null) is 16 bytes in wchar_t for now.

tamuki@seadog:~/glibc/localedata/locales$ grep -A10 "^\<era\>" *
ja_JP:era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
ja_JP-	"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
ja_JP-	"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
ja_JP-	"+:1:1926//12//25:1926//12//31:<U662D><U548C>:%EC<U5143><U5E74>";/
ja_JP-	"+:2:1913//01//01:1926//12//24:<U5927><U6B63>:%EC%Ey<U5E74>";/
ja_JP-	"+:2:1912//07//30:1912//12//31:<U5927><U6B63>:%EC<U5143><U5E74>";/
ja_JP-	"+:6:1873//01//01:1912//07//29:<U660E><U6CBB>:%EC%Ey<U5E74>";/
ja_JP-	"+:1:0001//01//01:1872//12//31:<U897F><U66A6>:%EC%Ey<U5E74>";/
ja_JP-	"+:1:-0001//12//31:-*:<U7D00><U5143><U524D>:%EC%Ey<U5E74>"
ja_JP-
ja_JP-era_d_fmt	"%EY%m<U6708>%d<U65E5>"
--
lo_LA:era     "+:1:-543//01//01:+*:<U0E9E>.<U0EAA>.:%EC %Ey"
lo_LA-era_d_fmt       "%e %b %Ey"
lo_LA-era_t_fmt       "%H.%M.%S <U0E99>."
lo_LA-era_d_t_fmt     "<U0EA7><U0EB1><U0E99>%A<U0E97><U0EB5><U0EC8> %e %B %EC %Ey, %H.%M.%S <U0E99>."
lo_LA-% Appropriate date representation (date(1))
lo_LA-date_fmt       "%a %e %b %Ey %H:%M:%S %Z"
lo_LA-week 7;19971130;1
lo_LA-END LC_TIME
lo_LA-
lo_LA-LC_MESSAGES
lo_LA-
--
th_TH:era     "+:1:-543//01//01:+*:<U0E1E>.<U0E28>.:%EC %Ey"
th_TH-era_d_fmt       "%e %b %Ey"
th_TH-era_t_fmt       "%H.%M.%S <U0E19>."
th_TH-era_d_t_fmt     "<U0E27><U0E31><U0E19>%A<U0E17><U0E35><U0E48> %e %B %EC %Ey, %H.%M.%S <U0E19>."
th_TH-% Appropriate date representation (date(1))
th_TH-date_fmt       "%a %e %b %Ey %H:%M:%S %Z"
th_TH-week 7;19971130;1
th_TH-END LC_TIME
th_TH-
th_TH-LC_MESSAGES
th_TH-

So, I think we should prepare an array of 32 bytes with a margin.
What do you think.

Regards,
TAMUKI Shoichi
Florian Weimer Oct. 24, 2018, 11:01 a.m. | #5
* TAMUKI Shoichi:

>> It looks like that there are two changes (plus some formatting cleanups)
>> in this patch: the change in padding:
>> 
>> > -		  DO_NUMBER (1, (era->offset
>> > +		  DO_NUMBER (2, (era->offset
>> >  				 + delta * era->absolute_direction));
>> 
>> and adding a way to override the padding, when %EY is exanded using
>> locale data, rewriting the %Ey contained in the format string.
>
> Yes, that's right.
>
>> I think the second part should be handled differently, by not rewriting
>> the format string, and rather adding strftime arguments (possibly in a
>> struct) to set the default padding.
>> 
>> This should also avoid the need for new memory allocations.
>
> Certainly.  I also want to avoid dynamic memory allocation as much as
> possible.  To add strftime arguments, which is not the prepared format
> string (era_format in locale data) but the modified string (rewriting
> the %Ey), anoher allocated separate memory is required.

Sorry, I don't understand.  I think all we need is a way to change the
default padding, so that the %Ey (unchanged) is interpreted as if
decorated with the appropriate flag.  I think this would avoid the need
to do any dynamically-sized allocations.

> An alternative plan will be to allocate memory beforehand statically.
> In that case, we have to decide the size to be assumed.

That's not possible because users can define their own locales if they
want.

Thanks,
Florian
TAMUKI Shoichi Oct. 24, 2018, 9:38 p.m. | #6
Hello Florian,

From: Florian Weimer <fweimer@redhat.com>
Subject: Re: [PATCH] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Wed, 24 Oct 2018 13:01:09 +0200

> Sorry, I don't understand.  I think all we need is a way to change the
> default padding, so that the %Ey (unchanged) is interpreted as if
> decorated with the appropriate flag.  I think this would avoid the need
> to do any dynamically-sized allocations.

Ah, I understand what you say.  I will revise the patch to be handled
by not rewriting the format string.

Thank you for suggestion.

Regards,
TAMUKI Shoichi

Patch

diff --git a/time/Makefile b/time/Makefile
index ec3e39dcea..6dc2acceaa 100644
--- a/time/Makefile
+++ b/time/Makefile
@@ -43,7 +43,7 @@  tests	:= test_time clocktest tst-posixtz tst-strptime tst_wcsftime \
 	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
 	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
 	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
-	   tst-tzname tst-y2039
+	   tst-tzname tst-y2039 tst-strftime2
 
 include ../Rules
 
diff --git a/time/strftime_l.c b/time/strftime_l.c
index c71f9f47a9..0030b631fa 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -100,6 +100,9 @@  extern char *tzname[];
 
 # define MEMCPY(d, s, n) __wmemcpy (d, s, n)
 # define STRLEN(s) __wcslen (s)
+# define STRCPY wcscpy
+# define STRSTR wcsstr
+# define MEMMOVE wmemmove
 
 #else
 # define CHAR_T char
@@ -114,6 +117,9 @@  extern char *tzname[];
 #  define MEMCPY(d, s, n) memcpy ((d), (s), (n))
 # endif
 # define STRLEN(s) strlen (s)
+# define STRCPY strcpy
+# define STRSTR strstr
+# define MEMMOVE memmove
 
 # ifdef _LIBC
 #  define MEMPCPY(d, s, n) __mempcpy (d, s, n)
@@ -233,7 +239,11 @@  static const CHAR_T zeroes[16] = /* "0000000000000000" */
       int _delta = width - _n;						      \
       int _incr = _n + (_delta > 0 ? _delta : 0);			      \
       if ((size_t) _incr >= maxsize - i)				      \
-	return 0;							      \
+	{								      \
+	  if (using_malloc)						      \
+	    free (subfmt2);						      \
+	  return 0;							      \
+	}								      \
       if (p)								      \
 	{								      \
 	  if (_delta > 0)						      \
@@ -574,6 +584,8 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
       int to_uppcase = 0;
       int change_case = 0;
       int format_char;
+      CHAR_T *subfmt2;
+      bool using_malloc = false;
 
 #if DO_MULTIBYTE && !defined COMPILE_WIDE
       switch (*f)
@@ -820,7 +832,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  if (modifier == L_('O'))
 	    goto bad_format;
 #ifdef _NL_CURRENT
-	  if (! (modifier == 'E'
+	  if (! (modifier == L_('E')
 		 && (*(subfmt =
 		       (const CHAR_T *) _NL_CURRENT (LC_TIME,
 						     NLW(ERA_D_T_FMT)))
@@ -843,6 +855,11 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	    add (len, __strftime_internal (p, maxsize - i, subfmt,
 					   tp, tzset_called ut_argument
 					   LOCALE_ARG));
+	    if (using_malloc)
+	      {
+		free (subfmt2);
+		using_malloc = false;
+	      }
 
 	    if (to_uppcase)
 	      while (old_start < p)
@@ -917,7 +934,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 #ifdef _NL_CURRENT
 	  if (! (modifier == L_('E')
 		 && (*(subfmt =
-		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
+		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
 		     != L_('\0'))))
 	    subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT));
 	  goto subformat;
@@ -1262,7 +1279,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  DO_NUMBER (1, tp->tm_wday);
 
 	case L_('Y'):
-	  if (modifier == 'E')
+	  if (modifier == L_('E'))
 	    {
 #if HAVE_STRUCT_ERA_ENTRY
 	      struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG);
@@ -1273,6 +1290,32 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 # else
 		  subfmt = era->era_format;
 # endif
+		  if (pad != 0)
+		    {
+		      CHAR_T *flag;
+
+		      if (__libc_use_alloca ((STRLEN (subfmt) + 2)
+					     * sizeof (CHAR_T)))
+			subfmt2 = (CHAR_T *) alloca ((STRLEN (subfmt) + 2)
+						     * sizeof (CHAR_T));
+		      else
+			{
+			  if ((subfmt2 =
+			      (CHAR_T *) malloc ((STRLEN (subfmt) + 2)
+						 * sizeof (CHAR_T))) == NULL)
+			    goto subformat;
+			  using_malloc = true;
+			}
+		      STRCPY (subfmt2, subfmt);
+		      if ((flag = STRSTR (subfmt2, L_("%Ey"))) != NULL)
+			{
+			  MEMMOVE (flag + 2 * sizeof (CHAR_T),
+				   flag + sizeof (CHAR_T), STRLEN (flag));
+			  *(flag + sizeof (CHAR_T)) = (CHAR_T) pad;
+			  subfmt = subfmt2;
+			  pad = 0;
+			}
+		    }
 		  goto subformat;
 		}
 #else
@@ -1294,7 +1337,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	      if (era)
 		{
 		  int delta = tp->tm_year - era->start_date[0];
-		  DO_NUMBER (1, (era->offset
+		  DO_NUMBER (2, (era->offset
 				 + delta * era->absolute_direction));
 		}
 #else
diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
new file mode 100644
index 0000000000..3e7ddfe9ea
--- /dev/null
+++ b/time/tst-strftime2.c
@@ -0,0 +1,134 @@ 
+/* Verify the behavior of strftime on alternate representation for year.
+
+   Copyright (C) 2013-2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <locale.h>
+#include <time.h>
+#include <stdio.h>
+#include <string.h>
+
+static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
+#define nlocales (sizeof (locales) / sizeof (locales[0]))
+
+static const char *formats[] = { "%EY", "%_EY", "%-EY" };
+#define nformats (sizeof (formats) / sizeof (formats[0]))
+
+static const struct
+{
+  const int d, m, y;
+} dates[] =
+  {
+    { 1, 3, 88 },
+    { 7, 0, 89 },
+    { 8, 0, 89 },
+    { 1, 3, 90 },
+    { 1, 3, 97 },
+    { 1, 3, 98 }
+  };
+#define ndates (sizeof (dates) / sizeof (dates[0]))
+
+static char ref[nlocales][nformats][ndates][100];
+
+static void
+mkreftable (void)
+{
+  int i, j, k;
+  char era[10];
+  static const int yrj[] = { 63, 64, 1, 2, 9, 10 };
+  static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 };
+
+  for (i = 0; i < nlocales; i++)
+    for (j = 0; j < nformats; j++)
+      for (k = 0; k < ndates; k++)
+	{
+	  if (i == 0)
+	    {
+	      sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c"
+					  : "\xe5\xb9\xb3\xe6\x88\x90");
+	      if (yrj[k] == 1)
+		sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era);
+	      else
+		{
+		  if (j == 0)
+		    sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]);
+		  else if (j == 1)
+		    sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]);
+		  else
+		    sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]);
+		}
+	    }
+	  else if (i == 1)
+	    {
+	      sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	  else
+	    {
+	      sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	}
+}
+
+static int
+do_test (void)
+{
+  int i, j, k, result = 0;
+  struct tm ttm;
+  char date[11], buf[100];
+  size_t r, e;
+
+  mkreftable ();
+  for (i = 0; i < nlocales; i++)
+    {
+      if (setlocale (LC_ALL, locales[i]) == NULL)
+	{
+	  printf ("locale %s does not exist, skipping...\n", locales[i]);
+	  continue;
+	}
+      printf ("[%s]\n", locales[i]);
+      for (j = 0; j < nformats; j++)
+	{
+	  for (k = 0; k < ndates; k++)
+	    {
+	      ttm.tm_mday = dates[k].d;
+	      ttm.tm_mon  = dates[k].m;
+	      ttm.tm_year = dates[k].y;
+	      strftime (date, sizeof (date), "%F", &ttm);
+	      r = strftime (buf, sizeof (buf), formats[j], &ttm);
+	      e = strlen (ref[i][j][k]);
+	      printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf);
+	      if (strcmp (buf, ref[i][j][k]) != 0)
+		{
+		  printf ("\tshould be \"%s\"", ref[i][j][k]);
+		  if (r != e)
+		    printf ("\tgot: %zu, expected: %zu", r, e);
+		  result = 1;
+		}
+	      else
+		printf ("\tOK");
+	      putchar ('\n');
+	    }
+	  putchar ('\n');
+	}
+    }
+  return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#include "../test-skeleton.c"