[v4] Improve the width of alternate representation for year in strftime [BZ #23758]
diff mbox series

Message ID 201812070655.AA04129@tamuki.linet.gr.jp
State New
Headers show
Series
  • [v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Related show

Commit Message

TAMUKI Shoichi Dec. 7, 2018, 6:55 a.m. UTC
The Japanese era name is scheduled to be changed on May 1, 2019.
Prior to this, change the alternate representation for year in
strftime to pad the number with zero to keep it constant width, so
that prevent the trouble we saw in the past from becoming obvious
again from the year after the era name changes onward.

Since only one Japanese era name is used by each emperor's reign as
lately, it is rare that the year ends in one digit or lasts more than
three digits.  In addition, the default width of month, day, hour,
minute, and second is 2, so adjust the default width of year the same
as them, and then the whole display balance is improved.  Therefore,
it would be reasonable to change the width padding with zero of %Ey
default to 2.

Currently in glibc, besides ja_JP locale, the locales using the
conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
these locales, they use the Buddhist era.  The Buddhist era is a value
obtained by adding 543 to the Christian era, so they are not affected
by the change of the conversion specifier %Ey.

Furthermore, for the output string of the conversion specifier %EY,
an optional flag is given to the conversion specifier so that it can
be also used the current non-padding format, and the padding format
can be controlled.  To achieve this, when an optional flag is given to
the conversion specifier %EY, the %Ey included in the combined
conversion specifier is interpreted as if decorated with the
appropriate flag.

ChangeLog:

	[BZ #23758]
	* NEWS: Mention the change.
	* manual/time.texi (strftime): Document the desctiption for %EC, %Ey,
	and %EY.
	* time/Makefile (tests): Add tst-strftime2.
	* time/strftime_l.c (__strftime_internal): Add argument yr_spec to
	override padding for %Ey.
	Change the width padding with zero of %Ey default to 2.  If an
	optional flag ('_' or '-') is specified to %EY, the %Ey in subformat
	is interpreted as if decorated with the appropriate flag.
	* time/tst-strftime2.c: New file.
---
 Changes since v3:
 - use support/test-driver.c instead of the old-style test-skeleton.c
   in tst-strftime2.c, also correct the copyright year
 - slightly modify the documentation including commit message

 Remarks:
 At last, I have fulfilled all legal requirements, so you can review
 my patch without worrying about the copyright issues.

 In this patch, all items pointed out so far have been fixed.

 My concern is the documentation.  Please review especially wording,
 expression, etc.  (I am not an English native speaker.)

 NEWS                 |   7 +++
 manual/time.texi     |  15 ++++++
 time/Makefile        |   2 +-
 time/strftime_l.c    |  28 ++++++-----
 time/tst-strftime2.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 173 insertions(+), 12 deletions(-)
 create mode 100644 time/tst-strftime2.c

Comments

TAMUKI Shoichi Dec. 14, 2018, 9:02 a.m. UTC | #1
Hello,

Ping - any feedback about it?

https://www.sourceware.org/ml/libc-alpha/2018-12/msg00232.html

I think that this patch has been basically reviewed, and is ready for
the further review especially for documentation, wording, etc.

According to the copyright year, since the Japanese localedata was
introduced to glibc on around 1999 (Heisei year 11), it will be the
first opportunity to encounter the problem when the year becomes one
digit due to the next era change.

So, it would be greatly appreciated if you could review this for 2.29.

Thank you for your cooperation.

Regards,
TAMUKI Shoichi
Rafal Luzynski Dec. 21, 2018, 9:38 a.m. UTC | #2
Hello,

14.12.2018 10:02 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
> 
> Hello,
> 
> Ping - any feedback about it?
> 
> https://www.sourceware.org/ml/libc-alpha/2018-12/msg00232.html

Thank you for your patch and for all your contribution, including
reviews of my patches.

One important question: have you signed your FSF copyright assignment?
Without this the reviewers are not even allowed to see your patches.
I can't see your previous commit in glibc source so I guess this may
be your first contribution ever.

Could somebody with the access to the FSF legal documents verify
what we are talking about here?

Regards,

Rafal
TAMUKI Shoichi Dec. 22, 2018, 10:42 a.m. UTC | #3
Hello Rafal,

Thank you for care about my patch.

From: Rafal Luzynski <digitalfreak@lingonborough.com>
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Fri, 21 Dec 2018 10:38:17 +0100 (CET)

> > Ping - any feedback about it?
> > 
> > https://www.sourceware.org/ml/libc-alpha/2018-12/msg00232.html
> 
> Thank you for your patch and for all your contribution, including
> reviews of my patches.
> 
> One important question: have you signed your FSF copyright assignment?
> Without this the reviewers are not even allowed to see your patches.

Yes, I have already signed my FSF copyright assignment - glibc.

Regards,
TAMUKI Shoichi
Rafal Luzynski Dec. 28, 2018, 9:51 a.m. UTC | #4
Hello Tamuki Shoichi,

I am sorry for this delayed reply.

Two general remarks:

1. Whatever you contribute here you will also have to port and contribute
to gnulib. [1] The reason is that "date" command line utility uses gnulib's
fprintftime which implements a similar functionality but does not use
strftime.  If you don't contribute to gnulib then you will not see these
new features in "date".

2. As we are short of time, for easier review I suggest splitting this patch
into two patches implementing two features:

* setting the default width of "%Ey" to 2;
* passing the additional flags from "%EY" to "%Ey".

While setting the width of "%Ey" is clear and obvious to me and I understand
it is urgent, I perceive passing the flags correctly as an additional
problem
fixed by the same patch.  Also the implementation of this additional feature
is not clear to me.

Please ignore my suggestion to split if other maintainers review your patch
and accept as it is.

Could you please reword the first line of your commit message to make it
shorter than 72 characters so that if you do "git log --oneline" in a
standard
80 columns terminal the title does not wrap?  Also, it could start with the
name of the function or the module you are correcting.  My suggestion:

strftime: Improve the default width of "%Ey" [BZ #23758].

or maybe better:

strftime: Set the default width of "%Ey" to 2 [BZ #23758].

7.12.2018 07:55 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
> 
> The Japanese era name is scheduled to be changed on May 1, 2019.
> Prior to this, change the alternate representation for year in

I am not sure but I think that it should be called "alternative"
rather than "alternate".

> strftime to pad the number with zero to keep it constant width, so
> that prevent the trouble we saw in the past from becoming obvious
> again from the year after the era name changes onward.
> 
> Since only one Japanese era name is used by each emperor's reign as
> lately, it is rare that the year ends in one digit or lasts more than
> three digits.  In addition, the default width of month, day, hour,
> minute, and second is 2, so adjust the default width of year the same
> as them, and then the whole display balance is improved.  Therefore,
> it would be reasonable to change the width padding with zero of %Ey
> default to 2.

I was thinking whether we want to set the default width of "%Ey" to 2
but indeed, using Gregorian calendar we have the default width "%y"
set to 2 and we use zero padding so for the consistency it is fine to
set "%Ey" the same.  My reasoning is different than yours but the result
is the same so I think it is OK.  Now what about "%EY"?  I was not looking
at this before so this piece of code is new to me.  As far as I can see,
"%EY", other than "%Ey" and "%Y", is not a formatting of the year number
but is just replaced by a subformat (so it is similar to "%c" or "%D" etc.)
Therefore there is no question what is the default width of the year
number in "%EY" and whether it should be the same as "%Y" (which has
no default padding).  The subformat substituted for "%EY" may or may not
contain "%Ey".

> Currently in glibc, besides ja_JP locale, the locales using the
> conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
> these locales, they use the Buddhist era.  The Buddhist era is a value
> obtained by adding 543 to the Christian era, so they are not affected
> by the change of the conversion specifier %Ey.
> 
> Furthermore, for the output string of the conversion specifier %EY,
> an optional flag is given to the conversion specifier so that it can
> be also used the current non-padding format, and the padding format
> can be controlled.  To achieve this, when an optional flag is given to
> the conversion specifier %EY, the %Ey included in the combined
> conversion specifier is interpreted as if decorated with the
> appropriate flag.

This change is reasonable but, as I said before, it is not related with
the era change so for my review this should belong to the second patch.
Again, this is only if nobody else reviews and if you agree to split.

> [...]
>  Remarks:
>  At last, I have fulfilled all legal requirements, so you can review
>  my patch without worrying about the copyright issues.

I am sorry for not noticing this remark before.  I was not looking carefully
at your patch because I did not know you had signed the copyright
assignment.
On the other hand, I did not know you had signed the assignment because I
was
not looking at your patch. :-/

>  In this patch, all items pointed out so far have been fixed.
> 
>  My concern is the documentation.  Please review especially wording,
>  expression, etc.  (I am not an English native speaker.)

I am not an English native speaker either so I hesitate to proofread.
Please take my feedback about the documentation as incomplete and not
fully reliable.  I provide it only because nobody else did it so far.
On the other hand, I think it is OK if we have to fix the documentation
in the freeze period.

>  NEWS                 |   7 +++
>  manual/time.texi     |  15 ++++++
>  time/Makefile        |   2 +-
>  time/strftime_l.c    |  28 ++++++-----
>  time/tst-strftime2.c | 133
> +++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 173 insertions(+), 12 deletions(-)
>  create mode 100644 time/tst-strftime2.c
> 
> diff --git a/NEWS b/NEWS
> index 8483dcf4928..40e33e228f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -41,6 +41,13 @@ Major new features:
>    incosistent mutex state after fork call in multithread environment.
>    In both popen and system there is no direct access to user-defined
> mutexes.
>  
> +* Improve the width of alternate representation for year in strftime.

As I said before, I think it should be s/alternate/alternative.  Also,
please note that the term "alternative" may be ambiguous here because
this may apply to both "%Ey" (alternative era, decimal digits) and "%Oy"
(alternative numbering system but nothing about an alternative era).

> +  For %Ey conversion specifier, the default action is now to pad the
> +  number with zero to keep minimum 2 digits as well as %y.

"to keep the minimum width of 2 digits, same as %y"

(or: "... similar as %y")

>  Also, the
> +  optional flag (either _ or -) can be used for %EY, so that the
> +  internal %Ey is interpreted as if decorated with the appropriate
> +  flag.
> +
> [...]

This would belong to the second patch.  Also, I'm not sure if it needs
to be mentioned.  Just those flag will work as expected.  It could be
perceived as a bug that they did not work but now it is being fixed.
(Again, I am not absolutely sure you should remove it, just asking in
public).

> [...]
> diff --git a/manual/time.texi b/manual/time.texi
> index 4d154452eba..c1d5770f9e0 100644
> --- a/manual/time.texi
> +++ b/manual/time.texi
> @@ -1393,6 +1393,9 @@ The preferred calendar time representation for the
> current locale.
>  The century of the year.  This is equivalent to the greatest integer not
>  greater than the year divided by 100.
>  
> +If the @code{E} modifier is specified (@code{%EC}), the locale's
> +alternate representation for year (the era name) is used instead.

"%C" is a format specifier for a century so I believe that the word
"year" is just a copy/paste typo here.

>  [...]
> +If the @code{E} modifier is specified (@code{%Ey}), the locale's
> +alternate representation for year (the era year) is used instead.  The
> +default action is to pad the number with zero to keep minimum 2 digits
> +as well as @code{%y}.

Again: s/alternate/alternative/ and s/ as well as/, same as/ or
/s/ as well as/, similar to/ or any other better wording.

> +
>  @item %Y
>  The year as a decimal number, using the Gregorian calendar.  Years
>  before the year @code{1} are numbered @code{0}, @code{-1}, and so on.
>  
> +If the @code{E} modifier is specified (@code{%EY}), the locale's
> +alternate representation for year (generally the combination of
> +@code{%EC} and @code{%Ey}) is used instead.

If I understand correctly, the alternative representation for "%EY"
is just a subformat which may contain "%EC" and "%Ey" or may contain
anything the user puts there.  What about just removing the sentence
in brackets and not mentioning what is inside?  My reasoning that it
may be not true and it does not actually matter what is inside as long
as it works as the user wants it to work.

>  In this case, the
> +optional flag (either @code{_} or @code{-}) can be used, so that the
> +internal @code{%Ey} is interpreted as if decorated with the
> +appropriate flag.

Again I'm not sure we should put here the obvious things which work
as expected.  This may be mentioned here if this bug is perceived
as severe by Japanese people but I think we don't mention all bugs
fixed in this section of NEWS.  There is another section which lists
all fixed bugs but on the other hand the bug 23758 is about the
default width of %Ey rather than not working flags.

I am not sure what to do here.  File a separate bug report?  Remove
any mention of the flags now working as too obvious?  Leave as is?

> [...]
> diff --git a/time/Makefile b/time/Makefile
> index 743bd99f182..9f94e07e3bc 100644
> --- a/time/Makefile
> +++ b/time/Makefile
> @@ -43,7 +43,7 @@ tests	:= test_time clocktest tst-posixtz tst-strptime
> tst_wcsftime \
>  	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
>  	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
>  	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
> -	   tst-tzname tst-y2039 bug-mktime4
> +	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
>  
>  include ../Rules
>  

I think you will have to modify LOCALES in this Makefile to include
the locales you are using.  Please also see below when I discuss the
test case.  Have you checked if your test case actually runs?

> diff --git a/time/strftime_l.c b/time/strftime_l.c
> index c71f9f47a95..8797341ea53 100644
> --- a/time/strftime_l.c
> +++ b/time/strftime_l.c
> @@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] =
>  #endif
>  
>  static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
> -				   const struct tm *, bool *
> +				   const struct tm *, int *, bool *

It's not clear to me why this new argument has to be a pointer:
"int *" rather than just "int".  Do you use the value of yr_spec in the
caller after __strftime_internal is executed?

I need an explanation and also if you split your patch this should belong
to the second part.  However, if other maintainers have reviewed and
accept this part you will not have to do anything.

>  				   ut_argument_spec
>  				   LOCALE_PARAM) __THROW;
>  
> @@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T
> *format,
>    tmcopy = *tp;
>    tp = &tmcopy;
>  #endif
> +  int yr_spec = 0;		/* Override padding for %Ey.  */

Same here, do you need yr_spec here as a variable?

>    bool tzset_called = false;
> -  return __strftime_internal (s, maxsize, format, tp, &tzset_called
> +  return __strftime_internal (s, maxsize, format, tp, &yr_spec,
> &tzset_called
>  			      ut_argument LOCALE_ARG);
>  }
>  #ifdef _LIBC
> @@ -466,7 +467,7 @@ libc_hidden_def (my_strftime)
>  
>  static size_t
>  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
> -		     const struct tm *tp, bool *tzset_called
> +		     const struct tm *tp, int *yr_spec, bool *tzset_called

Same here.

>  		     ut_argument_spec LOCALE_PARAM)
>  {
>  #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
> @@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const
> CHAR_T *format,
>  	  if (modifier == L_('O'))
>  	    goto bad_format;
>  #ifdef _NL_CURRENT
> -	  if (! (modifier == 'E'
> +	  if (! (modifier == L_('E')

That's an unrelated change, probably does not actually change anything
but it's good to accept this for the consistency.  If the patch is split,
it should belong to the first part.

> [...]
> @@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	  {
>  	    CHAR_T *old_start = p;
>  	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
> -					      tp, tzset_called ut_argument
> -					      LOCALE_ARG);
> +					      tp, yr_spec, tzset_called
> +					      ut_argument LOCALE_ARG);
>  	    add (len, __strftime_internal (p, maxsize - i, subfmt,
> -					   tp, tzset_called ut_argument
> -					   LOCALE_ARG));
> +					   tp, yr_spec, tzset_called
> +					   ut_argument LOCALE_ARG));
> +	    *yr_spec = 0;

Whatever was the value of yr_spec after two calls of __strftime_internal
it is now lost and the new value is always 0.  It's not clear to me.
Why do you need a variable if you don't use its value and always set to 0?
What if a caller initializes yr_spec to anything different than 0?

> [...]
> @@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const
> CHAR_T *format,
>  #ifdef _NL_CURRENT
>  	  if (! (modifier == L_('E')
>  		 && (*(subfmt =
> -		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> +		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))

Again an unrelated change but good, let's accept it.  This should
be in the first part, if the patch is split.

(I'm sorry if my email client breaks formatting here, please refer
to the original patch in case of doubt).


> [...]
> @@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	  DO_NUMBER (1, tp->tm_wday);
>  
>  	case L_('Y'):
> -	  if (modifier == 'E')
> +	  if (modifier == L_('E'))

Again unrelated but good, let's accept it.  This should belong to the
first part, if the patch is split.

> [...]
> @@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  # else
>  		  subfmt = era->era_format;
>  # endif
> +		  if (pad != 0)
> +		    *yr_spec = pad;

I guess here is some reason why yr_spec is a pointer but I have not
analyzed this part yet.  If split, this should belong to the second part.

> [...]
> @@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize,
> const CHAR_T *format,
>  	      if (era)
>  		{
>  		  int delta = tp->tm_year - era->start_date[0];
> -		  DO_NUMBER (1, (era->offset
> +		  if (*yr_spec != 0)
> +		    pad = *yr_spec;

Again this is something which I don't understand but maybe I should
just read more.  If split, this should belong to the second part.

> +		  DO_NUMBER (2, (era->offset
>  				 + delta * era->absolute_direction));

If I understand correctly, this is the core part of setting the default
width of "%Ey" to 2.  Very good, very desired and very reasonable,
especially
if you scroll few lines down and see that "%y" has the value 2 here as well.

> [...]
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> new file mode 100644
> index 00000000000..6f2cf4ddd81
> --- /dev/null
> +++ b/time/tst-strftime2.c
> @@ -0,0 +1,133 @@
> +/* Verify the behavior of strftime on alternate representation for year.
> +
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <locale.h>
> +#include <time.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8",
> "th_TH.UTF-8" };

These locales should be added to LOCALES in Makefile to make sure
they are generated before the test is run.  As far as I can see only
"ja_JP.UTF-8" is generated now but please list them all.  We should not
rely on other scripts because they may remove "ja_JP.UTF-8" ever in
future.

> +#define nlocales (sizeof (locales) / sizeof (locales[0]))

I think we already have a macro to calculate the number of elements
in the array but I don't remember its name at the moment.

> +static const char *formats[] = { "%EY", "%_EY", "%-EY" };

So this test case tests the behavior of additional flags passed to "%EY",
is that correct?  Therefore if the patch is split then it should belong
to the second part.  It does not test the width of "%Ey" and whether
"%Ey" works correctly (displays the correct year number) at all, does it?
Do we have such a test already?  Do we need it?  Is it difficult for you
to write it?  Are we able to prepare a test for the future Japanese era
already?

> [...]
> +    {
> +      if (setlocale (LC_ALL, locales[i]) == NULL)
> +	{
> +	  printf ("locale %s does not exist, skipping...\n", locales[i]);
> +	  continue;

I'm afraid that this test skips most of the locales.

> [...]

I don't provide the feedback for the rest of this test case because
I don't feel competent enough.

Other maintainers are welcome to provide their feedback.

Regards,

Rafal


[1] http://git.savannah.gnu.org/gitweb/?p=gnulib.git
Carlos O'Donell Jan. 3, 2019, 2:34 p.m. UTC | #5
On 12/22/18 5:42 AM, TAMUKI Shoichi wrote:
> Hello Rafal,
> 
> Thank you for care about my patch.
> 
> From: Rafal Luzynski <digitalfreak@lingonborough.com>
> Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
> Date: Fri, 21 Dec 2018 10:38:17 +0100 (CET)
> 
>>> Ping - any feedback about it?
>>>
>>> https://www.sourceware.org/ml/libc-alpha/2018-12/msg00232.html
>>
>> Thank you for your patch and for all your contribution, including
>> reviews of my patches.
>>
>> One important question: have you signed your FSF copyright assignment?
>> Without this the reviewers are not even allowed to see your patches.
> 
> Yes, I have already signed my FSF copyright assignment - glibc.

As project steward I can confirm that TAMUKI has a copyright 
assignment in place. Thank you for working through that process.
Carlos O'Donell Jan. 3, 2019, 2:43 p.m. UTC | #6
On 12/7/18 1:55 AM, TAMUKI Shoichi wrote:
> The Japanese era name is scheduled to be changed on May 1, 2019.
> Prior to this, change the alternate representation for year in
> strftime to pad the number with zero to keep it constant width, so
> that prevent the trouble we saw in the past from becoming obvious
> again from the year after the era name changes onward.
> 
> Since only one Japanese era name is used by each emperor's reign as
> lately, it is rare that the year ends in one digit or lasts more than
> three digits.  In addition, the default width of month, day, hour,
> minute, and second is 2, so adjust the default width of year the same
> as them, and then the whole display balance is improved.  Therefore,
> it would be reasonable to change the width padding with zero of %Ey
> default to 2.
> 
> Currently in glibc, besides ja_JP locale, the locales using the
> conversion specifier above are lo_LA (Laos) and th_TH (Thailand).  In
> these locales, they use the Buddhist era.  The Buddhist era is a value
> obtained by adding 543 to the Christian era, so they are not affected
> by the change of the conversion specifier %Ey.
> 
> Furthermore, for the output string of the conversion specifier %EY,
> an optional flag is given to the conversion specifier so that it can
> be also used the current non-padding format, and the padding format
> can be controlled.  To achieve this, when an optional flag is given to
> the conversion specifier %EY, the %Ey included in the combined
> conversion specifier is interpreted as if decorated with the
> appropriate flag.

Please split this patch into 2 patches as soon as possible.

The fix for "%_EY"/"%-EY" should be reviewed separately as a bug fix,
and can go into master even if we are in a frozen state (we are currently
frozen for 2.29 release).

I want to review these changes in parallel so we can get them into 2.29
and be prepared for the Era name change.

Cheers,
Carlos.

> ChangeLog:
> 
> 	[BZ #23758]
> 	* NEWS: Mention the change.
> 	* manual/time.texi (strftime): Document the desctiption for %EC, %Ey,
> 	and %EY.
> 	* time/Makefile (tests): Add tst-strftime2.
> 	* time/strftime_l.c (__strftime_internal): Add argument yr_spec to
> 	override padding for %Ey.
> 	Change the width padding with zero of %Ey default to 2.  If an
> 	optional flag ('_' or '-') is specified to %EY, the %Ey in subformat
> 	is interpreted as if decorated with the appropriate flag.
> 	* time/tst-strftime2.c: New file.
> ---
>  Changes since v3:
>  - use support/test-driver.c instead of the old-style test-skeleton.c
>    in tst-strftime2.c, also correct the copyright year
>  - slightly modify the documentation including commit message
> 
>  Remarks:
>  At last, I have fulfilled all legal requirements, so you can review
>  my patch without worrying about the copyright issues.
> 
>  In this patch, all items pointed out so far have been fixed.
> 
>  My concern is the documentation.  Please review especially wording,
>  expression, etc.  (I am not an English native speaker.)
> 
>  NEWS                 |   7 +++
>  manual/time.texi     |  15 ++++++
>  time/Makefile        |   2 +-
>  time/strftime_l.c    |  28 ++++++-----
>  time/tst-strftime2.c | 133 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 173 insertions(+), 12 deletions(-)
>  create mode 100644 time/tst-strftime2.c
> 
> diff --git a/NEWS b/NEWS
> index 8483dcf4928..40e33e228f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -41,6 +41,13 @@ Major new features:
>    incosistent mutex state after fork call in multithread environment.
>    In both popen and system there is no direct access to user-defined mutexes.
>  
> +* Improve the width of alternate representation for year in strftime.
> +  For %Ey conversion specifier, the default action is now to pad the
> +  number with zero to keep minimum 2 digits as well as %y.  Also, the
> +  optional flag (either _ or -) can be used for %EY, so that the
> +  internal %Ey is interpreted as if decorated with the appropriate
> +  flag.
> +
>  Deprecated and removed features, and other changes affecting compatibility:
>  
>  * The glibc.tune tunable namespace has been renamed to glibc.cpu and the
> diff --git a/manual/time.texi b/manual/time.texi
> index 4d154452eba..c1d5770f9e0 100644
> --- a/manual/time.texi
> +++ b/manual/time.texi
> @@ -1393,6 +1393,9 @@ The preferred calendar time representation for the current locale.
>  The century of the year.  This is equivalent to the greatest integer not
>  greater than the year divided by 100.
>  
> +If the @code{E} modifier is specified (@code{%EC}), the locale's
> +alternate representation for year (the era name) is used instead.
> +
>  This format was first standardized by POSIX.2-1992 and by @w{ISO C99}.
>  
>  @item %d
> @@ -1568,10 +1571,22 @@ The preferred time of day representation for the current locale.
>  The year without a century as a decimal number (range @code{00} through
>  @code{99}).  This is equivalent to the year modulo 100.
>  
> +If the @code{E} modifier is specified (@code{%Ey}), the locale's
> +alternate representation for year (the era year) is used instead.  The
> +default action is to pad the number with zero to keep minimum 2 digits
> +as well as @code{%y}.
> +
>  @item %Y
>  The year as a decimal number, using the Gregorian calendar.  Years
>  before the year @code{1} are numbered @code{0}, @code{-1}, and so on.
>  
> +If the @code{E} modifier is specified (@code{%EY}), the locale's
> +alternate representation for year (generally the combination of
> +@code{%EC} and @code{%Ey}) is used instead.  In this case, the
> +optional flag (either @code{_} or @code{-}) can be used, so that the
> +internal @code{%Ey} is interpreted as if decorated with the
> +appropriate flag.
> +
>  @item %z
>  @w{RFC 822}/@w{ISO 8601:1988} style numeric time zone (e.g.,
>  @code{-0600} or @code{+0100}), or nothing if no time zone is
> diff --git a/time/Makefile b/time/Makefile
> index 743bd99f182..9f94e07e3bc 100644
> --- a/time/Makefile
> +++ b/time/Makefile
> @@ -43,7 +43,7 @@ tests	:= test_time clocktest tst-posixtz tst-strptime tst_wcsftime \
>  	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
>  	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
>  	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
> -	   tst-tzname tst-y2039 bug-mktime4
> +	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
>  
>  include ../Rules
>  
> diff --git a/time/strftime_l.c b/time/strftime_l.c
> index c71f9f47a95..8797341ea53 100644
> --- a/time/strftime_l.c
> +++ b/time/strftime_l.c
> @@ -434,7 +434,7 @@ static CHAR_T const month_name[][10] =
>  #endif
>  
>  static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
> -				   const struct tm *, bool *
> +				   const struct tm *, int *, bool *
>  				   ut_argument_spec
>  				   LOCALE_PARAM) __THROW;
>  
> @@ -456,8 +456,9 @@ my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>    tmcopy = *tp;
>    tp = &tmcopy;
>  #endif
> +  int yr_spec = 0;		/* Override padding for %Ey.  */
>    bool tzset_called = false;
> -  return __strftime_internal (s, maxsize, format, tp, &tzset_called
> +  return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
>  			      ut_argument LOCALE_ARG);
>  }
>  #ifdef _LIBC
> @@ -466,7 +467,7 @@ libc_hidden_def (my_strftime)
>  
>  static size_t
>  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
> -		     const struct tm *tp, bool *tzset_called
> +		     const struct tm *tp, int *yr_spec, bool *tzset_called
>  		     ut_argument_spec LOCALE_PARAM)
>  {
>  #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
> @@ -820,7 +821,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  	  if (modifier == L_('O'))
>  	    goto bad_format;
>  #ifdef _NL_CURRENT
> -	  if (! (modifier == 'E'
> +	  if (! (modifier == L_('E')
>  		 && (*(subfmt =
>  		       (const CHAR_T *) _NL_CURRENT (LC_TIME,
>  						     NLW(ERA_D_T_FMT)))
> @@ -838,11 +839,12 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  	  {
>  	    CHAR_T *old_start = p;
>  	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
> -					      tp, tzset_called ut_argument
> -					      LOCALE_ARG);
> +					      tp, yr_spec, tzset_called
> +					      ut_argument LOCALE_ARG);
>  	    add (len, __strftime_internal (p, maxsize - i, subfmt,
> -					   tp, tzset_called ut_argument
> -					   LOCALE_ARG));
> +					   tp, yr_spec, tzset_called
> +					   ut_argument LOCALE_ARG));
> +	    *yr_spec = 0;
>  
>  	    if (to_uppcase)
>  	      while (old_start < p)
> @@ -917,7 +919,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  #ifdef _NL_CURRENT
>  	  if (! (modifier == L_('E')
>  		 && (*(subfmt =
> -		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> +		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
>  		     != L_('\0'))))
>  	    subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT));
>  	  goto subformat;
> @@ -1262,7 +1264,7 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  	  DO_NUMBER (1, tp->tm_wday);
>  
>  	case L_('Y'):
> -	  if (modifier == 'E')
> +	  if (modifier == L_('E'))
>  	    {
>  #if HAVE_STRUCT_ERA_ENTRY
>  	      struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG);
> @@ -1273,6 +1275,8 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  # else
>  		  subfmt = era->era_format;
>  # endif
> +		  if (pad != 0)
> +		    *yr_spec = pad;
>  		  goto subformat;
>  		}
>  #else
> @@ -1294,7 +1298,9 @@ __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
>  	      if (era)
>  		{
>  		  int delta = tp->tm_year - era->start_date[0];
> -		  DO_NUMBER (1, (era->offset
> +		  if (*yr_spec != 0)
> +		    pad = *yr_spec;
> +		  DO_NUMBER (2, (era->offset
>  				 + delta * era->absolute_direction));
>  		}
>  #else
> diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
> new file mode 100644
> index 00000000000..6f2cf4ddd81
> --- /dev/null
> +++ b/time/tst-strftime2.c
> @@ -0,0 +1,133 @@
> +/* Verify the behavior of strftime on alternate representation for year.
> +
> +   Copyright (C) 2018 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +#include <locale.h>
> +#include <time.h>
> +#include <stdio.h>
> +#include <string.h>
> +
> +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
> +#define nlocales (sizeof (locales) / sizeof (locales[0]))
> +
> +static const char *formats[] = { "%EY", "%_EY", "%-EY" };
> +#define nformats (sizeof (formats) / sizeof (formats[0]))
> +
> +static const struct
> +{
> +  const int d, m, y;
> +} dates[] =
> +  {
> +    { 1, 3, 88 },
> +    { 7, 0, 89 },
> +    { 8, 0, 89 },
> +    { 1, 3, 90 },
> +    { 1, 3, 97 },
> +    { 1, 3, 98 }
> +  };
> +#define ndates (sizeof (dates) / sizeof (dates[0]))
> +
> +static char ref[nlocales][nformats][ndates][100];
> +
> +static void
> +mkreftable (void)
> +{
> +  int i, j, k;
> +  char era[10];
> +  static const int yrj[] = { 63, 64, 1, 2, 9, 10 };
> +  static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 };
> +
> +  for (i = 0; i < nlocales; i++)
> +    for (j = 0; j < nformats; j++)
> +      for (k = 0; k < ndates; k++)
> +	{
> +	  if (i == 0)
> +	    {
> +	      sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c"
> +					  : "\xe5\xb9\xb3\xe6\x88\x90");
> +	      if (yrj[k] == 1)
> +		sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era);
> +	      else
> +		{
> +		  if (j == 0)
> +		    sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]);
> +		  else if (j == 1)
> +		    sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]);
> +		  else
> +		    sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]);
> +		}
> +	    }
> +	  else if (i == 1)
> +	    {
> +	      sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e ");
> +	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
> +	    }
> +	  else
> +	    {
> +	      sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e ");
> +	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
> +	    }
> +	}
> +}
> +
> +static int
> +do_test (void)
> +{
> +  int i, j, k, result = 0;
> +  struct tm ttm;
> +  char date[11], buf[100];
> +  size_t r, e;
> +
> +  mkreftable ();
> +  for (i = 0; i < nlocales; i++)
> +    {
> +      if (setlocale (LC_ALL, locales[i]) == NULL)
> +	{
> +	  printf ("locale %s does not exist, skipping...\n", locales[i]);
> +	  continue;
> +	}
> +      printf ("[%s]\n", locales[i]);
> +      for (j = 0; j < nformats; j++)
> +	{
> +	  for (k = 0; k < ndates; k++)
> +	    {
> +	      ttm.tm_mday = dates[k].d;
> +	      ttm.tm_mon  = dates[k].m;
> +	      ttm.tm_year = dates[k].y;
> +	      strftime (date, sizeof (date), "%F", &ttm);
> +	      r = strftime (buf, sizeof (buf), formats[j], &ttm);
> +	      e = strlen (ref[i][j][k]);
> +	      printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf);
> +	      if (strcmp (buf, ref[i][j][k]) != 0)
> +		{
> +		  printf ("\tshould be \"%s\"", ref[i][j][k]);
> +		  if (r != e)
> +		    printf ("\tgot: %zu, expected: %zu", r, e);
> +		  result = 1;
> +		}
> +	      else
> +		printf ("\tOK");
> +	      putchar ('\n');
> +	    }
> +	  putchar ('\n');
> +	}
> +    }
> +  return result;
> +}
> +
> +#include <support/test-driver.c>
>
Rafal Luzynski Jan. 3, 2019, 10:47 p.m. UTC | #7
3.01.2019 15:43 Carlos O'Donell <carlos@redhat.com> wrote:
> [...]
> I want to review these changes in parallel so we can get them into 2.29
> and be prepared for the Era name change.

For the record, you may be interested in the previous reviews:

https://sourceware.org/ml/libc-alpha/2019-01/msg00057.html
https://sourceware.org/ml/libc-alpha/2019-01/msg00037.html
https://sourceware.org/ml/libc-alpha/2018-12/msg00999.html
https://sourceware.org/ml/libc-alpha/2018-10/msg00440.html

Regards,

Rafal
TAMUKI Shoichi Jan. 6, 2019, 6:28 a.m. UTC | #8
Hello Rafal,

Sorry to be late replying.  Thank you for the detailed review on this
patch.

From: Rafal Luzynski <digitalfreak@lingonborough.com>
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Fri, 28 Dec 2018 10:51:30 +0100 (CET)

> Two general remarks:
> 
> 1. Whatever you contribute here you will also have to port and contribute
> to gnulib. [1] The reason is that "date" command line utility uses gnulib's
> fprintftime which implements a similar functionality but does not use
> strftime.  If you don't contribute to gnulib then you will not see these
> new features in "date".

I am also aware of this.  I am going to contribute to Gnulib after
committing with consensus of these fixes in Glibc.

> 2. As we are short of time, for easier review I suggest splitting this patch
> into two patches implementing two features:
> 
> * setting the default width of "%Ey" to 2;
> * passing the additional flags from "%EY" to "%Ey".
> 
> it is urgent, I perceive passing the flags correctly as an additional problem
> fixed by the same patch.  Also the implementation of this additional feature
> is not clear to me.
> 
> Please ignore my suggestion to split if other maintainers review your patch
> and accept as it is.

Certainly, the patch's outlook is better and it becomes easier to
understand, so I will try to split the patch by the next update.
Thank you for your suggestion.

> Could you please reword the first line of your commit message to make it
> shorter than 72 characters so that if you do "git log --oneline" in a standard
> 80 columns terminal the title does not wrap?  Also, it could start with the
> name of the function or the module you are correcting.  My suggestion:
> 
> strftime: Improve the default width of "%Ey" [BZ #23758].
> 
> or maybe better:
> 
> strftime: Set the default width of "%Ey" to 2 [BZ #23758].

It was careless that the title exceeded 80 characters.  I will fix
them while referring to the above.

> > The Japanese era name is scheduled to be changed on May 1, 2019.
> > Prior to this, change the alternate representation for year in
> 
> I am not sure but I think that it should be called "alternative"
> rather than "alternate".

There are three words of "alternate" in the current time.texi, and it
matched this.  Based on this proposal, I will fix these to
"alternative".

> > Since only one Japanese era name is used by each emperor's reign as
> > lately, it is rare that the year ends in one digit or lasts more than
> > three digits.  In addition, the default width of month, day, hour,
> > minute, and second is 2, so adjust the default width of year the same
> > as them, and then the whole display balance is improved.  Therefore,
> > it would be reasonable to change the width padding with zero of %Ey
> > default to 2.
> 
> I was thinking whether we want to set the default width of "%Ey" to 2
> but indeed, using Gregorian calendar we have the default width "%y"
> set to 2 and we use zero padding so for the consistency it is fine to
> set "%Ey" the same.  My reasoning is different than yours but the result
> is the same so I think it is OK.  Now what about "%EY"?  I was not looking
> at this before so this piece of code is new to me.  As far as I can see,
> "%EY", other than "%Ey" and "%Y", is not a formatting of the year number
> but is just replaced by a subformat (so it is similar to "%c" or "%D" etc.)
> Therefore there is no question what is the default width of the year
> number in "%EY" and whether it should be the same as "%Y" (which has
> no default padding).  The subformat substituted for "%EY" may or may not
> contain "%Ey".

The reason for setting the default width of "%y" to 2 is because "%y"
is 0 or more and less than 100, so that makes sense.  On the other
hand, although the range of "%Ey" is indeterminate, it is  practically
reasonable to fit in 2 digits.  As you said, I think that it can be
thought of as "%y" the same.

For "%EY", it is replaced with a subformat containing "%Ey" and it can
be regarded as a full format of "%Ey" including the era name "%EC".
Currently, padding of "%Ey" can be controlled freely, but "%EY"
depends on "%Ey" default padding included in the replacing subformats.
This patch is made to be able to control this.

> > Furthermore, for the output string of the conversion specifier %EY,
> > an optional flag is given to the conversion specifier so that it can
> > be also used the current non-padding format, and the padding format
> > can be controlled.  To achieve this, when an optional flag is given to
> > the conversion specifier %EY, the %Ey included in the combined
> > conversion specifier is interpreted as if decorated with the
> > appropriate flag.
> 
> This change is reasonable but, as I said before, it is not related with
> the era change so for my review this should belong to the second patch.
> Again, this is only if nobody else reviews and if you agree to split.

Yes, it belongs to the second patch because this fix is to control
padding of "%EY".

> >  Remarks:
> >  At last, I have fulfilled all legal requirements, so you can review
> >  my patch without worrying about the copyright issues.
> 
> I am sorry for not noticing this remark before.  I was not looking carefully
> at your patch because I did not know you had signed the copyright assignment.
> On the other hand, I did not know you had signed the assignment because I was
> not looking at your patch. :-/

No problem.  Sorry, it is hard to find.

> >  My concern is the documentation.  Please review especially wording,
> >  expression, etc.  (I am not an English native speaker.)
> 
> I am not an English native speaker either so I hesitate to proofread.
> Please take my feedback about the documentation as incomplete and not
> fully reliable.  I provide it only because nobody else did it so far.
> On the other hand, I think it is OK if we have to fix the documentation
> in the freeze period.

Your feedback is very helpful.  Thank you very much.

> > +* Improve the width of alternate representation for year in strftime.
> 
> As I said before, I think it should be s/alternate/alternative.  Also,
> please note that the term "alternative" may be ambiguous here because
> this may apply to both "%Ey" (alternative era, decimal digits) and "%Oy"
> (alternative numbering system but nothing about an alternative era).
> 
> > +  For %Ey conversion specifier, the default action is now to pad the
> > +  number with zero to keep minimum 2 digits as well as %y.
> 
> "to keep the minimum width of 2 digits, same as %y"
> 
> (or: "... similar as %y")

I will fix them as above.

> > +  optional flag (either _ or -) can be used for %EY, so that the
> > +  internal %Ey is interpreted as if decorated with the appropriate
> > +  flag.
> 
> This would belong to the second patch.  Also, I'm not sure if it needs
> to be mentioned.  Just those flag will work as expected.  It could be
> perceived as a bug that they did not work but now it is being fixed.
> (Again, I am not absolutely sure you should remove it, just asking in
> public).

Yes, it belongs to the second patch.

> > +If the @code{E} modifier is specified (@code{%EC}), the locale's
> > +alternate representation for year (the era name) is used instead.
> 
> "%C" is a format specifier for a century so I believe that the word
> "year" is just a copy/paste typo here.

"%EC" is a format specifier that replaces an era name instead of a
century, and the era name is a component of the locale's alternative
representation of the full format year, so it was written as above.

> > +If the @code{E} modifier is specified (@code{%Ey}), the locale's
> > +alternate representation for year (the era year) is used instead.  The
> > +default action is to pad the number with zero to keep minimum 2 digits
> > +as well as @code{%y}.
> 
> Again: s/alternate/alternative/ and s/ as well as/, same as/ or
> /s/ as well as/, similar to/ or any other better wording.

I will fix them as above.

> > +If the @code{E} modifier is specified (@code{%EY}), the locale's
> > +alternate representation for year (generally the combination of
> > +@code{%EC} and @code{%Ey}) is used instead.
> 
> If I understand correctly, the alternative representation for "%EY"
> is just a subformat which may contain "%EC" and "%Ey" or may contain
> anything the user puts there.  What about just removing the sentence
> in brackets and not mentioning what is inside?  My reasoning that it
> may be not true and it does not actually matter what is inside as long
> as it works as the user wants it to work.

That's right.  "%EY" is the full format of "%Ey" with "%EC" and "%Ey"
as a component.  Users can also define formats that do not include
both.  Especially in the Japanese locale, we use that situation in the
first year when the era changes (i.e. 1926, 1989, 2019, etc.).

> > +optional flag (either @code{_} or @code{-}) can be used, so that the
> > +internal @code{%Ey} is interpreted as if decorated with the
> > +appropriate flag.
> 
> Again I'm not sure we should put here the obvious things which work
> as expected.  This may be mentioned here if this bug is perceived
> as severe by Japanese people but I think we don't mention all bugs
> fixed in this section of NEWS.  There is another section which lists
> all fixed bugs but on the other hand the bug 23758 is about the
> default width of %Ey rather than not working flags.
> 
> I am not sure what to do here.  File a separate bug report?  Remove
> any mention of the flags now working as too obvious?  Leave as is?

As mentioned above, currently, padding of "%Ey" can be controlled
freely, but "%EY" depends on "%Ey" default padding included in the
replacing subformats.  This patch is made to be able to control this.

> > -	   tst-tzname tst-y2039 bug-mktime4
> > +	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
> 
> I think you will have to modify LOCALES in this Makefile to include
> the locales you are using.  Please also see below when I discuss the
> test case.  Have you checked if your test case actually runs?

I was running the test case directly in the local environment to check
the behavior.  Certainly, we need to add the locales to be used for
LOCALES of Makefile.  Thank you for pointing out.

> > -				   const struct tm *, bool *
> > +				   const struct tm *, int *, bool *
> 
> It's not clear to me why this new argument has to be a pointer:
> "int *" rather than just "int".  Do you use the value of yr_spec in the
> caller after __strftime_internal is executed?
> 
> I need an explanation and also if you split your patch this should belong
> to the second part.  However, if other maintainers have reviewed and
> accept this part you will not have to do anything.
> 
> > +  int yr_spec = 0;		/* Override padding for %Ey.  */
> 
> Same here, do you need yr_spec here as a variable?
> 
> > -  return __strftime_internal (s, maxsize, format, tp, &tzset_called
> > +  return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
> 
> > -		     const struct tm *tp, bool *tzset_called
> > +		     const struct tm *tp, int *yr_spec, bool *tzset_called
> 
> Same here.

Since the value of yr_spec in the caller is used after
__strftime_internal is executed, this new argument has to be a
pointer.

> > -	  if (! (modifier == 'E'
> > +	  if (! (modifier == L_('E')
> 
> That's an unrelated change, probably does not actually change anything
> but it's good to accept this for the consistency.  If the patch is split,
> it should belong to the first part.

For clarity, I will include unrelated changes in another patch.

> >  	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
> > -					      tp, tzset_called ut_argument
> > -					      LOCALE_ARG);
> > +					      tp, yr_spec, tzset_called
> > +					      ut_argument LOCALE_ARG);
> >  	    add (len, __strftime_internal (p, maxsize - i, subfmt,
> > -					   tp, tzset_called ut_argument
> > -					   LOCALE_ARG));
> > +					   tp, yr_spec, tzset_called
> > +					   ut_argument LOCALE_ARG));
> > +	    *yr_spec = 0;
> 
> Whatever was the value of yr_spec after two calls of __strftime_internal
> it is now lost and the new value is always 0.  It's not clear to me.
> Why do you need a variable if you don't use its value and always set to 0?
> What if a caller initializes yr_spec to anything different than 0?

yr_spec is a variable introduced to pass information such as the flags
of "%Ey" included in the subformat to replace "%EY".  The reason for
setting yr_spec to 0 in the above is to finish the passing of
information such as flags and restore the subsequent behavior to the
default.

> > -		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> > +		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
> 
> Again an unrelated change but good, let's accept it.  This should
> be in the first part, if the patch is split.
> 
> > -	  if (modifier == 'E')
> > +	  if (modifier == L_('E'))
> 
> Again unrelated but good, let's accept it.  This should belong to the
> first part, if the patch is split.

Likewise, I will include them in another patch.

> > +		  if (pad != 0)
> > +		    *yr_spec = pad;
> 
> I guess here is some reason why yr_spec is a pointer but I have not
> analyzed this part yet.  If split, this should belong to the second part.
> 
> > -		  DO_NUMBER (1, (era->offset
> > +		  if (*yr_spec != 0)
> > +		    pad = *yr_spec;
> 
> Again this is something which I don't understand but maybe I should
> just read more.  If split, this should belong to the second part.

According to the value of "pad", DO_NUMBER macro will prepare the
appropriate padding.  Yes, it belongs to the second patch.

> > +		  DO_NUMBER (2, (era->offset
> 
> If I understand correctly, this is the core part of setting the default
> width of "%Ey" to 2.  Very good, very desired and very reasonable, especially
> if you scroll few lines down and see that "%y" has the value 2 here as well.

Yes, that's right.

> > +static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
> 
> These locales should be added to LOCALES in Makefile to make sure
> they are generated before the test is run.  As far as I can see only
> "ja_JP.UTF-8" is generated now but please list them all.  We should not
> rely on other scripts because they may remove "ja_JP.UTF-8" ever in
> future.

As you pointed out, I will add "ja_JP.UTF-8", "lo_LA.UTF-8", and
"th_TH.UTF-8" to LOCALES in Makefile.

> > +#define nlocales (sizeof (locales) / sizeof (locales[0]))
> 
> I think we already have a macro to calculate the number of elements
> in the array but I don't remember its name at the moment.

As a result of searching the Glibc source tree as follows, it seems
that there are many cases that we prepare the current situation on our
own.  I could not find a generic macro.

tamuki@seadog:~/glibc$ grep -nr "^#define .*sizeof .* / sizeof .*[0].*"

> > +static const char *formats[] = { "%EY", "%_EY", "%-EY" };
> 
> So this test case tests the behavior of additional flags passed to "%EY",
> is that correct?  Therefore if the patch is split then it should belong
> to the second part.  It does not test the width of "%Ey" and whether
> "%Ey" works correctly (displays the correct year number) at all, does it?
> Do we have such a test already?  Do we need it?  Is it difficult for you
> to write it?  Are we able to prepare a test for the future Japanese era
> already?

In this test case, the test for the width of "%Ey" and whether "%Ey"
works correctly (displays the correct year number) are confirmed by
"%Ey" included in the subformat which replaces "%EY".

> > +	  printf ("locale %s does not exist, skipping...\n", locales[i]);
> 
> I'm afraid that this test skips most of the locales.

The locales to be tested are "ja_JP.UTF-8", "lo_LA.UTF-8", and
"th_TH.UTF-8" which use an alternative era.  As long as I add these to
LOCALES in Makefile, I think that the test will be executed.

> I don't provide the feedback for the rest of this test case because
> I don't feel competent enough.

It tests the width of "%Ey" and whether "%Ey" works correctly
(displays the correct year number) while changing the flags of "%Ey"
and the year, month, and day near the boundary of the era.

Regards,
TAMUKI Shoichi
Joseph Myers Jan. 7, 2019, 4:23 p.m. UTC | #9
On Sun, 6 Jan 2019, TAMUKI Shoichi wrote:

> > I think we already have a macro to calculate the number of elements
> > in the array but I don't remember its name at the moment.
> 
> As a result of searching the Glibc source tree as follows, it seems
> that there are many cases that we prepare the current situation on our
> own.  I could not find a generic macro.

It's array_length in array_length.h.
TAMUKI Shoichi Jan. 8, 2019, 9:06 a.m. UTC | #10
Hello Joseph,

From: Joseph Myers <joseph@codesourcery.com>
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Mon, 7 Jan 2019 16:23:48 +0000

> > > I think we already have a macro to calculate the number of elements
> > > in the array but I don't remember its name at the moment.
> > 
> > As a result of searching the Glibc source tree as follows, it seems
> > that there are many cases that we prepare the current situation on our
> > own.  I could not find a generic macro.
> 
> It's array_length in array_length.h.

Thank you for the information.

Using the array_length macro, I confirmed the testsuite succesfully
runs.  I will apply this to the next patch.

Regards,
TAMUKI Shoichi
TAMUKI Shoichi Jan. 12, 2019, 1:33 p.m. UTC | #11
Hello Carlos and all,

From: Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH v4] Improve the width of alternate representation for year in strftime [BZ #23758]
Date: Thu, 3 Jan 2019 09:43:21 -0500

> Please split this patch into 2 patches as soon as possible.

I split the patch into 2 patches. [1][2][3]

[1] https://sourceware.org/ml/libc-alpha/2019-01/msg00239.html
[2] https://sourceware.org/ml/libc-alpha/2019-01/msg00240.html
[3] https://sourceware.org/ml/libc-alpha/2019-01/msg00241.html

> The fix for "%_EY"/"%-EY" should be reviewed separately as a bug fix,
> and can go into master even if we are in a frozen state (we are currently
> frozen for 2.29 release).
> 
> I want to review these changes in parallel so we can get them into 2.29
> and be prepared for the Era name change.

So, we need to review these patches so that they can go into master.

Regards,
TAMUKI Shoichi

Patch
diff mbox series

diff --git a/NEWS b/NEWS
index 8483dcf4928..40e33e228f8 100644
--- a/NEWS
+++ b/NEWS
@@ -41,6 +41,13 @@  Major new features:
   incosistent mutex state after fork call in multithread environment.
   In both popen and system there is no direct access to user-defined mutexes.
 
+* Improve the width of alternate representation for year in strftime.
+  For %Ey conversion specifier, the default action is now to pad the
+  number with zero to keep minimum 2 digits as well as %y.  Also, the
+  optional flag (either _ or -) can be used for %EY, so that the
+  internal %Ey is interpreted as if decorated with the appropriate
+  flag.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The glibc.tune tunable namespace has been renamed to glibc.cpu and the
diff --git a/manual/time.texi b/manual/time.texi
index 4d154452eba..c1d5770f9e0 100644
--- a/manual/time.texi
+++ b/manual/time.texi
@@ -1393,6 +1393,9 @@  The preferred calendar time representation for the current locale.
 The century of the year.  This is equivalent to the greatest integer not
 greater than the year divided by 100.
 
+If the @code{E} modifier is specified (@code{%EC}), the locale's
+alternate representation for year (the era name) is used instead.
+
 This format was first standardized by POSIX.2-1992 and by @w{ISO C99}.
 
 @item %d
@@ -1568,10 +1571,22 @@  The preferred time of day representation for the current locale.
 The year without a century as a decimal number (range @code{00} through
 @code{99}).  This is equivalent to the year modulo 100.
 
+If the @code{E} modifier is specified (@code{%Ey}), the locale's
+alternate representation for year (the era year) is used instead.  The
+default action is to pad the number with zero to keep minimum 2 digits
+as well as @code{%y}.
+
 @item %Y
 The year as a decimal number, using the Gregorian calendar.  Years
 before the year @code{1} are numbered @code{0}, @code{-1}, and so on.
 
+If the @code{E} modifier is specified (@code{%EY}), the locale's
+alternate representation for year (generally the combination of
+@code{%EC} and @code{%Ey}) is used instead.  In this case, the
+optional flag (either @code{_} or @code{-}) can be used, so that the
+internal @code{%Ey} is interpreted as if decorated with the
+appropriate flag.
+
 @item %z
 @w{RFC 822}/@w{ISO 8601:1988} style numeric time zone (e.g.,
 @code{-0600} or @code{+0100}), or nothing if no time zone is
diff --git a/time/Makefile b/time/Makefile
index 743bd99f182..9f94e07e3bc 100644
--- a/time/Makefile
+++ b/time/Makefile
@@ -43,7 +43,7 @@  tests	:= test_time clocktest tst-posixtz tst-strptime tst_wcsftime \
 	   tst-getdate tst-mktime tst-mktime2 tst-ftime_l tst-strftime \
 	   tst-mktime3 tst-strptime2 bug-asctime bug-asctime_r bug-mktime1 \
 	   tst-strptime3 bug-getdate1 tst-strptime-whitespace tst-ftime \
-	   tst-tzname tst-y2039 bug-mktime4
+	   tst-tzname tst-y2039 bug-mktime4 tst-strftime2
 
 include ../Rules
 
diff --git a/time/strftime_l.c b/time/strftime_l.c
index c71f9f47a95..8797341ea53 100644
--- a/time/strftime_l.c
+++ b/time/strftime_l.c
@@ -434,7 +434,7 @@  static CHAR_T const month_name[][10] =
 #endif
 
 static size_t __strftime_internal (CHAR_T *, size_t, const CHAR_T *,
-				   const struct tm *, bool *
+				   const struct tm *, int *, bool *
 				   ut_argument_spec
 				   LOCALE_PARAM) __THROW;
 
@@ -456,8 +456,9 @@  my_strftime (CHAR_T *s, size_t maxsize, const CHAR_T *format,
   tmcopy = *tp;
   tp = &tmcopy;
 #endif
+  int yr_spec = 0;		/* Override padding for %Ey.  */
   bool tzset_called = false;
-  return __strftime_internal (s, maxsize, format, tp, &tzset_called
+  return __strftime_internal (s, maxsize, format, tp, &yr_spec, &tzset_called
 			      ut_argument LOCALE_ARG);
 }
 #ifdef _LIBC
@@ -466,7 +467,7 @@  libc_hidden_def (my_strftime)
 
 static size_t
 __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
-		     const struct tm *tp, bool *tzset_called
+		     const struct tm *tp, int *yr_spec, bool *tzset_called
 		     ut_argument_spec LOCALE_PARAM)
 {
 #if defined _LIBC && defined USE_IN_EXTENDED_LOCALE_MODEL
@@ -820,7 +821,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  if (modifier == L_('O'))
 	    goto bad_format;
 #ifdef _NL_CURRENT
-	  if (! (modifier == 'E'
+	  if (! (modifier == L_('E')
 		 && (*(subfmt =
 		       (const CHAR_T *) _NL_CURRENT (LC_TIME,
 						     NLW(ERA_D_T_FMT)))
@@ -838,11 +839,12 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  {
 	    CHAR_T *old_start = p;
 	    size_t len = __strftime_internal (NULL, (size_t) -1, subfmt,
-					      tp, tzset_called ut_argument
-					      LOCALE_ARG);
+					      tp, yr_spec, tzset_called
+					      ut_argument LOCALE_ARG);
 	    add (len, __strftime_internal (p, maxsize - i, subfmt,
-					   tp, tzset_called ut_argument
-					   LOCALE_ARG));
+					   tp, yr_spec, tzset_called
+					   ut_argument LOCALE_ARG));
+	    *yr_spec = 0;
 
 	    if (to_uppcase)
 	      while (old_start < p)
@@ -917,7 +919,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 #ifdef _NL_CURRENT
 	  if (! (modifier == L_('E')
 		 && (*(subfmt =
-		       (const CHAR_T *)_NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
+		       (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(ERA_D_FMT)))
 		     != L_('\0'))))
 	    subfmt = (const CHAR_T *) _NL_CURRENT (LC_TIME, NLW(D_FMT));
 	  goto subformat;
@@ -1262,7 +1264,7 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	  DO_NUMBER (1, tp->tm_wday);
 
 	case L_('Y'):
-	  if (modifier == 'E')
+	  if (modifier == L_('E'))
 	    {
 #if HAVE_STRUCT_ERA_ENTRY
 	      struct era_entry *era = _nl_get_era_entry (tp HELPER_LOCALE_ARG);
@@ -1273,6 +1275,8 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 # else
 		  subfmt = era->era_format;
 # endif
+		  if (pad != 0)
+		    *yr_spec = pad;
 		  goto subformat;
 		}
 #else
@@ -1294,7 +1298,9 @@  __strftime_internal (CHAR_T *s, size_t maxsize, const CHAR_T *format,
 	      if (era)
 		{
 		  int delta = tp->tm_year - era->start_date[0];
-		  DO_NUMBER (1, (era->offset
+		  if (*yr_spec != 0)
+		    pad = *yr_spec;
+		  DO_NUMBER (2, (era->offset
 				 + delta * era->absolute_direction));
 		}
 #else
diff --git a/time/tst-strftime2.c b/time/tst-strftime2.c
new file mode 100644
index 00000000000..6f2cf4ddd81
--- /dev/null
+++ b/time/tst-strftime2.c
@@ -0,0 +1,133 @@ 
+/* Verify the behavior of strftime on alternate representation for year.
+
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <locale.h>
+#include <time.h>
+#include <stdio.h>
+#include <string.h>
+
+static const char *locales[] = { "ja_JP.UTF-8", "lo_LA.UTF-8", "th_TH.UTF-8" };
+#define nlocales (sizeof (locales) / sizeof (locales[0]))
+
+static const char *formats[] = { "%EY", "%_EY", "%-EY" };
+#define nformats (sizeof (formats) / sizeof (formats[0]))
+
+static const struct
+{
+  const int d, m, y;
+} dates[] =
+  {
+    { 1, 3, 88 },
+    { 7, 0, 89 },
+    { 8, 0, 89 },
+    { 1, 3, 90 },
+    { 1, 3, 97 },
+    { 1, 3, 98 }
+  };
+#define ndates (sizeof (dates) / sizeof (dates[0]))
+
+static char ref[nlocales][nformats][ndates][100];
+
+static void
+mkreftable (void)
+{
+  int i, j, k;
+  char era[10];
+  static const int yrj[] = { 63, 64, 1, 2, 9, 10 };
+  static const int yrb[] = { 2531, 2532, 2532, 2533, 2540, 2541 };
+
+  for (i = 0; i < nlocales; i++)
+    for (j = 0; j < nformats; j++)
+      for (k = 0; k < ndates; k++)
+	{
+	  if (i == 0)
+	    {
+	      sprintf (era, "%s", (k < 2) ? "\xe6\x98\xad\xe5\x92\x8c"
+					  : "\xe5\xb9\xb3\xe6\x88\x90");
+	      if (yrj[k] == 1)
+		sprintf (ref[i][j][k], "%s\xe5\x85\x83\xe5\xb9\xb4", era);
+	      else
+		{
+		  if (j == 0)
+		    sprintf (ref[i][j][k], "%s%02d\xe5\xb9\xb4", era, yrj[k]);
+		  else if (j == 1)
+		    sprintf (ref[i][j][k], "%s%2d\xe5\xb9\xb4", era, yrj[k]);
+		  else
+		    sprintf (ref[i][j][k], "%s%d\xe5\xb9\xb4", era, yrj[k]);
+		}
+	    }
+	  else if (i == 1)
+	    {
+	      sprintf (era, "\xe0\xba\x9e\x2e\xe0\xba\xaa\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	  else
+	    {
+	      sprintf (era, "\xe0\xb8\x9e\x2e\xe0\xb8\xa8\x2e ");
+	      sprintf (ref[i][j][k], "%s%d", era, yrb[k]);
+	    }
+	}
+}
+
+static int
+do_test (void)
+{
+  int i, j, k, result = 0;
+  struct tm ttm;
+  char date[11], buf[100];
+  size_t r, e;
+
+  mkreftable ();
+  for (i = 0; i < nlocales; i++)
+    {
+      if (setlocale (LC_ALL, locales[i]) == NULL)
+	{
+	  printf ("locale %s does not exist, skipping...\n", locales[i]);
+	  continue;
+	}
+      printf ("[%s]\n", locales[i]);
+      for (j = 0; j < nformats; j++)
+	{
+	  for (k = 0; k < ndates; k++)
+	    {
+	      ttm.tm_mday = dates[k].d;
+	      ttm.tm_mon  = dates[k].m;
+	      ttm.tm_year = dates[k].y;
+	      strftime (date, sizeof (date), "%F", &ttm);
+	      r = strftime (buf, sizeof (buf), formats[j], &ttm);
+	      e = strlen (ref[i][j][k]);
+	      printf ("%s\t\"%s\"\t\"%s\"", date, formats[j], buf);
+	      if (strcmp (buf, ref[i][j][k]) != 0)
+		{
+		  printf ("\tshould be \"%s\"", ref[i][j][k]);
+		  if (r != e)
+		    printf ("\tgot: %zu, expected: %zu", r, e);
+		  result = 1;
+		}
+	      else
+		printf ("\tOK");
+	      putchar ('\n');
+	    }
+	  putchar ('\n');
+	}
+    }
+  return result;
+}
+
+#include <support/test-driver.c>