diff mbox series

Add verbose comments to 'era' in ja_JP locale.

Message ID ce089856-d470-6699-7c72-0b1c3644b85a@redhat.com
State New
Headers show
Series Add verbose comments to 'era' in ja_JP locale. | expand

Commit Message

Carlos O'Donell March 28, 2019, 6:09 p.m. UTC
Rafal,

While reviewing DJ's new test I went through all the dates, years,
and names, and figured I'd put them into a verbose comment in ja_JP
to make this easier to maintain in the future.

What do you think of this for master?

8< --- 8< ---- 8<
---
  localedata/locales/ja_JP | 23 +++++++++++++++++++++++
  1 file changed, 23 insertions(+)

Comments

Carlos O'Donell March 28, 2019, 8:19 p.m. UTC | #1
On 3/28/19 2:09 PM, Carlos O'Donell wrote:
> Rafal,
> 
> While reviewing DJ's new test I went through all the dates, years,
> and names, and figured I'd put them into a verbose comment in ja_JP
> to make this easier to maintain in the future.
> 
> What do you think of this for master?
> 
> 8< --- 8< ---- 8<
> ---
>   localedata/locales/ja_JP | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
> index 9bfbb2bb9b..74ef9e39f3 100644
> --- a/localedata/locales/ja_JP
> +++ b/localedata/locales/ja_JP
> @@ -14946,6 +14946,29 @@ am_pm    "<U5348><U524D>";"<U5348><U5F8C>"
> 
>   t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
> 
> +# The era names are laid out in groups of 2 to account for the desire
> +# to avoid using '1' for the first era year.  Instead of 1 we use '元'
> +# <U5143> or "gan" as the first era year.
> +#
> +# The following dates and their names are recorded below in descending
> +# date order (note that '年' <U5E74> or "year" follows each date).
> +#
> +# Offset: Start date:    End date:    Era name:    Using "gan":
> +# (Y)     (YYYY-MM-DD)
> +# 2       1990-01-01    +*        平成 (Heisei)    No
> +# 1       1989-01-08    1989-12-31    平成 (Heisei)    Yes
> +# 2       1927-01-01    1989-01-07    昭和 (Shōwa)    No
> +# 1       1926-12-25    1926-12-31    昭和 (Shōwa)    Yes
> +# 2       1913-01-01    1926-12-24    大正 (Taishō)    No
> +# 1       1912-07-30    1912-12-31    大正 (Taishō)    Yes
> +# 6       1873-01-01    1912-07-29    明治 (Meiji)    No
> +# 1       0001-01-01    1872-12-31    西暦 (C.E)    No
> +# 1       -0000-12-31    -*        紀元前 (B.C.E.)    No

This should read "-0001-12-31" here. Fixed locally.

> +#
> +# Note:
> +# - The last entry 紀元前 means pre-era/B.C./B.C.E.
> +# - The second-to-last entry 西暦 means C.E.
> +#
>   era    "+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
>       "+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
>       "+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/
Rafal Luzynski March 28, 2019, 10:35 p.m. UTC | #2
28.03.2019 19:09 Carlos O'Donell <codonell@redhat.com> wrote:
> 
> Rafal,
> 
> While reviewing DJ's new test I went through all the dates, years,
> and names, and figured I'd put them into a verbose comment in ja_JP
> to make this easier to maintain in the future.
> 
> What do you think of this for master?

Sadly, I don't have enough knowledge about Japanese calendar to verify
if your comments are correct or not.  Fortunately I can see Tamuki
Shoichi on the CC: list so I hope to read some feedback from him.

Few remarks below, though:

> 8< --- 8< ---- 8<
> ---
>   localedata/locales/ja_JP | 23 +++++++++++++++++++++++
>   1 file changed, 23 insertions(+)
> 
> diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
> index 9bfbb2bb9b..74ef9e39f3 100644
> --- a/localedata/locales/ja_JP
> +++ b/localedata/locales/ja_JP
> @@ -14946,6 +14946,29 @@ am_pm	"<U5348><U524D>";"<U5348><U5F8C>"
> [...]
> +# Offset: Start date:	End date:	Era name:	Using "gan":
> +# (Y)     (YYYY-MM-DD)
> +# 2       1990-01-01	+*		平成 (Heisei)	No

What does the symbol "+*" mean?  Am I the only one confused?  Does
it maybe mean "infinity"?  Can we use anything different, like "+inf"?

> +# 1       1989-01-08	1989-12-31	平成 (Heisei)	Yes
> +# 2       1927-01-01	1989-01-07	昭和 (Shōwa)	No
> +# 1       1926-12-25	1926-12-31	昭和 (Shōwa)	Yes
> +# 2       1913-01-01	1926-12-24	大正 (Taishō)	No
> +# 1       1912-07-30	1912-12-31	大正 (Taishō)	Yes
> +# 6       1873-01-01	1912-07-29	明治 (Meiji)	No
> +# 1       0001-01-01	1872-12-31	西暦 (C.E)	No
> +# 1       -0000-12-31	-*		紀元前 (B.C.E.)	No

I was going to complain that the column with "Yes" and "No" is badly
unaligned.  It appears bad in my email client but it became aligned
when I clicked "reply".  Just please make sure all columns are aligned.
Unfortunately, this time "-*" (again, is this anything like "-inf"?)
looks shifted too much to the right and pushes the following columns.

Hm... if it means "-inf" then shouldn't the columns be swapped, I mean
"Start date: -inf, End date: -0001-12-31"?

(Yes, I read your another email as well.)

> +#
> +# Note:
> +# - The last entry 紀元前 means pre-era/B.C./B.C.E.
> +# - The second-to-last entry 西暦 means C.E.

Aren't the terms "B.C.", "B.C.E.", and "C.E." reserved for the
Christian calendar?  I'm sorry about my ignorance.

I think I need further explanations before I tell any opinion about
this patch.  Of course, I'll appreciate if other people give more
valuable feedback.

Regards,

Rafal
TAMUKI Shoichi March 29, 2019, 6:53 a.m. UTC | #3
Hello Carlos-san,

From: Carlos O'Donell <codonell@redhat.com>
Subject: [PATCH] Add verbose comments to 'era' in ja_JP locale.
Date: Thu, 28 Mar 2019 14:09:36 -0400

> While reviewing DJ's new test I went through all the dates, years,
> and names, and figured I'd put them into a verbose comment in ja_JP
> to make this easier to maintain in the future.
> 
> What do you think of this for master?

Thank you for the new text.

Sorry, I am not happy to put the information in ja_JP locale data.
Since it is necessary to describe similar information in other locale
data such as *_TW, and also it becomes rather troublesome to maintain,
it would be better to include the information in a documentation named
"The locale definition source file format", that is expected to be
created in Glibc.

This documentation looks something like this:

http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html

Regards,
TAMUKI Shoichi
TAMUKI Shoichi March 29, 2019, 8:43 a.m. UTC | #4
Hello Carlos-san,

From: TAMUKI Shoichi <tamuki@linet.gr.jp>
Subject: Re: [PATCH] Add verbose comments to 'era' in ja_JP locale.
Date: Fri, 29 Mar 2019 15:53:08 +0900

> Sorry, I am not happy to put the information in ja_JP locale data.
> Since it is necessary to describe similar information in other locale
> data such as *_TW, and also it becomes rather troublesome to maintain,
> it would be better to include the information in a documentation named
> "The locale definition source file format", that is expected to be
> created in Glibc.

If adding the text as shown below, it does not affect to maintain era
data, so there may be no problem.  It is available in *_TW as well.

diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
index 9bfbb2bb9b..983b866650 100644
--- a/localedata/locales/ja_JP
+++ b/localedata/locales/ja_JP
@@ -14946,6 +14946,12 @@ am_pm	"<U5348><U524D>";"<U5348><U5F8C>"
 
 t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
 
+% The era names are laid out in groups of 2 to account for the desire
+% to avoid using '1' for the first era year.  Instead of '1' we use
+% <U5143> or "origin" as the first era year.
+%
+% Note that <U5E74> or "year" follows each year number.
+%
 era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
 	"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
 	"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/

The rest of the information is independent of the specific locale data
and it is appropriate to include it in a separate document.

Please be aware that we will be in the process of adding entry for the
new Japanese era to ja_JP locale data for several days from now.

Regards,
TAMUKI Shoichi
Rafal Luzynski March 29, 2019, 11:04 a.m. UTC | #5
29.03.2019 07:53 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
> [...]
> Sorry, I am not happy to put the information in ja_JP locale data.

That was my first thought as well.  But after a while I found a reason
in Carlos' patch.  Yes, we should not explain in the locale data file
how era format works but we should explain how the rules have been
applied to implement this particular locale data file.  It's like
a comment in a source code file: it should not explain how the language
works but how this particular solution had been implemented and what
this piece of code meant.

Shortly: explaining the format of the era field - no; explaining how
and why it has been provided for ja_JP - yes.

> [...]
> it would be better to include the information in a documentation named
> "The locale definition source file format", that is expected to be
> created in Glibc.

Sadly, that document does not exist now.  As far as I remember the
previous documentation was so outdated that it was better to remove it.
The current guidelines say that in order to create a new locale file
you should take any existing (and working) now and change it to your
needs.

Which of course is not an excuse to put the general description of the
era field in a locale data file.

> This documentation looks something like this:
> 
> http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html

This is definitely good but Glibc has many extensions which means
that a potential Glibc version should include this information plus
have many additional pieces.

Regards,

Rafal
Carlos O'Donell March 29, 2019, 2:57 p.m. UTC | #6
On 3/29/19 7:04 AM, Rafal Luzynski wrote:
> 29.03.2019 07:53 TAMUKI Shoichi <tamuki@linet.gr.jp> wrote:
>> [...]
>> Sorry, I am not happy to put the information in ja_JP locale data.
> 
> That was my first thought as well.  But after a while I found a reason
> in Carlos' patch.  Yes, we should not explain in the locale data file
> how era format works but we should explain how the rules have been
> applied to implement this particular locale data file.  It's like
> a comment in a source code file: it should not explain how the language
> works but how this particular solution had been implemented and what
> this piece of code meant.
> 
> Shortly: explaining the format of the era field - no; explaining how
> and why it has been provided for ja_JP - yes.
> 
>> [...]
>> it would be better to include the information in a documentation named
>> "The locale definition source file format", that is expected to be
>> created in Glibc.
> 
> Sadly, that document does not exist now.  As far as I remember the
> previous documentation was so outdated that it was better to remove it.
> The current guidelines say that in order to create a new locale file
> you should take any existing (and working) now and change it to your
> needs.
> 
> Which of course is not an excuse to put the general description of the
> era field in a locale data file.
> 
>> This documentation looks something like this:
>>
>> http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html
> 
> This is definitely good but Glibc has many extensions which means
> that a potential Glibc version should include this information plus
> have many additional pieces.

I agree with Rafal on all points.

Perfect is the enemy of the good.

We certainly need a document describing how to write, edit, and
compile locales, and what formats are avialable.

Such a documnet is a huge untertaking. The intent of my patch, as
Rafal points out, is to add source-code comments to the ja_JP
locale to make it easier for me to review.

It seems like Rafal does not object to the patch.

I'll see if I can get consensus from TAMUKI-san in the other email.
Rafal Luzynski March 29, 2019, 8:46 p.m. UTC | #7
29.03.2019 15:57 Carlos O'Donell <codonell@redhat.com> wrote:
> [...]
> It seems like Rafal does not object to the patch.

True, I don't object which means I can't see any error but I'd like
to hear the final work from TAMUKI-san due to my poor knowledge about
the Japanese calendar.

> I'll see if I can get consensus from TAMUKI-san in the other email.

That's what I mean.

Regards,

Rafal
TAMUKI Shoichi March 30, 2019, 12:50 p.m. UTC | #8
Hello Rafal-san,

From: Rafal Luzynski <digitalfreak@lingonborough.com>
Subject: Re: [PATCH] Add verbose comments to 'era' in ja_JP locale.
Date: Fri, 29 Mar 2019 12:04:18 +0100 (CET)

> > Sorry, I am not happy to put the information in ja_JP locale data.
> 
> That was my first thought as well.  But after a while I found a reason
> in Carlos' patch.  Yes, we should not explain in the locale data file
> how era format works but we should explain how the rules have been
> applied to implement this particular locale data file.  It's like
> a comment in a source code file: it should not explain how the language
> works but how this particular solution had been implemented and what
> this piece of code meant.
> 
> Shortly: explaining the format of the era field - no; explaining how
> and why it has been provided for ja_JP - yes.

OK.  I got it.

> > it would be better to include the information in a documentation named
> > "The locale definition source file format", that is expected to be
> > created in Glibc.
> 
> Sadly, that document does not exist now.  As far as I remember the
> previous documentation was so outdated that it was better to remove it.
> The current guidelines say that in order to create a new locale file
> you should take any existing (and working) now and change it to your
> needs.
> 
> Which of course is not an excuse to put the general description of the
> era field in a locale data file.

Was the document deleted?  Oh my goodness.

As you know, it was pointed out that there is a bug in the direction
of BC in Bugzilla.  If there was a proper manual in Glibc, there would
be no problem.

https://sourceware.org/bugzilla/show_bug.cgi?id=24162#c6

> > This documentation looks something like this:
> > 
> > http://pubs.opengroup.org/onlinepubs/7908799/xbd/locale.html
> 
> This is definitely good but Glibc has many extensions which means
> that a potential Glibc version should include this information plus
> have many additional pieces.

Certainly, it will take enough effort and time to rebuild it.

By the way, Glibc does not support the abbreviation of era, so we can
not use commonly used expressions in Japan, like "H31.03.30" (today).
I want to introduce "abera" to Glibc in the future.

# H -> Heisei
# S -> Showa
# T -> Taisho
# M -> Meiji
# ROC -> Minguo (in Taiwan)

Regards,
TAMUKI Shoichi
TAMUKI Shoichi March 30, 2019, 12:52 p.m. UTC | #9
Hello Carlos-san,

From: Carlos O'Donell <codonell@redhat.com>
Subject: Re: [PATCH] Add verbose comments to 'era' in ja_JP locale.
Date: Fri, 29 Mar 2019 10:57:14 -0400

> I agree with Rafal on all points.
> 
> Perfect is the enemy of the good.
> 
> We certainly need a document describing how to write, edit, and
> compile locales, and what formats are avialable.
> 
> Such a documnet is a huge untertaking. The intent of my patch, as
> Rafal points out, is to add source-code comments to the ja_JP
> locale to make it easier for me to review.

Agreed.  We certainly need such a document for Glibc, however it will
take enough effort and time to rebuild it.  Since era is particularly
complex in format, it is a good idea to put descriptions in ja_JP
locale.

However, I have some suggestions.

In the current patch, I would like to be brief as the line of the
comment are long.  In particular, since the description segments of
era has overlapping content, it is not necessary to put it in the
comment.  Instead, how about adding an explanation of the format of
description segment of era.

Next, I would like to avoid putting kanji character in the locale
data.  The locale data of Glibc can be customized by users using
localedef.  As ja_JP locale data does not depend on encodings, users
of either ja_JP.eucJP or ja_JP.SJIS environment may be garbled and
unable to edit correctly.  It is good to describe as <GAN>, <NEN>,
etc. according to other existing comment lines of ja_JP locale data.

Also, it is better to use "%" instead of "#" at the beginning of the
comment line of locale data.

How about the following explanation.

% The era names are laid out in groups of 2 to account for the desire
% to avoid using '1' for the first era year.  Instead of '1' we use
% <U5143> or <GAN> as the first era year.
%
% The following dates and their names are recorded below in descending
% date order (note that <U5E74> or <NEN> follows each date).
% <HEISEI> -> <SHOWA> -> <TAISHO> -> <MEIJI> -> <AD> -> <BC>
%
% Each string is an era description segment with the format:
% "direction:offset:start_date:end_date:era_name:era_format"
%
% Note:
% - The '+*' entry in end_date means "forever going forward"
% - The '-*' entry in end_date means "forever going backwards" count up.
% - Negative year number in start_date is prior to AD 1 (BC) counting up.
% - The last entry <U7D00><U5143><U524D> in era_name means BC.
% - The second-to-last entry <U897F><U66A6> in era_name means AD.
%

Regards,
TAMUKI Shoichi
diff mbox series

Patch

diff --git a/localedata/locales/ja_JP b/localedata/locales/ja_JP
index 9bfbb2bb9b..74ef9e39f3 100644
--- a/localedata/locales/ja_JP
+++ b/localedata/locales/ja_JP
@@ -14946,6 +14946,29 @@  am_pm	"<U5348><U524D>";"<U5348><U5F8C>"
  
  t_fmt_ampm "%p%I<U6642>%M<U5206>%S<U79D2>"
  
+# The era names are laid out in groups of 2 to account for the desire
+# to avoid using '1' for the first era year.  Instead of 1 we use '元'
+# <U5143> or "gan" as the first era year.
+#
+# The following dates and their names are recorded below in descending
+# date order (note that '年' <U5E74> or "year" follows each date).
+#
+# Offset: Start date:	End date:	Era name:	Using "gan":
+# (Y)     (YYYY-MM-DD)
+# 2       1990-01-01	+*		平成 (Heisei)	No
+# 1       1989-01-08	1989-12-31	平成 (Heisei)	Yes
+# 2       1927-01-01	1989-01-07	昭和 (Shōwa)	No
+# 1       1926-12-25	1926-12-31	昭和 (Shōwa)	Yes
+# 2       1913-01-01	1926-12-24	大正 (Taishō)	No
+# 1       1912-07-30	1912-12-31	大正 (Taishō)	Yes
+# 6       1873-01-01	1912-07-29	明治 (Meiji)	No
+# 1       0001-01-01	1872-12-31	西暦 (C.E)	No
+# 1       -0000-12-31	-*		紀元前 (B.C.E.)	No
+#
+# Note:
+# - The last entry 紀元前 means pre-era/B.C./B.C.E.
+# - The second-to-last entry 西暦 means C.E.
+#
  era	"+:2:1990//01//01:+*:<U5E73><U6210>:%EC%Ey<U5E74>";/
  	"+:1:1989//01//08:1989//12//31:<U5E73><U6210>:%EC<U5143><U5E74>";/
  	"+:2:1927//01//01:1989//01//07:<U662D><U548C>:%EC%Ey<U5E74>";/