diff mbox series

Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8.

Message ID 6b0df162-d17a-ec1d-0fc0-728dc3a8ef9b@linux.ibm.com
State New
Headers show
Series Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8. | expand

Commit Message

Stefan Liebler Aug. 26, 2019, 2:41 p.m. UTC
Hi,

the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
Unfortunately the test posix/tst-regex assumes the former encoding.

This patch just changes the encoding back to ISO-8859.
Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
which leads to two further matches in the first call to test_expr.

Bye
Stefan

ChangeLog:

	* ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.

Comments

Zack Weinberg Aug. 26, 2019, 3:06 p.m. UTC | #1
On Mon, Aug 26, 2019 at 10:41 AM Stefan Liebler <stli@linux.ibm.com> wrote:
> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
> Unfortunately the test posix/tst-regex assumes the former encoding.
>
> This patch just changes the encoding back to ISO-8859.
> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
> which leads to two further matches in the first call to test_expr.

Better the test should be fixed to assume UTF-8, or, even better, not
to rely on ChangeLog files for test data.

zw
Stefan Liebler Aug. 27, 2019, 10:18 a.m. UTC | #2
On 8/26/19 5:06 PM, Zack Weinberg wrote:
> On Mon, Aug 26, 2019 at 10:41 AM Stefan Liebler <stli@linux.ibm.com> wrote:
>> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
>> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
>> Unfortunately the test posix/tst-regex assumes the former encoding.
>>
>> This patch just changes the encoding back to ISO-8859.
>> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
>> which leads to two further matches in the first call to test_expr.
> 
> Better the test should be fixed to assume UTF-8, or, even better, not
> to rely on ChangeLog files for test data.
> 
> zw
> 

Sure. Sounds fair.
I've copied the old ChangeLog.8 file and use it as a dedicated 
input-file for tst-regex. See attached patch.

Bye
Stefan
Paul Eggert Aug. 27, 2019, 10:54 a.m. UTC | #3
Stefan Liebler wrote:
>>
>> Better the test should be fixed to assume UTF-8, or, even better, not
>> to rely on ChangeLog files for test data.
>>
>> zw
>>
> 
> Sure. Sounds fair.
> I've copied the old ChangeLog.8 file and use it as a dedicated input-file for 
> tst-regex. See attached patch.

Better to do both, i.e., to not have the test cases rely on ChangeLog files, and 
to use UTF-8 in source files. Proposed patch attached. This patch has mixed 
encoding since it changes a file's encoding from Latin-1 to UTF-8, so it's 
compressed to prevent my email client from mangling it.
Carlos O'Donell Aug. 27, 2019, 6:41 p.m. UTC | #4
On 8/26/19 10:41 AM, Stefan Liebler wrote:
> Hi,
> 
> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
> Unfortunately the test posix/tst-regex assumes the former encoding.
> 
> This patch just changes the encoding back to ISO-8859.
> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
> which leads to two further matches in the first call to test_expr.
> 
> Bye
> Stefan
> 
> ChangeLog:
> 
>     * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.
Stefan, Thank you for trying to fix this. I think we should handle
it slightly different and set a precedent here.

Please copy ChangeLog.8 to tst-regex.in and use a test-specific input
file. I find it silly that we've been using ChangeLog.8 for anything.

To be clear I think we should:
* Leave the ChangeLog in UTF-8. -- This is for attribution only.
* Add a new tst-regex.in  -- This is for testing only.

Likewise if you find anything else like this I'd say we split out the
inputs into test-specific inputs.

Any time we mix attribution and testing is bound to go wrong.
Carlos O'Donell Aug. 27, 2019, 6:42 p.m. UTC | #5
On 8/27/19 6:18 AM, Stefan Liebler wrote:
> commit a6af326cb423f6f2ae7475b1619807e83c7e9ae6
> Author: Stefan Liebler <stli@linux.ibm.com>
> Date:   Tue Aug 27 09:11:32 2019 +0200
> 
>     Fix posix/tst-regex by using a dedicated input-file.
>     
>     The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
>     changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
>     Unfortunately the test posix/tst-regex assumes the former encoding.
>     Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
>     which would lead to two further matches in the first call to test_expr.
>     
>     This patch just copies the former ChangeLog.8 file to tst-regex.input
>     and adjusts the test in order to use this new input file.
>     
>     ChangeLog:
>     
>             * posix/tst-regex.c (do_test): Use tst-regex.input as input file.
>             * posix/tst-regex.input: New file.
> 
> diff --git a/posix/tst-regex.c b/posix/tst-regex.c
> index fe05da9b63..c5d802625a 100644
> --- a/posix/tst-regex.c
> +++ b/posix/tst-regex.c
> @@ -67,7 +67,7 @@ do_test (void)
>    mtrace ();
>  
>    /* Make the content of the file available in memory.  */
> -  file = "../ChangeLog.old/ChangeLog.8";
> +  file = "./tst-regex.input";
>    fd = open (file, O_RDONLY);
>    if (fd == -1)
>      error (EXIT_FAILURE, errno, "cannot open %s", basename (file));

Sorry, I missed Zack's feedback on this already. This looks good.

Good for master.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Carlos O'Donell Aug. 27, 2019, 6:45 p.m. UTC | #6
On 8/27/19 6:54 AM, Paul Eggert wrote:
> Stefan Liebler wrote:
>>> 
>>> Better the test should be fixed to assume UTF-8, or, even better,
>>> not to rely on ChangeLog files for test data.
>>> 
>>> zw
>>> 
>> 
>> Sure. Sounds fair. I've copied the old ChangeLog.8 file and use it
>> as a dedicated input-file for tst-regex. See attached patch.
> 
> Better to do both, i.e., to not have the test cases rely on ChangeLog
> files, and to use UTF-8 in source files. Proposed patch attached.
> This patch has mixed encoding since it changes a file's encoding from
> Latin-1 to UTF-8, so it's compressed to prevent my email client from
> mangling it.

I don't know what you did with your attachments but whatever you did
makes it impossible for me to review them :-(

Normally my MUA will inline display text-style patches so I can comment
on them in a broad reply.

I've already ACK'd Stefan's other patch, since it splits out the test
input. The rest is orthogonal IMO.

FAOD I think you should have consensus to just clean all of this up
and switch to UTF-8. But they can go in after his change.
Stefan Liebler Aug. 28, 2019, 8:35 a.m. UTC | #7
On 8/27/19 8:42 PM, Carlos O'Donell wrote:
> On 8/27/19 6:18 AM, Stefan Liebler wrote:
>> commit a6af326cb423f6f2ae7475b1619807e83c7e9ae6
>> Author: Stefan Liebler <stli@linux.ibm.com>
>> Date:   Tue Aug 27 09:11:32 2019 +0200
>>
>>      Fix posix/tst-regex by using a dedicated input-file.
>>      
>>      The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
>>      changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
>>      Unfortunately the test posix/tst-regex assumes the former encoding.
>>      Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
>>      which would lead to two further matches in the first call to test_expr.
>>      
>>      This patch just copies the former ChangeLog.8 file to tst-regex.input
>>      and adjusts the test in order to use this new input file.
>>      
>>      ChangeLog:
>>      
>>              * posix/tst-regex.c (do_test): Use tst-regex.input as input file.
>>              * posix/tst-regex.input: New file.
>>
>> diff --git a/posix/tst-regex.c b/posix/tst-regex.c
>> index fe05da9b63..c5d802625a 100644
>> --- a/posix/tst-regex.c
>> +++ b/posix/tst-regex.c
>> @@ -67,7 +67,7 @@ do_test (void)
>>     mtrace ();
>>   
>>     /* Make the content of the file available in memory.  */
>> -  file = "../ChangeLog.old/ChangeLog.8";
>> +  file = "./tst-regex.input";
>>     fd = open (file, O_RDONLY);
>>     if (fd == -1)
>>       error (EXIT_FAILURE, errno, "cannot open %s", basename (file));
> 
> Sorry, I missed Zack's feedback on this already. This looks good.
> 
> Good for master.
> 
> Reviewed-by: Carlos O'Donell <carlos@redhat.com>
> 

I've just committed this patch.
Paul, please proceed with the switch to UTF-8.

Thanks.
Stefan
Rafal Luzynski Aug. 28, 2019, 4:46 p.m. UTC | #8
28.08.2019 10:35 Stefan Liebler <stli@linux.ibm.com> wrote:
> I've just committed this patch.
> [...]

Thanks, I confirm it works fine.

Regards,

Rafal
diff mbox series

Patch

commit 695be39994979a372e7643eabf90578de6246e20
Author: Stefan Liebler <stli@linux.ibm.com>
Date:   Mon Aug 26 13:47:21 2019 +0200

    Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8
    
    The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0
    changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8.
    Unfortunately the test posix/tst-regex assumes the former encoding.
    
    This patch just changes the encoding back to ISO-8859.
    Furthermore Francesco Potortì is now written with 'ì' instead of 'i`'
    which leads to two further matches in the first call to test_expr.
    
    ChangeLog:
    
            * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding.

diff --git a/ChangeLog.old/ChangeLog.8 b/ChangeLog.old/ChangeLog.8
index c48660d23a..12988d982a 100644
--- a/ChangeLog.old/ChangeLog.8
+++ b/ChangeLog.old/ChangeLog.8
@@ -6025,7 +6025,7 @@ 
 	(Host Address Functions): Use uint32_t consequently and add a
 	number of clarifications for IPv4/IPv6, classless addresses.
 	(Internet Namespace): Added some paragraphs about IPv6.
-	Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
+	Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
 
 1998-04-05  Philip Blundell  <Philip.Blundell@pobox.com>
 
@@ -6565,7 +6565,7 @@ 
 	* manual/examples/mkfsock.c (make_named_socket): Removed blank
 	lines for clarification.
 	(make_named_socket): Use strncpy instead of strcpy.
-	Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
+	Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>.
 
 1998-03-30 13:28  Ulrich Drepper  <drepper@cygnus.com>
 
@@ -8314,7 +8314,7 @@ 
 1998-02-27  Ulrich Drepper  <drepper@cygnus.com>
 
 	* misc/efgcvt_r.c (APPEND): Handle printing of 0.0 correctly.
-	Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
+	Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
 
 	* misc/tst-efgcvt.c (ecvt_tests): Add new test case for reported
 	bug.
@@ -8322,7 +8322,7 @@ 
 1998-02-25  Andreas Jaeger  <aj@arthur.rhein-neckar.de>
 
 	* manual/arith.texi (Old-style number conversion): Correct
-	typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
+	typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>.
 
 1998-02-27  Ulrich Drepper  <drepper@cygnus.com>
 
diff --git a/posix/tst-regex.c b/posix/tst-regex.c
index fe05da9b63..995ad38c7f 100644
--- a/posix/tst-regex.c
+++ b/posix/tst-regex.c
@@ -124,7 +124,7 @@  do_test (void)
 
   /* Run the actual tests.  All tests are run in a single-byte and a
      multi-byte locale.  */
-  result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 2, 2);
+  result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 4, 4);
   result |= test_expr ("G.ran", 2, 3);
   result |= test_expr ("G.\\{1\\}ran", 2, 3);
   result |= test_expr ("G.*ran", 3, 44);