Message ID | 6b0df162-d17a-ec1d-0fc0-728dc3a8ef9b@linux.ibm.com |
---|---|
State | New |
Headers | show |
Series | Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8. | expand |
On Mon, Aug 26, 2019 at 10:41 AM Stefan Liebler <stli@linux.ibm.com> wrote: > the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 > changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. > Unfortunately the test posix/tst-regex assumes the former encoding. > > This patch just changes the encoding back to ISO-8859. > Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' > which leads to two further matches in the first call to test_expr. Better the test should be fixed to assume UTF-8, or, even better, not to rely on ChangeLog files for test data. zw
On 8/26/19 5:06 PM, Zack Weinberg wrote: > On Mon, Aug 26, 2019 at 10:41 AM Stefan Liebler <stli@linux.ibm.com> wrote: >> the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 >> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. >> Unfortunately the test posix/tst-regex assumes the former encoding. >> >> This patch just changes the encoding back to ISO-8859. >> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' >> which leads to two further matches in the first call to test_expr. > > Better the test should be fixed to assume UTF-8, or, even better, not > to rely on ChangeLog files for test data. > > zw > Sure. Sounds fair. I've copied the old ChangeLog.8 file and use it as a dedicated input-file for tst-regex. See attached patch. Bye Stefan
Stefan Liebler wrote: >> >> Better the test should be fixed to assume UTF-8, or, even better, not >> to rely on ChangeLog files for test data. >> >> zw >> > > Sure. Sounds fair. > I've copied the old ChangeLog.8 file and use it as a dedicated input-file for > tst-regex. See attached patch. Better to do both, i.e., to not have the test cases rely on ChangeLog files, and to use UTF-8 in source files. Proposed patch attached. This patch has mixed encoding since it changes a file's encoding from Latin-1 to UTF-8, so it's compressed to prevent my email client from mangling it.
On 8/26/19 10:41 AM, Stefan Liebler wrote: > Hi, > > the recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 > changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. > Unfortunately the test posix/tst-regex assumes the former encoding. > > This patch just changes the encoding back to ISO-8859. > Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' > which leads to two further matches in the first call to test_expr. > > Bye > Stefan > > ChangeLog: > > * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding. Stefan, Thank you for trying to fix this. I think we should handle it slightly different and set a precedent here. Please copy ChangeLog.8 to tst-regex.in and use a test-specific input file. I find it silly that we've been using ChangeLog.8 for anything. To be clear I think we should: * Leave the ChangeLog in UTF-8. -- This is for attribution only. * Add a new tst-regex.in -- This is for testing only. Likewise if you find anything else like this I'd say we split out the inputs into test-specific inputs. Any time we mix attribution and testing is bound to go wrong.
On 8/27/19 6:18 AM, Stefan Liebler wrote: > commit a6af326cb423f6f2ae7475b1619807e83c7e9ae6 > Author: Stefan Liebler <stli@linux.ibm.com> > Date: Tue Aug 27 09:11:32 2019 +0200 > > Fix posix/tst-regex by using a dedicated input-file. > > The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 > changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. > Unfortunately the test posix/tst-regex assumes the former encoding. > Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' > which would lead to two further matches in the first call to test_expr. > > This patch just copies the former ChangeLog.8 file to tst-regex.input > and adjusts the test in order to use this new input file. > > ChangeLog: > > * posix/tst-regex.c (do_test): Use tst-regex.input as input file. > * posix/tst-regex.input: New file. > > diff --git a/posix/tst-regex.c b/posix/tst-regex.c > index fe05da9b63..c5d802625a 100644 > --- a/posix/tst-regex.c > +++ b/posix/tst-regex.c > @@ -67,7 +67,7 @@ do_test (void) > mtrace (); > > /* Make the content of the file available in memory. */ > - file = "../ChangeLog.old/ChangeLog.8"; > + file = "./tst-regex.input"; > fd = open (file, O_RDONLY); > if (fd == -1) > error (EXIT_FAILURE, errno, "cannot open %s", basename (file)); Sorry, I missed Zack's feedback on this already. This looks good. Good for master. Reviewed-by: Carlos O'Donell <carlos@redhat.com>
On 8/27/19 6:54 AM, Paul Eggert wrote: > Stefan Liebler wrote: >>> >>> Better the test should be fixed to assume UTF-8, or, even better, >>> not to rely on ChangeLog files for test data. >>> >>> zw >>> >> >> Sure. Sounds fair. I've copied the old ChangeLog.8 file and use it >> as a dedicated input-file for tst-regex. See attached patch. > > Better to do both, i.e., to not have the test cases rely on ChangeLog > files, and to use UTF-8 in source files. Proposed patch attached. > This patch has mixed encoding since it changes a file's encoding from > Latin-1 to UTF-8, so it's compressed to prevent my email client from > mangling it. I don't know what you did with your attachments but whatever you did makes it impossible for me to review them :-( Normally my MUA will inline display text-style patches so I can comment on them in a broad reply. I've already ACK'd Stefan's other patch, since it splits out the test input. The rest is orthogonal IMO. FAOD I think you should have consensus to just clean all of this up and switch to UTF-8. But they can go in after his change.
On 8/27/19 8:42 PM, Carlos O'Donell wrote: > On 8/27/19 6:18 AM, Stefan Liebler wrote: >> commit a6af326cb423f6f2ae7475b1619807e83c7e9ae6 >> Author: Stefan Liebler <stli@linux.ibm.com> >> Date: Tue Aug 27 09:11:32 2019 +0200 >> >> Fix posix/tst-regex by using a dedicated input-file. >> >> The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 >> changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. >> Unfortunately the test posix/tst-regex assumes the former encoding. >> Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' >> which would lead to two further matches in the first call to test_expr. >> >> This patch just copies the former ChangeLog.8 file to tst-regex.input >> and adjusts the test in order to use this new input file. >> >> ChangeLog: >> >> * posix/tst-regex.c (do_test): Use tst-regex.input as input file. >> * posix/tst-regex.input: New file. >> >> diff --git a/posix/tst-regex.c b/posix/tst-regex.c >> index fe05da9b63..c5d802625a 100644 >> --- a/posix/tst-regex.c >> +++ b/posix/tst-regex.c >> @@ -67,7 +67,7 @@ do_test (void) >> mtrace (); >> >> /* Make the content of the file available in memory. */ >> - file = "../ChangeLog.old/ChangeLog.8"; >> + file = "./tst-regex.input"; >> fd = open (file, O_RDONLY); >> if (fd == -1) >> error (EXIT_FAILURE, errno, "cannot open %s", basename (file)); > > Sorry, I missed Zack's feedback on this already. This looks good. > > Good for master. > > Reviewed-by: Carlos O'Donell <carlos@redhat.com> > I've just committed this patch. Paul, please proceed with the switch to UTF-8. Thanks. Stefan
28.08.2019 10:35 Stefan Liebler <stli@linux.ibm.com> wrote: > I've just committed this patch. > [...] Thanks, I confirm it works fine. Regards, Rafal
commit 695be39994979a372e7643eabf90578de6246e20 Author: Stefan Liebler <stli@linux.ibm.com> Date: Mon Aug 26 13:47:21 2019 +0200 Fix posix/tst-regex by using ISO-8859 encoding for ChangeLog.8 The recent commit e6855a3bdfe147c52b29b5e7d70a95a8aa22ece0 changed the encoding of ChangeLog.old/ChangeLog.8 from ISO-8859 to UTF-8. Unfortunately the test posix/tst-regex assumes the former encoding. This patch just changes the encoding back to ISO-8859. Furthermore Francesco Potortì is now written with 'ì' instead of 'i`' which leads to two further matches in the first call to test_expr. ChangeLog: * ChangeLog.old/ChangeLog.8: Use ISO-8859 encoding. diff --git a/ChangeLog.old/ChangeLog.8 b/ChangeLog.old/ChangeLog.8 index c48660d23a..12988d982a 100644 --- a/ChangeLog.old/ChangeLog.8 +++ b/ChangeLog.old/ChangeLog.8 @@ -6025,7 +6025,7 @@ (Host Address Functions): Use uint32_t consequently and add a number of clarifications for IPv4/IPv6, classless addresses. (Internet Namespace): Added some paragraphs about IPv6. - Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>. + Based on suggestions by Francesco Potortì <F.Potorti@cnuce.cnr.it>. 1998-04-05 Philip Blundell <Philip.Blundell@pobox.com> @@ -6565,7 +6565,7 @@ * manual/examples/mkfsock.c (make_named_socket): Removed blank lines for clarification. (make_named_socket): Use strncpy instead of strcpy. - Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>. + Reported by Francesco Potortì <F.Potorti@cnuce.cnr.it>. 1998-03-30 13:28 Ulrich Drepper <drepper@cygnus.com> @@ -8314,7 +8314,7 @@ 1998-02-27 Ulrich Drepper <drepper@cygnus.com> * misc/efgcvt_r.c (APPEND): Handle printing of 0.0 correctly. - Reported by Göran Uddeborg <goeran@uddeborg.pp.se>. + Reported by Göran Uddeborg <goeran@uddeborg.pp.se>. * misc/tst-efgcvt.c (ecvt_tests): Add new test case for reported bug. @@ -8322,7 +8322,7 @@ 1998-02-25 Andreas Jaeger <aj@arthur.rhein-neckar.de> * manual/arith.texi (Old-style number conversion): Correct - typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>. + typo. Reported by Göran Uddeborg <goeran@uddeborg.pp.se>. 1998-02-27 Ulrich Drepper <drepper@cygnus.com> diff --git a/posix/tst-regex.c b/posix/tst-regex.c index fe05da9b63..995ad38c7f 100644 --- a/posix/tst-regex.c +++ b/posix/tst-regex.c @@ -124,7 +124,7 @@ do_test (void) /* Run the actual tests. All tests are run in a single-byte and a multi-byte locale. */ - result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 2, 2); + result = test_expr ("[äáàâéèêíìîñöóòôüúùû]", 4, 4); result |= test_expr ("G.ran", 2, 3); result |= test_expr ("G.\\{1\\}ran", 2, 3); result |= test_expr ("G.*ran", 3, 44);