Message ID | 619cade7e73dc33184bf4247b739d54cd9d7d8b3.1652994079.git.fweimer@redhat.com |
---|---|
State | New |
Headers | show |
Series | Assume UTF-8 encoding for localedef input files | expand |
On 5/19/22 17:06, Florian Weimer via Libc-alpha wrote: > The array lr->buf contains characters, which can be signed. A 0xff > byte in the input could be incorrectly reported as EOF. More > importantly, get_string in linereader.c converts a signed input byte > to a Unicode code point using ADDWC ((uint32_t) ch), under the > assumption that this decodes the ISO-8859-1 input encoding. If char > is signed, this does not give the correct result. This means that > ISO-8859-1 input files for localedef are not actually supported, > contrary to the comment in get_string. This is a happy accident because > we can therefore change the file encoding to UTF-8 without impacting > backwards compatibility. LGTM. Reviewed-by: Carlos O'Donell <carlos@redhat.com> Tested-by: Carlos O'Donell <carlos@redhat.com> > > While at it, remove the \32 check for MS-DOS end-of-file character (^Z). OK. We don't need this, files should have the correct EOF. > --- > locale/programs/linereader.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h > index 0fb10ec833..653a71d2d1 100644 > --- a/locale/programs/linereader.h > +++ b/locale/programs/linereader.h > @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) > return EOF; > } > > - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; > + return lr->buf[lr->idx++] & 0xff; OK. Agreed, this should not be sign extended. It's a byte in the buffer not EOF. With the original MS-DOS checking it might have been *needed* to return -1. > } > >
diff --git a/locale/programs/linereader.h b/locale/programs/linereader.h index 0fb10ec833..653a71d2d1 100644 --- a/locale/programs/linereader.h +++ b/locale/programs/linereader.h @@ -134,7 +134,7 @@ lr_getc (struct linereader *lr) return EOF; } - return lr->buf[lr->idx] == '\32' ? EOF : lr->buf[lr->idx++]; + return lr->buf[lr->idx++] & 0xff; }