Message ID | 20210108151632.277015-1-f4bug@amsat.org |
---|---|
State | New |
Headers | show |
Series | decodetree: Open files with encoding='utf-8' | expand |
On Fri, Jan 8, 2021 at 7:18 AM Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: > > When decodetree.py was added in commit 568ae7efae7, QEMU was > using Python 2 which happily reads UTF-8 files in text mode. > Python 3 requires either UTF-8 locale or an explicit encoding > passed to open(). Now that Python 3 is required, explicit > UTF-8 encoding for decodetree sources. > > This fixes: > > $ /usr/bin/python3 scripts/decodetree.py test.decode > Traceback (most recent call last): > File "scripts/decodetree.py", line 1397, in <module> > main() > File "scripts/decodetree.py", line 1308, in main > parse_file(f, toppat) > File "scripts/decodetree.py", line 994, in parse_file > for line in f: > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > ordinal not in range(128) > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > scripts/decodetree.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > index 47aa9caf6d1..fa40903cff1 100644 > --- a/scripts/decodetree.py > +++ b/scripts/decodetree.py > @@ -1304,7 +1304,7 @@ def main(): > > for filename in args: > input_file = filename > - f = open(filename, 'r') > + f = open(filename, 'r', encoding='utf-8') > parse_file(f, toppat) > f.close() > > -- > 2.26.2 > > Reviewed-by: Yonggang Luo <l <peter.maydell@linaro.org>uoyonggang@gmail.com > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: > > When decodetree.py was added in commit 568ae7efae7, QEMU was > using Python 2 which happily reads UTF-8 files in text mode. > Python 3 requires either UTF-8 locale or an explicit encoding > passed to open(). Now that Python 3 is required, explicit > UTF-8 encoding for decodetree sources. > > This fixes: > > $ /usr/bin/python3 scripts/decodetree.py test.decode > Traceback (most recent call last): > File "scripts/decodetree.py", line 1397, in <module> > main() > File "scripts/decodetree.py", line 1308, in main > parse_file(f, toppat) > File "scripts/decodetree.py", line 994, in parse_file > for line in f: > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > ordinal not in range(128) > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > scripts/decodetree.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > index 47aa9caf6d1..fa40903cff1 100644 > --- a/scripts/decodetree.py > +++ b/scripts/decodetree.py > @@ -1304,7 +1304,7 @@ def main(): > > for filename in args: > input_file = filename > - f = open(filename, 'r') > + f = open(filename, 'r', encoding='utf-8') > parse_file(f, toppat) > f.close() Should we also be opening the output file explicitly as utf-8 ? (How do we say "write to sys.stdout as utf-8" for the case where we're doing that?) thanks -- PMM
On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell <peter.maydell@linaro.org> wrote: > > On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: > > > > When decodetree.py was added in commit 568ae7efae7, QEMU was > > using Python 2 which happily reads UTF-8 files in text mode. > > Python 3 requires either UTF-8 locale or an explicit encoding > > passed to open(). Now that Python 3 is required, explicit > > UTF-8 encoding for decodetree sources. > > > > This fixes: > > > > $ /usr/bin/python3 scripts/decodetree.py test.decode > > Traceback (most recent call last): > > File "scripts/decodetree.py", line 1397, in <module> > > main() > > File "scripts/decodetree.py", line 1308, in main > > parse_file(f, toppat) > > File "scripts/decodetree.py", line 994, in parse_file > > for line in f: > > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > > return codecs.ascii_decode(input, self.errors)[0] > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > > ordinal not in range(128) > > > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > > --- > > scripts/decodetree.py | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > > index 47aa9caf6d1..fa40903cff1 100644 > > --- a/scripts/decodetree.py > > +++ b/scripts/decodetree.py > > @@ -1304,7 +1304,7 @@ def main(): > > > > for filename in args: > > input_file = filename > > - f = open(filename, 'r') > > + f = open(filename, 'r', encoding='utf-8') > > parse_file(f, toppat) > > f.close() > > Should we also be opening the output file explicitly as > utf-8 ? (How do we say "write to sys.stdout as utf-8" for > the case where we're doing that?) Can be done with ``` sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf8", errors="ignore") ``` > > thanks > -- PMM > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo
On Sat, Jan 09, 2021 at 12:13:31AM +0800, 罗勇刚(Yonggang Luo) wrote: > On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell <peter.maydell@linaro.org> > wrote: > > > > On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> > wrote: > > > > > > When decodetree.py was added in commit 568ae7efae7, QEMU was > > > using Python 2 which happily reads UTF-8 files in text mode. > > > Python 3 requires either UTF-8 locale or an explicit encoding > > > passed to open(). Now that Python 3 is required, explicit > > > UTF-8 encoding for decodetree sources. > > > > > > This fixes: > > > > > > $ /usr/bin/python3 scripts/decodetree.py test.decode > > > Traceback (most recent call last): > > > File "scripts/decodetree.py", line 1397, in <module> > > > main() > > > File "scripts/decodetree.py", line 1308, in main > > > parse_file(f, toppat) > > > File "scripts/decodetree.py", line 994, in parse_file > > > for line in f: > > > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > > > return codecs.ascii_decode(input, self.errors)[0] > > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position > 80: > > > ordinal not in range(128) > > > > > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > > > --- > > > scripts/decodetree.py | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > > > index 47aa9caf6d1..fa40903cff1 100644 > > > --- a/scripts/decodetree.py > > > +++ b/scripts/decodetree.py > > > @@ -1304,7 +1304,7 @@ def main(): > > > > > > for filename in args: > > > input_file = filename > > > - f = open(filename, 'r') > > > + f = open(filename, 'r', encoding='utf-8') > > > parse_file(f, toppat) > > > f.close() > > > > Should we also be opening the output file explicitly as > > utf-8 ? (How do we say "write to sys.stdout as utf-8" for > > the case where we're doing that?) > > Can be done with > ``` > sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf8", > errors="ignore") > ``` In the specific case of decodetree, just assigning this to `output_fd` is enough, and less hacky than overwriting `sys.stdout`.
On 1/8/21 4:38 PM, Peter Maydell wrote: > On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: >> >> When decodetree.py was added in commit 568ae7efae7, QEMU was >> using Python 2 which happily reads UTF-8 files in text mode. >> Python 3 requires either UTF-8 locale or an explicit encoding >> passed to open(). Now that Python 3 is required, explicit >> UTF-8 encoding for decodetree sources. >> >> This fixes: >> >> $ /usr/bin/python3 scripts/decodetree.py test.decode >> Traceback (most recent call last): >> File "scripts/decodetree.py", line 1397, in <module> >> main() >> File "scripts/decodetree.py", line 1308, in main >> parse_file(f, toppat) >> File "scripts/decodetree.py", line 994, in parse_file >> for line in f: >> File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode >> return codecs.ascii_decode(input, self.errors)[0] >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: >> ordinal not in range(128) >> >> Reported-by: Peter Maydell <peter.maydell@linaro.org> >> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> >> --- >> scripts/decodetree.py | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/scripts/decodetree.py b/scripts/decodetree.py >> index 47aa9caf6d1..fa40903cff1 100644 >> --- a/scripts/decodetree.py >> +++ b/scripts/decodetree.py >> @@ -1304,7 +1304,7 @@ def main(): >> >> for filename in args: >> input_file = filename >> - f = open(filename, 'r') >> + f = open(filename, 'r', encoding='utf-8') >> parse_file(f, toppat) >> f.close() > > Should we also be opening the output file explicitly as > utf-8 ? (How do we say "write to sys.stdout as utf-8" for > the case where we're doing that?) I have been wondering about it, but the content written in the output file is plain C code using only ASCII, which any locale is able to process. But indeed maybe we prefer ignore the user locale... I'm not sure.
On Fri, 8 Jan 2021 at 16:44, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: > > On 1/8/21 4:38 PM, Peter Maydell wrote: > > Should we also be opening the output file explicitly as > > utf-8 ? (How do we say "write to sys.stdout as utf-8" for > > the case where we're doing that?) > > I have been wondering about it, but the content written > in the output file is plain C code using only ASCII, > which any locale is able to process. But indeed maybe > we prefer ignore the user locale... I'm not sure. I'm not a python expert so I don't know what the usual thing is here, but it seems to me better to insulate our build process from what the user's locale happens to be set to, even if it happens that we currently only output ASCII. thanks -- PMM
On 1/8/21 6:14 PM, Peter Maydell wrote: > On Fri, 8 Jan 2021 at 16:44, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote: >> >> On 1/8/21 4:38 PM, Peter Maydell wrote: >>> Should we also be opening the output file explicitly as >>> utf-8 ? (How do we say "write to sys.stdout as utf-8" for >>> the case where we're doing that?) >> >> I have been wondering about it, but the content written >> in the output file is plain C code using only ASCII, >> which any locale is able to process. But indeed maybe >> we prefer ignore the user locale... I'm not sure. > > I'm not a python expert so I don't know what the usual thing > is here, but it seems to me better to insulate our build > process from what the user's locale happens to be set to, > even if it happens that we currently only output ASCII. OK, I'll respin.
I had a similar issue in the past with the acceptance tests. Some VMs send UTF-8 output in their console and the acceptance test script would bail out if the locale was not UTF-8. I sent a patch on the ml but it probably got lost: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html I can re-spin it if you guys are interested On 1/8/2021 10:16 AM, Philippe Mathieu-Daudé wrote: > When decodetree.py was added in commit 568ae7efae7, QEMU was > using Python 2 which happily reads UTF-8 files in text mode. > Python 3 requires either UTF-8 locale or an explicit encoding > passed to open(). Now that Python 3 is required, explicit > UTF-8 encoding for decodetree sources. > > This fixes: > > $ /usr/bin/python3 scripts/decodetree.py test.decode > Traceback (most recent call last): > File "scripts/decodetree.py", line 1397, in <module> > main() > File "scripts/decodetree.py", line 1308, in main > parse_file(f, toppat) > File "scripts/decodetree.py", line 994, in parse_file > for line in f: > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > return codecs.ascii_decode(input, self.errors)[0] > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > ordinal not in range(128) > > Reported-by: Peter Maydell <peter.maydell@linaro.org> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> > --- > scripts/decodetree.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > index 47aa9caf6d1..fa40903cff1 100644 > --- a/scripts/decodetree.py > +++ b/scripts/decodetree.py > @@ -1304,7 +1304,7 @@ def main(): > > for filename in args: > input_file = filename > - f = open(filename, 'r') > + f = open(filename, 'r', encoding='utf-8') > parse_file(f, toppat) > f.close() >
[CCing John, Wainer] On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote: > I had a similar issue in the past with the acceptance tests. > Some VMs send UTF-8 output in their console and the acceptance test > script would bail out if the locale was not UTF-8. > > I sent a patch on the ml but it probably got lost: > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html > > I can re-spin it if you guys are interested The mbox at https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/ can still be applied cleanly, I don't think you need to resubmit. However, we have no owner for tests/acceptance/avocado_qemu in MAINTAINERS. Is anybody currently taking care of tests/acceptance patches and making sure they are merged?
On 1/12/21 4:11 PM, Eduardo Habkost wrote: > [CCing John, Wainer] > > On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote: >> I had a similar issue in the past with the acceptance tests. >> Some VMs send UTF-8 output in their console and the acceptance test >> script would bail out if the locale was not UTF-8. >> >> I sent a patch on the ml but it probably got lost: >> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html >> >> I can re-spin it if you guys are interested > > The mbox at > https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/ > can still be applied cleanly, I don't think you need to resubmit. > > However, we have no owner for tests/acceptance/avocado_qemu in > MAINTAINERS. Is anybody currently taking care of > tests/acceptance patches and making sure they are merged? > I touch these tests sometimes, but I know very little about avocado overall, so I don't think it's going to be me taking point here. (I don't mind taking a reviewer stanza for something like *.py, though.) Acceptance (Integration) Testing with the Avocado framework W: https://trello.com/b/6Qi1pxVn/avocado-qemu R: Cleber Rosa <crosa@redhat.com> R: Philippe Mathieu-Daudé <philmd@redhat.com> R: Wainer dos Santos Moschetta <wainersm@redhat.com> S: Odd Fixes F: tests/acceptance/ Why is this only "Odd Fixes"? Isn't it new within the last ~2y? The avocado_qemu module itself was largely written by Cleber, Philippe and Caio. --js
On 1/13/21 12:35 AM, John Snow wrote: > On 1/12/21 4:11 PM, Eduardo Habkost wrote: >> [CCing John, Wainer] >> >> On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote: >>> I had a similar issue in the past with the acceptance tests. >>> Some VMs send UTF-8 output in their console and the acceptance test >>> script would bail out if the locale was not UTF-8. >>> >>> I sent a patch on the ml but it probably got lost: >>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html >>> >>> I can re-spin it if you guys are interested >> >> The mbox at >> https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/ >> >> can still be applied cleanly, I don't think you need to resubmit. >> >> However, we have no owner for tests/acceptance/avocado_qemu in >> MAINTAINERS. Is anybody currently taking care of >> tests/acceptance patches and making sure they are merged? [1] The answer to this question is below in [2]... > I touch these tests sometimes, but I know very little about avocado > overall, so I don't think it's going to be me taking point here. > > (I don't mind taking a reviewer stanza for something like *.py, though.) > > Acceptance (Integration) Testing with the Avocado framework > W: https://trello.com/b/6Qi1pxVn/avocado-qemu > R: Cleber Rosa <crosa@redhat.com> > R: Philippe Mathieu-Daudé <philmd@redhat.com> > R: Wainer dos Santos Moschetta <wainersm@redhat.com> > S: Odd Fixes > F: tests/acceptance/ > > Why is this only "Odd Fixes"? Isn't it new within the last ~2y? The > avocado_qemu module itself was largely written by Cleber, Philippe and > Caio. [2] The answer to this question is above in [1] :) > > --js > >
diff --git a/scripts/decodetree.py b/scripts/decodetree.py index 47aa9caf6d1..fa40903cff1 100644 --- a/scripts/decodetree.py +++ b/scripts/decodetree.py @@ -1304,7 +1304,7 @@ def main(): for filename in args: input_file = filename - f = open(filename, 'r') + f = open(filename, 'r', encoding='utf-8') parse_file(f, toppat) f.close()
When decodetree.py was added in commit 568ae7efae7, QEMU was using Python 2 which happily reads UTF-8 files in text mode. Python 3 requires either UTF-8 locale or an explicit encoding passed to open(). Now that Python 3 is required, explicit UTF-8 encoding for decodetree sources. This fixes: $ /usr/bin/python3 scripts/decodetree.py test.decode Traceback (most recent call last): File "scripts/decodetree.py", line 1397, in <module> main() File "scripts/decodetree.py", line 1308, in main parse_file(f, toppat) File "scripts/decodetree.py", line 994, in parse_file for line in f: File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: ordinal not in range(128) Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> --- scripts/decodetree.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)