decodetree: Open files with encoding='utf-8'

Message ID	20210108151632.277015-1-f4bug@amsat.org
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: =?utf-8?q?Philippe_Mathieu-Daud=C3=A9?= <f4bug@amsat.org> To: qemu-devel@nongnu.org Subject: [PATCH] decodetree: Open files with encoding='utf-8' Date: Fri, 8 Jan 2021 16:16:32 +0100 Message-Id: <20210108151632.277015-1-f4bug@amsat.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2a00:1450:4864:20::436; envelope-from=philippe.mathieu.daude@gmail.com; helo=mail-wr1-x436.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.249, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.248, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action Precedence: list Cc: Peter Maydell <peter.maydell@linaro.org>, Eduardo Habkost <ehabkost@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, =?utf-8?q?Philippe_Mathie?= =?utf-8?q?u-Daud=C3=A9?= <f4bug@amsat.org>, Cleber Rosa <crosa@redhat.com>, Paolo Bonzini <pbonzini@redhat.com> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org>
Series	decodetree: Open files with encoding='utf-8' \| expand decodetree: Open files with encoding='utf-8'

Philippe Mathieu-Daudé Jan. 8, 2021, 3:16 p.m. UTC

When decodetree.py was added in commit 568ae7efae7, QEMU was
using Python 2 which happily reads UTF-8 files in text mode.
Python 3 requires either UTF-8 locale or an explicit encoding
passed to open(). Now that Python 3 is required, explicit
UTF-8 encoding for decodetree sources.

This fixes:

  $ /usr/bin/python3 scripts/decodetree.py test.decode
  Traceback (most recent call last):
    File "scripts/decodetree.py", line 1397, in <module>
      main()
    File "scripts/decodetree.py", line 1308, in main
      parse_file(f, toppat)
    File "scripts/decodetree.py", line 994, in parse_file
      for line in f:
    File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
      return codecs.ascii_decode(input, self.errors)[0]
  UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
  ordinal not in range(128)

Reported-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
---
 scripts/decodetree.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Yonggang Luo Jan. 8, 2021, 3:25 p.m. UTC | #1

On Fri, Jan 8, 2021 at 7:18 AM Philippe Mathieu-Daudé <f4bug@amsat.org>
wrote:
>
> When decodetree.py was added in commit 568ae7efae7, QEMU was
> using Python 2 which happily reads UTF-8 files in text mode.
> Python 3 requires either UTF-8 locale or an explicit encoding
> passed to open(). Now that Python 3 is required, explicit
> UTF-8 encoding for decodetree sources.
>
> This fixes:
>
>   $ /usr/bin/python3 scripts/decodetree.py test.decode
>   Traceback (most recent call last):
>     File "scripts/decodetree.py", line 1397, in <module>
>       main()
>     File "scripts/decodetree.py", line 1308, in main
>       parse_file(f, toppat)
>     File "scripts/decodetree.py", line 994, in parse_file
>       for line in f:
>     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>       return codecs.ascii_decode(input, self.errors)[0]
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>   ordinal not in range(128)
>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  scripts/decodetree.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..fa40903cff1 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -1304,7 +1304,7 @@ def main():
>
>      for filename in args:
>          input_file = filename
> -        f = open(filename, 'r')
> +        f = open(filename, 'r', encoding='utf-8')
>          parse_file(f, toppat)
>          f.close()
>
> --
> 2.26.2
>
>


Reviewed-by:  Yonggang Luo <l <peter.maydell@linaro.org>uoyonggang@gmail.com
>

--
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo

Peter Maydell Jan. 8, 2021, 3:38 p.m. UTC | #2

On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
> When decodetree.py was added in commit 568ae7efae7, QEMU was
> using Python 2 which happily reads UTF-8 files in text mode.
> Python 3 requires either UTF-8 locale or an explicit encoding
> passed to open(). Now that Python 3 is required, explicit
> UTF-8 encoding for decodetree sources.
>
> This fixes:
>
>   $ /usr/bin/python3 scripts/decodetree.py test.decode
>   Traceback (most recent call last):
>     File "scripts/decodetree.py", line 1397, in <module>
>       main()
>     File "scripts/decodetree.py", line 1308, in main
>       parse_file(f, toppat)
>     File "scripts/decodetree.py", line 994, in parse_file
>       for line in f:
>     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>       return codecs.ascii_decode(input, self.errors)[0]
>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>   ordinal not in range(128)
>
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>  scripts/decodetree.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..fa40903cff1 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -1304,7 +1304,7 @@ def main():
>
>      for filename in args:
>          input_file = filename
> -        f = open(filename, 'r')
> +        f = open(filename, 'r', encoding='utf-8')
>          parse_file(f, toppat)
>          f.close()

Should we also be opening the output file explicitly as
utf-8 ? (How do we say "write to sys.stdout as utf-8" for
the case where we're doing that?)

thanks
-- PMM

Yonggang Luo Jan. 8, 2021, 4:13 p.m. UTC | #3

On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell <peter.maydell@linaro.org>
wrote:
>
> On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org>
wrote:
> >
> > When decodetree.py was added in commit 568ae7efae7, QEMU was
> > using Python 2 which happily reads UTF-8 files in text mode.
> > Python 3 requires either UTF-8 locale or an explicit encoding
> > passed to open(). Now that Python 3 is required, explicit
> > UTF-8 encoding for decodetree sources.
> >
> > This fixes:
> >
> >   $ /usr/bin/python3 scripts/decodetree.py test.decode
> >   Traceback (most recent call last):
> >     File "scripts/decodetree.py", line 1397, in <module>
> >       main()
> >     File "scripts/decodetree.py", line 1308, in main
> >       parse_file(f, toppat)
> >     File "scripts/decodetree.py", line 994, in parse_file
> >       for line in f:
> >     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> >       return codecs.ascii_decode(input, self.errors)[0]
> >   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
80:
> >   ordinal not in range(128)
> >
> > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > ---
> >  scripts/decodetree.py | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> > index 47aa9caf6d1..fa40903cff1 100644
> > --- a/scripts/decodetree.py
> > +++ b/scripts/decodetree.py
> > @@ -1304,7 +1304,7 @@ def main():
> >
> >      for filename in args:
> >          input_file = filename
> > -        f = open(filename, 'r')
> > +        f = open(filename, 'r', encoding='utf-8')
> >          parse_file(f, toppat)
> >          f.close()
>
> Should we also be opening the output file explicitly as
> utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> the case where we're doing that?)

Can be done with
```
        sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf8",
errors="ignore")
```

>
> thanks
> -- PMM
>


--
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo

Eduardo Habkost Jan. 8, 2021, 4:43 p.m. UTC | #4

On Sat, Jan 09, 2021 at 12:13:31AM +0800, 罗勇刚(Yonggang Luo) wrote:
> On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell <peter.maydell@linaro.org>
> wrote:
> >
> > On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org>
> wrote:
> > >
> > > When decodetree.py was added in commit 568ae7efae7, QEMU was
> > > using Python 2 which happily reads UTF-8 files in text mode.
> > > Python 3 requires either UTF-8 locale or an explicit encoding
> > > passed to open(). Now that Python 3 is required, explicit
> > > UTF-8 encoding for decodetree sources.
> > >
> > > This fixes:
> > >
> > >   $ /usr/bin/python3 scripts/decodetree.py test.decode
> > >   Traceback (most recent call last):
> > >     File "scripts/decodetree.py", line 1397, in <module>
> > >       main()
> > >     File "scripts/decodetree.py", line 1308, in main
> > >       parse_file(f, toppat)
> > >     File "scripts/decodetree.py", line 994, in parse_file
> > >       for line in f:
> > >     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
> > >       return codecs.ascii_decode(input, self.errors)[0]
> > >   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 80:
> > >   ordinal not in range(128)
> > >
> > > Reported-by: Peter Maydell <peter.maydell@linaro.org>
> > > Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> > > ---
> > >  scripts/decodetree.py | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> > > index 47aa9caf6d1..fa40903cff1 100644
> > > --- a/scripts/decodetree.py
> > > +++ b/scripts/decodetree.py
> > > @@ -1304,7 +1304,7 @@ def main():
> > >
> > >      for filename in args:
> > >          input_file = filename
> > > -        f = open(filename, 'r')
> > > +        f = open(filename, 'r', encoding='utf-8')
> > >          parse_file(f, toppat)
> > >          f.close()
> >
> > Should we also be opening the output file explicitly as
> > utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> > the case where we're doing that?)
> 
> Can be done with
> ```
>         sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf8",
> errors="ignore")
> ```

In the specific case of decodetree, just assigning this to
`output_fd` is enough, and less hacky than overwriting
`sys.stdout`.

Philippe Mathieu-Daudé Jan. 8, 2021, 4:44 p.m. UTC | #5

On 1/8/21 4:38 PM, Peter Maydell wrote:
> On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>>
>> When decodetree.py was added in commit 568ae7efae7, QEMU was
>> using Python 2 which happily reads UTF-8 files in text mode.
>> Python 3 requires either UTF-8 locale or an explicit encoding
>> passed to open(). Now that Python 3 is required, explicit
>> UTF-8 encoding for decodetree sources.
>>
>> This fixes:
>>
>>   $ /usr/bin/python3 scripts/decodetree.py test.decode
>>   Traceback (most recent call last):
>>     File "scripts/decodetree.py", line 1397, in <module>
>>       main()
>>     File "scripts/decodetree.py", line 1308, in main
>>       parse_file(f, toppat)
>>     File "scripts/decodetree.py", line 994, in parse_file
>>       for line in f:
>>     File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>>       return codecs.ascii_decode(input, self.errors)[0]
>>   UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>>   ordinal not in range(128)
>>
>> Reported-by: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
>> ---
>>  scripts/decodetree.py | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
>> index 47aa9caf6d1..fa40903cff1 100644
>> --- a/scripts/decodetree.py
>> +++ b/scripts/decodetree.py
>> @@ -1304,7 +1304,7 @@ def main():
>>
>>      for filename in args:
>>          input_file = filename
>> -        f = open(filename, 'r')
>> +        f = open(filename, 'r', encoding='utf-8')
>>          parse_file(f, toppat)
>>          f.close()
> 
> Should we also be opening the output file explicitly as
> utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> the case where we're doing that?)

I have been wondering about it, but the content written
in the output file is plain C code using only ASCII,
which any locale is able to process. But indeed maybe
we prefer ignore the user locale... I'm not sure.

Peter Maydell Jan. 8, 2021, 5:14 p.m. UTC | #6

On Fri, 8 Jan 2021 at 16:44, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>
> On 1/8/21 4:38 PM, Peter Maydell wrote:
> > Should we also be opening the output file explicitly as
> > utf-8 ? (How do we say "write to sys.stdout as utf-8" for
> > the case where we're doing that?)
>
> I have been wondering about it, but the content written
> in the output file is plain C code using only ASCII,
> which any locale is able to process. But indeed maybe
> we prefer ignore the user locale... I'm not sure.

I'm not a python expert so I don't know what the usual thing
is here, but it seems to me better to insulate our build
process from what the user's locale happens to be set to,
even if it happens that we currently only output ASCII.

thanks
-- PMM

Philippe Mathieu-Daudé Jan. 8, 2021, 6:02 p.m. UTC | #7

On 1/8/21 6:14 PM, Peter Maydell wrote:
> On Fri, 8 Jan 2021 at 16:44, Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:
>>
>> On 1/8/21 4:38 PM, Peter Maydell wrote:
>>> Should we also be opening the output file explicitly as
>>> utf-8 ? (How do we say "write to sys.stdout as utf-8" for
>>> the case where we're doing that?)
>>
>> I have been wondering about it, but the content written
>> in the output file is plain C code using only ASCII,
>> which any locale is able to process. But indeed maybe
>> we prefer ignore the user locale... I'm not sure.
> 
> I'm not a python expert so I don't know what the usual thing
> is here, but it seems to me better to insulate our build
> process from what the user's locale happens to be set to,
> even if it happens that we currently only output ASCII.

OK, I'll respin.

Daniele Buono Jan. 8, 2021, 10:51 p.m. UTC | #8

I had a similar issue in the past with the acceptance tests.
Some VMs send UTF-8 output in their console and the acceptance test
script would bail out if the locale was not UTF-8.

I sent a patch on the ml but it probably got lost:
https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html

I can re-spin it if you guys are interested


On 1/8/2021 10:16 AM, Philippe Mathieu-Daudé wrote:
> When decodetree.py was added in commit 568ae7efae7, QEMU was
> using Python 2 which happily reads UTF-8 files in text mode.
> Python 3 requires either UTF-8 locale or an explicit encoding
> passed to open(). Now that Python 3 is required, explicit
> UTF-8 encoding for decodetree sources.
> 
> This fixes:
> 
>    $ /usr/bin/python3 scripts/decodetree.py test.decode
>    Traceback (most recent call last):
>      File "scripts/decodetree.py", line 1397, in <module>
>        main()
>      File "scripts/decodetree.py", line 1308, in main
>        parse_file(f, toppat)
>      File "scripts/decodetree.py", line 994, in parse_file
>        for line in f:
>      File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>        return codecs.ascii_decode(input, self.errors)[0]
>    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80:
>    ordinal not in range(128)
> 
> Reported-by: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
> ---
>   scripts/decodetree.py | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/scripts/decodetree.py b/scripts/decodetree.py
> index 47aa9caf6d1..fa40903cff1 100644
> --- a/scripts/decodetree.py
> +++ b/scripts/decodetree.py
> @@ -1304,7 +1304,7 @@ def main():
> 
>       for filename in args:
>           input_file = filename
> -        f = open(filename, 'r')
> +        f = open(filename, 'r', encoding='utf-8')
>           parse_file(f, toppat)
>           f.close()
>

Eduardo Habkost Jan. 12, 2021, 9:11 p.m. UTC | #9

[CCing John, Wainer]

On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote:
> I had a similar issue in the past with the acceptance tests.
> Some VMs send UTF-8 output in their console and the acceptance test
> script would bail out if the locale was not UTF-8.
> 
> I sent a patch on the ml but it probably got lost:
> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html
> 
> I can re-spin it if you guys are interested

The mbox at
https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/
can still be applied cleanly, I don't think you need to resubmit.

However, we have no owner for tests/acceptance/avocado_qemu in
MAINTAINERS.  Is anybody currently taking care of
tests/acceptance patches and making sure they are merged?

John Snow Jan. 12, 2021, 11:35 p.m. UTC | #10

On 1/12/21 4:11 PM, Eduardo Habkost wrote:
> [CCing John, Wainer]
> 
> On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote:
>> I had a similar issue in the past with the acceptance tests.
>> Some VMs send UTF-8 output in their console and the acceptance test
>> script would bail out if the locale was not UTF-8.
>>
>> I sent a patch on the ml but it probably got lost:
>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html
>>
>> I can re-spin it if you guys are interested
> 
> The mbox at
> https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/
> can still be applied cleanly, I don't think you need to resubmit.
> 
> However, we have no owner for tests/acceptance/avocado_qemu in
> MAINTAINERS.  Is anybody currently taking care of
> tests/acceptance patches and making sure they are merged?
> 

I touch these tests sometimes, but I know very little about avocado 
overall, so I don't think it's going to be me taking point here.

(I don't mind taking a reviewer stanza for something like *.py, though.)

Acceptance (Integration) Testing with the Avocado framework
W: https://trello.com/b/6Qi1pxVn/avocado-qemu
R: Cleber Rosa <crosa@redhat.com>
R: Philippe Mathieu-Daudé <philmd@redhat.com>
R: Wainer dos Santos Moschetta <wainersm@redhat.com>
S: Odd Fixes
F: tests/acceptance/

Why is this only "Odd Fixes"? Isn't it new within the last ~2y? The 
avocado_qemu module itself was largely written by Cleber, Philippe and Caio.

--js

Philippe Mathieu-Daudé Jan. 12, 2021, 11:44 p.m. UTC | #11

On 1/13/21 12:35 AM, John Snow wrote:
> On 1/12/21 4:11 PM, Eduardo Habkost wrote:
>> [CCing John, Wainer]
>>
>> On Fri, Jan 08, 2021 at 05:51:41PM -0500, Daniele Buono wrote:
>>> I had a similar issue in the past with the acceptance tests.
>>> Some VMs send UTF-8 output in their console and the acceptance test
>>> script would bail out if the locale was not UTF-8.
>>>
>>> I sent a patch on the ml but it probably got lost:
>>> https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg06086.html
>>>
>>> I can re-spin it if you guys are interested
>>
>> The mbox at
>> https://lore.kernel.org/qemu-devel/20200721125522.20511-1-dbuono@linux.vnet.ibm.com/
>>
>> can still be applied cleanly, I don't think you need to resubmit.
>>
>> However, we have no owner for tests/acceptance/avocado_qemu in
>> MAINTAINERS.  Is anybody currently taking care of
>> tests/acceptance patches and making sure they are merged?

[1] The answer to this question is below in [2]...

> I touch these tests sometimes, but I know very little about avocado
> overall, so I don't think it's going to be me taking point here.
> 
> (I don't mind taking a reviewer stanza for something like *.py, though.)
> 
> Acceptance (Integration) Testing with the Avocado framework
> W: https://trello.com/b/6Qi1pxVn/avocado-qemu
> R: Cleber Rosa <crosa@redhat.com>
> R: Philippe Mathieu-Daudé <philmd@redhat.com>
> R: Wainer dos Santos Moschetta <wainersm@redhat.com>
> S: Odd Fixes
> F: tests/acceptance/
> 
> Why is this only "Odd Fixes"? Isn't it new within the last ~2y? The
> avocado_qemu module itself was largely written by Cleber, Philippe and
> Caio.

[2] The answer to this question is above in [1] :)

> 
> --js
> 
>

decodetree: Open files with encoding='utf-8'

Commit Message

Comments

Patch