diff mbox series

[v3,5/7] support/testing/infra/emulator.py: update encoding when calling qemu

Message ID 20210923155726.87851-6-kory.maincent@bootlin.com
State Accepted
Headers show
Series Add support for ISO9660 image compatible with Legacy and EFI BIOS | expand

Commit Message

Köry Maincent Sept. 23, 2021, 3:57 p.m. UTC
With UTF-8 got issue with wrong character returned by Qemu when using EFI
BIOS. This breaks the test process with the following error. Update to
ISO-8859-1 encoding to avoid it.

    emulator.login()
  File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login
    index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT],
  File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect
    return self.expect_list(compiled_pattern_list,
  File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list
    return exp.expect_loop(timeout)
  File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop
    incoming = spawn.read_nonblocking(spawn.maxread, timeout)
  File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking
    return super(spawn, self).read_nonblocking(size)
  File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking
    s = self._decoder.decode(s, final=False)
  File "/usr/lib/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
---
 support/testing/infra/emulator.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yann E. MORIN Sept. 30, 2021, 8:28 p.m. UTC | #1
Köry, All,

+Arnout, +Petr for their python3 expertise

On 2021-09-23 17:57 +0200, Kory Maincent spake thusly:
> With UTF-8 got issue with wrong character returned by Qemu when using EFI
> BIOS. This breaks the test process with the following error. Update to
> ISO-8859-1 encoding to avoid it.

I am always skeptical about switching away from UTF-8. Rather, I'd like
to undersand why that is the case.

Is it specific to just the edk2 tests you added later in the series?
What if you set your locale to 'C' (LC_ALL=C LANG=C) when runing the
test-suite? Or to 'C.UTF-8'?

>     emulator.login()
>   File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login
>     index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT],
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect
>     return self.expect_list(compiled_pattern_list,
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list
>     return exp.expect_loop(timeout)
>   File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop
>     incoming = spawn.read_nonblocking(spawn.maxread, timeout)
>   File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking
>     return super(spawn, self).read_nonblocking(size)
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking
>     s = self._decoder.decode(s, final=False)
>   File "/usr/lib/python3.8/codecs.py", line 322, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte

0xda in iso8859-15 is Ú (LATIN CAPITAL LETTER U WITH ACUTE). This is a
weird character to see... And his is the first byte... Is edk2 just
spitting actual binary?

So I'd really like to understand why that is... 

Regards,
Yann E. MORIN.

> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
> ---
>  support/testing/infra/emulator.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py
> index 0a77eb80fc..1fab9caad8 100644
> --- a/support/testing/infra/emulator.py
> +++ b/support/testing/infra/emulator.py
> @@ -76,7 +76,7 @@ class Emulator(object):
>          self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd))
>          self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:],
>                                    timeout=5 * self.timeout_multiplier,
> -                                  encoding='utf-8',
> +                                  encoding='ISO-8859-1',
>                                    env={"QEMU_AUDIO_DRV": "none"})
>          # We want only stdout into the log to avoid double echo
>          self.qemu.logfile_read = self.logfile
> -- 
> 2.25.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot@lists.buildroot.org
> https://lists.buildroot.org/mailman/listinfo/buildroot
Yann E. MORIN Oct. 2, 2021, 8:28 p.m. UTC | #2
Köry, All,

On 2021-09-30 22:28 +0200, Yann E. MORIN spake thusly:
> On 2021-09-23 17:57 +0200, Kory Maincent spake thusly:
> > With UTF-8 got issue with wrong character returned by Qemu when using EFI
> > BIOS. This breaks the test process with the following error. Update to
> > ISO-8859-1 encoding to avoid it.
> I am always skeptical about switching away from UTF-8. Rather, I'd like
> to undersand why that is the case.
[--SNIP--]
> 0xda in iso8859-15 is Ú (LATIN CAPITAL LETTER U WITH ACUTE). This is a
> weird character to see... And his is the first byte... Is edk2 just
> spitting actual binary?

It's not the first byte; and it's not edk2, but grub2:

    00000150  6d 1b 5b 34 30 6d 1b 5b  30 32 3b 33 30 48 47 4e  |m.[40m.[02;30HGN|
    00000160  55 20 47 52 55 42 20 20  76 65 72 73 69 6f 6e 20  |U GRUB  version |
    00000170  32 2e 30 34 0a 0d 0a 0d  1b 5b 30 34 3b 30 32 48  |2.04.....[04;02H|
    00000180  da c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    00000190  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000001a0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000001b0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000001c0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 bf 1b 5b  |...............[|
    000001d0  30 35 3b 30 32 48 b3 1b  5b 30 35 3b 37 39 48 b3  |05;02H..[05;79H.|

So, what is this "da c4...c4 bf" sequence? It looks a bit more obvious
when looking at the folowing lines:

    000001d0  30 35 3b 30 32 48 b3 1b  5b 30 35 3b 37 39 48 b3  |05;02H..[05;79H.|
    000001e0  1b 5b 30 36 3b 30 32 48  b3 1b 5b 30 36 3b 37 39  |.[06;02H..[06;79|
    000001f0  48 b3 1b 5b 30 37 3b 30  32 48 b3 1b 5b 30 37 3b  |H..[07;02H..[07;|
    00000200  37 39 48 b3 1b 5b 30 38  3b 30 32 48 b3 1b 5b 30  |79H..[08;02H..[0|
    00000210  38 3b 37 39 48 b3 1b 5b  30 39 3b 30 32 48 b3 1b  |8;79H..[09;02H..|
    00000220  5b 30 39 3b 37 39 48 b3  1b 5b 31 30 3b 30 32 48  |[09;79H..[10;02H|
    00000230  b3 1b 5b 31 30 3b 37 39  48 b3 1b 5b 31 31 3b 30  |..[10;79H..[11;0|
    00000240  32 48 b3 1b 5b 31 31 3b  37 39 48 b3 1b 5b 31 32  |2H..[11;79H..[12|
    00000250  3b 30 32 48 b3 1b 5b 31  32 3b 37 39 48 b3 1b 5b  |;02H..[12;79H..[|
    00000260  31 33 3b 30 32 48 b3 1b  5b 31 33 3b 37 39 48 b3  |13;02H..[13;79H.|
    00000270  1b 5b 31 34 3b 30 32 48  b3 1b 5b 31 34 3b 37 39  |.[14;02H..[14;79|
    00000280  48 b3 1b 5b 31 35 3b 30  32 48 b3 1b 5b 31 35 3b  |H..[15;02H..[15;|
    00000290  37 39 48 b3 1b 5b 31 36  3b 30 32 48 b3 1b 5b 31  |79H..[16;02H..[1|
    000002a0  36 3b 37 39 48 b3 1b 5b  31 37 3b 30 32 48 b3 1b  |6;79H..[17;02H..|
    000002b0  5b 31 37 3b 37 39 48 b3  1b 5b 31 38 3b 30 32 48  |[17;79H..[18;02H|
    000002c0  c0 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000002d0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000002e0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    000002f0  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 c4 c4 c4  |................|
    00000300  c4 c4 c4 c4 c4 c4 c4 c4  c4 c4 c4 c4 c4 d9 1b 5b  |...............[|
    00000310  31 39 3b 30 32 48 1b 5b  32 30 3b 30 32 48 20 20  |19;02H.[20;02H  |

OK, so there are a lot of ANSI Escape Sequences, which are CSI commands:

    1b 5b 30 35 3b 37 39 48     - ESC [ 05 ; 79 H

This is CSI for "Cursor Position", i.e. move sursor to row 5, column 79.
OK, it is drawing a box! da is top-left corner, c4 is horizontal line,
bf is top-right corner, b3 is vertical line, c0 id lower-left corner,
and d9 is lower-right corner.

This is definitely not ISO-8859-15; it is CP437 [0]:
    https://en.wikipedia.org/wiki/Code_page_437

So, the solution to switch over to iso-8859-15 is not the proper
solution. We don't want to switch to CP437 either!

What I suggest, instead, is that we tell pexpect.spawn to just fix any
non-decodable character, by passing the parameter:
    codec_errors='replace'

At least, it makes the two EFI run-test succeed!

If that is OK, I can do that when applying, and fix the commit log
accordingly.

Regards,
Yann E. MORIN.

[0] Oh boy, does that bring fond memories... ;-)
Thomas Petazzoni Oct. 3, 2021, 9:09 a.m. UTC | #3
On Sat, 2 Oct 2021 22:28:05 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> OK, so there are a lot of ANSI Escape Sequences, which are CSI commands:
> 
>     1b 5b 30 35 3b 37 39 48     - ESC [ 05 ; 79 H
> 
> This is CSI for "Cursor Position", i.e. move sursor to row 5, column 79.
> OK, it is drawing a box! da is top-left corner, c4 is horizontal line,
> bf is top-right corner, b3 is vertical line, c0 id lower-left corner,
> and d9 is lower-right corner.
> 
> This is definitely not ISO-8859-15; it is CP437 [0]:
>     https://en.wikipedia.org/wiki/Code_page_437
> 
> So, the solution to switch over to iso-8859-15 is not the proper
> solution. We don't want to switch to CP437 either!

Wow, thanks for the investigation!

> What I suggest, instead, is that we tell pexpect.spawn to just fix any
> non-decodable character, by passing the parameter:
>     codec_errors='replace'
> 
> At least, it makes the two EFI run-test succeed!
> 
> If that is OK, I can do that when applying, and fix the commit log
> accordingly.

This indeed looks like a better option.

Thanks again!

Thomas
Yann E. MORIN Oct. 3, 2021, 12:47 p.m. UTC | #4
Köry, All,

On 2021-09-23 17:57 +0200, Kory Maincent spake thusly:
> With UTF-8 got issue with wrong character returned by Qemu when using EFI
> BIOS. This breaks the test process with the following error. Update to
> ISO-8859-1 encoding to avoid it.
> 
>     emulator.login()
>   File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login
>     index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT],
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect
>     return self.expect_list(compiled_pattern_list,
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list
>     return exp.expect_loop(timeout)
>   File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop
>     incoming = spawn.read_nonblocking(spawn.maxread, timeout)
>   File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking
>     return super(spawn, self).read_nonblocking(size)
>   File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking
>     s = self._decoder.decode(s, final=False)
>   File "/usr/lib/python3.8/codecs.py", line 322, in decode
>     (result, consumed) = self._buffer_decode(data, self.errors, final)
> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte
> 
> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
> ---
>  support/testing/infra/emulator.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py
> index 0a77eb80fc..1fab9caad8 100644
> --- a/support/testing/infra/emulator.py
> +++ b/support/testing/infra/emulator.py
> @@ -76,7 +76,7 @@ class Emulator(object):
>          self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd))
>          self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:],
>                                    timeout=5 * self.timeout_multiplier,
> -                                  encoding='utf-8',
> +                                  encoding='ISO-8859-1',

After review from Thomas, I've indeed reverted to using utf-8, but using
codecerrors='replace' to fix invalid utf-8 sequences.

I've also reworded the commit log accordingly.

Applied to master, thanks.

Regards,
Yann E. MORIN.

>                                    env={"QEMU_AUDIO_DRV": "none"})
>          # We want only stdout into the log to avoid double echo
>          self.qemu.logfile_read = self.logfile
> -- 
> 2.25.1
> 
> _______________________________________________
> buildroot mailing list
> buildroot@lists.buildroot.org
> https://lists.buildroot.org/mailman/listinfo/buildroot
Köry Maincent Oct. 4, 2021, 7:47 a.m. UTC | #5
Hello Yann,

On Sun, 3 Oct 2021 14:47:19 +0200
"Yann E. MORIN" <yann.morin.1998@free.fr> wrote:

> After review from Thomas, I've indeed reverted to using utf-8, but using
> codecerrors='replace' to fix invalid utf-8 sequences.
> 
> I've also reworded the commit log accordingly.
> 
> Applied to master, thanks.

Thanks for this investigation on the utf8 character issue and for completing the
merging process :)

Regards,
Köry
Peter Korsgaard Oct. 6, 2021, 2:59 p.m. UTC | #6
>>>>> "Yann" == Yann E MORIN <yann.morin.1998@free.fr> writes:

 > Köry, All,
 > On 2021-09-23 17:57 +0200, Kory Maincent spake thusly:
 >> With UTF-8 got issue with wrong character returned by Qemu when using EFI
 >> BIOS. This breaks the test process with the following error. Update to
 >> ISO-8859-1 encoding to avoid it.
 >> 
 >> emulator.login()
 >> File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login
 >> index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT],
 >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect
 >> return self.expect_list(compiled_pattern_list,
 >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list
 >> return exp.expect_loop(timeout)
 >> File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop
 >> incoming = spawn.read_nonblocking(spawn.maxread, timeout)
 >> File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking
 >> return super(spawn, self).read_nonblocking(size)
 >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking
 >> s = self._decoder.decode(s, final=False)
 >> File "/usr/lib/python3.8/codecs.py", line 322, in decode
 >> (result, consumed) = self._buffer_decode(data, self.errors, final)
 >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte
 >> 
 >> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com>
 >> ---
 >> support/testing/infra/emulator.py | 2 +-
 >> 1 file changed, 1 insertion(+), 1 deletion(-)
 >> 
 >> diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py
 >> index 0a77eb80fc..1fab9caad8 100644
 >> --- a/support/testing/infra/emulator.py
 >> +++ b/support/testing/infra/emulator.py
 >> @@ -76,7 +76,7 @@ class Emulator(object):
 >> self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd))
 >> self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:],
 >> timeout=5 * self.timeout_multiplier,
 >> -                                  encoding='utf-8',
 >> +                                  encoding='ISO-8859-1',

 > After review from Thomas, I've indeed reverted to using utf-8, but using
 > codecerrors='replace' to fix invalid utf-8 sequences.

 > I've also reworded the commit log accordingly.

 > Applied to master, thanks.

Committed to 2021.02.x, 2021.05.x and 2021.08.x, thanks.
diff mbox series

Patch

diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py
index 0a77eb80fc..1fab9caad8 100644
--- a/support/testing/infra/emulator.py
+++ b/support/testing/infra/emulator.py
@@ -76,7 +76,7 @@  class Emulator(object):
         self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd))
         self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:],
                                   timeout=5 * self.timeout_multiplier,
-                                  encoding='utf-8',
+                                  encoding='ISO-8859-1',
                                   env={"QEMU_AUDIO_DRV": "none"})
         # We want only stdout into the log to avoid double echo
         self.qemu.logfile_read = self.logfile