Message ID | 20210923155726.87851-6-kory.maincent@bootlin.com |
---|---|
State | Accepted |
Headers | show |
Series | Add support for ISO9660 image compatible with Legacy and EFI BIOS | expand |
Köry, All, +Arnout, +Petr for their python3 expertise On 2021-09-23 17:57 +0200, Kory Maincent spake thusly: > With UTF-8 got issue with wrong character returned by Qemu when using EFI > BIOS. This breaks the test process with the following error. Update to > ISO-8859-1 encoding to avoid it. I am always skeptical about switching away from UTF-8. Rather, I'd like to undersand why that is the case. Is it specific to just the edk2 tests you added later in the series? What if you set your locale to 'C' (LC_ALL=C LANG=C) when runing the test-suite? Or to 'C.UTF-8'? > emulator.login() > File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login > index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT], > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect > return self.expect_list(compiled_pattern_list, > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list > return exp.expect_loop(timeout) > File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop > incoming = spawn.read_nonblocking(spawn.maxread, timeout) > File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking > return super(spawn, self).read_nonblocking(size) > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking > s = self._decoder.decode(s, final=False) > File "/usr/lib/python3.8/codecs.py", line 322, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte 0xda in iso8859-15 is Ú (LATIN CAPITAL LETTER U WITH ACUTE). This is a weird character to see... And his is the first byte... Is edk2 just spitting actual binary? So I'd really like to understand why that is... Regards, Yann E. MORIN. > Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> > --- > support/testing/infra/emulator.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py > index 0a77eb80fc..1fab9caad8 100644 > --- a/support/testing/infra/emulator.py > +++ b/support/testing/infra/emulator.py > @@ -76,7 +76,7 @@ class Emulator(object): > self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd)) > self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:], > timeout=5 * self.timeout_multiplier, > - encoding='utf-8', > + encoding='ISO-8859-1', > env={"QEMU_AUDIO_DRV": "none"}) > # We want only stdout into the log to avoid double echo > self.qemu.logfile_read = self.logfile > -- > 2.25.1 > > _______________________________________________ > buildroot mailing list > buildroot@lists.buildroot.org > https://lists.buildroot.org/mailman/listinfo/buildroot
Köry, All, On 2021-09-30 22:28 +0200, Yann E. MORIN spake thusly: > On 2021-09-23 17:57 +0200, Kory Maincent spake thusly: > > With UTF-8 got issue with wrong character returned by Qemu when using EFI > > BIOS. This breaks the test process with the following error. Update to > > ISO-8859-1 encoding to avoid it. > I am always skeptical about switching away from UTF-8. Rather, I'd like > to undersand why that is the case. [--SNIP--] > 0xda in iso8859-15 is Ú (LATIN CAPITAL LETTER U WITH ACUTE). This is a > weird character to see... And his is the first byte... Is edk2 just > spitting actual binary? It's not the first byte; and it's not edk2, but grub2: 00000150 6d 1b 5b 34 30 6d 1b 5b 30 32 3b 33 30 48 47 4e |m.[40m.[02;30HGN| 00000160 55 20 47 52 55 42 20 20 76 65 72 73 69 6f 6e 20 |U GRUB version | 00000170 32 2e 30 34 0a 0d 0a 0d 1b 5b 30 34 3b 30 32 48 |2.04.....[04;02H| 00000180 da c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 00000190 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000001a0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000001b0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000001c0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 bf 1b 5b |...............[| 000001d0 30 35 3b 30 32 48 b3 1b 5b 30 35 3b 37 39 48 b3 |05;02H..[05;79H.| So, what is this "da c4...c4 bf" sequence? It looks a bit more obvious when looking at the folowing lines: 000001d0 30 35 3b 30 32 48 b3 1b 5b 30 35 3b 37 39 48 b3 |05;02H..[05;79H.| 000001e0 1b 5b 30 36 3b 30 32 48 b3 1b 5b 30 36 3b 37 39 |.[06;02H..[06;79| 000001f0 48 b3 1b 5b 30 37 3b 30 32 48 b3 1b 5b 30 37 3b |H..[07;02H..[07;| 00000200 37 39 48 b3 1b 5b 30 38 3b 30 32 48 b3 1b 5b 30 |79H..[08;02H..[0| 00000210 38 3b 37 39 48 b3 1b 5b 30 39 3b 30 32 48 b3 1b |8;79H..[09;02H..| 00000220 5b 30 39 3b 37 39 48 b3 1b 5b 31 30 3b 30 32 48 |[09;79H..[10;02H| 00000230 b3 1b 5b 31 30 3b 37 39 48 b3 1b 5b 31 31 3b 30 |..[10;79H..[11;0| 00000240 32 48 b3 1b 5b 31 31 3b 37 39 48 b3 1b 5b 31 32 |2H..[11;79H..[12| 00000250 3b 30 32 48 b3 1b 5b 31 32 3b 37 39 48 b3 1b 5b |;02H..[12;79H..[| 00000260 31 33 3b 30 32 48 b3 1b 5b 31 33 3b 37 39 48 b3 |13;02H..[13;79H.| 00000270 1b 5b 31 34 3b 30 32 48 b3 1b 5b 31 34 3b 37 39 |.[14;02H..[14;79| 00000280 48 b3 1b 5b 31 35 3b 30 32 48 b3 1b 5b 31 35 3b |H..[15;02H..[15;| 00000290 37 39 48 b3 1b 5b 31 36 3b 30 32 48 b3 1b 5b 31 |79H..[16;02H..[1| 000002a0 36 3b 37 39 48 b3 1b 5b 31 37 3b 30 32 48 b3 1b |6;79H..[17;02H..| 000002b0 5b 31 37 3b 37 39 48 b3 1b 5b 31 38 3b 30 32 48 |[17;79H..[18;02H| 000002c0 c0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000002d0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000002e0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 000002f0 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 |................| 00000300 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 c4 d9 1b 5b |...............[| 00000310 31 39 3b 30 32 48 1b 5b 32 30 3b 30 32 48 20 20 |19;02H.[20;02H | OK, so there are a lot of ANSI Escape Sequences, which are CSI commands: 1b 5b 30 35 3b 37 39 48 - ESC [ 05 ; 79 H This is CSI for "Cursor Position", i.e. move sursor to row 5, column 79. OK, it is drawing a box! da is top-left corner, c4 is horizontal line, bf is top-right corner, b3 is vertical line, c0 id lower-left corner, and d9 is lower-right corner. This is definitely not ISO-8859-15; it is CP437 [0]: https://en.wikipedia.org/wiki/Code_page_437 So, the solution to switch over to iso-8859-15 is not the proper solution. We don't want to switch to CP437 either! What I suggest, instead, is that we tell pexpect.spawn to just fix any non-decodable character, by passing the parameter: codec_errors='replace' At least, it makes the two EFI run-test succeed! If that is OK, I can do that when applying, and fix the commit log accordingly. Regards, Yann E. MORIN. [0] Oh boy, does that bring fond memories... ;-)
On Sat, 2 Oct 2021 22:28:05 +0200 "Yann E. MORIN" <yann.morin.1998@free.fr> wrote: > OK, so there are a lot of ANSI Escape Sequences, which are CSI commands: > > 1b 5b 30 35 3b 37 39 48 - ESC [ 05 ; 79 H > > This is CSI for "Cursor Position", i.e. move sursor to row 5, column 79. > OK, it is drawing a box! da is top-left corner, c4 is horizontal line, > bf is top-right corner, b3 is vertical line, c0 id lower-left corner, > and d9 is lower-right corner. > > This is definitely not ISO-8859-15; it is CP437 [0]: > https://en.wikipedia.org/wiki/Code_page_437 > > So, the solution to switch over to iso-8859-15 is not the proper > solution. We don't want to switch to CP437 either! Wow, thanks for the investigation! > What I suggest, instead, is that we tell pexpect.spawn to just fix any > non-decodable character, by passing the parameter: > codec_errors='replace' > > At least, it makes the two EFI run-test succeed! > > If that is OK, I can do that when applying, and fix the commit log > accordingly. This indeed looks like a better option. Thanks again! Thomas
Köry, All, On 2021-09-23 17:57 +0200, Kory Maincent spake thusly: > With UTF-8 got issue with wrong character returned by Qemu when using EFI > BIOS. This breaks the test process with the following error. Update to > ISO-8859-1 encoding to avoid it. > > emulator.login() > File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login > index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT], > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect > return self.expect_list(compiled_pattern_list, > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list > return exp.expect_loop(timeout) > File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop > incoming = spawn.read_nonblocking(spawn.maxread, timeout) > File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking > return super(spawn, self).read_nonblocking(size) > File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking > s = self._decoder.decode(s, final=False) > File "/usr/lib/python3.8/codecs.py", line 322, in decode > (result, consumed) = self._buffer_decode(data, self.errors, final) > UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte > > Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> > --- > support/testing/infra/emulator.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py > index 0a77eb80fc..1fab9caad8 100644 > --- a/support/testing/infra/emulator.py > +++ b/support/testing/infra/emulator.py > @@ -76,7 +76,7 @@ class Emulator(object): > self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd)) > self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:], > timeout=5 * self.timeout_multiplier, > - encoding='utf-8', > + encoding='ISO-8859-1', After review from Thomas, I've indeed reverted to using utf-8, but using codecerrors='replace' to fix invalid utf-8 sequences. I've also reworded the commit log accordingly. Applied to master, thanks. Regards, Yann E. MORIN. > env={"QEMU_AUDIO_DRV": "none"}) > # We want only stdout into the log to avoid double echo > self.qemu.logfile_read = self.logfile > -- > 2.25.1 > > _______________________________________________ > buildroot mailing list > buildroot@lists.buildroot.org > https://lists.buildroot.org/mailman/listinfo/buildroot
Hello Yann, On Sun, 3 Oct 2021 14:47:19 +0200 "Yann E. MORIN" <yann.morin.1998@free.fr> wrote: > After review from Thomas, I've indeed reverted to using utf-8, but using > codecerrors='replace' to fix invalid utf-8 sequences. > > I've also reworded the commit log accordingly. > > Applied to master, thanks. Thanks for this investigation on the utf8 character issue and for completing the merging process :) Regards, Köry
>>>>> "Yann" == Yann E MORIN <yann.morin.1998@free.fr> writes: > Köry, All, > On 2021-09-23 17:57 +0200, Kory Maincent spake thusly: >> With UTF-8 got issue with wrong character returned by Qemu when using EFI >> BIOS. This breaks the test process with the following error. Update to >> ISO-8859-1 encoding to avoid it. >> >> emulator.login() >> File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login >> index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT], >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect >> return self.expect_list(compiled_pattern_list, >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list >> return exp.expect_loop(timeout) >> File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop >> incoming = spawn.read_nonblocking(spawn.maxread, timeout) >> File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking >> return super(spawn, self).read_nonblocking(size) >> File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking >> s = self._decoder.decode(s, final=False) >> File "/usr/lib/python3.8/codecs.py", line 322, in decode >> (result, consumed) = self._buffer_decode(data, self.errors, final) >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte >> >> Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> >> --- >> support/testing/infra/emulator.py | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py >> index 0a77eb80fc..1fab9caad8 100644 >> --- a/support/testing/infra/emulator.py >> +++ b/support/testing/infra/emulator.py >> @@ -76,7 +76,7 @@ class Emulator(object): >> self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd)) >> self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:], >> timeout=5 * self.timeout_multiplier, >> - encoding='utf-8', >> + encoding='ISO-8859-1', > After review from Thomas, I've indeed reverted to using utf-8, but using > codecerrors='replace' to fix invalid utf-8 sequences. > I've also reworded the commit log accordingly. > Applied to master, thanks. Committed to 2021.02.x, 2021.05.x and 2021.08.x, thanks.
diff --git a/support/testing/infra/emulator.py b/support/testing/infra/emulator.py index 0a77eb80fc..1fab9caad8 100644 --- a/support/testing/infra/emulator.py +++ b/support/testing/infra/emulator.py @@ -76,7 +76,7 @@ class Emulator(object): self.logfile.write("> starting qemu with '%s'\n" % " ".join(qemu_cmd)) self.qemu = pexpect.spawn(qemu_cmd[0], qemu_cmd[1:], timeout=5 * self.timeout_multiplier, - encoding='utf-8', + encoding='ISO-8859-1', env={"QEMU_AUDIO_DRV": "none"}) # We want only stdout into the log to avoid double echo self.qemu.logfile_read = self.logfile
With UTF-8 got issue with wrong character returned by Qemu when using EFI BIOS. This breaks the test process with the following error. Update to ISO-8859-1 encoding to avoid it. emulator.login() File "/home/kmaincent/Documents/projects/ariane-groupe/buildroot/support/testing/infra/emulator.py", line 89, in login index = self.qemu.expect(["buildroot login:", pexpect.TIMEOUT], File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 340, in expect return self.expect_list(compiled_pattern_list, File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 369, in expect_list return exp.expect_loop(timeout) File "/usr/lib/python3/dist-packages/pexpect/expect.py", line 111, in expect_loop incoming = spawn.read_nonblocking(spawn.maxread, timeout) File "/usr/lib/python3/dist-packages/pexpect/pty_spawn.py", line 485, in read_nonblocking return super(spawn, self).read_nonblocking(size) File "/usr/lib/python3/dist-packages/pexpect/spawnbase.py", line 178, in read_nonblocking s = self._decoder.decode(s, final=False) File "/usr/lib/python3.8/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xda in position 0: invalid continuation byte Signed-off-by: Kory Maincent <kory.maincent@bootlin.com> --- support/testing/infra/emulator.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)