diff mbox series

[PULL,7/8] tests/acceptance: Add test of NeXTcube framebuffer using OCR

Message ID 20190907154744.4136-8-huth@tuxfamily.org
State New
Headers show
Series [PULL,1/8] m68k: Add NeXTcube framebuffer device emulation | expand

Commit Message

Thomas Huth Sept. 7, 2019, 3:47 p.m. UTC
From: Philippe Mathieu-Daudé <f4bug@amsat.org>

Add a test of the NeXTcube framebuffer using the Tesseract OCR
engine on a screenshot of the framebuffer device.

The test is very quick:

  $ avocado --show=app,console run tests/acceptance/machine_m68k_nextcube.py
  JOB ID     : 78844a92424cc495bd068c3874d542d1e20f24bc
  JOB LOG    : /home/phil/avocado/job-results/job-2019-08-13T13.16-78844a9/job.log
   (1/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_size: PASS (2.16 s)
   (2/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v3: -
  ue r pun Honl'flx ; 5‘ 55‘
  avg ncaaaaa 25 MHZ, memary jag m
  Backplane slat «a
  Ethernet address a a r a r3 2
  Memgry sackets aea canflqured far 16MB Darlly page made stMs but have 16MB page made stMs )nstalled
  Memgry sackets a and 1 canflqured far 16MB Darlly page made stMs but have 16MB page made stMs )nstalled
  [...]
  Yestlnq the rpu, 5::
  system test raneg Errar egge 51
  Egg: cammand
  Default pggc devlce nut fauna
  NEXY>I
  PASS (2.64 s)
   (3/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v4: SKIP: tesseract v4 OCR tool not available
  RESULTS    : PASS 2 | ERROR 0 | FAIL 0 | SKIP 1 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 5.35 s

Documentation on how to install tesseract:
  https://github.com/tesseract-ocr/tesseract/wiki#installation

Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-Id: <20190813134921.30602-2-philmd@redhat.com>
Signed-off-by: Thomas Huth <huth@tuxfamily.org>
---
 tests/acceptance/machine_m68k_nextcube.py | 121 ++++++++++++++++++++++
 1 file changed, 121 insertions(+)
 create mode 100644 tests/acceptance/machine_m68k_nextcube.py

Comments

Peter Maydell Sept. 10, 2019, 12:02 p.m. UTC | #1
On Sat, 7 Sep 2019 at 16:47, Thomas Huth <huth@tuxfamily.org> wrote:
>
> From: Philippe Mathieu-Daudé <f4bug@amsat.org>
>
> Add a test of the NeXTcube framebuffer using the Tesseract OCR
> engine on a screenshot of the framebuffer device.
>
> The test is very quick:
>
>   $ avocado --show=app,console run tests/acceptance/machine_m68k_nextcube.py
>   JOB ID     : 78844a92424cc495bd068c3874d542d1e20f24bc
>   JOB LOG    : /home/phil/avocado/job-results/job-2019-08-13T13.16-78844a9/job.log
>    (1/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_size: PASS (2.16 s)
>    (2/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v3: -
>   ue r pun Honl'flx ; 5‘ 55‘
>   avg ncaaaaa 25 MHZ, memary jag m
>   Backplane slat «a
>   Ethernet address a a r a r3 2
>   Memgry sackets aea canflqured far 16MB Darlly page made stMs but have 16MB page made stMs )nstalled

By the way, do we know why the output from this test case is
garbled like this ? It suggests that something's not right
somewhere...

thanks
-- PMM
Thomas Huth Sept. 10, 2019, 12:07 p.m. UTC | #2
On 10/09/2019 14.02, Peter Maydell wrote:
> On Sat, 7 Sep 2019 at 16:47, Thomas Huth <huth@tuxfamily.org> wrote:
>>
>> From: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>
>> Add a test of the NeXTcube framebuffer using the Tesseract OCR
>> engine on a screenshot of the framebuffer device.
>>
>> The test is very quick:
>>
>>   $ avocado --show=app,console run tests/acceptance/machine_m68k_nextcube.py
>>   JOB ID     : 78844a92424cc495bd068c3874d542d1e20f24bc
>>   JOB LOG    : /home/phil/avocado/job-results/job-2019-08-13T13.16-78844a9/job.log
>>    (1/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_size: PASS (2.16 s)
>>    (2/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v3: -
>>   ue r pun Honl'flx ; 5‘ 55‘
>>   avg ncaaaaa 25 MHZ, memary jag m
>>   Backplane slat «a
>>   Ethernet address a a r a r3 2
>>   Memgry sackets aea canflqured far 16MB Darlly page made stMs but have 16MB page made stMs )nstalled
> 
> By the way, do we know why the output from this test case is
> garbled like this ? It suggests that something's not right
> somewhere...

The text is created from the framebuffer with the OCR-tool Tesseract -
which is just not good enough to recognize all words properly here.

 Thomas
Philippe Mathieu-Daudé Sept. 10, 2019, 12:58 p.m. UTC | #3
On 9/10/19 2:07 PM, Thomas Huth wrote:
> On 10/09/2019 14.02, Peter Maydell wrote:
>> On Sat, 7 Sep 2019 at 16:47, Thomas Huth <huth@tuxfamily.org> wrote:
>>>
>>> From: Philippe Mathieu-Daudé <f4bug@amsat.org>
>>>
>>> Add a test of the NeXTcube framebuffer using the Tesseract OCR
>>> engine on a screenshot of the framebuffer device.
>>>
>>> The test is very quick:
>>>
>>>   $ avocado --show=app,console run tests/acceptance/machine_m68k_nextcube.py
>>>   JOB ID     : 78844a92424cc495bd068c3874d542d1e20f24bc
>>>   JOB LOG    : /home/phil/avocado/job-results/job-2019-08-13T13.16-78844a9/job.log
>>>    (1/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_size: PASS (2.16 s)
>>>    (2/3) tests/acceptance/machine_m68k_nextcube.py:NextCubeMachine.test_bootrom_framebuffer_ocr_with_tesseract_v3: -
>>>   ue r pun Honl'flx ; 5‘ 55‘
>>>   avg ncaaaaa 25 MHZ, memary jag m
>>>   Backplane slat «a
>>>   Ethernet address a a r a r3 2
>>>   Memgry sackets aea canflqured far 16MB Darlly page made stMs but have 16MB page made stMs )nstalled
>>
>> By the way, do we know why the output from this test case is
>> garbled like this ? It suggests that something's not right
>> somewhere...

I got better result using few options to tune, but later noticed they
differ on Fedora/Ubuntu.
Tesseract v4 has better result but it is alpha and we need to install
train data. Not that big, 15MiB:
https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata
I preferred to keep the simplest tests with acceptable result, we are
not interested in fully understandable text output: we only want to know
the framebuffer model works. Reading "Ethernet address" is good and
quick enough.

> The text is created from the framebuffer with the OCR-tool Tesseract -
> which is just not good enough to recognize all words properly here.
> 
>  Thomas
>
diff mbox series

Patch

diff --git a/tests/acceptance/machine_m68k_nextcube.py b/tests/acceptance/machine_m68k_nextcube.py
new file mode 100644
index 0000000000..e09cab9f20
--- /dev/null
+++ b/tests/acceptance/machine_m68k_nextcube.py
@@ -0,0 +1,121 @@ 
+# Functional test that boots a VM and run OCR on the framebuffer
+#
+# Copyright (c) Philippe Mathieu-Daudé <f4bug@amsat.org>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+import os
+import re
+import time
+import logging
+import distutils.spawn
+
+from avocado_qemu import Test
+from avocado import skipUnless
+from avocado.utils import process
+from avocado.utils.path import find_command, CmdNotFoundError
+
+PIL_AVAILABLE = True
+try:
+    from PIL import Image
+except ImportError:
+    PIL_AVAILABLE = False
+
+
+def tesseract_available(expected_version):
+    try:
+        find_command('tesseract')
+    except CmdNotFoundError:
+        return False
+    res = process.run('tesseract --version')
+    try:
+        version = res.stdout_text.split()[1]
+    except IndexError:
+        version = res.stderr_text.split()[1]
+    return int(version.split('.')[0]) == expected_version
+
+    match = re.match(r'tesseract\s(\d)', res)
+    if match is None:
+        return False
+    # now this is guaranteed to be a digit
+    return int(match.groups()[0]) == expected_version
+
+
+class NextCubeMachine(Test):
+
+    timeout = 15
+
+    def check_bootrom_framebuffer(self, screenshot_path):
+        rom_url = ('http://www.nextcomputers.org/NeXTfiles/Software/ROM_Files/'
+                   '68040_Non-Turbo_Chipset/Rev_2.5_v66.BIN')
+        rom_hash = 'b3534796abae238a0111299fc406a9349f7fee24'
+        rom_path = self.fetch_asset(rom_url, asset_hash=rom_hash)
+
+        self.vm.set_machine('next-cube')
+        self.vm.add_args('-bios', rom_path)
+        self.vm.launch()
+
+        self.log.info('VM launched, waiting for display')
+        # TODO: Use avocado.utils.wait.wait_for to catch the
+        #       'displaysurface_create 1120x832' trace-event.
+        time.sleep(2)
+
+        self.vm.command('human-monitor-command',
+                        command_line='screendump %s' % screenshot_path)
+
+    @skipUnless(PIL_AVAILABLE, 'Python PIL not installed')
+    def test_bootrom_framebuffer_size(self):
+        """
+        :avocado: tags=arch:m68k
+        :avocado: tags=machine:next_cube
+        :avocado: tags=device:framebuffer
+        """
+        screenshot_path = os.path.join(self.workdir, "dump.png")
+        self.check_bootrom_framebuffer(screenshot_path)
+
+        width, height = Image.open(screenshot_path).size
+        self.assertEqual(width, 1120)
+        self.assertEqual(height, 832)
+
+    @skipUnless(tesseract_available(3), 'tesseract v3 OCR tool not available')
+    def test_bootrom_framebuffer_ocr_with_tesseract_v3(self):
+        """
+        :avocado: tags=arch:m68k
+        :avocado: tags=machine:next_cube
+        :avocado: tags=device:framebuffer
+        """
+        screenshot_path = os.path.join(self.workdir, "dump.png")
+        self.check_bootrom_framebuffer(screenshot_path)
+
+        console_logger = logging.getLogger('console')
+        text = process.run("tesseract %s stdout" % screenshot_path).stdout_text
+        for line in text.split('\n'):
+            if len(line):
+                console_logger.debug(line)
+        self.assertIn('Backplane', text)
+        self.assertIn('Ethernet address', text)
+
+    # Tesseract 4 adds a new OCR engine based on LSTM neural networks. The
+    # new version is faster and more accurate than version 3. The drawback is
+    # that it is still alpha-level software.
+    @skipUnless(tesseract_available(4), 'tesseract v4 OCR tool not available')
+    def test_bootrom_framebuffer_ocr_with_tesseract_v4(self):
+        """
+        :avocado: tags=arch:m68k
+        :avocado: tags=machine:next_cube
+        :avocado: tags=device:framebuffer
+        """
+        screenshot_path = os.path.join(self.workdir, "dump.png")
+        self.check_bootrom_framebuffer(screenshot_path)
+
+        console_logger = logging.getLogger('console')
+        proc = process.run("tesseract --oem 1 %s stdout" % screenshot_path)
+        text = proc.stdout_text
+        for line in text.split('\n'):
+            if len(line):
+                console_logger.debug(line)
+        self.assertIn('Testing the FPU, SCC', text)
+        self.assertIn('System test failed. Error code 51', text)
+        self.assertIn('Boot command', text)
+        self.assertIn('Next>', text)