diff mbox series

Acceptance Test BootLinuxAarch64.test_virt_tcg execution times

Message ID 20200806193553.GA1463846@localhost.localdomain
State New
Headers show
Series Acceptance Test BootLinuxAarch64.test_virt_tcg execution times | expand

Commit Message

Cleber Rosa Aug. 6, 2020, 7:35 p.m. UTC
TL;DR: This is a followup from an IRC chat about the
tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg test
taking many orders of magnitute longer than other acceptance (and even
similar boot) tests.  I could not find an easy way for this specific
test (aarch64+tcg) to have a significant execution time improvement.
The best solution may be to filter out tests that are known to be
slow, and create a specific "test job" that includes them.

Fisrt, if it's not clear, this specific test runs QEMU with TCG and
boots a Fedora 31 "cloud image" and waits until the cloud-init agent
notifies the test that the boot is over.

Out of the four archtiectures tested with the same approach under
"tests/acceptance/boot_linux.py", aarch64 was special, in the sense
that many Linux "cloud images" got stuck very late in the boot
process.  What seemed to be a disk activity within the guest that
seemed to make the kernel drain its random number sources if my memory
serves me right.  Giving the machine a RNG device fixed it.  This can
still be verified Today if you comment the virtio-rng lines in the
aarch64 test.

So, even with the RNG device and the boot process not getting stuck, a
lot of the test time is spent with QEMU actively using CPU time
produced by the guest boot process.  This may or may not be the cause
for the slowness.

One approach to have a shorter test time, would be to reduce the
things that happen during the guest boot process.  Choosing a minimal
guest, such as CirrOS, would be an example of such a solution, but:

* With less things happening during the guest boot, less things
  get tested within QEMU;

* CirrOS can not make use of the same boot cloud-init configuration
  and boot verification the test currently uses;

So that leaves other non-minimal Linux "cloud images" as options.  But
still, the following things are required or nice to have:

* Support for cloud-init;

* Support for as many as possible architectures;

* Wide user base;

* Be thoroughly tested with this "boot_linux.py" test

So in the end, I picked Fedora 31, which was available and behaved
well for four different architectures with and without KVM. Today, I
verified if switching distros would provide an "easy fix", but the
results were negative.  Any ideas on how to improve the test execution
times are appreciated.

For the record, one of the ways we're trying to improve the overall
test experience is to allow tests to run in parallel (expected to be
fully supported on the upcoming version 81.0).

For those interested, these are the numbers I got, and how I tested
with other distros.  I'm using QEMU e1d322c405 with a vanilla
configure under a x86_64 Fedora 32 host.

Fedora 31 (baseline):
====================

  $ make check-venv
  $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
  JOB ID     : 14802f9d5016a44d2937ed7b1fec63b2eaa06e89
  JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.19-14802f9/job.log
   (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (153.12 s)
   (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (149.57 s)
   (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (154.45 s)
   (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (148.97 s)
   (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (150.70 s)
  RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 757.50 s

Fedora 32:
==========

1. Tweak version and image hash:

---
---

2. Download the image before the test:

  $ ./tests/venv/bin/avocado vmimage get --distro=ubuntu --arch aarch64 --distro-version=20.04
  The image was downloaded:
  Provider Version Architecture File
  ubuntu   20.04   arm64        /tmp/data/cache/by_location/19db8c6d910a3f2660c4109ffb85d73d43e5cdf2/ubuntu-20.04-server-cloudimg-arm64.img

3. Run the tests:

  $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
  JOB ID     : 92a1bdbb5e933e6dff8b882808a191f1de3c2600
  JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T12.13-92a1bdb/job.log
   (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (341.40 s)
   (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (345.82 s)
   (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (335.91 s)
   (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (320.32 s)
   (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (319.79 s)
  RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
  JOB TIME   : 1663.92 s

Comments

Philippe Mathieu-Daudé Aug. 12, 2020, 12:19 p.m. UTC | #1
On 8/6/20 9:35 PM, Cleber Rosa wrote:
> TL;DR: This is a followup from an IRC chat about the
> tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg test
> taking many orders of magnitute longer than other acceptance (and even
> similar boot) tests.  I could not find an easy way for this specific
> test (aarch64+tcg) to have a significant execution time improvement.
> The best solution may be to filter out tests that are known to be
> slow, and create a specific "test job" that includes them.
> 
> Fisrt, if it's not clear, this specific test runs QEMU with TCG and
> boots a Fedora 31 "cloud image" and waits until the cloud-init agent
> notifies the test that the boot is over.
> 
> Out of the four archtiectures tested with the same approach under
> "tests/acceptance/boot_linux.py", aarch64 was special, in the sense
> that many Linux "cloud images" got stuck very late in the boot
> process.  What seemed to be a disk activity within the guest that
> seemed to make the kernel drain its random number sources if my memory
> serves me right.  Giving the machine a RNG device fixed it.  This can
> still be verified Today if you comment the virtio-rng lines in the
> aarch64 test.
> 
> So, even with the RNG device and the boot process not getting stuck, a
> lot of the test time is spent with QEMU actively using CPU time
> produced by the guest boot process.  This may or may not be the cause
> for the slowness.

I believe this is related to the issue Richard recently addressed:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg729216.html

> 
> One approach to have a shorter test time, would be to reduce the
> things that happen during the guest boot process.  Choosing a minimal
> guest, such as CirrOS, would be an example of such a solution, but:
> 
> * With less things happening during the guest boot, less things
>   get tested within QEMU;
> 
> * CirrOS can not make use of the same boot cloud-init configuration
>   and boot verification the test currently uses;
> 
> So that leaves other non-minimal Linux "cloud images" as options.  But
> still, the following things are required or nice to have:
> 
> * Support for cloud-init;
> 
> * Support for as many as possible architectures;
> 
> * Wide user base;
> 
> * Be thoroughly tested with this "boot_linux.py" test
> 
> So in the end, I picked Fedora 31, which was available and behaved
> well for four different architectures with and without KVM. Today, I
> verified if switching distros would provide an "easy fix", but the
> results were negative.  Any ideas on how to improve the test execution
> times are appreciated.
> 
> For the record, one of the ways we're trying to improve the overall
> test experience is to allow tests to run in parallel (expected to be
> fully supported on the upcoming version 81.0).
> 
> For those interested, these are the numbers I got, and how I tested
> with other distros.  I'm using QEMU e1d322c405 with a vanilla
> configure under a x86_64 Fedora 32 host.
> 
> Fedora 31 (baseline):
> ====================
> 
>   $ make check-venv
>   $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
>   JOB ID     : 14802f9d5016a44d2937ed7b1fec63b2eaa06e89
>   JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.19-14802f9/job.log
>    (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (153.12 s)
>    (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (149.57 s)
>    (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (154.45 s)
>    (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (148.97 s)
>    (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (150.70 s)
>   RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>   JOB TIME   : 757.50 s
> 
> Fedora 32:
> ==========
> 
> 1. Tweak version and image hash:
> 
> ---
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 0055dc7cee..44c62bd4a2 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -48,7 +48,7 @@ class BootLinuxBase(Test):
>              image_arch = 'ppc64le'
>          try:
>              boot = vmimage.get(
> -                'fedora', arch=image_arch, version='31',
> +                'fedora', arch=image_arch, version='32',
>                  checksum=self.chksum,
>                  algorithm='sha256',
>                  cache_dir=self.cache_dirs[0],
> @@ -160,7 +160,7 @@ class BootLinuxAarch64(BootLinux):
>      :avocado: tags=machine:gic-version=2
>      """
>  
> -    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
> +    chksum = 'b367755c664a2d7a26955bbfff985855adfa2ca15e908baf15b4b176d68d3967'
>  
>      def add_common_args(self):
>          self.vm.add_args('-bios',
> ---
> 
> 2. Download the image before the test:
> 
>   $ ./tests/venv/bin/avocado vmimage get --distro=fedora --arch aarch64 --distro-version=32
>   The image was downloaded:
>   Provider Version Architecture File
>   fedora   32      aarch64      /tmp/data/cache/by_location/7049001631a4b2eabf5766cc110e66d486e09821/Fedora-Cloud-Base-32-1.6.aarch64.qcow2
> 
> 3. Run the tests:
> 
>   $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
> JOB ID     : 09e740a41dc400f9fcbb9253f613734597fe0efc
> JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.53-09e740a/job.log
>  (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (162.06 s)
>  (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.78 s)
>  (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (166.98 s)
>  (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (171.13 s)
>  (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.43 s)
> RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> JOB TIME   : 836.05 s
> 
> Ubuntu 20.04:
> =============
> 
> 1. Tweak version and image hash:
> 
> ---
> diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
> index 0055dc7cee..03c0e1bee9 100644
> --- a/tests/acceptance/boot_linux.py
> +++ b/tests/acceptance/boot_linux.py
> @@ -48,7 +48,7 @@ class BootLinuxBase(Test):
>              image_arch = 'ppc64le'
>          try:
>              boot = vmimage.get(
> -                'fedora', arch=image_arch, version='31',
> +                'ubuntu', arch=image_arch, version='20.04',
>                  checksum=self.chksum,
>                  algorithm='sha256',
>                  cache_dir=self.cache_dirs[0],
> @@ -160,7 +160,7 @@ class BootLinuxAarch64(BootLinux):
>      :avocado: tags=machine:gic-version=2
>      """
>  
> -    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
> +    chksum = '1d9e50f3381145835b11911adf611f455d674a570814086b7d6581ecc0718770'
>  
>      def add_common_args(self):
>          self.vm.add_args('-bios',
> ---
> 
> 2. Download the image before the test:
> 
>   $ ./tests/venv/bin/avocado vmimage get --distro=ubuntu --arch aarch64 --distro-version=20.04
>   The image was downloaded:
>   Provider Version Architecture File
>   ubuntu   20.04   arm64        /tmp/data/cache/by_location/19db8c6d910a3f2660c4109ffb85d73d43e5cdf2/ubuntu-20.04-server-cloudimg-arm64.img
> 
> 3. Run the tests:
> 
>   $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
>   JOB ID     : 92a1bdbb5e933e6dff8b882808a191f1de3c2600
>   JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T12.13-92a1bdb/job.log
>    (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (341.40 s)
>    (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (345.82 s)
>    (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (335.91 s)
>    (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (320.32 s)
>    (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (319.79 s)
>   RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>   JOB TIME   : 1663.92 s
>
diff mbox series

Patch

diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 0055dc7cee..44c62bd4a2 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -48,7 +48,7 @@  class BootLinuxBase(Test):
             image_arch = 'ppc64le'
         try:
             boot = vmimage.get(
-                'fedora', arch=image_arch, version='31',
+                'fedora', arch=image_arch, version='32',
                 checksum=self.chksum,
                 algorithm='sha256',
                 cache_dir=self.cache_dirs[0],
@@ -160,7 +160,7 @@  class BootLinuxAarch64(BootLinux):
     :avocado: tags=machine:gic-version=2
     """
 
-    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
+    chksum = 'b367755c664a2d7a26955bbfff985855adfa2ca15e908baf15b4b176d68d3967'
 
     def add_common_args(self):
         self.vm.add_args('-bios',
---

2. Download the image before the test:

  $ ./tests/venv/bin/avocado vmimage get --distro=fedora --arch aarch64 --distro-version=32
  The image was downloaded:
  Provider Version Architecture File
  fedora   32      aarch64      /tmp/data/cache/by_location/7049001631a4b2eabf5766cc110e66d486e09821/Fedora-Cloud-Base-32-1.6.aarch64.qcow2

3. Run the tests:

  $ ./tests/venv/bin/avocado run -t arch:aarch64,accel:tcg --keep-tmp on -- tests/acceptance/boot_linux.py{,,,,}
JOB ID     : 09e740a41dc400f9fcbb9253f613734597fe0efc
JOB LOG    : /home/cleber/avocado/job-results/job-2020-08-06T13.53-09e740a/job.log
 (1/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (162.06 s)
 (2/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.78 s)
 (3/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (166.98 s)
 (4/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (171.13 s)
 (5/5) tests/acceptance/boot_linux.py:BootLinuxAarch64.test_virt_tcg: PASS (167.43 s)
RESULTS    : PASS 5 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 836.05 s

Ubuntu 20.04:
=============

1. Tweak version and image hash:

---
diff --git a/tests/acceptance/boot_linux.py b/tests/acceptance/boot_linux.py
index 0055dc7cee..03c0e1bee9 100644
--- a/tests/acceptance/boot_linux.py
+++ b/tests/acceptance/boot_linux.py
@@ -48,7 +48,7 @@  class BootLinuxBase(Test):
             image_arch = 'ppc64le'
         try:
             boot = vmimage.get(
-                'fedora', arch=image_arch, version='31',
+                'ubuntu', arch=image_arch, version='20.04',
                 checksum=self.chksum,
                 algorithm='sha256',
                 cache_dir=self.cache_dirs[0],
@@ -160,7 +160,7 @@  class BootLinuxAarch64(BootLinux):
     :avocado: tags=machine:gic-version=2
     """
 
-    chksum = '1e18d9c0cf734940c4b5d5ec592facaed2af0ad0329383d5639c997fdf16fe49'
+    chksum = '1d9e50f3381145835b11911adf611f455d674a570814086b7d6581ecc0718770'
 
     def add_common_args(self):
         self.vm.add_args('-bios',