diff mbox series

[03/10] tests/avocado/intel_iommu.py: increase timeout

Message ID 20231208190911.102879-4-crosa@redhat.com
State New
Headers show
Series for-8.3 tests/avocado: prep for Avocado 103.0 LTS | expand

Commit Message

Cleber Rosa Dec. 8, 2023, 7:09 p.m. UTC
Based on many runs, the average run time for these 4 tests is around
250 seconds, with 320 seconds being the ceiling.  In any way, the
default 120 seconds timeout is inappropriate in my experience.

Let's increase the timeout so these tests get a chance to completion.

Signed-off-by: Cleber Rosa <crosa@redhat.com>
---
 tests/avocado/intel_iommu.py | 2 ++
 1 file changed, 2 insertions(+)

Comments

Alex Bennée Dec. 11, 2023, 5:01 p.m. UTC | #1
Cleber Rosa <crosa@redhat.com> writes:

> Based on many runs, the average run time for these 4 tests is around
> 250 seconds, with 320 seconds being the ceiling.  In any way, the
> default 120 seconds timeout is inappropriate in my experience.

I would rather see these tests updated to fix:

 - Don't use such an old Fedora 31 image
 - Avoid updating image packages (when will RH stop serving them?)
 - The "test" is a fairly basic check of dmesg/sysfs output

I think building a buildroot image with the tools pre-installed (with
perhaps more testing) would be a better use of our limited test time.

FWIW the runtime on my machine is:

➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
 (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
 (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
 (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
 (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
JOB TIME   : 255.43 s
Akihiko Odaki Dec. 12, 2023, 8:18 a.m. UTC | #2
On 2023/12/12 2:01, Alex Bennée wrote:
> Cleber Rosa <crosa@redhat.com> writes:
> 
>> Based on many runs, the average run time for these 4 tests is around
>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>> default 120 seconds timeout is inappropriate in my experience.
> 
> I would rather see these tests updated to fix:
> 
>   - Don't use such an old Fedora 31 image
>   - Avoid updating image packages (when will RH stop serving them?)
>   - The "test" is a fairly basic check of dmesg/sysfs output
> 
> I think building a buildroot image with the tools pre-installed (with
> perhaps more testing) would be a better use of our limited test time.

That's what tests/avocado/netdev-ethtool.py does, but I don't like it 
much because building a buildroot image takes long and results in a 
somewhat big binary blob.

I rather prefer to have some script that runs mkosi[1] to make an image; 
it downloads packages from distributor so it will take much less than 
using buildroot. The CI system can run the script and cache the image.

[1] https://github.com/systemd/mkosi
Alex Bennée Dec. 12, 2023, 11:27 a.m. UTC | #3
Akihiko Odaki <akihiko.odaki@daynix.com> writes:

> On 2023/12/12 2:01, Alex Bennée wrote:
>> Cleber Rosa <crosa@redhat.com> writes:
>> 
>>> Based on many runs, the average run time for these 4 tests is around
>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>> default 120 seconds timeout is inappropriate in my experience.
>> I would rather see these tests updated to fix:
>>   - Don't use such an old Fedora 31 image
>>   - Avoid updating image packages (when will RH stop serving them?)
>>   - The "test" is a fairly basic check of dmesg/sysfs output
>> I think building a buildroot image with the tools pre-installed
>> (with
>> perhaps more testing) would be a better use of our limited test time.
>
> That's what tests/avocado/netdev-ethtool.py does, but I don't like it
> much because building a buildroot image takes long and results in a
> somewhat big binary blob.
>
> I rather prefer to have some script that runs mkosi[1] to make an
> image; it downloads packages from distributor so it will take much
> less than using buildroot. The CI system can run the script and cache
> the image.

I'm all more smaller more directed test cases and I'm less worried about
exactly how things are built. I only use buildroot personally because
I'm familiar with it and it makes it easy to build testcases for
multiple architectures.

> [1] https://github.com/systemd/mkosi

If that works for you go for it ;-)
Cleber Rosa Dec. 13, 2023, 8:08 p.m. UTC | #4
Alex Bennée <alex.bennee@linaro.org> writes:

> Cleber Rosa <crosa@redhat.com> writes:
>
>> Based on many runs, the average run time for these 4 tests is around
>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>> default 120 seconds timeout is inappropriate in my experience.
>
> I would rather see these tests updated to fix:
>
>  - Don't use such an old Fedora 31 image

I remember proposing a bump in Fedora version used by default in
avocado_qemu.LinuxTest (which would propagate to tests such as
boot_linux.py and others), but that was not well accepted.  I can
definitely work on such a version bump again.

>  - Avoid updating image packages (when will RH stop serving them?)

IIUC the only reason for updating the packages is to test the network
from the guest, and could/should be done another way.

Eric, could you confirm this?

>  - The "test" is a fairly basic check of dmesg/sysfs output

Maybe the network is also an implicit check here.  Let's see what Eric
has to say.

>
> I think building a buildroot image with the tools pre-installed (with
> perhaps more testing) would be a better use of our limited test time.
>
> FWIW the runtime on my machine is:
>
> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
> JOB TIME   : 255.43 s
>

Yes, I've also seen similar runtimes in other environments... so it
looks like it depends a lot on the "dnf -y install numactl-devel".  If
that can be removed, the tests would have much more predictable runtimes.
Eric Auger Dec. 14, 2023, 7:24 a.m. UTC | #5
Hi Cleber,

On 12/13/23 21:08, Cleber Rosa wrote:
> Alex Bennée <alex.bennee@linaro.org> writes:
>
>> Cleber Rosa <crosa@redhat.com> writes:
>>
>>> Based on many runs, the average run time for these 4 tests is around
>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>> default 120 seconds timeout is inappropriate in my experience.
>> I would rather see these tests updated to fix:
>>
>>  - Don't use such an old Fedora 31 image
> I remember proposing a bump in Fedora version used by default in
> avocado_qemu.LinuxTest (which would propagate to tests such as
> boot_linux.py and others), but that was not well accepted.  I can
> definitely work on such a version bump again.
>
>>  - Avoid updating image packages (when will RH stop serving them?)
> IIUC the only reason for updating the packages is to test the network
> from the guest, and could/should be done another way.
>
> Eric, could you confirm this?
Sorry for the delay. Yes effectively I used the dnf install to stress
the viommu. In the past I was able to trigger viommu bugs that way
whereas getting an IP @ for the guest was just successful.
>
>>  - The "test" is a fairly basic check of dmesg/sysfs output
> Maybe the network is also an implicit check here.  Let's see what Eric
> has to say.

To be honest I do not remember how avocado does the check in itself; my
guess if that if the dnf install does not complete you get a timeout and
the test fails. But you may be more knowledged on this than me ;-)

Thanks

Eric
>
>> I think building a buildroot image with the tools pre-installed (with
>> perhaps more testing) would be a better use of our limited test time.
>>
>> FWIW the runtime on my machine is:
>>
>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>> JOB TIME   : 255.43 s
>>
> Yes, I've also seen similar runtimes in other environments... so it
> looks like it depends a lot on the "dnf -y install numactl-devel".  If
> that can be removed, the tests would have much more predictable runtimes.
>
Alex Bennée Dec. 14, 2023, 9:41 a.m. UTC | #6
Eric Auger <eric.auger@redhat.com> writes:

> Hi Cleber,
>
> On 12/13/23 21:08, Cleber Rosa wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> Cleber Rosa <crosa@redhat.com> writes:
>>>
>>>> Based on many runs, the average run time for these 4 tests is around
>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>> default 120 seconds timeout is inappropriate in my experience.
>>> I would rather see these tests updated to fix:
>>>
>>>  - Don't use such an old Fedora 31 image
>> I remember proposing a bump in Fedora version used by default in
>> avocado_qemu.LinuxTest (which would propagate to tests such as
>> boot_linux.py and others), but that was not well accepted.  I can
>> definitely work on such a version bump again.
>>
>>>  - Avoid updating image packages (when will RH stop serving them?)
>> IIUC the only reason for updating the packages is to test the network
>> from the guest, and could/should be done another way.
>>
>> Eric, could you confirm this?
> Sorry for the delay. Yes effectively I used the dnf install to stress
> the viommu. In the past I was able to trigger viommu bugs that way
> whereas getting an IP @ for the guest was just successful.
>>
>>>  - The "test" is a fairly basic check of dmesg/sysfs output
>> Maybe the network is also an implicit check here.  Let's see what Eric
>> has to say.
>
> To be honest I do not remember how avocado does the check in itself; my
> guess if that if the dnf install does not complete you get a timeout and
> the test fails. But you may be more knowledged on this than me ;-)

I guess the problem is relying on external infrastructure can lead to
unpredictable results. However its a lot easier to configure user mode
networking just to pull something off the internet than have a local
netperf or some such setup to generate local traffic.

I guess there is no loopback like setup which would sufficiently
exercise the code?

>
> Thanks
>
> Eric
>>
>>> I think building a buildroot image with the tools pre-installed (with
>>> perhaps more testing) would be a better use of our limited test time.
>>>
>>> FWIW the runtime on my machine is:
>>>
>>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>>> JOB TIME   : 255.43 s
>>>
>> Yes, I've also seen similar runtimes in other environments... so it
>> looks like it depends a lot on the "dnf -y install numactl-devel".  If
>> that can be removed, the tests would have much more predictable runtimes.
>>
Philippe Mathieu-Daudé Dec. 14, 2023, 9:41 a.m. UTC | #7
On 14/12/23 08:24, Eric Auger wrote:
> Hi Cleber,
> 
> On 12/13/23 21:08, Cleber Rosa wrote:
>> Alex Bennée <alex.bennee@linaro.org> writes:
>>
>>> Cleber Rosa <crosa@redhat.com> writes:
>>>
>>>> Based on many runs, the average run time for these 4 tests is around
>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>> default 120 seconds timeout is inappropriate in my experience.
>>> I would rather see these tests updated to fix:
>>>
>>>   - Don't use such an old Fedora 31 image
>> I remember proposing a bump in Fedora version used by default in
>> avocado_qemu.LinuxTest (which would propagate to tests such as
>> boot_linux.py and others), but that was not well accepted.  I can
>> definitely work on such a version bump again.
>>
>>>   - Avoid updating image packages (when will RH stop serving them?)
>> IIUC the only reason for updating the packages is to test the network
>> from the guest, and could/should be done another way.
>>
>> Eric, could you confirm this?
> Sorry for the delay. Yes effectively I used the dnf install to stress
> the viommu. In the past I was able to trigger viommu bugs that way
> whereas getting an IP @ for the guest was just successful.

Maybe this test is useful as what Daniel described as "Tier 2" [*],
that maintainers run locally but don't need to be gating CI? That
would save us some resources there.

[*] https://lore.kernel.org/qemu-devel/20200427152036.GI1244803@redhat.com/

>>
>>>   - The "test" is a fairly basic check of dmesg/sysfs output
>> Maybe the network is also an implicit check here.  Let's see what Eric
>> has to say.
> 
> To be honest I do not remember how avocado does the check in itself; my
> guess if that if the dnf install does not complete you get a timeout and
> the test fails. But you may be more knowledged on this than me ;-)
> 
> Thanks
> 
> Eric
Eric Auger Dec. 14, 2023, 1:26 p.m. UTC | #8
On 12/14/23 10:41, Alex Bennée wrote:
> Eric Auger <eric.auger@redhat.com> writes:
>
>> Hi Cleber,
>>
>> On 12/13/23 21:08, Cleber Rosa wrote:
>>> Alex Bennée <alex.bennee@linaro.org> writes:
>>>
>>>> Cleber Rosa <crosa@redhat.com> writes:
>>>>
>>>>> Based on many runs, the average run time for these 4 tests is around
>>>>> 250 seconds, with 320 seconds being the ceiling.  In any way, the
>>>>> default 120 seconds timeout is inappropriate in my experience.
>>>> I would rather see these tests updated to fix:
>>>>
>>>>  - Don't use such an old Fedora 31 image
>>> I remember proposing a bump in Fedora version used by default in
>>> avocado_qemu.LinuxTest (which would propagate to tests such as
>>> boot_linux.py and others), but that was not well accepted.  I can
>>> definitely work on such a version bump again.
>>>
>>>>  - Avoid updating image packages (when will RH stop serving them?)
>>> IIUC the only reason for updating the packages is to test the network
>>> from the guest, and could/should be done another way.
>>>
>>> Eric, could you confirm this?
>> Sorry for the delay. Yes effectively I used the dnf install to stress
>> the viommu. In the past I was able to trigger viommu bugs that way
>> whereas getting an IP @ for the guest was just successful.
>>>>  - The "test" is a fairly basic check of dmesg/sysfs output
>>> Maybe the network is also an implicit check here.  Let's see what Eric
>>> has to say.
>> To be honest I do not remember how avocado does the check in itself; my
>> guess if that if the dnf install does not complete you get a timeout and
>> the test fails. But you may be more knowledged on this than me ;-)
> I guess the problem is relying on external infrastructure can lead to
> unpredictable results. However its a lot easier to configure user mode
> networking just to pull something off the internet than have a local
> netperf or some such setup to generate local traffic.
>
> I guess there is no loopback like setup which would sufficiently
> exercise the code?

I don't think so. This test is a reproducer for a bug I encountered and
fixed in the past.
Besudes, I am totally fine moving the test out of the gating CI and just
keep it as a tier2 test, as suggested by Phil.

Thanks

Eric
>
>> Thanks
>>
>> Eric
>>>> I think building a buildroot image with the tools pre-installed (with
>>>> perhaps more testing) would be a better use of our limited test time.
>>>>
>>>> FWIW the runtime on my machine is:
>>>>
>>>> ➜  env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado run ./tests/avocado/intel_iommu.py
>>>> JOB ID     : 5c582ccf274f3aee279c2208f969a7af8ceb9943
>>>> JOB LOG    : /home/alex/avocado/job-results/job-2023-12-11T16.53-5c582cc/job.log
>>>>  (1/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu: PASS (44.21 s)
>>>>  (2/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict: PASS (78.60 s)
>>>>  (3/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_strict_cm: PASS (65.57 s)
>>>>  (4/4) ./tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt: PASS (66.63 s)
>>>> RESULTS    : PASS 4 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
>>>> JOB TIME   : 255.43 s
>>>>
>>> Yes, I've also seen similar runtimes in other environments... so it
>>> looks like it depends a lot on the "dnf -y install numactl-devel".  If
>>> that can be removed, the tests would have much more predictable runtimes.
>>>
diff mbox series

Patch

diff --git a/tests/avocado/intel_iommu.py b/tests/avocado/intel_iommu.py
index f04ee1cf9d..24bfad0756 100644
--- a/tests/avocado/intel_iommu.py
+++ b/tests/avocado/intel_iommu.py
@@ -25,6 +25,8 @@  class IntelIOMMU(LinuxTest):
     :avocado: tags=flaky
     """
 
+    timeout = 360
+
     IOMMU_ADDON = ',iommu_platform=on,disable-modern=off,disable-legacy=on'
     kernel_path = None
     initrd_path = None