[6/6] mtest2make: stop disabling meson test timeouts

Message ID	20230601163123.1805282-7-berrange@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com> To: qemu-devel@nongnu.org Cc: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>, Cleber Rosa <crosa@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>, Laurent Vivier <lvivier@redhat.com>, Thomas Huth <thuth@redhat.com>, =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com> Subject: [PATCH 6/6] mtest2make: stop disabling meson test timeouts Date: Thu, 1 Jun 2023 17:31:23 +0100 Message-Id: <20230601163123.1805282-7-berrange@redhat.com> In-Reply-To: <20230601163123.1805282-1-berrange@redhat.com> References: <20230601163123.1805282-1-berrange@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=berrange@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -22 X-Spam_score: -2.3 X-Spam_bar: -- X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.166, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	tests: enable meson test timeouts to improve debuggability \| expand [0/6] tests: enable meson test timeouts to improve debuggability [1/6] qtest: bump min meson timeout to 60 seconds [2/6] qtest: bump migration-test timeout to 5 minutes [3/6] qtest: bump qom-test timeout to 7 minutes [4/6] qtest: bump aspeed_smc-test timeout to 2 minutes [5/6] qtest: bump bios-table-test timeout to 6 minutes [6/6] mtest2make: stop disabling meson test timeouts

Message ID

20230601163123.1805282-7-berrange@redhat.com

State

New

Headers

From: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>
To: qemu-devel@nongnu.org
Cc: =?utf-8?q?Alex_Benn=C3=A9e?= <alex.bennee@linaro.org>,
 Cleber Rosa <crosa@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>,
 John Snow <jsnow@redhat.com>, Laurent Vivier <lvivier@redhat.com>,
 Thomas Huth <thuth@redhat.com>,
 =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>
Subject: [PATCH 6/6] mtest2make: stop disabling meson test timeouts
Date: Thu,  1 Jun 2023 17:31:23 +0100
Message-Id: <20230601163123.1805282-7-berrange@redhat.com>
In-Reply-To: <20230601163123.1805282-1-berrange@redhat.com>
References: <20230601163123.1805282-1-berrange@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Received-SPF: pass client-ip=170.10.133.124;
 envelope-from=berrange@redhat.com;
 helo=us-smtp-delivery-124.mimecast.com
X-Spam_score_int: -22
X-Spam_score: -2.3
X-Spam_bar: --
X-Spam_report: (-2.3 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.166,
 DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001,
 T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: qemu-devel@nongnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <https://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
 <mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Sender: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org

Series

tests: enable meson test timeouts to improve debuggability | expand

Commit Message

Daniel P. Berrangé June 1, 2023, 4:31 p.m. UTC

The mtest2make.py script passes the arg '-t 0' to 'meson test' which
disables all test timeouts. This is a major source of pain when running
in GitLab CI and a test gets stuck. It will stall until GitLab kills the
CI job. This leaves us with little easily consumable information about
the stalled test. The TAP format doesn't show the test name until it is
completed, and TAP output from multiple tests it interleaved. So we
have to analyse the log to figure out what tests had un-finished TAP
output present and thus infer which test case caused the hang. This is
very time consuming and error prone.

By allowing meson to kill stalled tests, we get a direct display of what
test program got stuck, which lets us more directly focus in on what
specific test case within the test program hung.

The other issue with disabling meson test timeouts by default is that it
makes it more likely that maintainers inadvertantly introduce slowdowns.
For example the recent-ish change that accidentally made migrate-test
take 15-20 minutes instead of around 1 minute.

The main risk of this change is that the individual test timeouts might
be too short to allow completion in high load scenarios. Thus, there is
likely to be some short term pain where we have to bump the timeouts for
certain tests to make them reliable enough. The preceeding few patches
raised the timeouts for all failures that were immediately apparent
in GitLab CI.

Even with the possible short term instability, this should still be a
net win for debuggability of failed CI pipelines over the long term.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 scripts/mtest2make.py | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Thomas Huth June 1, 2023, 7:15 p.m. UTC | #1

On 01/06/2023 18.31, Daniel P. Berrangé wrote:
> The mtest2make.py script passes the arg '-t 0' to 'meson test' which
> disables all test timeouts. This is a major source of pain when running
> in GitLab CI and a test gets stuck. It will stall until GitLab kills the
> CI job. This leaves us with little easily consumable information about
> the stalled test. The TAP format doesn't show the test name until it is
> completed, and TAP output from multiple tests it interleaved. So we
> have to analyse the log to figure out what tests had un-finished TAP
> output present and thus infer which test case caused the hang. This is
> very time consuming and error prone.
> 
> By allowing meson to kill stalled tests, we get a direct display of what
> test program got stuck, which lets us more directly focus in on what
> specific test case within the test program hung.
> 
> The other issue with disabling meson test timeouts by default is that it
> makes it more likely that maintainers inadvertantly introduce slowdowns.
> For example the recent-ish change that accidentally made migrate-test
> take 15-20 minutes instead of around 1 minute.
> 
> The main risk of this change is that the individual test timeouts might
> be too short to allow completion in high load scenarios. Thus, there is
> likely to be some short term pain where we have to bump the timeouts for
> certain tests to make them reliable enough. The preceeding few patches
> raised the timeouts for all failures that were immediately apparent
> in GitLab CI.
> 
> Even with the possible short term instability, this should still be a
> net win for debuggability of failed CI pipelines over the long term.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   scripts/mtest2make.py | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
> index 179dd54871..eb01a05ddb 100644
> --- a/scripts/mtest2make.py
> +++ b/scripts/mtest2make.py
> @@ -27,7 +27,8 @@ def names(self, base):
>   .speed.slow = $(foreach s,$(sort $(filter-out %-thorough, $1)), --suite $s)
>   .speed.thorough = $(foreach s,$(sort $1), --suite $s)
>   
> -.mtestargs = --no-rebuild -t 0
> +TIMEOUT_MULTIPLIER = 1
> +.mtestargs = --no-rebuild -t $(TIMEOUT_MULTIPLIER)
>   ifneq ($(SPEED), quick)
>   .mtestargs += --setup $(SPEED)
>   endif

Basically Ack, but could you please double-check that "make check 
-j$(nproc)" still works if configure has been run with "--enable-debug" ? 
... maybe we need to adjust the multiplier in that case...

  Thomas

diff --git a/scripts/mtest2make.py b/scripts/mtest2make.py
index 179dd54871..eb01a05ddb 100644
--- a/scripts/mtest2make.py
+++ b/scripts/mtest2make.py
@@ -27,7 +27,8 @@  def names(self, base):
 .speed.slow = $(foreach s,$(sort $(filter-out %-thorough, $1)), --suite $s)
 .speed.thorough = $(foreach s,$(sort $1), --suite $s)
 
-.mtestargs = --no-rebuild -t 0
+TIMEOUT_MULTIPLIER = 1
+.mtestargs = --no-rebuild -t $(TIMEOUT_MULTIPLIER)
 ifneq ($(SPEED), quick)
 .mtestargs += --setup $(SPEED)
 endif

[6/6] mtest2make: stop disabling meson test timeouts

Commit Message

Comments

Patch