diff mbox series

[1/5] tests: wait max 120 seconds for migration test status changes

Message ID 20220628105434.295905-2-berrange@redhat.com
State New
Headers show
Series tests: improve reliability of migration test | expand

Commit Message

Daniel P. Berrangé June 28, 2022, 10:54 a.m. UTC
Currently the wait_for_migration_fail and wait_for_migration_complete
functions will spin in an infinite loop checking query-migrate status
to detect a specific change/goal. This is fine when everything goes
to plan, but when the unusual happens, these will hang the test suite
forever.

Any normally executing migration test case normally takes < 1 second
for a state change, with exception of the autoconverge test which
takes about 5 seconds. Taking into account possibility of people
running tests inside TCG, allowing a factor of x20 slowdown gives
a reasonable worst case of 120 seconds. Anything taking longer than
this is a strong sign that the test has hung, or the test should be
rewritten to be faster.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Laurent Vivier June 28, 2022, 12:47 p.m. UTC | #1
On 28/06/2022 12:54, Daniel P. Berrangé wrote:
> Currently the wait_for_migration_fail and wait_for_migration_complete
> functions will spin in an infinite loop checking query-migrate status
> to detect a specific change/goal. This is fine when everything goes
> to plan, but when the unusual happens, these will hang the test suite
> forever.
> 
> Any normally executing migration test case normally takes < 1 second
> for a state change, with exception of the autoconverge test which
> takes about 5 seconds. Taking into account possibility of people
> running tests inside TCG, allowing a factor of x20 slowdown gives
> a reasonable worst case of 120 seconds. Anything taking longer than
> this is a strong sign that the test has hung, or the test should be
> rewritten to be faster.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index a6aa59e4e6..e81e831c85 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -15,6 +15,14 @@
>   
>   #include "migration-helpers.h"
>   
> +/*
> + * Number of seconds we wait when looking for migration
> + * status changes, to avoid test suite hanging forever
> + * when things go wrong. Needs to be higher enough to
> + * avoid false positives on loaded hosts.
> + */
> +#define MIGRATION_STATUS_WAIT_TIMEOUT 120
> +
>   bool got_stop;
>   
>   static void check_stop_event(QTestState *who)
> @@ -166,8 +174,11 @@ static bool check_migration_status(QTestState *who, const char *goal,
>   void wait_for_migration_status(QTestState *who,
>                                  const char *goal, const char **ungoals)
>   {
> +    g_test_timer_start();
>       while (!check_migration_status(who, goal, ungoals)) {
>           usleep(1000);
> +
> +        g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT);
>       }
>   }
>   
> @@ -178,6 +189,7 @@ void wait_for_migration_complete(QTestState *who)
>   
>   void wait_for_migration_fail(QTestState *from, bool allow_active)
>   {
> +    g_test_timer_start();
>       QDict *rsp_return;
>       char *status;
>       bool failed;
> @@ -193,6 +205,8 @@ void wait_for_migration_fail(QTestState *from, bool allow_active)
>           g_assert(result);
>           failed = !strcmp(status, "failed");
>           g_free(status);
> +
> +        g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT);
>       } while (!failed);
>   
>       /* Is the machine currently running? */

Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Thomas Huth June 28, 2022, 12:49 p.m. UTC | #2
On 28/06/2022 12.54, Daniel P. Berrangé wrote:
> Currently the wait_for_migration_fail and wait_for_migration_complete
> functions will spin in an infinite loop checking query-migrate status
> to detect a specific change/goal. This is fine when everything goes
> to plan, but when the unusual happens, these will hang the test suite
> forever.
> 
> Any normally executing migration test case normally takes < 1 second
> for a state change, with exception of the autoconverge test which
> takes about 5 seconds. Taking into account possibility of people
> running tests inside TCG, allowing a factor of x20 slowdown gives
> a reasonable worst case of 120 seconds. Anything taking longer than
> this is a strong sign that the test has hung, or the test should be
> rewritten to be faster.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>   tests/qtest/migration-helpers.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index a6aa59e4e6..e81e831c85 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -15,6 +15,14 @@
>   
>   #include "migration-helpers.h"
>   
> +/*
> + * Number of seconds we wait when looking for migration
> + * status changes, to avoid test suite hanging forever
> + * when things go wrong. Needs to be higher enough to
> + * avoid false positives on loaded hosts.
> + */
> +#define MIGRATION_STATUS_WAIT_TIMEOUT 120

Reviewed-by: Thomas Huth <thuth@redhat.com>

... but I think I'd suggest to use an even longer timeout - sometimes people 
try to run the tests with TCI, or on heavy loaded CI machines, and then you 
can get really really long timeouts. Maybe 240 or even 300 seconds?
diff mbox series

Patch

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index a6aa59e4e6..e81e831c85 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -15,6 +15,14 @@ 
 
 #include "migration-helpers.h"
 
+/*
+ * Number of seconds we wait when looking for migration
+ * status changes, to avoid test suite hanging forever
+ * when things go wrong. Needs to be higher enough to
+ * avoid false positives on loaded hosts.
+ */
+#define MIGRATION_STATUS_WAIT_TIMEOUT 120
+
 bool got_stop;
 
 static void check_stop_event(QTestState *who)
@@ -166,8 +174,11 @@  static bool check_migration_status(QTestState *who, const char *goal,
 void wait_for_migration_status(QTestState *who,
                                const char *goal, const char **ungoals)
 {
+    g_test_timer_start();
     while (!check_migration_status(who, goal, ungoals)) {
         usleep(1000);
+
+        g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT);
     }
 }
 
@@ -178,6 +189,7 @@  void wait_for_migration_complete(QTestState *who)
 
 void wait_for_migration_fail(QTestState *from, bool allow_active)
 {
+    g_test_timer_start();
     QDict *rsp_return;
     char *status;
     bool failed;
@@ -193,6 +205,8 @@  void wait_for_migration_fail(QTestState *from, bool allow_active)
         g_assert(result);
         failed = !strcmp(status, "failed");
         g_free(status);
+
+        g_assert(g_test_timer_elapsed() < MIGRATION_STATUS_WAIT_TIMEOUT);
     } while (!failed);
 
     /* Is the machine currently running? */