diff mbox series

[PULL,v2,01/16] tests/9pfs: fix test dir for parallel tests

Message ID fd3237f7b005b1b73c954ccd5c6011e8228e19a3.1604067568.git.qemu_oss@crudebyte.com
State New
Headers show
Series [PULL,v2,01/16] tests/9pfs: fix test dir for parallel tests | expand

Commit Message

Christian Schoenebeck Oct. 30, 2020, 12:07 p.m. UTC
Use mkdtemp() to generate a unique directory for the 9p 'local' tests.

This fixes occasional 9p test failures when running 'make check -jN' if
QEMU was compiled for multiple target architectures, because the individual
architecture's test suites would run in parallel and interfere with each
other's data as the test directory was previously hard coded and hence the
same directory was used by all of them simultaniously.

This also requires a change how the test directory is created and deleted:
As the test path is now randomized and virtio_9p_register_nodes() being
called in a somewhat undeterministic way, that's no longer an appropriate
place to create and remove the test directory. Use a constructor and
destructor function for creating and removing the test directory instead.
Unfortunately libqos currently does not support setup/teardown callbacks
to handle this more cleanly.

Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
Tested-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <a37fbc713614f7615b11d0a3cb8d9adc3b8fba4b.1604061839.git.qemu_oss@crudebyte.com>
Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
---
 tests/qtest/libqos/virtio-9p.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

Comments

Christian Schoenebeck Oct. 31, 2020, 1:20 p.m. UTC | #1
On Freitag, 30. Oktober 2020 13:07:03 CET Christian Schoenebeck wrote:
> Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> 
> This fixes occasional 9p test failures when running 'make check -jN' if
> QEMU was compiled for multiple target architectures, because the individual
> architecture's test suites would run in parallel and interfere with each
> other's data as the test directory was previously hard coded and hence the
> same directory was used by all of them simultaniously.
> 
> This also requires a change how the test directory is created and deleted:
> As the test path is now randomized and virtio_9p_register_nodes() being
> called in a somewhat undeterministic way, that's no longer an appropriate
> place to create and remove the test directory. Use a constructor and
> destructor function for creating and removing the test directory instead.
> Unfortunately libqos currently does not support setup/teardown callbacks
> to handle this more cleanly.

Peter, please ignore this PR. This patch needs rework:

ERROR:../tests/qtest/test-x86-cpuid-compat.c:208:test_plus_minus: stdout of
child process (/x86/cpuid/parsing-plus-minus/subprocess [34856]) failed to
match: 

stdout was:

# mkdir('/home/travis/build/cschoenebeck/qemu/build/qtest-9p-local-PwY2nQ')
failed: File exists

ERROR qtest-x86_64/test-x86-cpuid-compat - Bail out! ERROR:../tests/qtest/
test-x86-cpuid-compat.c:208:test_plus_minus: stdout of child process (/x86/
cpuid/parsing-plus-minus/subprocess [34856]) failed to match:

make: *** [Makefile.mtest:1793: run-test-222] Error 1

https://travis-ci.org/github/cschoenebeck/qemu/jobs/740199494

> 
> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> Tested-by: Greg Kurz <groug@kaod.org>
> Reviewed-by: Greg Kurz <groug@kaod.org>
> Message-Id:
> <a37fbc713614f7615b11d0a3cb8d9adc3b8fba4b.1604061839.git.qemu_oss@crudebyte
> .com> Signed-off-by: Christian Schoenebeck <qemu_oss@crudebyte.com>
> ---
>  tests/qtest/libqos/virtio-9p.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
> index d43647b3b7..6b22fa0e9a 100644
> --- a/tests/qtest/libqos/virtio-9p.c
> +++ b/tests/qtest/libqos/virtio-9p.c
> @@ -35,7 +35,12 @@ static char *concat_path(const char* a, const char* b)
>  static void init_local_test_path(void)
>  {
>      char *pwd = g_get_current_dir();
> -    local_test_path = concat_path(pwd, "qtest-9p-local");
> +    char *template = concat_path(pwd, "qtest-9p-local-XXXXXX");
> +    local_test_path = mkdtemp(template);
> +    if (!local_test_path) {
> +        g_test_message("mkdtemp('%s') failed: %s", template,
> strerror(errno)); +    }
> +    g_assert(local_test_path);
>      g_free(pwd);
>  }
> 
> @@ -246,11 +251,6 @@ static void virtio_9p_register_nodes(void)
>      const char *str_simple = "fsdev=fsdev0,mount_tag=" MOUNT_TAG;
>      const char *str_addr = "fsdev=fsdev0,addr=04.0,mount_tag=" MOUNT_TAG;
> 
> -    /* make sure test dir for the 'local' tests exists and is clean */
> -    init_local_test_path();
> -    remove_local_test_dir();
> -    create_local_test_dir();
> -
>      QPCIAddress addr = {
>          .devfn = QPCI_DEVFN(4, 0),
>      };
> @@ -278,3 +278,16 @@ static void virtio_9p_register_nodes(void)
>  }
> 
>  libqos_init(virtio_9p_register_nodes);
> +
> +static void __attribute__((constructor)) construct_virtio_9p(void)
> +{
> +    /* make sure test dir for the 'local' tests exists */
> +    init_local_test_path();
> +    create_local_test_dir();
> +}

I'm not sure yet what happens there exactly. One problem that I can see is 
that this constructor function is executed before main() is entered, it 
generates a random test path as global variable, creates that test directory, 
then main() is entered and qost-test.c might now call g_test_trap_subprocess() 
(depending on the config) which would fork the process for the individual test 
cases, probably ending up with a shared test dir path unintentionally. So this 
constructor solution is probably wrong anyway.

But that does not explain what I'm seeing in above's CI error: that error 
message suggests that create_local_test_dir() was called in parallel with the 
same test path. A GCC constructor function being called again on fork() is not 
an expected behaviour to me, but assuming it does on some system, the address 
space should have deviated at this point and hence each subprocess should end 
up with overwriting the global test path variable with their own test 
directory path in this case.

It's too late for touching libqos for this, so I hope we can find a temporary 
workaround for this problem for 5.2 release. For instance by using a TLS 
(__thread) variable instead of a regular global variable for the test path, 
but I would still like to understand the precise (mis)behaviour observed 
there.

> +
> +static void __attribute__((destructor)) destruct_virtio_9p(void)
> +{
> +    /* remove previously created test dir when test suite completed */
> +    remove_local_test_dir();
> +}
Peter Maydell Oct. 31, 2020, 8:34 p.m. UTC | #2
On Sat, 31 Oct 2020 at 13:20, Christian Schoenebeck
<qemu_oss@crudebyte.com> wrote:
>
> On Freitag, 30. Oktober 2020 13:07:03 CET Christian Schoenebeck wrote:
> > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> >
> > This fixes occasional 9p test failures when running 'make check -jN' if
> > QEMU was compiled for multiple target architectures, because the individual
> > architecture's test suites would run in parallel and interfere with each
> > other's data as the test directory was previously hard coded and hence the
> > same directory was used by all of them simultaniously.
> >
> > This also requires a change how the test directory is created and deleted:
> > As the test path is now randomized and virtio_9p_register_nodes() being
> > called in a somewhat undeterministic way, that's no longer an appropriate
> > place to create and remove the test directory. Use a constructor and
> > destructor function for creating and removing the test directory instead.
> > Unfortunately libqos currently does not support setup/teardown callbacks
> > to handle this more cleanly.
>
> Peter, please ignore this PR. This patch needs rework:

OK. As a general rule you need to make "please drop this PR"
requests as replies to the top level cover letter, though --
otherwise it's pot luck whether I happen to notice them or not.

thanks
-- PMM
Christian Schoenebeck Oct. 31, 2020, 9:15 p.m. UTC | #3
On Samstag, 31. Oktober 2020 21:34:31 CET Peter Maydell wrote:
> On Sat, 31 Oct 2020 at 13:20, Christian Schoenebeck
> 
> <qemu_oss@crudebyte.com> wrote:
> > On Freitag, 30. Oktober 2020 13:07:03 CET Christian Schoenebeck wrote:
> > > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> > > 
> > > This fixes occasional 9p test failures when running 'make check -jN' if
> > > QEMU was compiled for multiple target architectures, because the
> > > individual
> > > architecture's test suites would run in parallel and interfere with each
> > > other's data as the test directory was previously hard coded and hence
> > > the
> > > same directory was used by all of them simultaniously.
> > > 
> > > This also requires a change how the test directory is created and
> > > deleted:
> > > As the test path is now randomized and virtio_9p_register_nodes() being
> > > called in a somewhat undeterministic way, that's no longer an
> > > appropriate
> > > place to create and remove the test directory. Use a constructor and
> > > destructor function for creating and removing the test directory
> > > instead.
> > > Unfortunately libqos currently does not support setup/teardown callbacks
> > > to handle this more cleanly.
> > 
> > Peter, please ignore this PR. This patch needs rework:
> OK. As a general rule you need to make "please drop this PR"
> requests as replies to the top level cover letter, though --
> otherwise it's pot luck whether I happen to notice them or not.
> 
> thanks
> -- PMM

Okay, got it.

Best regards,
Christian Schoenebeck
Christian Schoenebeck Nov. 1, 2020, 11:37 a.m. UTC | #4
On Samstag, 31. Oktober 2020 14:20:27 CET Christian Schoenebeck wrote:
> On Freitag, 30. Oktober 2020 13:07:03 CET Christian Schoenebeck wrote:
> > Use mkdtemp() to generate a unique directory for the 9p 'local' tests.
> > 
> > This fixes occasional 9p test failures when running 'make check -jN' if
> > QEMU was compiled for multiple target architectures, because the
> > individual
> > architecture's test suites would run in parallel and interfere with each
> > other's data as the test directory was previously hard coded and hence the
> > same directory was used by all of them simultaniously.
> > 
> > This also requires a change how the test directory is created and deleted:
> > As the test path is now randomized and virtio_9p_register_nodes() being
> > called in a somewhat undeterministic way, that's no longer an appropriate
> > place to create and remove the test directory. Use a constructor and
> > destructor function for creating and removing the test directory instead.
> > Unfortunately libqos currently does not support setup/teardown callbacks
> > to handle this more cleanly.
> 
> Peter, please ignore this PR. This patch needs rework:
> 
> ERROR:../tests/qtest/test-x86-cpuid-compat.c:208:test_plus_minus: stdout of
> child process (/x86/cpuid/parsing-plus-minus/subprocess [34856]) failed to
> match:
> 
> stdout was:
> 
> # mkdir('/home/travis/build/cschoenebeck/qemu/build/qtest-9p-local-PwY2nQ')
> failed: File exists
> 
> ERROR qtest-x86_64/test-x86-cpuid-compat - Bail out! ERROR:../tests/qtest/
> test-x86-cpuid-compat.c:208:test_plus_minus: stdout of child process (/x86/
> cpuid/parsing-plus-minus/subprocess [34856]) failed to match:
> 
> make: *** [Makefile.mtest:1793: run-test-222] Error 1
> 
> https://travis-ci.org/github/cschoenebeck/qemu/jobs/740199494

Ok, I found a solution: by moving constructor & destructor functions from 
virtio-9p.c to virtio-9p-test.c:
https://github.com/cschoenebeck/qemu/commit/b4c72149f087d5a

The problem was that the constructor function was executed when libqos was 
loaded, which included completely unrelated test suites that just link to 
libqos.

In conjunction with Peter Xu's two migration patches (fixing occasional 
lockups of migration tests) overall situation appears to be smooth now:
https://lore.kernel.org/qemu-devel/20201030135350.GA588069@xz-x1/

There is now only one test failure left concerning macOS Xcode builds, but 
that seems to be completely unrelated to our 9pfs patches:
https://github.com/cschoenebeck/qemu/runs/1338011297

missing object type 'vhost-user-gpu'
Broken pipe
../tests/qtest/libqtest.c:176: kill_qemu() detected QEMU death from signal 6 
(Abort trap: 6)
ERROR qtest-aarch64/device-introspect-test - too few tests run (expected 6, 
got 5)
gmake: *** [Makefile.mtest:905: run-test-111] Error 1

I prepare updated patches for review.

Best regards,
Christian Schoenebeck
diff mbox series

Patch

diff --git a/tests/qtest/libqos/virtio-9p.c b/tests/qtest/libqos/virtio-9p.c
index d43647b3b7..6b22fa0e9a 100644
--- a/tests/qtest/libqos/virtio-9p.c
+++ b/tests/qtest/libqos/virtio-9p.c
@@ -35,7 +35,12 @@  static char *concat_path(const char* a, const char* b)
 static void init_local_test_path(void)
 {
     char *pwd = g_get_current_dir();
-    local_test_path = concat_path(pwd, "qtest-9p-local");
+    char *template = concat_path(pwd, "qtest-9p-local-XXXXXX");
+    local_test_path = mkdtemp(template);
+    if (!local_test_path) {
+        g_test_message("mkdtemp('%s') failed: %s", template, strerror(errno));
+    }
+    g_assert(local_test_path);
     g_free(pwd);
 }
 
@@ -246,11 +251,6 @@  static void virtio_9p_register_nodes(void)
     const char *str_simple = "fsdev=fsdev0,mount_tag=" MOUNT_TAG;
     const char *str_addr = "fsdev=fsdev0,addr=04.0,mount_tag=" MOUNT_TAG;
 
-    /* make sure test dir for the 'local' tests exists and is clean */
-    init_local_test_path();
-    remove_local_test_dir();
-    create_local_test_dir();
-
     QPCIAddress addr = {
         .devfn = QPCI_DEVFN(4, 0),
     };
@@ -278,3 +278,16 @@  static void virtio_9p_register_nodes(void)
 }
 
 libqos_init(virtio_9p_register_nodes);
+
+static void __attribute__((constructor)) construct_virtio_9p(void)
+{
+    /* make sure test dir for the 'local' tests exists */
+    init_local_test_path();
+    create_local_test_dir();
+}
+
+static void __attribute__((destructor)) destruct_virtio_9p(void)
+{
+    /* remove previously created test dir when test suite completed */
+    remove_local_test_dir();
+}