Message ID | 20221025105520.3016-1-quintela@redhat.com |
---|---|
State | New |
Headers | show |
Series | tests: Create fifo for test-io-channel-command | expand |
Juan Quintela <quintela@redhat.com> writes: > Previous commit removed the creation of the fifo. Without it, I get > random failure during tests with high load, please consider > reintroduce it. > > My guess is that there is a race between the two socats when we leave > them to create the channel, better return to the previous behavior. > > I can't reproduce the problem when I run ./test-io-channel-command > test alone, I need to do the make check. And any (unrelated) change > can make it dissapear. I was chasing a similar problem with this test although I don't see it timeout while running (I don't think our unit tests time out). I'm provisionally queuing this to testing/next unless anyone objects. > > commit 76f5148c21b4543e62a6ad605ac4b44133421401 > Author: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com> > Date: Thu Oct 6 15:36:57 2022 +0400 > > tests/unit: make test-io-channel-command work on win32 > > This has been tested under msys2 & windows 11. I haven't tried to make > it work with other environments yet, but that should be enough to > validate the channel-command implementation anyway. > > Here are the changes: > - drop tests/ from fifo/pipe path, to avoid directory issues > - use g_find_program() to lookup the socat executable (otherwise we > would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal > with missing socat differently) > - skip the "echo" test when socat is missing as well > > Signed-off-by: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com> > Reviewed-by: Daniel P. BerrangΓ© <berrange@redhat.com> > Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com> > > Failure: > > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > [178/178] π qemu:unit / test-io-channel-command > ^CWARNING: Received SIGTERM, exiting > 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM >>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k > βββββββββββββββββββββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββ > stderr: > 2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists > > TAP parsing error: Too few tests run (expected 4, got 0) > ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ > > Summary of Failures: > > 178/178 qemu:unit / test-io-channel-command INTERRUPT 1127.75s killed by signal 15 SIGTERM > > Signed-off-by: Juan Quintela <quintela@redhat.com> > --- > tests/unit/test-io-channel-command.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c > index 7eee939c07..7e75f960f4 100644 > --- a/tests/unit/test-io-channel-command.c > +++ b/tests/unit/test-io-channel-command.c > @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async) > } > > unlink(TEST_FIFO); > + if (mkfifo(TEST_FIFO, 0600) < 0) { > + abort(); > + } > src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv, > O_WRONLY, > &error_abort));
CC'ing Marc-AndrΓ© as original author of the change On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote: > > Juan Quintela <quintela@redhat.com> writes: > > > Previous commit removed the creation of the fifo. Without it, I get > > random failure during tests with high load, please consider > > reintroduce it. > > > > My guess is that there is a race between the two socats when we leave > > them to create the channel, better return to the previous behavior. > > > > I can't reproduce the problem when I run ./test-io-channel-command > > test alone, I need to do the make check. And any (unrelated) change > > can make it dissapear. > > I was chasing a similar problem with this test although I don't see it > timeout while running (I don't think our unit tests time out). I'm > provisionally queuing this to testing/next unless anyone objects. It won't build on Win32 since that platform lacks mkfifo. The test normally works since socat will call mknod to create the fifo. I think the problem is that we have a race condition where the client socat runs before the server socat, and so won't see the fifo. This will be where high load triggers problems. With regards, Daniel
Daniel P. BerrangΓ© <berrange@redhat.com> writes: > CC'ing Marc-AndrΓ© as original author of the change > > On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote: >> >> Juan Quintela <quintela@redhat.com> writes: >> >> > Previous commit removed the creation of the fifo. Without it, I get >> > random failure during tests with high load, please consider >> > reintroduce it. >> > >> > My guess is that there is a race between the two socats when we leave >> > them to create the channel, better return to the previous behavior. >> > >> > I can't reproduce the problem when I run ./test-io-channel-command >> > test alone, I need to do the make check. And any (unrelated) change >> > can make it dissapear. >> >> I was chasing a similar problem with this test although I don't see it >> timeout while running (I don't think our unit tests time out). I'm >> provisionally queuing this to testing/next unless anyone objects. > > It won't build on Win32 since that platform lacks mkfifo. > > The test normally works since socat will call mknod to create > the fifo. > > I think the problem is that we have a race condition where the > client socat runs before the server socat, and so won't see the > fifo. This will be where high load triggers problems. Ok I shall drop the patch from testing/next - we need a better solution.
On 26/10/2022 18.18, Alex BennΓ©e wrote: > > Daniel P. BerrangΓ© <berrange@redhat.com> writes: > >> CC'ing Marc-AndrΓ© as original author of the change >> >> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote: >>> >>> Juan Quintela <quintela@redhat.com> writes: >>> >>>> Previous commit removed the creation of the fifo. Without it, I get >>>> random failure during tests with high load, please consider >>>> reintroduce it. >>>> >>>> My guess is that there is a race between the two socats when we leave >>>> them to create the channel, better return to the previous behavior. >>>> >>>> I can't reproduce the problem when I run ./test-io-channel-command >>>> test alone, I need to do the make check. And any (unrelated) change >>>> can make it dissapear. >>> >>> I was chasing a similar problem with this test although I don't see it >>> timeout while running (I don't think our unit tests time out). I'm >>> provisionally queuing this to testing/next unless anyone objects. >> >> It won't build on Win32 since that platform lacks mkfifo. >> >> The test normally works since socat will call mknod to create >> the fifo. >> >> I think the problem is that we have a race condition where the >> client socat runs before the server socat, and so won't see the >> fifo. This will be where high load triggers problems. > > Ok I shall drop the patch from testing/next - we need a better solution. Could we maybe at least revert the patch that introduced the problem? ... the failing test is annoying ... Thomas
Thomas Huth <thuth@redhat.com> writes: > On 26/10/2022 18.18, Alex BennΓ©e wrote: >> Daniel P. BerrangΓ© <berrange@redhat.com> writes: >> >>> CC'ing Marc-AndrΓ© as original author of the change >>> >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote: >>>> >>>> Juan Quintela <quintela@redhat.com> writes: >>>> >>>>> Previous commit removed the creation of the fifo. Without it, I get >>>>> random failure during tests with high load, please consider >>>>> reintroduce it. >>>>> >>>>> My guess is that there is a race between the two socats when we leave >>>>> them to create the channel, better return to the previous behavior. >>>>> >>>>> I can't reproduce the problem when I run ./test-io-channel-command >>>>> test alone, I need to do the make check. And any (unrelated) change >>>>> can make it dissapear. >>>> >>>> I was chasing a similar problem with this test although I don't see it >>>> timeout while running (I don't think our unit tests time out). I'm >>>> provisionally queuing this to testing/next unless anyone objects. >>> >>> It won't build on Win32 since that platform lacks mkfifo. >>> >>> The test normally works since socat will call mknod to create >>> the fifo. >>> >>> I think the problem is that we have a race condition where the >>> client socat runs before the server socat, and so won't see the >>> fifo. This will be where high load triggers problems. >> Ok I shall drop the patch from testing/next - we need a better >> solution. > > Could we maybe at least revert the patch that introduced the problem? > ... the failing test is annoying ... I'm trying to fix it now but I haven't been able to get one of the socat's to not race with the other: --8<---------------cut here---------------start------------->8--- modified tests/unit/test-io-channel-command.c @@ -19,6 +19,7 @@ */ #include "qemu/osdep.h" +#include <glib/gstdio.h> #include "io/channel-command.h" #include "io-channel-helpers.h" #include "qapi/error.h" @@ -33,25 +34,26 @@ static char *socat = NULL; static void test_io_channel_command_fifo(bool async) { + g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XXXXXX", NULL); + g_autofree gchar *fifo = g_strdup_printf("%s/%s", tmpdir, TEST_FIFO); + g_autoptr(GString) srcargs = g_string_new(socat); + g_autoptr(GString) dstargs = g_string_new(socat); + g_auto(GStrv) srcargv; + g_auto(GStrv) dstargv; QIOChannel *src, *dst; QIOChannelTest *test; - const char *srcargv[] = { - socat, "-", SOCAT_SRC, NULL, - }; - const char *dstargv[] = { - socat, SOCAT_DST, "-", NULL, - }; - if (!socat) { - g_test_skip("socat is not found in PATH"); - return; - } + g_string_append_printf(srcargs, " - PIPE:%s,wronly,creat=1,excl=1", fifo); + g_string_append_printf(dstargs, " PIPE:%s,rdonly,creat=0 -", fifo); + + srcargv = g_strsplit(srcargs->str, " ", -1); + dstargv = g_strsplit(dstargs->str, " ", -1); - unlink(TEST_FIFO); - src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv, + src = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) srcargv, O_WRONLY, &error_abort)); - dst = QIO_CHANNEL(qio_channel_command_new_spawn(dstargv, + + dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) dstargv, O_RDONLY, &error_abort)); @@ -62,17 +64,27 @@ static void test_io_channel_command_fifo(bool async) object_unref(OBJECT(src)); object_unref(OBJECT(dst)); - unlink(TEST_FIFO); + g_rmdir(tmpdir); } static void test_io_channel_command_fifo_async(void) { + if (!socat) { + g_test_skip("socat is not found in PATH"); + return; + } + test_io_channel_command_fifo(true); } static void test_io_channel_command_fifo_sync(void) { + if (!socat) { + g_test_skip("socat is not found in PATH"); + return; + } + test_io_channel_command_fifo(false); } --8<---------------cut here---------------end--------------->8--- > > Thomas
On Thu, Oct 27, 2022 at 02:59:04PM +0100, Alex BennΓ©e wrote: > > Thomas Huth <thuth@redhat.com> writes: > > > On 26/10/2022 18.18, Alex BennΓ©e wrote: > >> Daniel P. BerrangΓ© <berrange@redhat.com> writes: > >> > >>> CC'ing Marc-AndrΓ© as original author of the change > >>> > >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote: > >>>> > >>>> Juan Quintela <quintela@redhat.com> writes: > >>>> > >>>>> Previous commit removed the creation of the fifo. Without it, I get > >>>>> random failure during tests with high load, please consider > >>>>> reintroduce it. > >>>>> > >>>>> My guess is that there is a race between the two socats when we leave > >>>>> them to create the channel, better return to the previous behavior. > >>>>> > >>>>> I can't reproduce the problem when I run ./test-io-channel-command > >>>>> test alone, I need to do the make check. And any (unrelated) change > >>>>> can make it dissapear. > >>>> > >>>> I was chasing a similar problem with this test although I don't see it > >>>> timeout while running (I don't think our unit tests time out). I'm > >>>> provisionally queuing this to testing/next unless anyone objects. > >>> > >>> It won't build on Win32 since that platform lacks mkfifo. > >>> > >>> The test normally works since socat will call mknod to create > >>> the fifo. > >>> > >>> I think the problem is that we have a race condition where the > >>> client socat runs before the server socat, and so won't see the > >>> fifo. This will be where high load triggers problems. > >> Ok I shall drop the patch from testing/next - we need a better > >> solution. > > > > Could we maybe at least revert the patch that introduced the problem? > > ... the failing test is annoying ... > > I'm trying to fix it now but I haven't been able to get one of the > socat's to not race with the other: My other thought was to run a socat first, with 'unlink-close=0' and make it exit immediately, simply to get the fifo created. I'm not sure how to make ti exit immediately though, with a guarantee that its had a chance to create the fifo. With regards, Daniel
diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c index 7eee939c07..7e75f960f4 100644 --- a/tests/unit/test-io-channel-command.c +++ b/tests/unit/test-io-channel-command.c @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async) } unlink(TEST_FIFO); + if (mkfifo(TEST_FIFO, 0600) < 0) { + abort(); + } src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv, O_WRONLY, &error_abort));