diff mbox series

tests: Create fifo for test-io-channel-command

Message ID 20221025105520.3016-1-quintela@redhat.com
State New
Headers show
Series tests: Create fifo for test-io-channel-command | expand

Commit Message

Juan Quintela Oct. 25, 2022, 10:55 a.m. UTC
Previous commit removed the creation of the fifo.  Without it, I get
random failure during tests with high load, please consider
reintroduce it.

My guess is that there is a race between the two socats when we leave
them to create the channel, better return to the previous behavior.

I can't reproduce the problem when I run ./test-io-channel-command
test alone, I need to do the make check.  And any (unrelated) change
can make it dissapear.

commit 76f5148c21b4543e62a6ad605ac4b44133421401
Author: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com>
Date:   Thu Oct 6 15:36:57 2022 +0400

    tests/unit: make test-io-channel-command work on win32

    This has been tested under msys2 & windows 11. I haven't tried to make
    it work with other environments yet, but that should be enough to
    validate the channel-command implementation anyway.

    Here are the changes:
    - drop tests/ from fifo/pipe path, to avoid directory issues
    - use g_find_program() to lookup the socat executable (otherwise we
    would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
    with missing socat differently)
    - skip the "echo" test when socat is missing as well

    Signed-off-by: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com>
    Reviewed-by: Daniel P. BerrangΓ© <berrange@redhat.com>
    Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>

Failure:

[178/178] πŸŒ“ qemu:unit / test-io-channel-command
[178/178] πŸŒ” qemu:unit / test-io-channel-command
[178/178] πŸŒ• qemu:unit / test-io-channel-command
[178/178] πŸŒ– qemu:unit / test-io-channel-command
[178/178] πŸŒ— qemu:unit / test-io-channel-command
[178/178] 🌘 qemu:unit / test-io-channel-command
[178/178] πŸŒ‘ qemu:unit / test-io-channel-command
[178/178] πŸŒ’ qemu:unit / test-io-channel-command
[178/178] πŸŒ“ qemu:unit / test-io-channel-command
^CWARNING: Received SIGTERM, exiting
178/178 qemu:unit / test-io-channel-command                                    INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
――――――――――――――――――――――――――――――――――――― βœ€  ―――――――――――――――――――――――――――――――――――――
stderr:
2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists

TAP parsing error: Too few tests run (expected 4, got 0)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Summary of Failures:

178/178 qemu:unit / test-io-channel-command                           INTERRUPT      1127.75s   killed by signal 15 SIGTERM

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 tests/unit/test-io-channel-command.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Alex BennΓ©e Oct. 25, 2022, 12:57 p.m. UTC | #1
Juan Quintela <quintela@redhat.com> writes:

> Previous commit removed the creation of the fifo.  Without it, I get
> random failure during tests with high load, please consider
> reintroduce it.
>
> My guess is that there is a race between the two socats when we leave
> them to create the channel, better return to the previous behavior.
>
> I can't reproduce the problem when I run ./test-io-channel-command
> test alone, I need to do the make check.  And any (unrelated) change
> can make it dissapear.

I was chasing a similar problem with this test although I don't see it
timeout while running (I don't think our unit tests time out). I'm
provisionally queuing this to testing/next unless anyone objects.

>
> commit 76f5148c21b4543e62a6ad605ac4b44133421401
> Author: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com>
> Date:   Thu Oct 6 15:36:57 2022 +0400
>
>     tests/unit: make test-io-channel-command work on win32
>
>     This has been tested under msys2 & windows 11. I haven't tried to make
>     it work with other environments yet, but that should be enough to
>     validate the channel-command implementation anyway.
>
>     Here are the changes:
>     - drop tests/ from fifo/pipe path, to avoid directory issues
>     - use g_find_program() to lookup the socat executable (otherwise we
>     would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
>     with missing socat differently)
>     - skip the "echo" test when socat is missing as well
>
>     Signed-off-by: Marc-AndrΓ© Lureau <marcandre.lureau@redhat.com>
>     Reviewed-by: Daniel P. BerrangΓ© <berrange@redhat.com>
>     Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>
>
> Failure:
>
> [178/178] πŸŒ“ qemu:unit / test-io-channel-command
> [178/178] πŸŒ” qemu:unit / test-io-channel-command
> [178/178] πŸŒ• qemu:unit / test-io-channel-command
> [178/178] πŸŒ– qemu:unit / test-io-channel-command
> [178/178] πŸŒ— qemu:unit / test-io-channel-command
> [178/178] 🌘 qemu:unit / test-io-channel-command
> [178/178] πŸŒ‘ qemu:unit / test-io-channel-command
> [178/178] πŸŒ’ qemu:unit / test-io-channel-command
> [178/178] πŸŒ“ qemu:unit / test-io-channel-command
> ^CWARNING: Received SIGTERM, exiting
> 178/178 qemu:unit / test-io-channel-command                                    INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
> ――――――――――――――――――――――――――――――――――――― βœ€  ―――――――――――――――――――――――――――――――――――――
> stderr:
> 2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists
>
> TAP parsing error: Too few tests run (expected 4, got 0)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> Summary of Failures:
>
> 178/178 qemu:unit / test-io-channel-command                           INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  tests/unit/test-io-channel-command.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
> index 7eee939c07..7e75f960f4 100644
> --- a/tests/unit/test-io-channel-command.c
> +++ b/tests/unit/test-io-channel-command.c
> @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async)
>      }
>  
>      unlink(TEST_FIFO);
> +    if (mkfifo(TEST_FIFO, 0600) < 0) {
> +        abort();
> +    }
>      src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
>                                                      O_WRONLY,
>                                                      &error_abort));
Daniel P. BerrangΓ© Oct. 25, 2022, 1:08 p.m. UTC | #2
CC'ing Marc-AndrΓ© as original author of the change

On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote:
> 
> Juan Quintela <quintela@redhat.com> writes:
> 
> > Previous commit removed the creation of the fifo.  Without it, I get
> > random failure during tests with high load, please consider
> > reintroduce it.
> >
> > My guess is that there is a race between the two socats when we leave
> > them to create the channel, better return to the previous behavior.
> >
> > I can't reproduce the problem when I run ./test-io-channel-command
> > test alone, I need to do the make check.  And any (unrelated) change
> > can make it dissapear.
> 
> I was chasing a similar problem with this test although I don't see it
> timeout while running (I don't think our unit tests time out). I'm
> provisionally queuing this to testing/next unless anyone objects.

It won't build on Win32 since that platform lacks mkfifo.

The test normally works since socat will call mknod to create
the fifo.

I think the problem is that we have a race condition where the
client socat runs before the server socat, and so won't see the
fifo. This will be where high load triggers problems.


With regards,
Daniel
Alex BennΓ©e Oct. 26, 2022, 4:18 p.m. UTC | #3
Daniel P. BerrangΓ© <berrange@redhat.com> writes:

> CC'ing Marc-AndrΓ© as original author of the change
>
> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote:
>> 
>> Juan Quintela <quintela@redhat.com> writes:
>> 
>> > Previous commit removed the creation of the fifo.  Without it, I get
>> > random failure during tests with high load, please consider
>> > reintroduce it.
>> >
>> > My guess is that there is a race between the two socats when we leave
>> > them to create the channel, better return to the previous behavior.
>> >
>> > I can't reproduce the problem when I run ./test-io-channel-command
>> > test alone, I need to do the make check.  And any (unrelated) change
>> > can make it dissapear.
>> 
>> I was chasing a similar problem with this test although I don't see it
>> timeout while running (I don't think our unit tests time out). I'm
>> provisionally queuing this to testing/next unless anyone objects.
>
> It won't build on Win32 since that platform lacks mkfifo.
>
> The test normally works since socat will call mknod to create
> the fifo.
>
> I think the problem is that we have a race condition where the
> client socat runs before the server socat, and so won't see the
> fifo. This will be where high load triggers problems.

Ok I shall drop the patch from testing/next - we need a better solution.
Thomas Huth Oct. 27, 2022, 12:25 p.m. UTC | #4
On 26/10/2022 18.18, Alex BennΓ©e wrote:
> 
> Daniel P. BerrangΓ© <berrange@redhat.com> writes:
> 
>> CC'ing Marc-AndrΓ© as original author of the change
>>
>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote:
>>>
>>> Juan Quintela <quintela@redhat.com> writes:
>>>
>>>> Previous commit removed the creation of the fifo.  Without it, I get
>>>> random failure during tests with high load, please consider
>>>> reintroduce it.
>>>>
>>>> My guess is that there is a race between the two socats when we leave
>>>> them to create the channel, better return to the previous behavior.
>>>>
>>>> I can't reproduce the problem when I run ./test-io-channel-command
>>>> test alone, I need to do the make check.  And any (unrelated) change
>>>> can make it dissapear.
>>>
>>> I was chasing a similar problem with this test although I don't see it
>>> timeout while running (I don't think our unit tests time out). I'm
>>> provisionally queuing this to testing/next unless anyone objects.
>>
>> It won't build on Win32 since that platform lacks mkfifo.
>>
>> The test normally works since socat will call mknod to create
>> the fifo.
>>
>> I think the problem is that we have a race condition where the
>> client socat runs before the server socat, and so won't see the
>> fifo. This will be where high load triggers problems.
> 
> Ok I shall drop the patch from testing/next - we need a better solution.

Could we maybe at least revert the patch that introduced the problem? ... 
the failing test is annoying ...

  Thomas
Alex BennΓ©e Oct. 27, 2022, 1:59 p.m. UTC | #5
Thomas Huth <thuth@redhat.com> writes:

> On 26/10/2022 18.18, Alex BennΓ©e wrote:
>> Daniel P. BerrangΓ© <berrange@redhat.com> writes:
>> 
>>> CC'ing Marc-AndrΓ© as original author of the change
>>>
>>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote:
>>>>
>>>> Juan Quintela <quintela@redhat.com> writes:
>>>>
>>>>> Previous commit removed the creation of the fifo.  Without it, I get
>>>>> random failure during tests with high load, please consider
>>>>> reintroduce it.
>>>>>
>>>>> My guess is that there is a race between the two socats when we leave
>>>>> them to create the channel, better return to the previous behavior.
>>>>>
>>>>> I can't reproduce the problem when I run ./test-io-channel-command
>>>>> test alone, I need to do the make check.  And any (unrelated) change
>>>>> can make it dissapear.
>>>>
>>>> I was chasing a similar problem with this test although I don't see it
>>>> timeout while running (I don't think our unit tests time out). I'm
>>>> provisionally queuing this to testing/next unless anyone objects.
>>>
>>> It won't build on Win32 since that platform lacks mkfifo.
>>>
>>> The test normally works since socat will call mknod to create
>>> the fifo.
>>>
>>> I think the problem is that we have a race condition where the
>>> client socat runs before the server socat, and so won't see the
>>> fifo. This will be where high load triggers problems.
>> Ok I shall drop the patch from testing/next - we need a better
>> solution.
>
> Could we maybe at least revert the patch that introduced the problem?
> ... the failing test is annoying ...

I'm trying to fix it now but I haven't been able to get one of the
socat's to not race with the other:

--8<---------------cut here---------------start------------->8---
modified   tests/unit/test-io-channel-command.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include <glib/gstdio.h>
 #include "io/channel-command.h"
 #include "io-channel-helpers.h"
 #include "qapi/error.h"
@@ -33,25 +34,26 @@ static char *socat = NULL;
 
 static void test_io_channel_command_fifo(bool async)
 {
+    g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XXXXXX", NULL);
+    g_autofree gchar *fifo = g_strdup_printf("%s/%s", tmpdir, TEST_FIFO);
+    g_autoptr(GString) srcargs = g_string_new(socat);
+    g_autoptr(GString) dstargs = g_string_new(socat);
+    g_auto(GStrv) srcargv;
+    g_auto(GStrv) dstargv;
     QIOChannel *src, *dst;
     QIOChannelTest *test;
-    const char *srcargv[] = {
-        socat, "-", SOCAT_SRC, NULL,
-    };
-    const char *dstargv[] = {
-        socat, SOCAT_DST, "-", NULL,
-    };
 
-    if (!socat) {
-        g_test_skip("socat is not found in PATH");
-        return;
-    }
+    g_string_append_printf(srcargs, " - PIPE:%s,wronly,creat=1,excl=1", fifo);
+    g_string_append_printf(dstargs, " PIPE:%s,rdonly,creat=0 -", fifo);
+
+    srcargv = g_strsplit(srcargs->str, " ", -1);
+    dstargv = g_strsplit(dstargs->str, " ", -1);
 
-    unlink(TEST_FIFO);
-    src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
+    src = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) srcargv,
                                                     O_WRONLY,
                                                     &error_abort));
-    dst = QIO_CHANNEL(qio_channel_command_new_spawn(dstargv,
+
+    dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) dstargv,
                                                     O_RDONLY,
                                                     &error_abort));
 
@@ -62,17 +64,27 @@ static void test_io_channel_command_fifo(bool async)
     object_unref(OBJECT(src));
     object_unref(OBJECT(dst));
 
-    unlink(TEST_FIFO);
+    g_rmdir(tmpdir);
 }
 
 
 static void test_io_channel_command_fifo_async(void)
 {
+    if (!socat) {
+        g_test_skip("socat is not found in PATH");
+        return;
+    }
+
     test_io_channel_command_fifo(true);
 }
 
 static void test_io_channel_command_fifo_sync(void)
 {
+    if (!socat) {
+        g_test_skip("socat is not found in PATH");
+        return;
+    }
+
     test_io_channel_command_fifo(false);
 }
--8<---------------cut here---------------end--------------->8---


>
>  Thomas
Daniel P. BerrangΓ© Oct. 27, 2022, 2:26 p.m. UTC | #6
On Thu, Oct 27, 2022 at 02:59:04PM +0100, Alex BennΓ©e wrote:
> 
> Thomas Huth <thuth@redhat.com> writes:
> 
> > On 26/10/2022 18.18, Alex BennΓ©e wrote:
> >> Daniel P. BerrangΓ© <berrange@redhat.com> writes:
> >> 
> >>> CC'ing Marc-AndrΓ© as original author of the change
> >>>
> >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex BennΓ©e wrote:
> >>>>
> >>>> Juan Quintela <quintela@redhat.com> writes:
> >>>>
> >>>>> Previous commit removed the creation of the fifo.  Without it, I get
> >>>>> random failure during tests with high load, please consider
> >>>>> reintroduce it.
> >>>>>
> >>>>> My guess is that there is a race between the two socats when we leave
> >>>>> them to create the channel, better return to the previous behavior.
> >>>>>
> >>>>> I can't reproduce the problem when I run ./test-io-channel-command
> >>>>> test alone, I need to do the make check.  And any (unrelated) change
> >>>>> can make it dissapear.
> >>>>
> >>>> I was chasing a similar problem with this test although I don't see it
> >>>> timeout while running (I don't think our unit tests time out). I'm
> >>>> provisionally queuing this to testing/next unless anyone objects.
> >>>
> >>> It won't build on Win32 since that platform lacks mkfifo.
> >>>
> >>> The test normally works since socat will call mknod to create
> >>> the fifo.
> >>>
> >>> I think the problem is that we have a race condition where the
> >>> client socat runs before the server socat, and so won't see the
> >>> fifo. This will be where high load triggers problems.
> >> Ok I shall drop the patch from testing/next - we need a better
> >> solution.
> >
> > Could we maybe at least revert the patch that introduced the problem?
> > ... the failing test is annoying ...
> 
> I'm trying to fix it now but I haven't been able to get one of the
> socat's to not race with the other:

My other thought was to run a socat first, with 'unlink-close=0' and
make it exit immediately, simply to get the fifo created.  I'm not
sure how to make ti exit immediately though, with a guarantee that
its had a chance to create the fifo.


With regards,
Daniel
diff mbox series

Patch

diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
index 7eee939c07..7e75f960f4 100644
--- a/tests/unit/test-io-channel-command.c
+++ b/tests/unit/test-io-channel-command.c
@@ -48,6 +48,9 @@  static void test_io_channel_command_fifo(bool async)
     }
 
     unlink(TEST_FIFO);
+    if (mkfifo(TEST_FIFO, 0600) < 0) {
+        abort();
+    }
     src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
                                                     O_WRONLY,
                                                     &error_abort));