tests: Create fifo for test-io-channel-command

Message ID	20221025105520.3016-1-quintela@redhat.com
State	New
Headers	show Return-Path: <qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org> From: Juan Quintela <quintela@redhat.com> To: qemu-devel@nongnu.org Cc: =?utf-8?q?Daniel_P=2E_Berrang=C3=A9?= <berrange@redhat.com>, Juan Quintela <quintela@redhat.com> Subject: [PATCH] tests: Create fifo for test-io-channel-command Date: Tue, 25 Oct 2022 12:55:20 +0200 Message-Id: <20221025105520.3016-1-quintela@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=170.10.133.124; envelope-from=quintela@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.517, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Sender: "Qemu-devel" <qemu-devel-bounces@nongnu.org> Errors-To: qemu-devel-bounces+incoming=patchwork.ozlabs.org@nongnu.org
Series	tests: Create fifo for test-io-channel-command \| expand tests: Create fifo for test-io-channel-command

Juan Quintela Oct. 25, 2022, 10:55 a.m. UTC

Previous commit removed the creation of the fifo.  Without it, I get
random failure during tests with high load, please consider
reintroduce it.

My guess is that there is a race between the two socats when we leave
them to create the channel, better return to the previous behavior.

I can't reproduce the problem when I run ./test-io-channel-command
test alone, I need to do the make check.  And any (unrelated) change
can make it dissapear.

commit 76f5148c21b4543e62a6ad605ac4b44133421401
Author: Marc-André Lureau <marcandre.lureau@redhat.com>
Date:   Thu Oct 6 15:36:57 2022 +0400

    tests/unit: make test-io-channel-command work on win32

    This has been tested under msys2 & windows 11. I haven't tried to make
    it work with other environments yet, but that should be enough to
    validate the channel-command implementation anyway.

    Here are the changes:
    - drop tests/ from fifo/pipe path, to avoid directory issues
    - use g_find_program() to lookup the socat executable (otherwise we
    would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
    with missing socat differently)
    - skip the "echo" test when socat is missing as well

    Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
    Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
    Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>

Failure:

[178/178] 🌓 qemu:unit / test-io-channel-command
[178/178] 🌔 qemu:unit / test-io-channel-command
[178/178] 🌕 qemu:unit / test-io-channel-command
[178/178] 🌖 qemu:unit / test-io-channel-command
[178/178] 🌗 qemu:unit / test-io-channel-command
[178/178] 🌘 qemu:unit / test-io-channel-command
[178/178] 🌑 qemu:unit / test-io-channel-command
[178/178] 🌒 qemu:unit / test-io-channel-command
[178/178] 🌓 qemu:unit / test-io-channel-command
^CWARNING: Received SIGTERM, exiting
178/178 qemu:unit / test-io-channel-command                                    INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists

TAP parsing error: Too few tests run (expected 4, got 0)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

Summary of Failures:

178/178 qemu:unit / test-io-channel-command                           INTERRUPT      1127.75s   killed by signal 15 SIGTERM

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 tests/unit/test-io-channel-command.c | 3 +++
 1 file changed, 3 insertions(+)

Alex Bennée Oct. 25, 2022, 12:57 p.m. UTC | #1

Juan Quintela <quintela@redhat.com> writes:

> Previous commit removed the creation of the fifo.  Without it, I get
> random failure during tests with high load, please consider
> reintroduce it.
>
> My guess is that there is a race between the two socats when we leave
> them to create the channel, better return to the previous behavior.
>
> I can't reproduce the problem when I run ./test-io-channel-command
> test alone, I need to do the make check.  And any (unrelated) change
> can make it dissapear.

I was chasing a similar problem with this test although I don't see it
timeout while running (I don't think our unit tests time out). I'm
provisionally queuing this to testing/next unless anyone objects.

>
> commit 76f5148c21b4543e62a6ad605ac4b44133421401
> Author: Marc-André Lureau <marcandre.lureau@redhat.com>
> Date:   Thu Oct 6 15:36:57 2022 +0400
>
>     tests/unit: make test-io-channel-command work on win32
>
>     This has been tested under msys2 & windows 11. I haven't tried to make
>     it work with other environments yet, but that should be enough to
>     validate the channel-command implementation anyway.
>
>     Here are the changes:
>     - drop tests/ from fifo/pipe path, to avoid directory issues
>     - use g_find_program() to lookup the socat executable (otherwise we
>     would need to change ChanneCommand to use G_SPAWN_SEARCH_PATH, and deal
>     with missing socat differently)
>     - skip the "echo" test when socat is missing as well
>
>     Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>     Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
>     Message-Id: <20221006113657.2656108-7-marcandre.lureau@redhat.com>
>
> Failure:
>
> [178/178] 🌓 qemu:unit / test-io-channel-command
> [178/178] 🌔 qemu:unit / test-io-channel-command
> [178/178] 🌕 qemu:unit / test-io-channel-command
> [178/178] 🌖 qemu:unit / test-io-channel-command
> [178/178] 🌗 qemu:unit / test-io-channel-command
> [178/178] 🌘 qemu:unit / test-io-channel-command
> [178/178] 🌑 qemu:unit / test-io-channel-command
> [178/178] 🌒 qemu:unit / test-io-channel-command
> [178/178] 🌓 qemu:unit / test-io-channel-command
> ^CWARNING: Received SIGTERM, exiting
> 178/178 qemu:unit / test-io-channel-command                                    INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>>>> MALLOC_PERTURB_=149 G_TEST_BUILDDIR=/scratch/qemu/multifd/x64/tests/unit G_TEST_SRCDIR=/mnt/code/qemu/multifd/tests/unit /scratch/qemu/multifd/x64/tests/unit/test-io-channel-command --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> 2022/10/25 12:32:48 socat[463140] E mkfifo(test-io-channel-command.fifo, 438): File exists
>
> TAP parsing error: Too few tests run (expected 4, got 0)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> Summary of Failures:
>
> 178/178 qemu:unit / test-io-channel-command                           INTERRUPT      1127.75s   killed by signal 15 SIGTERM
>
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  tests/unit/test-io-channel-command.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/tests/unit/test-io-channel-command.c b/tests/unit/test-io-channel-command.c
> index 7eee939c07..7e75f960f4 100644
> --- a/tests/unit/test-io-channel-command.c
> +++ b/tests/unit/test-io-channel-command.c
> @@ -48,6 +48,9 @@ static void test_io_channel_command_fifo(bool async)
>      }
>  
>      unlink(TEST_FIFO);
> +    if (mkfifo(TEST_FIFO, 0600) < 0) {
> +        abort();
> +    }
>      src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
>                                                      O_WRONLY,
>                                                      &error_abort));

Daniel P. Berrangé Oct. 25, 2022, 1:08 p.m. UTC | #2

CC'ing Marc-André as original author of the change

On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
> 
> Juan Quintela <quintela@redhat.com> writes:
> 
> > Previous commit removed the creation of the fifo.  Without it, I get
> > random failure during tests with high load, please consider
> > reintroduce it.
> >
> > My guess is that there is a race between the two socats when we leave
> > them to create the channel, better return to the previous behavior.
> >
> > I can't reproduce the problem when I run ./test-io-channel-command
> > test alone, I need to do the make check.  And any (unrelated) change
> > can make it dissapear.
> 
> I was chasing a similar problem with this test although I don't see it
> timeout while running (I don't think our unit tests time out). I'm
> provisionally queuing this to testing/next unless anyone objects.

It won't build on Win32 since that platform lacks mkfifo.

The test normally works since socat will call mknod to create
the fifo.

I think the problem is that we have a race condition where the
client socat runs before the server socat, and so won't see the
fifo. This will be where high load triggers problems.


With regards,
Daniel

Alex Bennée Oct. 26, 2022, 4:18 p.m. UTC | #3

Daniel P. Berrangé <berrange@redhat.com> writes:

> CC'ing Marc-André as original author of the change
>
> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
>> 
>> Juan Quintela <quintela@redhat.com> writes:
>> 
>> > Previous commit removed the creation of the fifo.  Without it, I get
>> > random failure during tests with high load, please consider
>> > reintroduce it.
>> >
>> > My guess is that there is a race between the two socats when we leave
>> > them to create the channel, better return to the previous behavior.
>> >
>> > I can't reproduce the problem when I run ./test-io-channel-command
>> > test alone, I need to do the make check.  And any (unrelated) change
>> > can make it dissapear.
>> 
>> I was chasing a similar problem with this test although I don't see it
>> timeout while running (I don't think our unit tests time out). I'm
>> provisionally queuing this to testing/next unless anyone objects.
>
> It won't build on Win32 since that platform lacks mkfifo.
>
> The test normally works since socat will call mknod to create
> the fifo.
>
> I think the problem is that we have a race condition where the
> client socat runs before the server socat, and so won't see the
> fifo. This will be where high load triggers problems.

Ok I shall drop the patch from testing/next - we need a better solution.

Thomas Huth Oct. 27, 2022, 12:25 p.m. UTC | #4

On 26/10/2022 18.18, Alex Bennée wrote:
> 
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
>> CC'ing Marc-André as original author of the change
>>
>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
>>>
>>> Juan Quintela <quintela@redhat.com> writes:
>>>
>>>> Previous commit removed the creation of the fifo.  Without it, I get
>>>> random failure during tests with high load, please consider
>>>> reintroduce it.
>>>>
>>>> My guess is that there is a race between the two socats when we leave
>>>> them to create the channel, better return to the previous behavior.
>>>>
>>>> I can't reproduce the problem when I run ./test-io-channel-command
>>>> test alone, I need to do the make check.  And any (unrelated) change
>>>> can make it dissapear.
>>>
>>> I was chasing a similar problem with this test although I don't see it
>>> timeout while running (I don't think our unit tests time out). I'm
>>> provisionally queuing this to testing/next unless anyone objects.
>>
>> It won't build on Win32 since that platform lacks mkfifo.
>>
>> The test normally works since socat will call mknod to create
>> the fifo.
>>
>> I think the problem is that we have a race condition where the
>> client socat runs before the server socat, and so won't see the
>> fifo. This will be where high load triggers problems.
> 
> Ok I shall drop the patch from testing/next - we need a better solution.

Could we maybe at least revert the patch that introduced the problem? ... 
the failing test is annoying ...

  Thomas

Alex Bennée Oct. 27, 2022, 1:59 p.m. UTC | #5

Thomas Huth <thuth@redhat.com> writes:

> On 26/10/2022 18.18, Alex Bennée wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>>> CC'ing Marc-André as original author of the change
>>>
>>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
>>>>
>>>> Juan Quintela <quintela@redhat.com> writes:
>>>>
>>>>> Previous commit removed the creation of the fifo.  Without it, I get
>>>>> random failure during tests with high load, please consider
>>>>> reintroduce it.
>>>>>
>>>>> My guess is that there is a race between the two socats when we leave
>>>>> them to create the channel, better return to the previous behavior.
>>>>>
>>>>> I can't reproduce the problem when I run ./test-io-channel-command
>>>>> test alone, I need to do the make check.  And any (unrelated) change
>>>>> can make it dissapear.
>>>>
>>>> I was chasing a similar problem with this test although I don't see it
>>>> timeout while running (I don't think our unit tests time out). I'm
>>>> provisionally queuing this to testing/next unless anyone objects.
>>>
>>> It won't build on Win32 since that platform lacks mkfifo.
>>>
>>> The test normally works since socat will call mknod to create
>>> the fifo.
>>>
>>> I think the problem is that we have a race condition where the
>>> client socat runs before the server socat, and so won't see the
>>> fifo. This will be where high load triggers problems.
>> Ok I shall drop the patch from testing/next - we need a better
>> solution.
>
> Could we maybe at least revert the patch that introduced the problem?
> ... the failing test is annoying ...

I'm trying to fix it now but I haven't been able to get one of the
socat's to not race with the other:

--8<---------------cut here---------------start------------->8---
modified   tests/unit/test-io-channel-command.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include <glib/gstdio.h>
 #include "io/channel-command.h"
 #include "io-channel-helpers.h"
 #include "qapi/error.h"
@@ -33,25 +34,26 @@ static char *socat = NULL;
 
 static void test_io_channel_command_fifo(bool async)
 {
+    g_autofree gchar *tmpdir = g_dir_make_tmp("qemu-test-io-channel.XXXXXX", NULL);
+    g_autofree gchar *fifo = g_strdup_printf("%s/%s", tmpdir, TEST_FIFO);
+    g_autoptr(GString) srcargs = g_string_new(socat);
+    g_autoptr(GString) dstargs = g_string_new(socat);
+    g_auto(GStrv) srcargv;
+    g_auto(GStrv) dstargv;
     QIOChannel *src, *dst;
     QIOChannelTest *test;
-    const char *srcargv[] = {
-        socat, "-", SOCAT_SRC, NULL,
-    };
-    const char *dstargv[] = {
-        socat, SOCAT_DST, "-", NULL,
-    };
 
-    if (!socat) {
-        g_test_skip("socat is not found in PATH");
-        return;
-    }
+    g_string_append_printf(srcargs, " - PIPE:%s,wronly,creat=1,excl=1", fifo);
+    g_string_append_printf(dstargs, " PIPE:%s,rdonly,creat=0 -", fifo);
+
+    srcargv = g_strsplit(srcargs->str, " ", -1);
+    dstargv = g_strsplit(dstargs->str, " ", -1);
 
-    unlink(TEST_FIFO);
-    src = QIO_CHANNEL(qio_channel_command_new_spawn(srcargv,
+    src = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) srcargv,
                                                     O_WRONLY,
                                                     &error_abort));
-    dst = QIO_CHANNEL(qio_channel_command_new_spawn(dstargv,
+
+    dst = QIO_CHANNEL(qio_channel_command_new_spawn((const char**) dstargv,
                                                     O_RDONLY,
                                                     &error_abort));
 
@@ -62,17 +64,27 @@ static void test_io_channel_command_fifo(bool async)
     object_unref(OBJECT(src));
     object_unref(OBJECT(dst));
 
-    unlink(TEST_FIFO);
+    g_rmdir(tmpdir);
 }
 
 
 static void test_io_channel_command_fifo_async(void)
 {
+    if (!socat) {
+        g_test_skip("socat is not found in PATH");
+        return;
+    }
+
     test_io_channel_command_fifo(true);
 }
 
 static void test_io_channel_command_fifo_sync(void)
 {
+    if (!socat) {
+        g_test_skip("socat is not found in PATH");
+        return;
+    }
+
     test_io_channel_command_fifo(false);
 }
--8<---------------cut here---------------end--------------->8---


>
>  Thomas

Daniel P. Berrangé Oct. 27, 2022, 2:26 p.m. UTC | #6

On Thu, Oct 27, 2022 at 02:59:04PM +0100, Alex Bennée wrote:
> 
> Thomas Huth <thuth@redhat.com> writes:
> 
> > On 26/10/2022 18.18, Alex Bennée wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >>> CC'ing Marc-André as original author of the change
> >>>
> >>> On Tue, Oct 25, 2022 at 01:57:23PM +0100, Alex Bennée wrote:
> >>>>
> >>>> Juan Quintela <quintela@redhat.com> writes:
> >>>>
> >>>>> Previous commit removed the creation of the fifo.  Without it, I get
> >>>>> random failure during tests with high load, please consider
> >>>>> reintroduce it.
> >>>>>
> >>>>> My guess is that there is a race between the two socats when we leave
> >>>>> them to create the channel, better return to the previous behavior.
> >>>>>
> >>>>> I can't reproduce the problem when I run ./test-io-channel-command
> >>>>> test alone, I need to do the make check.  And any (unrelated) change
> >>>>> can make it dissapear.
> >>>>
> >>>> I was chasing a similar problem with this test although I don't see it
> >>>> timeout while running (I don't think our unit tests time out). I'm
> >>>> provisionally queuing this to testing/next unless anyone objects.
> >>>
> >>> It won't build on Win32 since that platform lacks mkfifo.
> >>>
> >>> The test normally works since socat will call mknod to create
> >>> the fifo.
> >>>
> >>> I think the problem is that we have a race condition where the
> >>> client socat runs before the server socat, and so won't see the
> >>> fifo. This will be where high load triggers problems.
> >> Ok I shall drop the patch from testing/next - we need a better
> >> solution.
> >
> > Could we maybe at least revert the patch that introduced the problem?
> > ... the failing test is annoying ...
> 
> I'm trying to fix it now but I haven't been able to get one of the
> socat's to not race with the other:

My other thought was to run a socat first, with 'unlink-close=0' and
make it exit immediately, simply to get the fifo created.  I'm not
sure how to make ti exit immediately though, with a guarantee that
its had a chance to create the fifo.


With regards,
Daniel

tests: Create fifo for test-io-channel-command

Commit Message

Comments

Patch