diff mbox series

[ovs-dev] ovsdb-idl.at: Wait all servers to join the cluster.

Message ID 20200610234538.962020-1-fbl@sysclose.org
State Changes Requested
Headers show
Series [ovs-dev] ovsdb-idl.at: Wait all servers to join the cluster. | expand

Commit Message

Flavio Leitner June 10, 2020, 11:45 p.m. UTC
The test 'Check Python IDL reconnects to leader - Python3
(leader only)' fails sometimes when the first ovsdb-server
gets killed before the others had joined the cluster.

Fix the function ovsdb_cluster_start_idltest to wait them
to join the cluster.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Flavio Leitner <fbl@sysclose.org>
---
 tests/ovsdb-idl.at | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Flavio Leitner June 10, 2020, 11:49 p.m. UTC | #1
Hi,

It would be nice to have this applied to branch-2.13 as well.
fbl

On Wed, Jun 10, 2020 at 08:45:38PM -0300, Flavio Leitner wrote:
> The test 'Check Python IDL reconnects to leader - Python3
> (leader only)' fails sometimes when the first ovsdb-server
> gets killed before the others had joined the cluster.
> 
> Fix the function ovsdb_cluster_start_idltest to wait them
> to join the cluster.
> 
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  tests/ovsdb-idl.at | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index b5cbee7d9..c045e9264 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () {
>       ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
>     done
>     on_exit 'kill `cat s*.pid`'
> +   for i in `seq $n`; do
> +     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> +       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +         break
> +       fi
> +       sleep 1
> +     done
> +     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +       return 1
> +     fi
> +   done
>  }
>  
>  # ovsdb_cluster_leader [REMOTES] [DATABASE]
> -- 
> 2.26.2
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Dumitru Ceara June 22, 2020, 8:51 a.m. UTC | #2
On 6/11/20 1:45 AM, Flavio Leitner wrote:
> The test 'Check Python IDL reconnects to leader - Python3
> (leader only)' fails sometimes when the first ovsdb-server
> gets killed before the others had joined the cluster.
> 
> Fix the function ovsdb_cluster_start_idltest to wait them
> to join the cluster.
> 
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  tests/ovsdb-idl.at | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index b5cbee7d9..c045e9264 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () {
>       ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
>     done
>     on_exit 'kill `cat s*.pid`'
> +   for i in `seq $n`; do
> +     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> +       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +         break
> +       fi
> +       sleep 1
> +     done
> +     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +       return 1
> +     fi
> +   done
>  }
>  
>  # ovsdb_cluster_leader [REMOTES] [DATABASE]
> 

Hi Flavio,

Looks good to me.

Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks,
Dumitru
Flavio Leitner Aug. 11, 2020, 2:55 p.m. UTC | #3
Hi,

This fix the testsuite which breaks 50% of times inside a VM.
It would be great to have it included in master, 2.14 and older
branches.

fbl

On Wed, Jun 10, 2020 at 08:45:38PM -0300, Flavio Leitner wrote:
> The test 'Check Python IDL reconnects to leader - Python3
> (leader only)' fails sometimes when the first ovsdb-server
> gets killed before the others had joined the cluster.
> 
> Fix the function ovsdb_cluster_start_idltest to wait them
> to join the cluster.
> 
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  tests/ovsdb-idl.at | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index b5cbee7d9..c045e9264 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () {
>       ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
>     done
>     on_exit 'kill `cat s*.pid`'
> +   for i in `seq $n`; do
> +     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> +       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +         break
> +       fi
> +       sleep 1
> +     done
> +     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +       return 1
> +     fi
> +   done
>  }
>  
>  # ovsdb_cluster_leader [REMOTES] [DATABASE]
> -- 
> 2.26.2
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Sept. 3, 2020, 9:20 p.m. UTC | #4
On 6/11/20 1:45 AM, Flavio Leitner wrote:
> The test 'Check Python IDL reconnects to leader - Python3
> (leader only)' fails sometimes when the first ovsdb-server
> gets killed before the others had joined the cluster.
> 
> Fix the function ovsdb_cluster_start_idltest to wait them
> to join the cluster.

Hi, Flavio.  Thanks for the fix and sorry for delays.

Patch seems OK, but I'm not very comfortable with the code duplication
between this function and OVS_WAIT_UNTIL macro.  Have you considered
conversion of ovsdb_cluster_start_idltest() function into m4_define()
macro so we could easily use OVS_WAIT_UNTIL inside of it?

Best regards, Ilya Maximets.

> 
> Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> ---
>  tests/ovsdb-idl.at | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index b5cbee7d9..c045e9264 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () {
>       ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
>     done
>     on_exit 'kill `cat s*.pid`'
> +   for i in `seq $n`; do
> +     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> +       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +         break
> +       fi
> +       sleep 1
> +     done
> +     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> +       return 1
> +     fi
> +   done
>  }
>  
>  # ovsdb_cluster_leader [REMOTES] [DATABASE]
>
Flavio Leitner Sept. 3, 2020, 10:05 p.m. UTC | #5
On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
> On 6/11/20 1:45 AM, Flavio Leitner wrote:
> > The test 'Check Python IDL reconnects to leader - Python3
> > (leader only)' fails sometimes when the first ovsdb-server
> > gets killed before the others had joined the cluster.
> > 
> > Fix the function ovsdb_cluster_start_idltest to wait them
> > to join the cluster.
> 
> Hi, Flavio.  Thanks for the fix and sorry for delays.
> 
> Patch seems OK, but I'm not very comfortable with the code duplication
> between this function and OVS_WAIT_UNTIL macro.  Have you considered
> conversion of ovsdb_cluster_start_idltest() function into m4_define()
> macro so we could easily use OVS_WAIT_UNTIL inside of it?

I tried, but I ran into issues and I am not familiar with m4.

fbl


> 
> Best regards, Ilya Maximets.
> 
> > 
> > Suggested-by: Ilya Maximets <i.maximets@ovn.org>
> > Signed-off-by: Flavio Leitner <fbl@sysclose.org>
> > ---
> >  tests/ovsdb-idl.at | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> > index b5cbee7d9..c045e9264 100644
> > --- a/tests/ovsdb-idl.at
> > +++ b/tests/ovsdb-idl.at
> > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () {
> >       ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
> >     done
> >     on_exit 'kill `cat s*.pid`'
> > +   for i in `seq $n`; do
> > +     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
> > +       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> > +         break
> > +       fi
> > +       sleep 1
> > +     done
> > +     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
> > +       return 1
> > +     fi
> > +   done
> >  }
> >  
> >  # ovsdb_cluster_leader [REMOTES] [DATABASE]
> > 
>
Ilya Maximets Sept. 3, 2020, 10:54 p.m. UTC | #6
On 9/4/20 12:05 AM, Flavio Leitner wrote:
> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
>> On 6/11/20 1:45 AM, Flavio Leitner wrote:
>>> The test 'Check Python IDL reconnects to leader - Python3
>>> (leader only)' fails sometimes when the first ovsdb-server
>>> gets killed before the others had joined the cluster.
>>>
>>> Fix the function ovsdb_cluster_start_idltest to wait them
>>> to join the cluster.
>>
>> Hi, Flavio.  Thanks for the fix and sorry for delays.
>>
>> Patch seems OK, but I'm not very comfortable with the code duplication
>> between this function and OVS_WAIT_UNTIL macro.  Have you considered
>> conversion of ovsdb_cluster_start_idltest() function into m4_define()
>> macro so we could easily use OVS_WAIT_UNTIL inside of it?
> 
> I tried, but I ran into issues and I am not familiar with m4.

Could you try following diff:

diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 789ae23a9..215f0a2d0 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -12,25 +12,6 @@ ovsdb_start_idltest () {
     on_exit 'kill `cat ovsdb-server.pid`'
 }
 
-# ovsdb_cluster_start_idltest [REMOTE] [SCHEMA]
-#
-# Creates a database using SCHEMA (default: idltest.ovsschema) and
-# starts a database cluster listening on punix:socket and REMOTE (if
-# specified).
-ovsdb_cluster_start_idltest () {
-   local n=$1
-   ovsdb-tool create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft || return $?
-   cid=`ovsdb-tool db-cid s1.db`
-   schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema`
-   for i in `seq 2 $n`; do
-     ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft || return $?
-   done
-   for i in `seq $n`; do
-     ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
-   done
-   on_exit 'kill `cat s*.pid`'
-}
-
 # ovsdb_cluster_leader [REMOTES] [DATABASE]
 #
 # Returns the leader of the DATABASE cluster.
@@ -48,6 +29,34 @@ ovsdb_cluster_leader () {
    done
 }])
 
+# OVSDB_CLUSTER_START_IDLTEST([N], [REMOTE])
+#
+# Creates a clustered database using idltest.ovsschema and starts a database
+# cluster of N servers listening on punix:socket and REMOTE (if specified).
+m4_define([OVSDB_CLUSTER_START_IDLTEST],
+  [n=$1
+   AT_CHECK([ovsdb-tool create-cluster s1.db \
+                        $abs_srcdir/idltest.ovsschema unix:s1.raft])
+   cid=$(ovsdb-tool db-cid s1.db)
+   schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema)
+   for i in $(seq 2 $n); do
+     AT_CHECK([ovsdb-tool join-cluster s$i.db \
+                          $schema_name unix:s$i.raft unix:s1.raft])
+   done
+   for i in $(seq $n); do
+     AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
+                   --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
+                   --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db])
+   done
+   on_exit 'kill $(cat s*.pid)'
+
+   for i in $(seq $n); do
+       OVS_WAIT_UNTIL([ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
+                                           | grep -q 'Status: cluster member'])
+   done
+])
+
+
 # OVSDB_CHECK_IDL_C(TITLE, [PRE-IDL-TXN], TRANSACTIONS, OUTPUT, [KEYWORDS],
 #                   [FILTER])
 #
@@ -1813,7 +1822,7 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY],
    AT_SKIP_IF([test "$IS_ARM64" = "yes"])
    AT_KEYWORDS([ovsdb server idl Python leader_only with tcp socket])
    m4_define([LPBK],[127.0.0.1])
-   AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK])
+   OVSDB_CLUSTER_START_IDLTEST([$2], ["ptcp:0:"LPBK])
    PARSE_LISTENING_PORT([s2.log], [TCP_PORT_1])
    PARSE_LISTENING_PORT([s3.log], [TCP_PORT_2])
    PARSE_LISTENING_PORT([s1.log], [TCP_PORT_3])
@@ -1836,7 +1845,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_C],
   [AT_SETUP([$1 - C - tcp])
    AT_KEYWORDS([ovsdb server idl positive tcp socket $5])
    m4_define([LPBK],[127.0.0.1])
-   AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK])
+   OVSDB_CLUSTER_START_IDLTEST([$2], ["ptcp:0:"LPBK])
    PARSE_LISTENING_PORT([s1.log], [TCP_PORT_1])
    PARSE_LISTENING_PORT([s2.log], [TCP_PORT_2])
    PARSE_LISTENING_PORT([s3.log], [TCP_PORT_3])
Flavio Leitner Sept. 4, 2020, 3:09 p.m. UTC | #7
On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote:
> On 9/4/20 12:05 AM, Flavio Leitner wrote:
> > On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
> >> On 6/11/20 1:45 AM, Flavio Leitner wrote:
> >>> The test 'Check Python IDL reconnects to leader - Python3
> >>> (leader only)' fails sometimes when the first ovsdb-server
> >>> gets killed before the others had joined the cluster.
> >>>
> >>> Fix the function ovsdb_cluster_start_idltest to wait them
> >>> to join the cluster.
> >>
> >> Hi, Flavio.  Thanks for the fix and sorry for delays.
> >>
> >> Patch seems OK, but I'm not very comfortable with the code duplication
> >> between this function and OVS_WAIT_UNTIL macro.  Have you considered
> >> conversion of ovsdb_cluster_start_idltest() function into m4_define()
> >> macro so we could easily use OVS_WAIT_UNTIL inside of it?
> > 
> > I tried, but I ran into issues and I am not familiar with m4.
> 
> Could you try following diff:

I tried, but it fails. The test logs are here:
http://people.redhat.com/~fleitner/2127.tar.bz2


#                             -*- compilation -*-
2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ...
./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \
                        $abs_srcdir/idltest.ovsschema unix:s1.raft
./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
                          $schema_name unix:s$i.raft unix:s1.raft
./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
                          $schema_name unix:s$i.raft unix:s1.raft
./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
                   --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
                   --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
                   --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
                   --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
                   --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
                   --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
                                           | grep -q 'Status: cluster member'...
ovsdb-idl.at:1840: wait succeeded immediately
ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
                                           | grep -q 'Status: cluster member'...
ovsdb-idl.at:1840: wait succeeded after 1 seconds
ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
                                           | grep -q 'Status: cluster member'...
ovsdb-idl.at:1840: wait succeeded immediately
ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"...
ovsdb-idl.at:1840: wait failed after 30 seconds
./ovs-macros.at:241: hard failure
2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241)
Ilya Maximets Sept. 4, 2020, 6:07 p.m. UTC | #8
On 9/4/20 5:09 PM, Flavio Leitner wrote:
> On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote:
>> On 9/4/20 12:05 AM, Flavio Leitner wrote:
>>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
>>>> On 6/11/20 1:45 AM, Flavio Leitner wrote:
>>>>> The test 'Check Python IDL reconnects to leader - Python3
>>>>> (leader only)' fails sometimes when the first ovsdb-server
>>>>> gets killed before the others had joined the cluster.
>>>>>
>>>>> Fix the function ovsdb_cluster_start_idltest to wait them
>>>>> to join the cluster.
>>>>
>>>> Hi, Flavio.  Thanks for the fix and sorry for delays.
>>>>
>>>> Patch seems OK, but I'm not very comfortable with the code duplication
>>>> between this function and OVS_WAIT_UNTIL macro.  Have you considered
>>>> conversion of ovsdb_cluster_start_idltest() function into m4_define()
>>>> macro so we could easily use OVS_WAIT_UNTIL inside of it?
>>>
>>> I tried, but I ran into issues and I am not familiar with m4.
>>
>> Could you try following diff:
> 
> I tried, but it fails. The test logs are here:
> http://people.redhat.com/~fleitner/2127.tar.bz2

Oh, sorry, my fault.  We can't use shell expansions over m4 definitions.
Should work with following change:

diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 075250a9c..30e896e3e 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST],
    for i in $(seq $n); do
      AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
-                   --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db])
+                   --remote=punix:s$i.ovsdb                           \
+                   m4_if([$2], [], [], [--remote=$2]) s$i.db])
    done
    on_exit 'kill $(cat s*.pid)'
 
---

> 
> 
> #                             -*- compilation -*-
> 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ...
> ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \
>                         $abs_srcdir/idltest.ovsschema unix:s1.raft
> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
>                           $schema_name unix:s$i.raft unix:s1.raft
> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
>                           $schema_name unix:s$i.raft unix:s1.raft
> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>                                            | grep -q 'Status: cluster member'...
> ovsdb-idl.at:1840: wait succeeded immediately
> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>                                            | grep -q 'Status: cluster member'...
> ovsdb-idl.at:1840: wait succeeded after 1 seconds
> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>                                            | grep -q 'Status: cluster member'...
> ovsdb-idl.at:1840: wait succeeded immediately
> ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"...
> ovsdb-idl.at:1840: wait failed after 30 seconds
> ./ovs-macros.at:241: hard failure
> 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241)
>
Flavio Leitner Sept. 4, 2020, 7:25 p.m. UTC | #9
On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote:
> On 9/4/20 5:09 PM, Flavio Leitner wrote:
> > On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote:
> >> On 9/4/20 12:05 AM, Flavio Leitner wrote:
> >>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
> >>>> On 6/11/20 1:45 AM, Flavio Leitner wrote:
> >>>>> The test 'Check Python IDL reconnects to leader - Python3
> >>>>> (leader only)' fails sometimes when the first ovsdb-server
> >>>>> gets killed before the others had joined the cluster.
> >>>>>
> >>>>> Fix the function ovsdb_cluster_start_idltest to wait them
> >>>>> to join the cluster.
> >>>>
> >>>> Hi, Flavio.  Thanks for the fix and sorry for delays.
> >>>>
> >>>> Patch seems OK, but I'm not very comfortable with the code duplication
> >>>> between this function and OVS_WAIT_UNTIL macro.  Have you considered
> >>>> conversion of ovsdb_cluster_start_idltest() function into m4_define()
> >>>> macro so we could easily use OVS_WAIT_UNTIL inside of it?
> >>>
> >>> I tried, but I ran into issues and I am not familiar with m4.
> >>
> >> Could you try following diff:
> > 
> > I tried, but it fails. The test logs are here:
> > http://people.redhat.com/~fleitner/2127.tar.bz2
> 
> Oh, sorry, my fault.  We can't use shell expansions over m4 definitions.
> Should work with following change:

No worries!

The test passes all times with the fix below.
Thanks for improving the fix!
fbl


> 
> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
> index 075250a9c..30e896e3e 100644
> --- a/tests/ovsdb-idl.at
> +++ b/tests/ovsdb-idl.at
> @@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST],
>     for i in $(seq $n); do
>       AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>                     --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
> -                   --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db])
> +                   --remote=punix:s$i.ovsdb                           \
> +                   m4_if([$2], [], [], [--remote=$2]) s$i.db])
>     done
>     on_exit 'kill $(cat s*.pid)'
>  
> ---
> 
> > 
> > 
> > #                             -*- compilation -*-
> > 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ...
> > ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \
> >                         $abs_srcdir/idltest.ovsschema unix:s1.raft
> > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
> >                           $schema_name unix:s$i.raft unix:s1.raft
> > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
> >                           $schema_name unix:s$i.raft unix:s1.raft
> > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
> >                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
> >                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
> >                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
> >                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
> >                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
> >                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
> > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
> >                                            | grep -q 'Status: cluster member'...
> > ovsdb-idl.at:1840: wait succeeded immediately
> > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
> >                                            | grep -q 'Status: cluster member'...
> > ovsdb-idl.at:1840: wait succeeded after 1 seconds
> > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
> >                                            | grep -q 'Status: cluster member'...
> > ovsdb-idl.at:1840: wait succeeded immediately
> > ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"...
> > ovsdb-idl.at:1840: wait failed after 30 seconds
> > ./ovs-macros.at:241: hard failure
> > 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241)
> > 
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Ilya Maximets Sept. 7, 2020, 11:53 a.m. UTC | #10
On 9/4/20 9:25 PM, Flavio Leitner wrote:
> On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote:
>> On 9/4/20 5:09 PM, Flavio Leitner wrote:
>>> On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote:
>>>> On 9/4/20 12:05 AM, Flavio Leitner wrote:
>>>>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
>>>>>> On 6/11/20 1:45 AM, Flavio Leitner wrote:
>>>>>>> The test 'Check Python IDL reconnects to leader - Python3
>>>>>>> (leader only)' fails sometimes when the first ovsdb-server
>>>>>>> gets killed before the others had joined the cluster.
>>>>>>>
>>>>>>> Fix the function ovsdb_cluster_start_idltest to wait them
>>>>>>> to join the cluster.
>>>>>>
>>>>>> Hi, Flavio.  Thanks for the fix and sorry for delays.
>>>>>>
>>>>>> Patch seems OK, but I'm not very comfortable with the code duplication
>>>>>> between this function and OVS_WAIT_UNTIL macro.  Have you considered
>>>>>> conversion of ovsdb_cluster_start_idltest() function into m4_define()
>>>>>> macro so we could easily use OVS_WAIT_UNTIL inside of it?
>>>>>
>>>>> I tried, but I ran into issues and I am not familiar with m4.
>>>>
>>>> Could you try following diff:
>>>
>>> I tried, but it fails. The test logs are here:
>>> http://people.redhat.com/~fleitner/2127.tar.bz2
>>
>> Oh, sorry, my fault.  We can't use shell expansions over m4 definitions.
>> Should work with following change:
> 
> No worries!
> 
> The test passes all times with the fix below.

Great!  Would you mind sending v2?

Best regards, Ilya Maximets.


> Thanks for improving the fix!
> fbl
> 
> 
>>
>> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
>> index 075250a9c..30e896e3e 100644
>> --- a/tests/ovsdb-idl.at
>> +++ b/tests/ovsdb-idl.at
>> @@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST],
>>     for i in $(seq $n); do
>>       AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>>                     --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>> -                   --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db])
>> +                   --remote=punix:s$i.ovsdb                           \
>> +                   m4_if([$2], [], [], [--remote=$2]) s$i.db])
>>     done
>>     on_exit 'kill $(cat s*.pid)'
>>  
>> ---
>>
>>>
>>>
>>> #                             -*- compilation -*-
>>> 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ...
>>> ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \
>>>                         $abs_srcdir/idltest.ovsschema unix:s1.raft
>>> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
>>>                           $schema_name unix:s$i.raft unix:s1.raft
>>> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \
>>>                           $schema_name unix:s$i.raft unix:s1.raft
>>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>>>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>>>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
>>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>>>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>>>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
>>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \
>>>                    --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \
>>>                    --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db
>>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>>>                                            | grep -q 'Status: cluster member'...
>>> ovsdb-idl.at:1840: wait succeeded immediately
>>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>>>                                            | grep -q 'Status: cluster member'...
>>> ovsdb-idl.at:1840: wait succeeded after 1 seconds
>>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \
>>>                                            | grep -q 'Status: cluster member'...
>>> ovsdb-idl.at:1840: wait succeeded immediately
>>> ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"...
>>> ovsdb-idl.at:1840: wait failed after 30 seconds
>>> ./ovs-macros.at:241: hard failure
>>> 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241)
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Flavio Leitner Sept. 7, 2020, 3:46 p.m. UTC | #11
On Mon, Sep 07, 2020 at 01:53:11PM +0200, Ilya Maximets wrote:
> On 9/4/20 9:25 PM, Flavio Leitner wrote:
> > On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote:
> >> On 9/4/20 5:09 PM, Flavio Leitner wrote:
> >>> On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote:
> >>>> On 9/4/20 12:05 AM, Flavio Leitner wrote:
> >>>>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote:
> >>>>>> On 6/11/20 1:45 AM, Flavio Leitner wrote:
> >>>>>>> The test 'Check Python IDL reconnects to leader - Python3
> >>>>>>> (leader only)' fails sometimes when the first ovsdb-server
> >>>>>>> gets killed before the others had joined the cluster.
> >>>>>>>
> >>>>>>> Fix the function ovsdb_cluster_start_idltest to wait them
> >>>>>>> to join the cluster.
> >>>>>>
> >>>>>> Hi, Flavio.  Thanks for the fix and sorry for delays.
> >>>>>>
> >>>>>> Patch seems OK, but I'm not very comfortable with the code duplication
> >>>>>> between this function and OVS_WAIT_UNTIL macro.  Have you considered
> >>>>>> conversion of ovsdb_cluster_start_idltest() function into m4_define()
> >>>>>> macro so we could easily use OVS_WAIT_UNTIL inside of it?
> >>>>>
> >>>>> I tried, but I ran into issues and I am not familiar with m4.
> >>>>
> >>>> Could you try following diff:
> >>>
> >>> I tried, but it fails. The test logs are here:
> >>> http://people.redhat.com/~fleitner/2127.tar.bz2
> >>
> >> Oh, sorry, my fault.  We can't use shell expansions over m4 definitions.
> >> Should work with following change:
> > 
> > No worries!
> > 
> > The test passes all times with the fix below.
> 
> Great!  Would you mind sending v2?

Done:
https://mail.openvswitch.org/pipermail/ovs-dev/2020-September/374572.html

fbl
diff mbox series

Patch

diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index b5cbee7d9..c045e9264 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -29,6 +29,17 @@  ovsdb_cluster_start_idltest () {
      ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $?
    done
    on_exit 'kill `cat s*.pid`'
+   for i in `seq $n`; do
+     for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do
+       if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
+         break
+       fi
+       sleep 1
+     done
+     if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then
+       return 1
+     fi
+   done
 }
 
 # ovsdb_cluster_leader [REMOTES] [DATABASE]