Message ID | 20200610234538.962020-1-fbl@sysclose.org |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] ovsdb-idl.at: Wait all servers to join the cluster. | expand |
Hi, It would be nice to have this applied to branch-2.13 as well. fbl On Wed, Jun 10, 2020 at 08:45:38PM -0300, Flavio Leitner wrote: > The test 'Check Python IDL reconnects to leader - Python3 > (leader only)' fails sometimes when the first ovsdb-server > gets killed before the others had joined the cluster. > > Fix the function ovsdb_cluster_start_idltest to wait them > to join the cluster. > > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > tests/ovsdb-idl.at | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index b5cbee7d9..c045e9264 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { > ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? > done > on_exit 'kill `cat s*.pid`' > + for i in `seq $n`; do > + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do > + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + break > + fi > + sleep 1 > + done > + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + return 1 > + fi > + done > } > > # ovsdb_cluster_leader [REMOTES] [DATABASE] > -- > 2.26.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 6/11/20 1:45 AM, Flavio Leitner wrote: > The test 'Check Python IDL reconnects to leader - Python3 > (leader only)' fails sometimes when the first ovsdb-server > gets killed before the others had joined the cluster. > > Fix the function ovsdb_cluster_start_idltest to wait them > to join the cluster. > > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > tests/ovsdb-idl.at | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index b5cbee7d9..c045e9264 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { > ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? > done > on_exit 'kill `cat s*.pid`' > + for i in `seq $n`; do > + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do > + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + break > + fi > + sleep 1 > + done > + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + return 1 > + fi > + done > } > > # ovsdb_cluster_leader [REMOTES] [DATABASE] > Hi Flavio, Looks good to me. Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks, Dumitru
Hi, This fix the testsuite which breaks 50% of times inside a VM. It would be great to have it included in master, 2.14 and older branches. fbl On Wed, Jun 10, 2020 at 08:45:38PM -0300, Flavio Leitner wrote: > The test 'Check Python IDL reconnects to leader - Python3 > (leader only)' fails sometimes when the first ovsdb-server > gets killed before the others had joined the cluster. > > Fix the function ovsdb_cluster_start_idltest to wait them > to join the cluster. > > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > tests/ovsdb-idl.at | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index b5cbee7d9..c045e9264 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { > ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? > done > on_exit 'kill `cat s*.pid`' > + for i in `seq $n`; do > + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do > + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + break > + fi > + sleep 1 > + done > + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + return 1 > + fi > + done > } > > # ovsdb_cluster_leader [REMOTES] [DATABASE] > -- > 2.26.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 6/11/20 1:45 AM, Flavio Leitner wrote: > The test 'Check Python IDL reconnects to leader - Python3 > (leader only)' fails sometimes when the first ovsdb-server > gets killed before the others had joined the cluster. > > Fix the function ovsdb_cluster_start_idltest to wait them > to join the cluster. Hi, Flavio. Thanks for the fix and sorry for delays. Patch seems OK, but I'm not very comfortable with the code duplication between this function and OVS_WAIT_UNTIL macro. Have you considered conversion of ovsdb_cluster_start_idltest() function into m4_define() macro so we could easily use OVS_WAIT_UNTIL inside of it? Best regards, Ilya Maximets. > > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > --- > tests/ovsdb-idl.at | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index b5cbee7d9..c045e9264 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { > ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? > done > on_exit 'kill `cat s*.pid`' > + for i in `seq $n`; do > + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do > + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + break > + fi > + sleep 1 > + done > + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > + return 1 > + fi > + done > } > > # ovsdb_cluster_leader [REMOTES] [DATABASE] >
On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: > On 6/11/20 1:45 AM, Flavio Leitner wrote: > > The test 'Check Python IDL reconnects to leader - Python3 > > (leader only)' fails sometimes when the first ovsdb-server > > gets killed before the others had joined the cluster. > > > > Fix the function ovsdb_cluster_start_idltest to wait them > > to join the cluster. > > Hi, Flavio. Thanks for the fix and sorry for delays. > > Patch seems OK, but I'm not very comfortable with the code duplication > between this function and OVS_WAIT_UNTIL macro. Have you considered > conversion of ovsdb_cluster_start_idltest() function into m4_define() > macro so we could easily use OVS_WAIT_UNTIL inside of it? I tried, but I ran into issues and I am not familiar with m4. fbl > > Best regards, Ilya Maximets. > > > > > Suggested-by: Ilya Maximets <i.maximets@ovn.org> > > Signed-off-by: Flavio Leitner <fbl@sysclose.org> > > --- > > tests/ovsdb-idl.at | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > > index b5cbee7d9..c045e9264 100644 > > --- a/tests/ovsdb-idl.at > > +++ b/tests/ovsdb-idl.at > > @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { > > ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? > > done > > on_exit 'kill `cat s*.pid`' > > + for i in `seq $n`; do > > + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do > > + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > > + break > > + fi > > + sleep 1 > > + done > > + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then > > + return 1 > > + fi > > + done > > } > > > > # ovsdb_cluster_leader [REMOTES] [DATABASE] > > >
On 9/4/20 12:05 AM, Flavio Leitner wrote: > On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: >> On 6/11/20 1:45 AM, Flavio Leitner wrote: >>> The test 'Check Python IDL reconnects to leader - Python3 >>> (leader only)' fails sometimes when the first ovsdb-server >>> gets killed before the others had joined the cluster. >>> >>> Fix the function ovsdb_cluster_start_idltest to wait them >>> to join the cluster. >> >> Hi, Flavio. Thanks for the fix and sorry for delays. >> >> Patch seems OK, but I'm not very comfortable with the code duplication >> between this function and OVS_WAIT_UNTIL macro. Have you considered >> conversion of ovsdb_cluster_start_idltest() function into m4_define() >> macro so we could easily use OVS_WAIT_UNTIL inside of it? > > I tried, but I ran into issues and I am not familiar with m4. Could you try following diff: diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index 789ae23a9..215f0a2d0 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -12,25 +12,6 @@ ovsdb_start_idltest () { on_exit 'kill `cat ovsdb-server.pid`' } -# ovsdb_cluster_start_idltest [REMOTE] [SCHEMA] -# -# Creates a database using SCHEMA (default: idltest.ovsschema) and -# starts a database cluster listening on punix:socket and REMOTE (if -# specified). -ovsdb_cluster_start_idltest () { - local n=$1 - ovsdb-tool create-cluster s1.db $abs_srcdir/idltest.ovsschema unix:s1.raft || return $? - cid=`ovsdb-tool db-cid s1.db` - schema_name=`ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema` - for i in `seq 2 $n`; do - ovsdb-tool join-cluster s$i.db $schema_name unix:s$i.raft unix:s1.raft || return $? - done - for i in `seq $n`; do - ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? - done - on_exit 'kill `cat s*.pid`' -} - # ovsdb_cluster_leader [REMOTES] [DATABASE] # # Returns the leader of the DATABASE cluster. @@ -48,6 +29,34 @@ ovsdb_cluster_leader () { done }]) +# OVSDB_CLUSTER_START_IDLTEST([N], [REMOTE]) +# +# Creates a clustered database using idltest.ovsschema and starts a database +# cluster of N servers listening on punix:socket and REMOTE (if specified). +m4_define([OVSDB_CLUSTER_START_IDLTEST], + [n=$1 + AT_CHECK([ovsdb-tool create-cluster s1.db \ + $abs_srcdir/idltest.ovsschema unix:s1.raft]) + cid=$(ovsdb-tool db-cid s1.db) + schema_name=$(ovsdb-tool schema-name $abs_srcdir/idltest.ovsschema) + for i in $(seq 2 $n); do + AT_CHECK([ovsdb-tool join-cluster s$i.db \ + $schema_name unix:s$i.raft unix:s1.raft]) + done + for i in $(seq $n); do + AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ + --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ + --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db]) + done + on_exit 'kill $(cat s*.pid)' + + for i in $(seq $n); do + OVS_WAIT_UNTIL([ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ + | grep -q 'Status: cluster member']) + done +]) + + # OVSDB_CHECK_IDL_C(TITLE, [PRE-IDL-TXN], TRANSACTIONS, OUTPUT, [KEYWORDS], # [FILTER]) # @@ -1813,7 +1822,7 @@ m4_define([OVSDB_CHECK_IDL_LEADER_ONLY_PY], AT_SKIP_IF([test "$IS_ARM64" = "yes"]) AT_KEYWORDS([ovsdb server idl Python leader_only with tcp socket]) m4_define([LPBK],[127.0.0.1]) - AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK]) + OVSDB_CLUSTER_START_IDLTEST([$2], ["ptcp:0:"LPBK]) PARSE_LISTENING_PORT([s2.log], [TCP_PORT_1]) PARSE_LISTENING_PORT([s3.log], [TCP_PORT_2]) PARSE_LISTENING_PORT([s1.log], [TCP_PORT_3]) @@ -1836,7 +1845,7 @@ m4_define([OVSDB_CHECK_CLUSTER_IDL_C], [AT_SETUP([$1 - C - tcp]) AT_KEYWORDS([ovsdb server idl positive tcp socket $5]) m4_define([LPBK],[127.0.0.1]) - AT_CHECK([ovsdb_cluster_start_idltest $2 "ptcp:0:"LPBK]) + OVSDB_CLUSTER_START_IDLTEST([$2], ["ptcp:0:"LPBK]) PARSE_LISTENING_PORT([s1.log], [TCP_PORT_1]) PARSE_LISTENING_PORT([s2.log], [TCP_PORT_2]) PARSE_LISTENING_PORT([s3.log], [TCP_PORT_3])
On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote: > On 9/4/20 12:05 AM, Flavio Leitner wrote: > > On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: > >> On 6/11/20 1:45 AM, Flavio Leitner wrote: > >>> The test 'Check Python IDL reconnects to leader - Python3 > >>> (leader only)' fails sometimes when the first ovsdb-server > >>> gets killed before the others had joined the cluster. > >>> > >>> Fix the function ovsdb_cluster_start_idltest to wait them > >>> to join the cluster. > >> > >> Hi, Flavio. Thanks for the fix and sorry for delays. > >> > >> Patch seems OK, but I'm not very comfortable with the code duplication > >> between this function and OVS_WAIT_UNTIL macro. Have you considered > >> conversion of ovsdb_cluster_start_idltest() function into m4_define() > >> macro so we could easily use OVS_WAIT_UNTIL inside of it? > > > > I tried, but I ran into issues and I am not familiar with m4. > > Could you try following diff: I tried, but it fails. The test logs are here: http://people.redhat.com/~fleitner/2127.tar.bz2 # -*- compilation -*- 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ... ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \ $abs_srcdir/idltest.ovsschema unix:s1.raft ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ $schema_name unix:s$i.raft unix:s1.raft ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ $schema_name unix:s$i.raft unix:s1.raft ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ | grep -q 'Status: cluster member'... ovsdb-idl.at:1840: wait succeeded immediately ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ | grep -q 'Status: cluster member'... ovsdb-idl.at:1840: wait succeeded after 1 seconds ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ | grep -q 'Status: cluster member'... ovsdb-idl.at:1840: wait succeeded immediately ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"... ovsdb-idl.at:1840: wait failed after 30 seconds ./ovs-macros.at:241: hard failure 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241)
On 9/4/20 5:09 PM, Flavio Leitner wrote: > On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote: >> On 9/4/20 12:05 AM, Flavio Leitner wrote: >>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: >>>> On 6/11/20 1:45 AM, Flavio Leitner wrote: >>>>> The test 'Check Python IDL reconnects to leader - Python3 >>>>> (leader only)' fails sometimes when the first ovsdb-server >>>>> gets killed before the others had joined the cluster. >>>>> >>>>> Fix the function ovsdb_cluster_start_idltest to wait them >>>>> to join the cluster. >>>> >>>> Hi, Flavio. Thanks for the fix and sorry for delays. >>>> >>>> Patch seems OK, but I'm not very comfortable with the code duplication >>>> between this function and OVS_WAIT_UNTIL macro. Have you considered >>>> conversion of ovsdb_cluster_start_idltest() function into m4_define() >>>> macro so we could easily use OVS_WAIT_UNTIL inside of it? >>> >>> I tried, but I ran into issues and I am not familiar with m4. >> >> Could you try following diff: > > I tried, but it fails. The test logs are here: > http://people.redhat.com/~fleitner/2127.tar.bz2 Oh, sorry, my fault. We can't use shell expansions over m4 definitions. Should work with following change: diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index 075250a9c..30e896e3e 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST], for i in $(seq $n); do AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ - --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db]) + --remote=punix:s$i.ovsdb \ + m4_if([$2], [], [], [--remote=$2]) s$i.db]) done on_exit 'kill $(cat s*.pid)' --- > > > # -*- compilation -*- > 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ... > ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \ > $abs_srcdir/idltest.ovsschema unix:s1.raft > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ > $schema_name unix:s$i.raft unix:s1.raft > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ > $schema_name unix:s$i.raft unix:s1.raft > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > | grep -q 'Status: cluster member'... > ovsdb-idl.at:1840: wait succeeded immediately > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > | grep -q 'Status: cluster member'... > ovsdb-idl.at:1840: wait succeeded after 1 seconds > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > | grep -q 'Status: cluster member'... > ovsdb-idl.at:1840: wait succeeded immediately > ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"... > ovsdb-idl.at:1840: wait failed after 30 seconds > ./ovs-macros.at:241: hard failure > 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241) >
On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote: > On 9/4/20 5:09 PM, Flavio Leitner wrote: > > On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote: > >> On 9/4/20 12:05 AM, Flavio Leitner wrote: > >>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: > >>>> On 6/11/20 1:45 AM, Flavio Leitner wrote: > >>>>> The test 'Check Python IDL reconnects to leader - Python3 > >>>>> (leader only)' fails sometimes when the first ovsdb-server > >>>>> gets killed before the others had joined the cluster. > >>>>> > >>>>> Fix the function ovsdb_cluster_start_idltest to wait them > >>>>> to join the cluster. > >>>> > >>>> Hi, Flavio. Thanks for the fix and sorry for delays. > >>>> > >>>> Patch seems OK, but I'm not very comfortable with the code duplication > >>>> between this function and OVS_WAIT_UNTIL macro. Have you considered > >>>> conversion of ovsdb_cluster_start_idltest() function into m4_define() > >>>> macro so we could easily use OVS_WAIT_UNTIL inside of it? > >>> > >>> I tried, but I ran into issues and I am not familiar with m4. > >> > >> Could you try following diff: > > > > I tried, but it fails. The test logs are here: > > http://people.redhat.com/~fleitner/2127.tar.bz2 > > Oh, sorry, my fault. We can't use shell expansions over m4 definitions. > Should work with following change: No worries! The test passes all times with the fix below. Thanks for improving the fix! fbl > > diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at > index 075250a9c..30e896e3e 100644 > --- a/tests/ovsdb-idl.at > +++ b/tests/ovsdb-idl.at > @@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST], > for i in $(seq $n); do > AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > - --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db]) > + --remote=punix:s$i.ovsdb \ > + m4_if([$2], [], [], [--remote=$2]) s$i.db]) > done > on_exit 'kill $(cat s*.pid)' > > --- > > > > > > > # -*- compilation -*- > > 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ... > > ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \ > > $abs_srcdir/idltest.ovsschema unix:s1.raft > > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ > > $schema_name unix:s$i.raft unix:s1.raft > > ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ > > $schema_name unix:s$i.raft unix:s1.raft > > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > > ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ > > --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ > > --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db > > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > > | grep -q 'Status: cluster member'... > > ovsdb-idl.at:1840: wait succeeded immediately > > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > > | grep -q 'Status: cluster member'... > > ovsdb-idl.at:1840: wait succeeded after 1 seconds > > ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ > > | grep -q 'Status: cluster member'... > > ovsdb-idl.at:1840: wait succeeded immediately > > ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"... > > ovsdb-idl.at:1840: wait failed after 30 seconds > > ./ovs-macros.at:241: hard failure > > 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241) > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 9/4/20 9:25 PM, Flavio Leitner wrote: > On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote: >> On 9/4/20 5:09 PM, Flavio Leitner wrote: >>> On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote: >>>> On 9/4/20 12:05 AM, Flavio Leitner wrote: >>>>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: >>>>>> On 6/11/20 1:45 AM, Flavio Leitner wrote: >>>>>>> The test 'Check Python IDL reconnects to leader - Python3 >>>>>>> (leader only)' fails sometimes when the first ovsdb-server >>>>>>> gets killed before the others had joined the cluster. >>>>>>> >>>>>>> Fix the function ovsdb_cluster_start_idltest to wait them >>>>>>> to join the cluster. >>>>>> >>>>>> Hi, Flavio. Thanks for the fix and sorry for delays. >>>>>> >>>>>> Patch seems OK, but I'm not very comfortable with the code duplication >>>>>> between this function and OVS_WAIT_UNTIL macro. Have you considered >>>>>> conversion of ovsdb_cluster_start_idltest() function into m4_define() >>>>>> macro so we could easily use OVS_WAIT_UNTIL inside of it? >>>>> >>>>> I tried, but I ran into issues and I am not familiar with m4. >>>> >>>> Could you try following diff: >>> >>> I tried, but it fails. The test logs are here: >>> http://people.redhat.com/~fleitner/2127.tar.bz2 >> >> Oh, sorry, my fault. We can't use shell expansions over m4 definitions. >> Should work with following change: > > No worries! > > The test passes all times with the fix below. Great! Would you mind sending v2? Best regards, Ilya Maximets. > Thanks for improving the fix! > fbl > > >> >> diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at >> index 075250a9c..30e896e3e 100644 >> --- a/tests/ovsdb-idl.at >> +++ b/tests/ovsdb-idl.at >> @@ -46,7 +46,8 @@ m4_define([OVSDB_CLUSTER_START_IDLTEST], >> for i in $(seq $n); do >> AT_CHECK([ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ >> --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ >> - --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db]) >> + --remote=punix:s$i.ovsdb \ >> + m4_if([$2], [], [], [--remote=$2]) s$i.db]) >> done >> on_exit 'kill $(cat s*.pid)' >> >> --- >> >>> >>> >>> # -*- compilation -*- >>> 2127. ovsdb-idl.at:1840: testing Check Python IDL reconnects to leader - Python3 (leader only) ... >>> ./ovsdb-idl.at:1840: ovsdb-tool create-cluster s1.db \ >>> $abs_srcdir/idltest.ovsschema unix:s1.raft >>> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ >>> $schema_name unix:s$i.raft unix:s1.raft >>> ./ovsdb-idl.at:1840: ovsdb-tool join-cluster s$i.db \ >>> $schema_name unix:s$i.raft unix:s1.raft >>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ >>> --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ >>> --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db >>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ >>> --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ >>> --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db >>> ./ovsdb-idl.at:1840: ovsdb-server -vraft -vconsole:warn --detach --no-chdir \ >>> --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i \ >>> --remote=punix:s$i.ovsdb ${2:+--remote="ptcp:0:"127.0.0.1} s$i.db >>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ >>> | grep -q 'Status: cluster member'... >>> ovsdb-idl.at:1840: wait succeeded immediately >>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ >>> | grep -q 'Status: cluster member'... >>> ovsdb-idl.at:1840: wait succeeded after 1 seconds >>> ovsdb-idl.at:1840: waiting until ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} \ >>> | grep -q 'Status: cluster member'... >>> ovsdb-idl.at:1840: wait succeeded immediately >>> ovsdb-idl.at:1840: waiting until TCP_PORT_1=`sed -n 's/.*0:.*: listening on port \([0-9]*\)$/\1/p' "s2.log"` && test X != X"$TCP_PORT_1"... >>> ovsdb-idl.at:1840: wait failed after 30 seconds >>> ./ovs-macros.at:241: hard failure >>> 2127. ovsdb-idl.at:1840: 2127. Check Python IDL reconnects to leader - Python3 (leader only) (ovsdb-idl.at:1840): FAILED (ovs-macros.at:241) >>> >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Mon, Sep 07, 2020 at 01:53:11PM +0200, Ilya Maximets wrote: > On 9/4/20 9:25 PM, Flavio Leitner wrote: > > On Fri, Sep 04, 2020 at 08:07:41PM +0200, Ilya Maximets wrote: > >> On 9/4/20 5:09 PM, Flavio Leitner wrote: > >>> On Fri, Sep 04, 2020 at 12:54:39AM +0200, Ilya Maximets wrote: > >>>> On 9/4/20 12:05 AM, Flavio Leitner wrote: > >>>>> On Thu, Sep 03, 2020 at 11:20:56PM +0200, Ilya Maximets wrote: > >>>>>> On 6/11/20 1:45 AM, Flavio Leitner wrote: > >>>>>>> The test 'Check Python IDL reconnects to leader - Python3 > >>>>>>> (leader only)' fails sometimes when the first ovsdb-server > >>>>>>> gets killed before the others had joined the cluster. > >>>>>>> > >>>>>>> Fix the function ovsdb_cluster_start_idltest to wait them > >>>>>>> to join the cluster. > >>>>>> > >>>>>> Hi, Flavio. Thanks for the fix and sorry for delays. > >>>>>> > >>>>>> Patch seems OK, but I'm not very comfortable with the code duplication > >>>>>> between this function and OVS_WAIT_UNTIL macro. Have you considered > >>>>>> conversion of ovsdb_cluster_start_idltest() function into m4_define() > >>>>>> macro so we could easily use OVS_WAIT_UNTIL inside of it? > >>>>> > >>>>> I tried, but I ran into issues and I am not familiar with m4. > >>>> > >>>> Could you try following diff: > >>> > >>> I tried, but it fails. The test logs are here: > >>> http://people.redhat.com/~fleitner/2127.tar.bz2 > >> > >> Oh, sorry, my fault. We can't use shell expansions over m4 definitions. > >> Should work with following change: > > > > No worries! > > > > The test passes all times with the fix below. > > Great! Would you mind sending v2? Done: https://mail.openvswitch.org/pipermail/ovs-dev/2020-September/374572.html fbl
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at index b5cbee7d9..c045e9264 100644 --- a/tests/ovsdb-idl.at +++ b/tests/ovsdb-idl.at @@ -29,6 +29,17 @@ ovsdb_cluster_start_idltest () { ovsdb-server -vraft -vconsole:warn --detach --no-chdir --log-file=s$i.log --pidfile=s$i.pid --unixctl=s$i --remote=punix:s$i.ovsdb ${2:+--remote=$2} s$i.db || return $? done on_exit 'kill `cat s*.pid`' + for i in `seq $n`; do + for d in `seq 1 "$OVS_CTL_TIMEOUT"`; do + if ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then + break + fi + sleep 1 + done + if ! ovs-appctl -t $(pwd)/s$i cluster/status ${schema_name} | grep -q 'Status: cluster member'; then + return 1 + fi + done } # ovsdb_cluster_leader [REMOTES] [DATABASE]
The test 'Check Python IDL reconnects to leader - Python3 (leader only)' fails sometimes when the first ovsdb-server gets killed before the others had joined the cluster. Fix the function ovsdb_cluster_start_idltest to wait them to join the cluster. Suggested-by: Ilya Maximets <i.maximets@ovn.org> Signed-off-by: Flavio Leitner <fbl@sysclose.org> --- tests/ovsdb-idl.at | 11 +++++++++++ 1 file changed, 11 insertions(+)