diff mbox series

[ovs-dev] ovn-ctl: Add "--no-leader-only" while cluster db init

Message ID 20200309104939.4971-1-taoyunxiang@cmss.chinamobile.com
State Changes Requested
Headers show
Series [ovs-dev] ovn-ctl: Add "--no-leader-only" while cluster db init | expand

Commit Message

taoyunxiang March 9, 2020, 10:49 a.m. UTC
For cluster mode, only the first server(which is the leader) in a cluster
should do init. But, sometimes, the role of the first server would transfer
leadership, so we should add "--no-leader-only" option in case 
the ovn-northd stucked.

Author: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
Co-authored-by: Liu Chang <liuchang@cmss.chinamobile.com>
Signed-off-by: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com>

---
 utilities/ovn-ctl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

0-day Robot March 9, 2020, 10:58 a.m. UTC | #1
Bleep bloop.  Greetings Tao YunXiang, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
error: sha1 information is lacking or useless (utilities/ovn-ctl).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0001 ovn-ctl: Add "--no-leader-only" while cluster db init
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
taoyunxiang March 16, 2020, 1:53 p.m. UTC | #2
Hi numans,
       The issue I discribed in this change is repeated. When I restart the ovn-northd of the first server(which has transfered its leadership), it will be stucked.
       The problem of robot replys seems reasonable. Do I need to take checksum or something for utilities/ovn-ctl ?


Thanks,
Yun
--------------
taoyunxiang@cmss.chinamobile.com
>Bleep bloop.  Greetings Tao YunXiang, I am a robot and I have tried out your patch.
>Thanks for your contribution.
>
>I encountered some error that I wasn't expecting.  See the details below.
>
>
>git-am:
>error: sha1 information is lacking or useless (utilities/ovn-ctl).
>error: could not build fake ancestor
>hint: Use 'git am --show-current-patch' to see the failed patch
>Patch failed at 0001 ovn-ctl: Add "--no-leader-only" while cluster db init
>When you have resolved this problem, run "git am --continue".
>If you prefer to skip this patch, run "git am --skip" instead.
>To restore the original branch and stop patching, run "git am --abort".
>
>
>Please check this out.  If you feel there has been an error, please email aconole@redhat.com
>
>Thanks,
>0-day Robot
Numan Siddique March 16, 2020, 6:40 p.m. UTC | #3
On Mon, Mar 16, 2020 at 7:25 PM taoyunxiang@cmss.chinamobile.com
<taoyunxiang@cmss.chinamobile.com> wrote:
>
> Hi numans,
>        The issue I discribed in this change is repeated. When I restart the ovn-northd of the first server(which has transfered its leadership), it will be stucked.
>        The problem of robot replys seems reasonable. Do I need to take checksum or something for utilities/ovn-ctl ?


Hi Yun,

I think the robot is trying to apply this patch to OVS repo instead of
OVN, since you don't have "ovn" tag i.e "PATCH ovn".
So no need to worry about this.

I think your patch makes sense. But I'd like if Han can take a look at
this patch.

Acked-by: Numan Siddique <numans@ovn.org>

Numan

>
>
> Thanks,
> Yun
> --------------
> taoyunxiang@cmss.chinamobile.com
> >Bleep bloop.  Greetings Tao YunXiang, I am a robot and I have tried out your patch.
> >Thanks for your contribution.
> >
> >I encountered some error that I wasn't expecting.  See the details below.
> >
> >
> >git-am:
> >error: sha1 information is lacking or useless (utilities/ovn-ctl).
> >error: could not build fake ancestor
> >hint: Use 'git am --show-current-patch' to see the failed patch
> >Patch failed at 0001 ovn-ctl: Add "--no-leader-only" while cluster db init
> >When you have resolved this problem, run "git am --continue".
> >If you prefer to skip this patch, run "git am --skip" instead.
> >To restore the original branch and stop patching, run "git am --abort".
> >
> >
> >Please check this out.  If you feel there has been an error, please email aconole@redhat.com
> >
> >Thanks,
> >0-day Robot
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou March 16, 2020, 7:09 p.m. UTC | #4
On Mon, Mar 9, 2020 at 3:50 AM Tao YunXiang <
taoyunxiang@cmss.chinamobile.com> wrote:
>
> For cluster mode, only the first server(which is the leader) in a cluster
> should do init. But, sometimes, the role of the first server would
transfer
> leadership, so we should add "--no-leader-only" option in case
> the ovn-northd stucked.
>
Thanks for the patch. This is trickier than it appears.
Let me describe the problem from my understanding. When creating the
cluster, the "init" needs to be executed once. This should be done with the
first server. In this case the script works well, because the
"cluster_remote_addr" is NOT specified when starting the first server,
which is the leader at that time. When another server joins the cluster,
the "cluster_remote_addr" must be specified, so the "init" won't be
executed.
However, when a server needs to be restarted later, no matter if it is
leader or not, it doesn't need to supply the "cluster_remote_addr"
parameter, because that parameter is needed only for joining a cluster.
When a server is restarted by executing this script, if
"cluster_remote_addr" is NOT specified, it will stuck at the "init"
command, because it is not leader and "--no-leader-only" is not specified
here.

So, the implementation doesn't really reflect the comment:
>      # Initialize the database if it's running standalone,
>      # active-passive, or is the first server in a cluster.
It tried to run the init command even when it is not the first server in a
cluster.

To fix the problem, one way is to make sure it does exactly as what the
comment said. However, it seems there is no easy way to distinguish if the
server is the first server in a cluster or it is just a server that has
already joined being restarted.

Alternatively, adding "--no-leader-only" solves the problem as well, as
proposed by this patch, because it is not harmful to run "init" again even
if the DB is already initiated, just like how it is handled when it is
standalone mode.

Based on the above analysis, I am ok with the fix, but I suggest to update
the commit message to clarify the problem more clearly. In addition, for
the comment mentioned above, it should be changed to:

# Initialize the database if it's NOT joining a cluster.

Finally, maybe the if condition can be removed completely, but I haven't
tested and not 100% sure.

Thanks,
Han


> Author: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
> Co-authored-by: Liu Chang <liuchang@cmss.chinamobile.com>
> Signed-off-by: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
> Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com>
>
> ---
>  utilities/ovn-ctl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
> index 2a337ae27..71bdbc076 100755
> --- a/utilities/ovn-ctl
> +++ b/utilities/ovn-ctl
> @@ -293,7 +293,7 @@ $cluster_remote_port
>      # Initialize the database if it's running standalone,
>      # active-passive, or is the first server in a cluster.
>      if test -z "$cluster_remote_addr"; then
> -        $(echo ovn-${db}ctl | tr _ -) init
> +        $(echo ovn-${db}ctl | tr _ -) --no-leader-only init
>      fi
>
>      if test $mode = cluster; then
> --
> 2.17.1
>
>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
taoyunxiang March 17, 2020, 3:19 a.m. UTC | #5
Hi Han and Numan,
     Thanks for your review, I have summited v2 change, ovn-ctl: Fix stucked while do cluster db init



Thanks,
Yun
--------------
taoyunxiang@cmss.chinamobile.com
>On Mon, Mar 9, 2020 at 3:50 AM Tao YunXiang <
>taoyunxiang@cmss.chinamobile.com> wrote:
>>
>> For cluster mode, only the first server(which is the leader) in a cluster
>> should do init. But, sometimes, the role of the first server would
>transfer
>> leadership, so we should add "--no-leader-only" option in case
>> the ovn-northd stucked.
>>
>Thanks for the patch. This is trickier than it appears.
>Let me describe the problem from my understanding. When creating the
>cluster, the "init" needs to be executed once. This should be done with the
>first server. In this case the script works well, because the
>"cluster_remote_addr" is NOT specified when starting the first server,
>which is the leader at that time. When another server joins the cluster,
>the "cluster_remote_addr" must be specified, so the "init" won't be
>executed.
>However, when a server needs to be restarted later, no matter if it is
>leader or not, it doesn't need to supply the "cluster_remote_addr"
>parameter, because that parameter is needed only for joining a cluster.
>When a server is restarted by executing this script, if
>"cluster_remote_addr" is NOT specified, it will stuck at the "init"
>command, because it is not leader and "--no-leader-only" is not specified
>here.
>
>So, the implementation doesn't really reflect the comment:
>>      # Initialize the database if it's running standalone,
>>      # active-passive, or is the first server in a cluster.
>It tried to run the init command even when it is not the first server in a
>cluster.
>
>To fix the problem, one way is to make sure it does exactly as what the
>comment said. However, it seems there is no easy way to distinguish if the
>server is the first server in a cluster or it is just a server that has
>already joined being restarted.
>
>Alternatively, adding "--no-leader-only" solves the problem as well, as
>proposed by this patch, because it is not harmful to run "init" again even
>if the DB is already initiated, just like how it is handled when it is
>standalone mode.
>
>Based on the above analysis, I am ok with the fix, but I suggest to update
>the commit message to clarify the problem more clearly. In addition, for
>the comment mentioned above, it should be changed to:
>
># Initialize the database if it's NOT joining a cluster.
>
>Finally, maybe the if condition can be removed completely, but I haven't
>tested and not 100% sure.
>
>Thanks,
>Han
>
>
>> Author: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
>> Co-authored-by: Liu Chang <liuchang@cmss.chinamobile.com>
>> Signed-off-by: Tao YunXiang <taoyunxiang@cmss.chinamobile.com>
>> Signed-off-by: Liu Chang <liuchang@cmss.chinamobile.com>
>>
>> ---
>>  utilities/ovn-ctl | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
>> index 2a337ae27..71bdbc076 100755
>> --- a/utilities/ovn-ctl
>> +++ b/utilities/ovn-ctl
>> @@ -293,7 +293,7 @@ $cluster_remote_port
>>      # Initialize the database if it's running standalone,
>>      # active-passive, or is the first server in a cluster.
>>      if test -z "$cluster_remote_addr"; then
>> -        $(echo ovn-${db}ctl | tr _ -) init
>> +        $(echo ovn-${db}ctl | tr _ -) --no-leader-only init
>>      fi
>>
>>      if test $mode = cluster; then
>> --
>> 2.17.1
>>
>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/utilities/ovn-ctl b/utilities/ovn-ctl
index 2a337ae27..71bdbc076 100755
--- a/utilities/ovn-ctl
+++ b/utilities/ovn-ctl
@@ -293,7 +293,7 @@  $cluster_remote_port
     # Initialize the database if it's running standalone,
     # active-passive, or is the first server in a cluster.
     if test -z "$cluster_remote_addr"; then
-        $(echo ovn-${db}ctl | tr _ -) init
+        $(echo ovn-${db}ctl | tr _ -) --no-leader-only init
     fi
 
     if test $mode = cluster; then