diff mbox series

[ovs-dev,1/2] Revert "northd: Don't poll ovsdb before the connection is fully established"

Message ID 20210917030538.9773-1-zhewang@nvidia.com
State Changes Requested
Delegated to: Han Zhou
Headers show
Series [ovs-dev,1/2] Revert "northd: Don't poll ovsdb before the connection is fully established" | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Zhen Wang Sept. 17, 2021, 3:05 a.m. UTC
From: zhen wang <zhewang@nvidia.com>

This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
Above commit introduced a bug when muptiple ovn-northd instances work in HA
mode. If SB leader and active ovn-northd instance got killed by system power
outage, standby ovn-northd instance would never detect the failure.

Signed-off-by: zhen wang <zhewang@nvidia.com>
---
 northd/northd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Numan Siddique Sept. 17, 2021, 8:31 p.m. UTC | #1
On Thu, Sep 16, 2021 at 11:05 PM Zhen Wang via dev
<ovs-dev@openvswitch.org> wrote:
>
> From: zhen wang <zhewang@nvidia.com>
>
> This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> Above commit introduced a bug when muptiple ovn-northd instances work in HA
> mode. If SB leader and active ovn-northd instance got killed by system power
> outage, standby ovn-northd instance would never detect the failure.
>
> Signed-off-by: zhen wang <zhewang@nvidia.com>

I'm not sure if we can revert this commit.   How would you propose to fix the
issue mentioned in the commit message of the commit 1e59feea933610 ?

I think you should configure a desired probe interval value in NB_Global.options
so that the stand-by northd instance can detect the failure.  With
this configuration and
with your patch 2,  the issue reported by you should be addressed.

Another option is to reset the probe interval from 0 to the default
value - DEFAULT_PROBE_INTERVAL_MSEC
once northd has successfully connected to the Northbound db server if
CMS has not configured the probe
interval in NB_Global.options.

Thanks
Numan



> ---
>  northd/northd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 688a6e4ef..b7e64470f 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
>
>  /* Default probe interval for NB and SB DB connections. */
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> -static int northd_probe_interval_nb = 0;
> -static int northd_probe_interval_sb = 0;
> +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
>  #define MAX_OVN_TAGS 4096
>
>  /* Pipeline stages. */
> --
> 2.20.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou Sept. 17, 2021, 8:34 p.m. UTC | #2
On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
>
> From: zhen wang <zhewang@nvidia.com>
>
> This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> Above commit introduced a bug when muptiple ovn-northd instances work in
HA
> mode. If SB leader and active ovn-northd instance got killed by system
power
> outage, standby ovn-northd instance would never detect the failure.
>

Thanks Zhen! I added the Renat and Numan who worked on the reverted commit
to CC, so that they can comment if this is ok.

For the commit message, I think it may be decoupled from the HA scenario
that is supposed to be fixed by the other patch in this series. The issue
this patch fixes is that before the initial NB downloading is complete the
northd will not send probe, so if the DB server is down (ungracefully)
before the northd reads the NB_Global options, the northd would never
probe, thus never reconnect to the new leader. (it is related to RAFT, but
whether it is multiple northds is irrelevant)

As to the original commit that is reverted by this one:

    northd: Don't poll ovsdb before the connection is fully established

    Set initial SB and NB DBs probe interval to 0 to avoid connection
    flapping.

    Before configured in northd_probe_interval value is actually applied
    to southbound and northbound database connections, both connections
    must be fully established, otherwise ovnnb_db_run() will return
    without retrieving configuration data from northbound DB. In cases
    when southbound database is big enough, default interval of 5 seconds
    will kill and retry the connection before it is fully established, no
    matter what is set in northd_probe_interval. Client reconnect will
    cause even more load to ovsdb-server and cause cascade effect, so
    northd can never stabilise. We have more than 2000 ports in our lab,
    and northd could not start before this patch, holding at 100% CPU
    utilisation both itself and ovsdb-server.

    After connections are established, any value in northd_probe_interval,
    or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.

I am not sure how would the commit help. There are at most 3 - 5 northds
(in practice), and suppose there are tens or hundreds of ovn-controllers
that makes SB busy, it is just 3 - 5 more clients retrying reconnect SB for
several times, and if NB is not that busy (most likely), these northd
clients should get the proper probe settings applied soon without causing
more issues at all. So I don't think the default probe 5 sec would cause
cascade effect for the initial period. @Renat @Numan please correct me if I
am wrong.

Thanks,
Han

> Signed-off-by: zhen wang <zhewang@nvidia.com>
> ---
>  northd/northd.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index 688a6e4ef..b7e64470f 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
>
>  /* Default probe interval for NB and SB DB connections. */
>  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> -static int northd_probe_interval_nb = 0;
> -static int northd_probe_interval_sb = 0;
> +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
>  #define MAX_OVN_TAGS 4096
>
>  /* Pipeline stages. */
> --
> 2.20.1
>
Han Zhou Sept. 17, 2021, 8:36 p.m. UTC | #3
On Fri, Sep 17, 2021 at 1:31 PM Numan Siddique <numans@ovn.org> wrote:
>
> On Thu, Sep 16, 2021 at 11:05 PM Zhen Wang via dev
> <ovs-dev@openvswitch.org> wrote:
> >
> > From: zhen wang <zhewang@nvidia.com>
> >
> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > Above commit introduced a bug when muptiple ovn-northd instances work
in HA
> > mode. If SB leader and active ovn-northd instance got killed by system
power
> > outage, standby ovn-northd instance would never detect the failure.
> >
> > Signed-off-by: zhen wang <zhewang@nvidia.com>
>
> I'm not sure if we can revert this commit.   How would you propose to fix
the
> issue mentioned in the commit message of the commit 1e59feea933610 ?
>
> I think you should configure a desired probe interval value in
NB_Global.options
> so that the stand-by northd instance can detect the failure.  With
> this configuration and
> with your patch 2,  the issue reported by you should be addressed.
>
> Another option is to reset the probe interval from 0 to the default
> value - DEFAULT_PROBE_INTERVAL_MSEC
> once northd has successfully connected to the Northbound db server if
> CMS has not configured the probe
> interval in NB_Global.options.
>
> Thanks
> Numan
>

Hi Numan, sorry that I cross replied. Please see my other reply for the
points you mentioned. -Han
>
>
> > ---
> >  northd/northd.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 688a6e4ef..b7e64470f 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> >
> >  /* Default probe interval for NB and SB DB connections. */
> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > -static int northd_probe_interval_nb = 0;
> > -static int northd_probe_interval_sb = 0;
> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
> >  #define MAX_OVN_TAGS 4096
> >
> >  /* Pipeline stages. */
> > --
> > 2.20.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Renat Nurgaliyev Sept. 17, 2021, 8:47 p.m. UTC | #4
Hello Han,

when I wrote this patch we had an issue with a very big SB database, around
1,5 gigabytes. There were no controllers or northds running, so the
database server was without any load at all. Although OVSDB was idling,
even a single northd process could not fully connect to the database due to
its size, since it could not fetch and process the data in 5 seconds.

Since then many optimizations were made, and the database size with the
same topology reduced to approximately twenty megabytes, so today I
wouldn't be able to reproduce the problem.

However, I am quite sure that it would still cause troubles with a huge
scale, when SB grows to hundreds of megabytes. With the default timeout of
5 seconds, which is implemented in the same thread that also fetches and
processes data, we make an artificial database size limit, which is not so
obvoius to troubleshoot.

Regards,
Renat.

Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:

>
>
> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
> >
> > From: zhen wang <zhewang@nvidia.com>
> >
> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > Above commit introduced a bug when muptiple ovn-northd instances work in
> HA
> > mode. If SB leader and active ovn-northd instance got killed by system
> power
> > outage, standby ovn-northd instance would never detect the failure.
> >
>
> Thanks Zhen! I added the Renat and Numan who worked on the reverted commit
> to CC, so that they can comment if this is ok.
>
> For the commit message, I think it may be decoupled from the HA scenario
> that is supposed to be fixed by the other patch in this series. The issue
> this patch fixes is that before the initial NB downloading is complete the
> northd will not send probe, so if the DB server is down (ungracefully)
> before the northd reads the NB_Global options, the northd would never
> probe, thus never reconnect to the new leader. (it is related to RAFT, but
> whether it is multiple northds is irrelevant)
>
> As to the original commit that is reverted by this one:
>
>     northd: Don't poll ovsdb before the connection is fully established
>
>     Set initial SB and NB DBs probe interval to 0 to avoid connection
>     flapping.
>
>     Before configured in northd_probe_interval value is actually applied
>     to southbound and northbound database connections, both connections
>     must be fully established, otherwise ovnnb_db_run() will return
>     without retrieving configuration data from northbound DB. In cases
>     when southbound database is big enough, default interval of 5 seconds
>     will kill and retry the connection before it is fully established, no
>     matter what is set in northd_probe_interval. Client reconnect will
>     cause even more load to ovsdb-server and cause cascade effect, so
>     northd can never stabilise. We have more than 2000 ports in our lab,
>     and northd could not start before this patch, holding at 100% CPU
>     utilisation both itself and ovsdb-server.
>
>     After connections are established, any value in northd_probe_interval,
>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
>
> I am not sure how would the commit help. There are at most 3 - 5 northds
> (in practice), and suppose there are tens or hundreds of ovn-controllers
> that makes SB busy, it is just 3 - 5 more clients retrying reconnect SB for
> several times, and if NB is not that busy (most likely), these northd
> clients should get the proper probe settings applied soon without causing
> more issues at all. So I don't think the default probe 5 sec would cause
> cascade effect for the initial period. @Renat @Numan please correct me if I
> am wrong.
>
> Thanks,
> Han
>
> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> > ---
> >  northd/northd.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index 688a6e4ef..b7e64470f 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> >
> >  /* Default probe interval for NB and SB DB connections. */
> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > -static int northd_probe_interval_nb = 0;
> > -static int northd_probe_interval_sb = 0;
> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
> >  #define MAX_OVN_TAGS 4096
> >
> >  /* Pipeline stages. */
> > --
> > 2.20.1
> >
>
Han Zhou Sept. 17, 2021, 8:55 p.m. UTC | #5
On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com> wrote:
>
> Hello Han,
>
> when I wrote this patch we had an issue with a very big SB database,
around 1,5 gigabytes. There were no controllers or northds running, so the
database server was without any load at all. Although OVSDB was idling,
even a single northd process could not fully connect to the database due to
its size, since it could not fetch and process the data in 5 seconds.

Hi Renat, thanks for the explanation. However, suppose SB is still huge, if
NB is not that big, the probe config in NB_Global will soon be applied to
ovn-northd, which would probe in proper interval (desired setting with the
SB size considered) instead of the default 5 sec, and it should succeed,
right?

Thanks,
Han

>
> Since then many optimizations were made, and the database size with the
same topology reduced to approximately twenty megabytes, so today I
wouldn't be able to reproduce the problem.
>
> However, I am quite sure that it would still cause troubles with a huge
scale, when SB grows to hundreds of megabytes. With the default timeout of
5 seconds, which is implemented in the same thread that also fetches and
processes data, we make an artificial database size limit, which is not so
obvoius to troubleshoot.
>
> Regards,
> Renat.
>
> Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
>>
>>
>>
>> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
>> >
>> > From: zhen wang <zhewang@nvidia.com>
>> >
>> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
>> > Above commit introduced a bug when muptiple ovn-northd instances work
in HA
>> > mode. If SB leader and active ovn-northd instance got killed by system
power
>> > outage, standby ovn-northd instance would never detect the failure.
>> >
>>
>> Thanks Zhen! I added the Renat and Numan who worked on the reverted
commit to CC, so that they can comment if this is ok.
>>
>> For the commit message, I think it may be decoupled from the HA scenario
that is supposed to be fixed by the other patch in this series. The issue
this patch fixes is that before the initial NB downloading is complete the
northd will not send probe, so if the DB server is down (ungracefully)
before the northd reads the NB_Global options, the northd would never
probe, thus never reconnect to the new leader. (it is related to RAFT, but
whether it is multiple northds is irrelevant)
>>
>> As to the original commit that is reverted by this one:
>>
>>     northd: Don't poll ovsdb before the connection is fully established
>>
>>     Set initial SB and NB DBs probe interval to 0 to avoid connection
>>     flapping.
>>
>>     Before configured in northd_probe_interval value is actually applied
>>     to southbound and northbound database connections, both connections
>>     must be fully established, otherwise ovnnb_db_run() will return
>>     without retrieving configuration data from northbound DB. In cases
>>     when southbound database is big enough, default interval of 5 seconds
>>     will kill and retry the connection before it is fully established, no
>>     matter what is set in northd_probe_interval. Client reconnect will
>>     cause even more load to ovsdb-server and cause cascade effect, so
>>     northd can never stabilise. We have more than 2000 ports in our lab,
>>     and northd could not start before this patch, holding at 100% CPU
>>     utilisation both itself and ovsdb-server.
>>
>>     After connections are established, any value in
northd_probe_interval,
>>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
>>
>> I am not sure how would the commit help. There are at most 3 - 5 northds
(in practice), and suppose there are tens or hundreds of ovn-controllers
that makes SB busy, it is just 3 - 5 more clients retrying reconnect SB for
several times, and if NB is not that busy (most likely), these northd
clients should get the proper probe settings applied soon without causing
more issues at all. So I don't think the default probe 5 sec would cause
cascade effect for the initial period. @Renat @Numan please correct me if I
am wrong.
>>
>> Thanks,
>> Han
>>
>> > Signed-off-by: zhen wang <zhewang@nvidia.com>
>> > ---
>> >  northd/northd.c | 4 ++--
>> >  1 file changed, 2 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/northd/northd.c b/northd/northd.c
>> > index 688a6e4ef..b7e64470f 100644
>> > --- a/northd/northd.c
>> > +++ b/northd/northd.c
>> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
>> >
>> >  /* Default probe interval for NB and SB DB connections. */
>> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
>> > -static int northd_probe_interval_nb = 0;
>> > -static int northd_probe_interval_sb = 0;
>> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
>> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
>> >  #define MAX_OVN_TAGS 4096
>> >
>> >  /* Pipeline stages. */
>> > --
>> > 2.20.1
>> >
Renat Nurgaliyev Sept. 17, 2021, 9:01 p.m. UTC | #6
Hi Han,

yes, I believe you are totally right. But it still feels like a chicken and
egg problem to me, storing the database timeout setting inside the database
itself. If there would be at least some local command line argument to
override timeout value, it would be already amazing, because currently
there is no way to control it before the database connection is made, and
if it cannot be made, it is too late to try to control it.

Thanks,
Renat.

Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:55:

>
>
> On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com>
> wrote:
> >
> > Hello Han,
> >
> > when I wrote this patch we had an issue with a very big SB database,
> around 1,5 gigabytes. There were no controllers or northds running, so the
> database server was without any load at all. Although OVSDB was idling,
> even a single northd process could not fully connect to the database due to
> its size, since it could not fetch and process the data in 5 seconds.
>
> Hi Renat, thanks for the explanation. However, suppose SB is still huge,
> if NB is not that big, the probe config in NB_Global will soon be applied
> to ovn-northd, which would probe in proper interval (desired setting with
> the SB size considered) instead of the default 5 sec, and it should
> succeed, right?
>
> Thanks,
> Han
>
> >
> > Since then many optimizations were made, and the database size with the
> same topology reduced to approximately twenty megabytes, so today I
> wouldn't be able to reproduce the problem.
> >
> > However, I am quite sure that it would still cause troubles with a huge
> scale, when SB grows to hundreds of megabytes. With the default timeout of
> 5 seconds, which is implemented in the same thread that also fetches and
> processes data, we make an artificial database size limit, which is not so
> obvoius to troubleshoot.
> >
> > Regards,
> > Renat.
> >
> > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
> >>
> >>
> >>
> >> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
> >> >
> >> > From: zhen wang <zhewang@nvidia.com>
> >> >
> >> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> >> > Above commit introduced a bug when muptiple ovn-northd instances work
> in HA
> >> > mode. If SB leader and active ovn-northd instance got killed by
> system power
> >> > outage, standby ovn-northd instance would never detect the failure.
> >> >
> >>
> >> Thanks Zhen! I added the Renat and Numan who worked on the reverted
> commit to CC, so that they can comment if this is ok.
> >>
> >> For the commit message, I think it may be decoupled from the HA
> scenario that is supposed to be fixed by the other patch in this series.
> The issue this patch fixes is that before the initial NB downloading is
> complete the northd will not send probe, so if the DB server is down
> (ungracefully) before the northd reads the NB_Global options, the northd
> would never probe, thus never reconnect to the new leader. (it is related
> to RAFT, but whether it is multiple northds is irrelevant)
> >>
> >> As to the original commit that is reverted by this one:
> >>
> >>     northd: Don't poll ovsdb before the connection is fully established
> >>
> >>     Set initial SB and NB DBs probe interval to 0 to avoid connection
> >>     flapping.
> >>
> >>     Before configured in northd_probe_interval value is actually applied
> >>     to southbound and northbound database connections, both connections
> >>     must be fully established, otherwise ovnnb_db_run() will return
> >>     without retrieving configuration data from northbound DB. In cases
> >>     when southbound database is big enough, default interval of 5
> seconds
> >>     will kill and retry the connection before it is fully established,
> no
> >>     matter what is set in northd_probe_interval. Client reconnect will
> >>     cause even more load to ovsdb-server and cause cascade effect, so
> >>     northd can never stabilise. We have more than 2000 ports in our lab,
> >>     and northd could not start before this patch, holding at 100% CPU
> >>     utilisation both itself and ovsdb-server.
> >>
> >>     After connections are established, any value in
> northd_probe_interval,
> >>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
> >>
> >> I am not sure how would the commit help. There are at most 3 - 5
> northds (in practice), and suppose there are tens or hundreds of
> ovn-controllers that makes SB busy, it is just 3 - 5 more clients retrying
> reconnect SB for several times, and if NB is not that busy (most likely),
> these northd clients should get the proper probe settings applied soon
> without causing more issues at all. So I don't think the default probe 5
> sec would cause cascade effect for the initial period. @Renat @Numan please
> correct me if I am wrong.
> >>
> >> Thanks,
> >> Han
> >>
> >> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> >> > ---
> >> >  northd/northd.c | 4 ++--
> >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/northd/northd.c b/northd/northd.c
> >> > index 688a6e4ef..b7e64470f 100644
> >> > --- a/northd/northd.c
> >> > +++ b/northd/northd.c
> >> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> >> >
> >> >  /* Default probe interval for NB and SB DB connections. */
> >> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> >> > -static int northd_probe_interval_nb = 0;
> >> > -static int northd_probe_interval_sb = 0;
> >> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> >> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
> >> >  #define MAX_OVN_TAGS 4096
> >> >
> >> >  /* Pipeline stages. */
> >> > --
> >> > 2.20.1
> >> >
>
Numan Siddique Sept. 17, 2021, 9:04 p.m. UTC | #7
On Fri, Sep 17, 2021 at 5:02 PM Renat Nurgaliyev <impleman@gmail.com> wrote:
>
> Hi Han,
>
> yes, I believe you are totally right. But it still feels like a chicken and
> egg problem to me, storing the database timeout setting inside the database
> itself. If there would be at least some local command line argument to
> override timeout value, it would be already amazing, because currently
> there is no way to control it before the database connection is made, and
> if it cannot be made, it is too late to try to control it.
>

What about the case where the NB database is huge and it takes > 5
seconds to fetch
all the contents ?

Numan

> Thanks,
> Renat.
>
> Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:55:
>
> >
> >
> > On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com>
> > wrote:
> > >
> > > Hello Han,
> > >
> > > when I wrote this patch we had an issue with a very big SB database,
> > around 1,5 gigabytes. There were no controllers or northds running, so the
> > database server was without any load at all. Although OVSDB was idling,
> > even a single northd process could not fully connect to the database due to
> > its size, since it could not fetch and process the data in 5 seconds.
> >
> > Hi Renat, thanks for the explanation. However, suppose SB is still huge,
> > if NB is not that big, the probe config in NB_Global will soon be applied
> > to ovn-northd, which would probe in proper interval (desired setting with
> > the SB size considered) instead of the default 5 sec, and it should
> > succeed, right?
> >
> > Thanks,
> > Han
> >
> > >
> > > Since then many optimizations were made, and the database size with the
> > same topology reduced to approximately twenty megabytes, so today I
> > wouldn't be able to reproduce the problem.
> > >
> > > However, I am quite sure that it would still cause troubles with a huge
> > scale, when SB grows to hundreds of megabytes. With the default timeout of
> > 5 seconds, which is implemented in the same thread that also fetches and
> > processes data, we make an artificial database size limit, which is not so
> > obvoius to troubleshoot.
> > >
> > > Regards,
> > > Renat.
> > >
> > > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
> > >>
> > >>
> > >>
> > >> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
> > >> >
> > >> > From: zhen wang <zhewang@nvidia.com>
> > >> >
> > >> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > >> > Above commit introduced a bug when muptiple ovn-northd instances work
> > in HA
> > >> > mode. If SB leader and active ovn-northd instance got killed by
> > system power
> > >> > outage, standby ovn-northd instance would never detect the failure.
> > >> >
> > >>
> > >> Thanks Zhen! I added the Renat and Numan who worked on the reverted
> > commit to CC, so that they can comment if this is ok.
> > >>
> > >> For the commit message, I think it may be decoupled from the HA
> > scenario that is supposed to be fixed by the other patch in this series.
> > The issue this patch fixes is that before the initial NB downloading is
> > complete the northd will not send probe, so if the DB server is down
> > (ungracefully) before the northd reads the NB_Global options, the northd
> > would never probe, thus never reconnect to the new leader. (it is related
> > to RAFT, but whether it is multiple northds is irrelevant)
> > >>
> > >> As to the original commit that is reverted by this one:
> > >>
> > >>     northd: Don't poll ovsdb before the connection is fully established
> > >>
> > >>     Set initial SB and NB DBs probe interval to 0 to avoid connection
> > >>     flapping.
> > >>
> > >>     Before configured in northd_probe_interval value is actually applied
> > >>     to southbound and northbound database connections, both connections
> > >>     must be fully established, otherwise ovnnb_db_run() will return
> > >>     without retrieving configuration data from northbound DB. In cases
> > >>     when southbound database is big enough, default interval of 5
> > seconds
> > >>     will kill and retry the connection before it is fully established,
> > no
> > >>     matter what is set in northd_probe_interval. Client reconnect will
> > >>     cause even more load to ovsdb-server and cause cascade effect, so
> > >>     northd can never stabilise. We have more than 2000 ports in our lab,
> > >>     and northd could not start before this patch, holding at 100% CPU
> > >>     utilisation both itself and ovsdb-server.
> > >>
> > >>     After connections are established, any value in
> > northd_probe_interval,
> > >>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
> > >>
> > >> I am not sure how would the commit help. There are at most 3 - 5
> > northds (in practice), and suppose there are tens or hundreds of
> > ovn-controllers that makes SB busy, it is just 3 - 5 more clients retrying
> > reconnect SB for several times, and if NB is not that busy (most likely),
> > these northd clients should get the proper probe settings applied soon
> > without causing more issues at all. So I don't think the default probe 5
> > sec would cause cascade effect for the initial period. @Renat @Numan please
> > correct me if I am wrong.
> > >>
> > >> Thanks,
> > >> Han
> > >>
> > >> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> > >> > ---
> > >> >  northd/northd.c | 4 ++--
> > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/northd/northd.c b/northd/northd.c
> > >> > index 688a6e4ef..b7e64470f 100644
> > >> > --- a/northd/northd.c
> > >> > +++ b/northd/northd.c
> > >> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> > >> >
> > >> >  /* Default probe interval for NB and SB DB connections. */
> > >> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > >> > -static int northd_probe_interval_nb = 0;
> > >> > -static int northd_probe_interval_sb = 0;
> > >> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> > >> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
> > >> >  #define MAX_OVN_TAGS 4096
> > >> >
> > >> >  /* Pipeline stages. */
> > >> > --
> > >> > 2.20.1
> > >> >
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Renat Nurgaliyev Sept. 17, 2021, 9:09 p.m. UTC | #8
Hi Numan,

according to our experience, in such case the connection can never be fully
established, and northd ends up being in an endless loop with 100% CPU
utilization.

Numan Siddique <nusiddiq@redhat.com> schrieb am Sa., 18. Sept. 2021, 00:05:

> On Fri, Sep 17, 2021 at 5:02 PM Renat Nurgaliyev <impleman@gmail.com>
> wrote:
> >
> > Hi Han,
> >
> > yes, I believe you are totally right. But it still feels like a chicken
> and
> > egg problem to me, storing the database timeout setting inside the
> database
> > itself. If there would be at least some local command line argument to
> > override timeout value, it would be already amazing, because currently
> > there is no way to control it before the database connection is made, and
> > if it cannot be made, it is too late to try to control it.
> >
>
> What about the case where the NB database is huge and it takes > 5
> seconds to fetch
> all the contents ?
>
> Numan
>
> > Thanks,
> > Renat.
> >
> > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:55:
> >
> > >
> > >
> > > On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com>
> > > wrote:
> > > >
> > > > Hello Han,
> > > >
> > > > when I wrote this patch we had an issue with a very big SB database,
> > > around 1,5 gigabytes. There were no controllers or northds running, so
> the
> > > database server was without any load at all. Although OVSDB was idling,
> > > even a single northd process could not fully connect to the database
> due to
> > > its size, since it could not fetch and process the data in 5 seconds.
> > >
> > > Hi Renat, thanks for the explanation. However, suppose SB is still
> huge,
> > > if NB is not that big, the probe config in NB_Global will soon be
> applied
> > > to ovn-northd, which would probe in proper interval (desired setting
> with
> > > the SB size considered) instead of the default 5 sec, and it should
> > > succeed, right?
> > >
> > > Thanks,
> > > Han
> > >
> > > >
> > > > Since then many optimizations were made, and the database size with
> the
> > > same topology reduced to approximately twenty megabytes, so today I
> > > wouldn't be able to reproduce the problem.
> > > >
> > > > However, I am quite sure that it would still cause troubles with a
> huge
> > > scale, when SB grows to hundreds of megabytes. With the default
> timeout of
> > > 5 seconds, which is implemented in the same thread that also fetches
> and
> > > processes data, we make an artificial database size limit, which is
> not so
> > > obvoius to troubleshoot.
> > > >
> > > > Regards,
> > > > Renat.
> > > >
> > > > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com>
> wrote:
> > > >> >
> > > >> > From: zhen wang <zhewang@nvidia.com>
> > > >> >
> > > >> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > > >> > Above commit introduced a bug when muptiple ovn-northd instances
> work
> > > in HA
> > > >> > mode. If SB leader and active ovn-northd instance got killed by
> > > system power
> > > >> > outage, standby ovn-northd instance would never detect the
> failure.
> > > >> >
> > > >>
> > > >> Thanks Zhen! I added the Renat and Numan who worked on the reverted
> > > commit to CC, so that they can comment if this is ok.
> > > >>
> > > >> For the commit message, I think it may be decoupled from the HA
> > > scenario that is supposed to be fixed by the other patch in this
> series.
> > > The issue this patch fixes is that before the initial NB downloading is
> > > complete the northd will not send probe, so if the DB server is down
> > > (ungracefully) before the northd reads the NB_Global options, the
> northd
> > > would never probe, thus never reconnect to the new leader. (it is
> related
> > > to RAFT, but whether it is multiple northds is irrelevant)
> > > >>
> > > >> As to the original commit that is reverted by this one:
> > > >>
> > > >>     northd: Don't poll ovsdb before the connection is fully
> established
> > > >>
> > > >>     Set initial SB and NB DBs probe interval to 0 to avoid
> connection
> > > >>     flapping.
> > > >>
> > > >>     Before configured in northd_probe_interval value is actually
> applied
> > > >>     to southbound and northbound database connections, both
> connections
> > > >>     must be fully established, otherwise ovnnb_db_run() will return
> > > >>     without retrieving configuration data from northbound DB. In
> cases
> > > >>     when southbound database is big enough, default interval of 5
> > > seconds
> > > >>     will kill and retry the connection before it is fully
> established,
> > > no
> > > >>     matter what is set in northd_probe_interval. Client reconnect
> will
> > > >>     cause even more load to ovsdb-server and cause cascade effect,
> so
> > > >>     northd can never stabilise. We have more than 2000 ports in our
> lab,
> > > >>     and northd could not start before this patch, holding at 100%
> CPU
> > > >>     utilisation both itself and ovsdb-server.
> > > >>
> > > >>     After connections are established, any value in
> > > northd_probe_interval,
> > > >>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
> > > >>
> > > >> I am not sure how would the commit help. There are at most 3 - 5
> > > northds (in practice), and suppose there are tens or hundreds of
> > > ovn-controllers that makes SB busy, it is just 3 - 5 more clients
> retrying
> > > reconnect SB for several times, and if NB is not that busy (most
> likely),
> > > these northd clients should get the proper probe settings applied soon
> > > without causing more issues at all. So I don't think the default probe
> 5
> > > sec would cause cascade effect for the initial period. @Renat @Numan
> please
> > > correct me if I am wrong.
> > > >>
> > > >> Thanks,
> > > >> Han
> > > >>
> > > >> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> > > >> > ---
> > > >> >  northd/northd.c | 4 ++--
> > > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >> >
> > > >> > diff --git a/northd/northd.c b/northd/northd.c
> > > >> > index 688a6e4ef..b7e64470f 100644
> > > >> > --- a/northd/northd.c
> > > >> > +++ b/northd/northd.c
> > > >> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> > > >> >
> > > >> >  /* Default probe interval for NB and SB DB connections. */
> > > >> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > > >> > -static int northd_probe_interval_nb = 0;
> > > >> > -static int northd_probe_interval_sb = 0;
> > > >> > +static int northd_probe_interval_nb =
> DEFAULT_PROBE_INTERVAL_MSEC;
> > > >> > +static int northd_probe_interval_sb =
> DEFAULT_PROBE_INTERVAL_MSEC;
> > > >> >  #define MAX_OVN_TAGS 4096
> > > >> >
> > > >> >  /* Pipeline stages. */
> > > >> > --
> > > >> > 2.20.1
> > > >> >
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
>
Han Zhou Sept. 17, 2021, 9:10 p.m. UTC | #9
On Fri, Sep 17, 2021 at 2:04 PM Numan Siddique <nusiddiq@redhat.com> wrote:
>
> On Fri, Sep 17, 2021 at 5:02 PM Renat Nurgaliyev <impleman@gmail.com>
wrote:
> >
> > Hi Han,
> >
> > yes, I believe you are totally right. But it still feels like a chicken
and
> > egg problem to me, storing the database timeout setting inside the
database
> > itself. If there would be at least some local command line argument to
> > override timeout value, it would be already amazing, because currently
> > there is no way to control it before the database connection is made,
and
> > if it cannot be made, it is too late to try to control it.
> >
>
> What about the case where the NB database is huge and it takes > 5
> seconds to fetch
> all the contents ?
>
I think Renat had the answer to this question: using the DB to configure
the probe interval to the DB is going to be a problem in certain cases. It
should be fine to use NB to configure probe interval for SB, but using NB
to configure the probe interval for NB itself is definitely causing the
problem when the NB is huge. Support command line options may be the right
approach. But in practice would it be good enough to use 60s as the default
value instead of 5s?

> Numan
>
> > Thanks,
> > Renat.
> >
> > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:55:
> >
> > >
> > >
> > > On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com>
> > > wrote:
> > > >
> > > > Hello Han,
> > > >
> > > > when I wrote this patch we had an issue with a very big SB database,
> > > around 1,5 gigabytes. There were no controllers or northds running,
so the
> > > database server was without any load at all. Although OVSDB was
idling,
> > > even a single northd process could not fully connect to the database
due to
> > > its size, since it could not fetch and process the data in 5 seconds.
> > >
> > > Hi Renat, thanks for the explanation. However, suppose SB is still
huge,
> > > if NB is not that big, the probe config in NB_Global will soon be
applied
> > > to ovn-northd, which would probe in proper interval (desired setting
with
> > > the SB size considered) instead of the default 5 sec, and it should
> > > succeed, right?
> > >
> > > Thanks,
> > > Han
> > >
> > > >
> > > > Since then many optimizations were made, and the database size with
the
> > > same topology reduced to approximately twenty megabytes, so today I
> > > wouldn't be able to reproduce the problem.
> > > >
> > > > However, I am quite sure that it would still cause troubles with a
huge
> > > scale, when SB grows to hundreds of megabytes. With the default
timeout of
> > > 5 seconds, which is implemented in the same thread that also fetches
and
> > > processes data, we make an artificial database size limit, which is
not so
> > > obvoius to troubleshoot.
> > > >
> > > > Regards,
> > > > Renat.
> > > >
> > > > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
> > > >>
> > > >>
> > > >>
> > > >> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com>
wrote:
> > > >> >
> > > >> > From: zhen wang <zhewang@nvidia.com>
> > > >> >
> > > >> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > > >> > Above commit introduced a bug when muptiple ovn-northd instances
work
> > > in HA
> > > >> > mode. If SB leader and active ovn-northd instance got killed by
> > > system power
> > > >> > outage, standby ovn-northd instance would never detect the
failure.
> > > >> >
> > > >>
> > > >> Thanks Zhen! I added the Renat and Numan who worked on the reverted
> > > commit to CC, so that they can comment if this is ok.
> > > >>
> > > >> For the commit message, I think it may be decoupled from the HA
> > > scenario that is supposed to be fixed by the other patch in this
series.
> > > The issue this patch fixes is that before the initial NB downloading
is
> > > complete the northd will not send probe, so if the DB server is down
> > > (ungracefully) before the northd reads the NB_Global options, the
northd
> > > would never probe, thus never reconnect to the new leader. (it is
related
> > > to RAFT, but whether it is multiple northds is irrelevant)
> > > >>
> > > >> As to the original commit that is reverted by this one:
> > > >>
> > > >>     northd: Don't poll ovsdb before the connection is fully
established
> > > >>
> > > >>     Set initial SB and NB DBs probe interval to 0 to avoid
connection
> > > >>     flapping.
> > > >>
> > > >>     Before configured in northd_probe_interval value is actually
applied
> > > >>     to southbound and northbound database connections, both
connections
> > > >>     must be fully established, otherwise ovnnb_db_run() will return
> > > >>     without retrieving configuration data from northbound DB. In
cases
> > > >>     when southbound database is big enough, default interval of 5
> > > seconds
> > > >>     will kill and retry the connection before it is fully
established,
> > > no
> > > >>     matter what is set in northd_probe_interval. Client reconnect
will
> > > >>     cause even more load to ovsdb-server and cause cascade effect,
so
> > > >>     northd can never stabilise. We have more than 2000 ports in
our lab,
> > > >>     and northd could not start before this patch, holding at 100%
CPU
> > > >>     utilisation both itself and ovsdb-server.
> > > >>
> > > >>     After connections are established, any value in
> > > northd_probe_interval,
> > > >>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
> > > >>
> > > >> I am not sure how would the commit help. There are at most 3 - 5
> > > northds (in practice), and suppose there are tens or hundreds of
> > > ovn-controllers that makes SB busy, it is just 3 - 5 more clients
retrying
> > > reconnect SB for several times, and if NB is not that busy (most
likely),
> > > these northd clients should get the proper probe settings applied soon
> > > without causing more issues at all. So I don't think the default
probe 5
> > > sec would cause cascade effect for the initial period. @Renat @Numan
please
> > > correct me if I am wrong.
> > > >>
> > > >> Thanks,
> > > >> Han
> > > >>
> > > >> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> > > >> > ---
> > > >> >  northd/northd.c | 4 ++--
> > > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > >> >
> > > >> > diff --git a/northd/northd.c b/northd/northd.c
> > > >> > index 688a6e4ef..b7e64470f 100644
> > > >> > --- a/northd/northd.c
> > > >> > +++ b/northd/northd.c
> > > >> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> > > >> >
> > > >> >  /* Default probe interval for NB and SB DB connections. */
> > > >> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > > >> > -static int northd_probe_interval_nb = 0;
> > > >> > -static int northd_probe_interval_sb = 0;
> > > >> > +static int northd_probe_interval_nb =
DEFAULT_PROBE_INTERVAL_MSEC;
> > > >> > +static int northd_probe_interval_sb =
DEFAULT_PROBE_INTERVAL_MSEC;
> > > >> >  #define MAX_OVN_TAGS 4096
> > > >> >
> > > >> >  /* Pipeline stages. */
> > > >> > --
> > > >> > 2.20.1
> > > >> >
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
Numan Siddique Sept. 17, 2021, 10:31 p.m. UTC | #10
On Fri, Sep 17, 2021 at 5:10 PM Han Zhou <hzhou@ovn.org> wrote:
>
>
>
> On Fri, Sep 17, 2021 at 2:04 PM Numan Siddique <nusiddiq@redhat.com> wrote:
> >
> > On Fri, Sep 17, 2021 at 5:02 PM Renat Nurgaliyev <impleman@gmail.com> wrote:
> > >
> > > Hi Han,
> > >
> > > yes, I believe you are totally right. But it still feels like a chicken and
> > > egg problem to me, storing the database timeout setting inside the database
> > > itself. If there would be at least some local command line argument to
> > > override timeout value, it would be already amazing, because currently
> > > there is no way to control it before the database connection is made, and
> > > if it cannot be made, it is too late to try to control it.
> > >
> >
> > What about the case where the NB database is huge and it takes > 5
> > seconds to fetch
> > all the contents ?
> >
> I think Renat had the answer to this question: using the DB to configure the probe interval to the DB is going to be a problem in certain cases. It should be fine to use NB to configure probe interval for SB, but using NB to configure the probe interval for NB itself is definitely causing the problem when the NB is huge. Support command line options may be the right approach. But in practice would it be good enough to use 60s as the default value instead of 5s?

In most of the large scale deployments,  CMS has to configure higher
probe intervals
anyway.  So 60 seconds as default seems OK to me. I think meanwhile we
should try to
brianstorm and solve the mentioned problem in a much better way.

Thanks
Numan

>
> > Numan
> >
> > > Thanks,
> > > Renat.
> > >
> > > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:55:
> > >
> > > >
> > > >
> > > > On Fri, Sep 17, 2021 at 1:48 PM Renat Nurgaliyev <impleman@gmail.com>
> > > > wrote:
> > > > >
> > > > > Hello Han,
> > > > >
> > > > > when I wrote this patch we had an issue with a very big SB database,
> > > > around 1,5 gigabytes. There were no controllers or northds running, so the
> > > > database server was without any load at all. Although OVSDB was idling,
> > > > even a single northd process could not fully connect to the database due to
> > > > its size, since it could not fetch and process the data in 5 seconds.
> > > >
> > > > Hi Renat, thanks for the explanation. However, suppose SB is still huge,
> > > > if NB is not that big, the probe config in NB_Global will soon be applied
> > > > to ovn-northd, which would probe in proper interval (desired setting with
> > > > the SB size considered) instead of the default 5 sec, and it should
> > > > succeed, right?
> > > >
> > > > Thanks,
> > > > Han
> > > >
> > > > >
> > > > > Since then many optimizations were made, and the database size with the
> > > > same topology reduced to approximately twenty megabytes, so today I
> > > > wouldn't be able to reproduce the problem.
> > > > >
> > > > > However, I am quite sure that it would still cause troubles with a huge
> > > > scale, when SB grows to hundreds of megabytes. With the default timeout of
> > > > 5 seconds, which is implemented in the same thread that also fetches and
> > > > processes data, we make an artificial database size limit, which is not so
> > > > obvoius to troubleshoot.
> > > > >
> > > > > Regards,
> > > > > Renat.
> > > > >
> > > > > Han Zhou <hzhou@ovn.org> schrieb am Fr., 17. Sept. 2021, 23:34:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Thu, Sep 16, 2021 at 8:05 PM Zhen Wang <zhewang@nvidia.com> wrote:
> > > > >> >
> > > > >> > From: zhen wang <zhewang@nvidia.com>
> > > > >> >
> > > > >> > This reverts commit 1e59feea933610b28fd4442243162ce35595cfee.
> > > > >> > Above commit introduced a bug when muptiple ovn-northd instances work
> > > > in HA
> > > > >> > mode. If SB leader and active ovn-northd instance got killed by
> > > > system power
> > > > >> > outage, standby ovn-northd instance would never detect the failure.
> > > > >> >
> > > > >>
> > > > >> Thanks Zhen! I added the Renat and Numan who worked on the reverted
> > > > commit to CC, so that they can comment if this is ok.
> > > > >>
> > > > >> For the commit message, I think it may be decoupled from the HA
> > > > scenario that is supposed to be fixed by the other patch in this series.
> > > > The issue this patch fixes is that before the initial NB downloading is
> > > > complete the northd will not send probe, so if the DB server is down
> > > > (ungracefully) before the northd reads the NB_Global options, the northd
> > > > would never probe, thus never reconnect to the new leader. (it is related
> > > > to RAFT, but whether it is multiple northds is irrelevant)
> > > > >>
> > > > >> As to the original commit that is reverted by this one:
> > > > >>
> > > > >>     northd: Don't poll ovsdb before the connection is fully established
> > > > >>
> > > > >>     Set initial SB and NB DBs probe interval to 0 to avoid connection
> > > > >>     flapping.
> > > > >>
> > > > >>     Before configured in northd_probe_interval value is actually applied
> > > > >>     to southbound and northbound database connections, both connections
> > > > >>     must be fully established, otherwise ovnnb_db_run() will return
> > > > >>     without retrieving configuration data from northbound DB. In cases
> > > > >>     when southbound database is big enough, default interval of 5
> > > > seconds
> > > > >>     will kill and retry the connection before it is fully established,
> > > > no
> > > > >>     matter what is set in northd_probe_interval. Client reconnect will
> > > > >>     cause even more load to ovsdb-server and cause cascade effect, so
> > > > >>     northd can never stabilise. We have more than 2000 ports in our lab,
> > > > >>     and northd could not start before this patch, holding at 100% CPU
> > > > >>     utilisation both itself and ovsdb-server.
> > > > >>
> > > > >>     After connections are established, any value in
> > > > northd_probe_interval,
> > > > >>     or default DEFAULT_PROBE_INTERVAL_MSEC is applied correctly.
> > > > >>
> > > > >> I am not sure how would the commit help. There are at most 3 - 5
> > > > northds (in practice), and suppose there are tens or hundreds of
> > > > ovn-controllers that makes SB busy, it is just 3 - 5 more clients retrying
> > > > reconnect SB for several times, and if NB is not that busy (most likely),
> > > > these northd clients should get the proper probe settings applied soon
> > > > without causing more issues at all. So I don't think the default probe 5
> > > > sec would cause cascade effect for the initial period. @Renat @Numan please
> > > > correct me if I am wrong.
> > > > >>
> > > > >> Thanks,
> > > > >> Han
> > > > >>
> > > > >> > Signed-off-by: zhen wang <zhewang@nvidia.com>
> > > > >> > ---
> > > > >> >  northd/northd.c | 4 ++--
> > > > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > >> >
> > > > >> > diff --git a/northd/northd.c b/northd/northd.c
> > > > >> > index 688a6e4ef..b7e64470f 100644
> > > > >> > --- a/northd/northd.c
> > > > >> > +++ b/northd/northd.c
> > > > >> > @@ -74,8 +74,8 @@ static bool use_ct_inv_match = true;
> > > > >> >
> > > > >> >  /* Default probe interval for NB and SB DB connections. */
> > > > >> >  #define DEFAULT_PROBE_INTERVAL_MSEC 5000
> > > > >> > -static int northd_probe_interval_nb = 0;
> > > > >> > -static int northd_probe_interval_sb = 0;
> > > > >> > +static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
> > > > >> > +static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
> > > > >> >  #define MAX_OVN_TAGS 4096
> > > > >> >
> > > > >> >  /* Pipeline stages. */
> > > > >> > --
> > > > >> > 2.20.1
> > > > >> >
> > > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index 688a6e4ef..b7e64470f 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -74,8 +74,8 @@  static bool use_ct_inv_match = true;
 
 /* Default probe interval for NB and SB DB connections. */
 #define DEFAULT_PROBE_INTERVAL_MSEC 5000
-static int northd_probe_interval_nb = 0;
-static int northd_probe_interval_sb = 0;
+static int northd_probe_interval_nb = DEFAULT_PROBE_INTERVAL_MSEC;
+static int northd_probe_interval_sb = DEFAULT_PROBE_INTERVAL_MSEC;
 #define MAX_OVN_TAGS 4096
 
 /* Pipeline stages. */