Message ID | 20171011085233.9645-1-nusiddiq@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] ovn pacemaker: Provide the option to configure inactivity probe value | expand |
Hi Andy. In the IRC meeting today, Numan suggested that you might be an appropriate reviewer for this patch, so if you agree and you have a chance to look at this then it would be appreciated. Thanks, Ben. On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: > From: Numan Siddique <nusiddiq@redhat.com> > > In the case of OVN HA deployments with openstack, it has been noticed > that the 5 seconds inactivity probe interval is not enough and ovsdb-servers > time out. > This patch > - providdes an option to configure this value. > - creates a connection row in NB/SB dbs and sets the target and > inactivity_probe values when the node is promoted to master. > > CC: Andy Zhou <azhou@ovn.org> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > --- > ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ > 1 file changed, 27 insertions(+) > > diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf > index fe1207c22..92620af6a 100755 > --- a/ovn/utilities/ovndb-servers.ocf > +++ b/ovn/utilities/ovndb-servers.ocf > @@ -8,6 +8,8 @@ > : ${SB_MASTER_PORT_DEFAULT="6642"} > : ${SB_MASTER_PROTO_DEFAULT="tcp"} > : ${MANAGE_NORTHD_DEFAULT="no"} > +: ${INACTIVE_PROBE_DEFAULT="60000"} > + > CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" > CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name OVN_REPL_INFO -s ovn_ovsdb_master_server" > OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} > @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} > SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_PORT_DEFAULT}} > SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_MASTER_PROTO_DEFAULT}} > MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} > +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${INACTIVE_PROBE_DEFAULT}} > > # Invalid IP address is an address that can never exist in the network, as > # mentioned in rfc-5737. The ovsdb servers connects to this IP address till > @@ -101,6 +104,14 @@ ovsdb_server_metadata() { > <content type="string" /> > </parameter> > > + <parameter name="inactive_probe_interval" unique="1"> > + <longdesc lang="en"> > + Inactive probe interval to set for ovsdb-server. > + </longdesc> > + <shortdesc lang="en">Set inactive probe interval</shortdesc> > + <content type="string" /> > + </parameter> > + > </parameters> > > <actions> > @@ -138,6 +149,22 @@ ovsdb_server_notify() { > ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > fi > > + conn=`ovn-nbctl get NB_global . connections` > + if [ "$conn" == "[]" ] > + then > + ovn-nbctl -- --id=@conn_uuid create Connection \ > +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ > +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid > + fi > + > + conn=`ovn-sbctl get SB_global . connections` > + if [ "$conn" == "[]" ] > + then > + ovn-sbctl -- --id=@conn_uuid create Connection \ > +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ > +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . connections=@conn_uuid > + fi > + > else > if [ "$MANAGE_NORTHD" = "yes" ]; then > # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so that > -- > 2.13.5 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Sure, I will take a look. On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <blp@ovn.org> wrote: > Hi Andy. In the IRC meeting today, Numan suggested that you might be an > appropriate reviewer for this patch, so if you agree and you have a > chance to look at this then it would be appreciated. > > Thanks, > > Ben. > > On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: >> From: Numan Siddique <nusiddiq@redhat.com> >> >> In the case of OVN HA deployments with openstack, it has been noticed >> that the 5 seconds inactivity probe interval is not enough and ovsdb-servers >> time out. >> This patch >> - providdes an option to configure this value. >> - creates a connection row in NB/SB dbs and sets the target and >> inactivity_probe values when the node is promoted to master. >> >> CC: Andy Zhou <azhou@ovn.org> >> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> >> --- >> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ >> 1 file changed, 27 insertions(+) >> >> diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf >> index fe1207c22..92620af6a 100755 >> --- a/ovn/utilities/ovndb-servers.ocf >> +++ b/ovn/utilities/ovndb-servers.ocf >> @@ -8,6 +8,8 @@ >> : ${SB_MASTER_PORT_DEFAULT="6642"} >> : ${SB_MASTER_PROTO_DEFAULT="tcp"} >> : ${MANAGE_NORTHD_DEFAULT="no"} >> +: ${INACTIVE_PROBE_DEFAULT="60000"} >> + >> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" >> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name OVN_REPL_INFO -s ovn_ovsdb_master_server" >> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} >> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} >> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_PORT_DEFAULT}} >> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_MASTER_PROTO_DEFAULT}} >> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} >> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${INACTIVE_PROBE_DEFAULT}} >> >> # Invalid IP address is an address that can never exist in the network, as >> # mentioned in rfc-5737. The ovsdb servers connects to this IP address till >> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { >> <content type="string" /> >> </parameter> >> >> + <parameter name="inactive_probe_interval" unique="1"> >> + <longdesc lang="en"> >> + Inactive probe interval to set for ovsdb-server. >> + </longdesc> >> + <shortdesc lang="en">Set inactive probe interval</shortdesc> >> + <content type="string" /> >> + </parameter> >> + >> </parameters> >> >> <actions> >> @@ -138,6 +149,22 @@ ovsdb_server_notify() { >> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd >> fi >> >> + conn=`ovn-nbctl get NB_global . connections` >> + if [ "$conn" == "[]" ] >> + then >> + ovn-nbctl -- --id=@conn_uuid create Connection \ >> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ >> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid >> + fi >> + >> + conn=`ovn-sbctl get SB_global . connections` >> + if [ "$conn" == "[]" ] >> + then >> + ovn-sbctl -- --id=@conn_uuid create Connection \ >> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ >> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . connections=@conn_uuid >> + fi >> + >> else >> if [ "$MANAGE_NORTHD" = "yes" ]; then >> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so that >> -- >> 2.13.5 >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi, Numan, I am curious why default 5 seconds inactivity time does not work? Do you have more details? Does the glitch usually happen around the HA switch over? If this happens during normal operation, Then this is not HA specific issue, but an indication of some connectivity issues. On Thu, Oct 12, 2017 at 11:08 AM, Andy Zhou <azhou@ovn.org> wrote: > Sure, I will take a look. > > On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <blp@ovn.org> wrote: >> Hi Andy. In the IRC meeting today, Numan suggested that you might be an >> appropriate reviewer for this patch, so if you agree and you have a >> chance to look at this then it would be appreciated. >> >> Thanks, >> >> Ben. >> >> On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: >>> From: Numan Siddique <nusiddiq@redhat.com> >>> >>> In the case of OVN HA deployments with openstack, it has been noticed >>> that the 5 seconds inactivity probe interval is not enough and ovsdb-servers >>> time out. >>> This patch >>> - providdes an option to configure this value. >>> - creates a connection row in NB/SB dbs and sets the target and >>> inactivity_probe values when the node is promoted to master. >>> >>> CC: Andy Zhou <azhou@ovn.org> >>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> >>> --- >>> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ >>> 1 file changed, 27 insertions(+) >>> >>> diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf >>> index fe1207c22..92620af6a 100755 >>> --- a/ovn/utilities/ovndb-servers.ocf >>> +++ b/ovn/utilities/ovndb-servers.ocf >>> @@ -8,6 +8,8 @@ >>> : ${SB_MASTER_PORT_DEFAULT="6642"} >>> : ${SB_MASTER_PROTO_DEFAULT="tcp"} >>> : ${MANAGE_NORTHD_DEFAULT="no"} >>> +: ${INACTIVE_PROBE_DEFAULT="60000"} >>> + >>> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" >>> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name OVN_REPL_INFO -s ovn_ovsdb_master_server" >>> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} >>> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} >>> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_PORT_DEFAULT}} >>> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_MASTER_PROTO_DEFAULT}} >>> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} >>> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${INACTIVE_PROBE_DEFAULT}} >>> >>> # Invalid IP address is an address that can never exist in the network, as >>> # mentioned in rfc-5737. The ovsdb servers connects to this IP address till >>> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { >>> <content type="string" /> >>> </parameter> >>> >>> + <parameter name="inactive_probe_interval" unique="1"> >>> + <longdesc lang="en"> >>> + Inactive probe interval to set for ovsdb-server. >>> + </longdesc> >>> + <shortdesc lang="en">Set inactive probe interval</shortdesc> >>> + <content type="string" /> >>> + </parameter> >>> + >>> </parameters> >>> >>> <actions> >>> @@ -138,6 +149,22 @@ ovsdb_server_notify() { >>> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd >>> fi >>> >>> + conn=`ovn-nbctl get NB_global . connections` >>> + if [ "$conn" == "[]" ] >>> + then >>> + ovn-nbctl -- --id=@conn_uuid create Connection \ >>> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ >>> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid >>> + fi >>> + >>> + conn=`ovn-sbctl get SB_global . connections` >>> + if [ "$conn" == "[]" ] >>> + then >>> + ovn-sbctl -- --id=@conn_uuid create Connection \ >>> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ >>> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . connections=@conn_uuid >>> + fi >>> + >>> else >>> if [ "$MANAGE_NORTHD" = "yes" ]; then >>> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so that >>> -- >>> 2.13.5 >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > Hi, Numan, > > I am curious why default 5 seconds inactivity time does not work? Do > you have more details? > > Does the glitch usually happen around the HA switch over? If this > happens during normal operation, > Then this is not HA specific issue, but an indication of some > connectivity issues. > Hi Andy. This happens in the openstack deployment and when the neutron-server is busy handling lots of API requests. Normally the deployment would be having 3 controller nodes and neutron-server would be running in each node. On each controller node, neutron-server starts around 10 - 12 neutron workers (which are separate processes). Number of API workers is a configuration option and normally number of cores = no of neutron works if not configured. I have tested in both physical nodes deployment and virtual deployment (3 controllers running as vms in a node). Around 40 connections are opened to the OVN north ovsdb-server by all the neutron workers in the physical deployment and around 15 connections are opened in the virtual deployment. When neutron-server is loaded with many API requests, I have noticed that, ovsdb-server drops the connections when it doesn't get the echo reply every 5 seconds. This leads to lot of reconnections to the ovsdb-server and the response from the neutron-server is very slow and bad. With this patch it seems to work fine. The issue is not because of any network issues but because of lots of connections from the neutron-server workers to the ovsdb-server and failure by the idl clients to reply to the echo request every 5 seconds when the neutron-server is loaded. I can make the patch to provide the configuration option to override the inactivity probe value so that it doesn't affect others who use the OVN OCF pacemaker script. Let me know your comments. Thanks Numan > > On Thu, Oct 12, 2017 at 11:08 AM, Andy Zhou <azhou@ovn.org> wrote: > > Sure, I will take a look. > > > > On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <blp@ovn.org> wrote: > >> Hi Andy. In the IRC meeting today, Numan suggested that you might be an > >> appropriate reviewer for this patch, so if you agree and you have a > >> chance to look at this then it would be appreciated. > >> > >> Thanks, > >> > >> Ben. > >> > >> On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: > >>> From: Numan Siddique <nusiddiq@redhat.com> > >>> > >>> In the case of OVN HA deployments with openstack, it has been noticed > >>> that the 5 seconds inactivity probe interval is not enough and > ovsdb-servers > >>> time out. > >>> This patch > >>> - providdes an option to configure this value. > >>> - creates a connection row in NB/SB dbs and sets the target and > >>> inactivity_probe values when the node is promoted to master. > >>> > >>> CC: Andy Zhou <azhou@ovn.org> > >>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > >>> --- > >>> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ > >>> 1 file changed, 27 insertions(+) > >>> > >>> diff --git a/ovn/utilities/ovndb-servers.ocf > b/ovn/utilities/ovndb-servers.ocf > >>> index fe1207c22..92620af6a 100755 > >>> --- a/ovn/utilities/ovndb-servers.ocf > >>> +++ b/ovn/utilities/ovndb-servers.ocf > >>> @@ -8,6 +8,8 @@ > >>> : ${SB_MASTER_PORT_DEFAULT="6642"} > >>> : ${SB_MASTER_PROTO_DEFAULT="tcp"} > >>> : ${MANAGE_NORTHD_DEFAULT="no"} > >>> +: ${INACTIVE_PROBE_DEFAULT="60000"} > >>> + > >>> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" > >>> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config > --name OVN_REPL_INFO -s ovn_ovsdb_master_server" > >>> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} > >>> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_ > nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} > >>> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_ > PORT_DEFAULT}} > >>> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_ > MASTER_PROTO_DEFAULT}} > >>> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} > >>> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${ > INACTIVE_PROBE_DEFAULT}} > >>> > >>> # Invalid IP address is an address that can never exist in the > network, as > >>> # mentioned in rfc-5737. The ovsdb servers connects to this IP > address till > >>> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { > >>> <content type="string" /> > >>> </parameter> > >>> > >>> + <parameter name="inactive_probe_interval" unique="1"> > >>> + <longdesc lang="en"> > >>> + Inactive probe interval to set for ovsdb-server. > >>> + </longdesc> > >>> + <shortdesc lang="en">Set inactive probe interval</shortdesc> > >>> + <content type="string" /> > >>> + </parameter> > >>> + > >>> </parameters> > >>> > >>> <actions> > >>> @@ -138,6 +149,22 @@ ovsdb_server_notify() { > >>> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > >>> fi > >>> > >>> + conn=`ovn-nbctl get NB_global . connections` > >>> + if [ "$conn" == "[]" ] > >>> + then > >>> + ovn-nbctl -- --id=@conn_uuid create Connection \ > >>> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ > >>> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . > connections=@conn_uuid > >>> + fi > >>> + > >>> + conn=`ovn-sbctl get SB_global . connections` > >>> + if [ "$conn" == "[]" ] > >>> + then > >>> + ovn-sbctl -- --id=@conn_uuid create Connection \ > >>> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ > >>> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . > connections=@conn_uuid > >>> + fi > >>> + > >>> else > >>> if [ "$MANAGE_NORTHD" = "yes" ]; then > >>> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so > that > >>> -- > >>> 2.13.5 > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> wrote: > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > >> Hi, Numan, >> >> I am curious why default 5 seconds inactivity time does not work? Do >> you have more details? >> >> Does the glitch usually happen around the HA switch over? If this >> happens during normal operation, >> Then this is not HA specific issue, but an indication of some >> connectivity issues. >> > > Hi Andy. This happens in the openstack deployment and when the > neutron-server is busy handling lots of API requests. > Normally the deployment would be having 3 controller nodes and > neutron-server would be running in each node. On each controller node, > neutron-server starts around 10 - 12 neutron workers (which are separate > processes). Number of API workers is a configuration option and normally > number of cores = no of neutron works if not configured. > > I have tested in both physical nodes deployment and virtual deployment (3 > controllers running as vms in a node). Around 40 connections are opened to > the OVN north ovsdb-server by all the neutron workers in the physical > deployment and around 15 connections are opened in the virtual deployment. > When neutron-server is loaded with many API requests, I have noticed that, > ovsdb-server drops the connections when it doesn't get the echo reply every > 5 seconds. This leads to lot of reconnections to the ovsdb-server and the > response from the neutron-server is very slow and bad. With this patch it > seems to work fine. > > The issue is not because of any network issues but because of lots of > connections from the neutron-server workers to the ovsdb-server and failure > by the idl clients to reply to the echo request every 5 seconds when the > neutron-server is loaded. We have to disable the inactivity probe everywhere each time we have done performance testing so far. > I can make the patch to provide the configuration option to override the > inactivity probe value so that it doesn't affect others who use the OVN OCF > pacemaker script. > > Let me know your comments. I think the default through this script should match the normal default. It looks like it defaults to 60s in this patch instead of 5s? I would make it match. I do like exposing the ability to change it, though. We could consider setting a different default through our OpenStack work. > > Thanks > Numan > > >> >> On Thu, Oct 12, 2017 at 11:08 AM, Andy Zhou <azhou@ovn.org> wrote: >> > Sure, I will take a look. >> > >> > On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <blp@ovn.org> wrote: >> >> Hi Andy. In the IRC meeting today, Numan suggested that you might be an >> >> appropriate reviewer for this patch, so if you agree and you have a >> >> chance to look at this then it would be appreciated. >> >> >> >> Thanks, >> >> >> >> Ben. >> >> >> >> On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: >> >>> From: Numan Siddique <nusiddiq@redhat.com> >> >>> >> >>> In the case of OVN HA deployments with openstack, it has been noticed >> >>> that the 5 seconds inactivity probe interval is not enough and >> ovsdb-servers >> >>> time out. >> >>> This patch >> >>> - providdes an option to configure this value. >> >>> - creates a connection row in NB/SB dbs and sets the target and >> >>> inactivity_probe values when the node is promoted to master. >> >>> >> >>> CC: Andy Zhou <azhou@ovn.org> >> >>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> >> >>> --- >> >>> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ >> >>> 1 file changed, 27 insertions(+) >> >>> >> >>> diff --git a/ovn/utilities/ovndb-servers.ocf >> b/ovn/utilities/ovndb-servers.ocf >> >>> index fe1207c22..92620af6a 100755 >> >>> --- a/ovn/utilities/ovndb-servers.ocf >> >>> +++ b/ovn/utilities/ovndb-servers.ocf >> >>> @@ -8,6 +8,8 @@ >> >>> : ${SB_MASTER_PORT_DEFAULT="6642"} >> >>> : ${SB_MASTER_PROTO_DEFAULT="tcp"} >> >>> : ${MANAGE_NORTHD_DEFAULT="no"} >> >>> +: ${INACTIVE_PROBE_DEFAULT="60000"} >> >>> + >> >>> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" >> >>> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config >> --name OVN_REPL_INFO -s ovn_ovsdb_master_server" >> >>> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} >> >>> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_ >> nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} >> >>> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_ >> PORT_DEFAULT}} >> >>> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_ >> MASTER_PROTO_DEFAULT}} >> >>> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} >> >>> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${ >> INACTIVE_PROBE_DEFAULT}} >> >>> >> >>> # Invalid IP address is an address that can never exist in the >> network, as >> >>> # mentioned in rfc-5737. The ovsdb servers connects to this IP >> address till >> >>> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { >> >>> <content type="string" /> >> >>> </parameter> >> >>> >> >>> + <parameter name="inactive_probe_interval" unique="1"> >> >>> + <longdesc lang="en"> >> >>> + Inactive probe interval to set for ovsdb-server. >> >>> + </longdesc> >> >>> + <shortdesc lang="en">Set inactive probe interval</shortdesc> >> >>> + <content type="string" /> >> >>> + </parameter> >> >>> + >> >>> </parameters> >> >>> >> >>> <actions> >> >>> @@ -138,6 +149,22 @@ ovsdb_server_notify() { >> >>> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd >> >>> fi >> >>> >> >>> + conn=`ovn-nbctl get NB_global . connections` >> >>> + if [ "$conn" == "[]" ] >> >>> + then >> >>> + ovn-nbctl -- --id=@conn_uuid create Connection \ >> >>> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ >> >>> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . >> connections=@conn_uuid >> >>> + fi >> >>> + >> >>> + conn=`ovn-sbctl get SB_global . connections` >> >>> + if [ "$conn" == "[]" ] >> >>> + then >> >>> + ovn-sbctl -- --id=@conn_uuid create Connection \ >> >>> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ >> >>> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . >> connections=@conn_uuid >> >>> + fi >> >>> + >> >>> else >> >>> if [ "$MANAGE_NORTHD" = "yes" ]; then >> >>> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so >> that >> >>> -- >> >>> 2.13.5 >> >>> >> >>> _______________________________________________ >> >>> dev mailing list >> >>> dev@openvswitch.org >> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> wrote: > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > >> Hi, Numan, > >> > >> I am curious why default 5 seconds inactivity time does not work? Do > >> you have more details? > >> > >> Does the glitch usually happen around the HA switch over? If this > >> happens during normal operation, > >> Then this is not HA specific issue, but an indication of some > >> connectivity issues. > >> > > > > Hi Andy. This happens in the openstack deployment and when the > > neutron-server is busy handling lots of API requests. > > Normally the deployment would be having 3 controller nodes and > > neutron-server would be running in each node. On each controller node, > > neutron-server starts around 10 - 12 neutron workers (which are separate > > processes). Number of API workers is a configuration option and normally > > number of cores = no of neutron works if not configured. > > > > I have tested in both physical nodes deployment and virtual deployment (3 > > controllers running as vms in a node). Around 40 connections are opened to > > the OVN north ovsdb-server by all the neutron workers in the physical > > deployment and around 15 connections are opened in the virtual deployment. > > When neutron-server is loaded with many API requests, I have noticed that, > > ovsdb-server drops the connections when it doesn't get the echo reply every > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and the > > response from the neutron-server is very slow and bad. With this patch it > > seems to work fine. > > > > The issue is not because of any network issues but because of lots of > > connections from the neutron-server workers to the ovsdb-server and failure > > by the idl clients to reply to the echo request every 5 seconds when the > > neutron-server is loaded. > > We have to disable the inactivity probe everywhere each time we have > done performance testing so far. Really this seems that it's a bug (or inadequacy) in ovsdb-server. It's pretty sad that ovsdb-server can't reply within 5 seconds (maybe there's a 2x or 3x multiplier on the response time, I don't recall). I hope that the clustered database does better here. That said, if in the real world we need 60 seconds for now, let's use it but remember that we should get our act together later. (Maybe a comment would be helpful.)
On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp@ovn.org> wrote: > On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> > wrote: > > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > > > >> Hi, Numan, > > >> > > >> I am curious why default 5 seconds inactivity time does not work? Do > > >> you have more details? > > >> > > >> Does the glitch usually happen around the HA switch over? If this > > >> happens during normal operation, > > >> Then this is not HA specific issue, but an indication of some > > >> connectivity issues. > > >> > > > > > > Hi Andy. This happens in the openstack deployment and when the > > > neutron-server is busy handling lots of API requests. > > > Normally the deployment would be having 3 controller nodes and > > > neutron-server would be running in each node. On each controller node, > > > neutron-server starts around 10 - 12 neutron workers (which are > separate > > > processes). Number of API workers is a configuration option and > normally > > > number of cores = no of neutron works if not configured. > > > > > > I have tested in both physical nodes deployment and virtual > deployment (3 > > > controllers running as vms in a node). Around 40 connections are > opened to > > > the OVN north ovsdb-server by all the neutron workers in the physical > > > deployment and around 15 connections are opened in the virtual > deployment. > > > When neutron-server is loaded with many API requests, I have noticed > that, > > > ovsdb-server drops the connections when it doesn't get the echo reply > every > > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and > the > > > response from the neutron-server is very slow and bad. With this > patch it > > > seems to work fine. > > > > > > The issue is not because of any network issues but because of lots of > > > connections from the neutron-server workers to the ovsdb-server and > failure > > > by the idl clients to reply to the echo request every 5 seconds when > the > > > neutron-server is loaded. > > > > We have to disable the inactivity probe everywhere each time we have > > done performance testing so far. > > Really this seems that it's a bug (or inadequacy) in ovsdb-server. It's > pretty sad that ovsdb-server can't reply within 5 seconds (maybe there's > a 2x or 3x multiplier on the response time, I don't recall). I hope > that the clustered database does better here. > > That said, if in the real world we need 60 seconds for now, let's use it > but remember that we should get our act together later. (Maybe a > comment would be helpful.) > Thanks. I will add relevant comments in my next patch.
On Fri, Oct 13, 2017 at 9:36 PM, Russell Bryant <russell@ovn.org> wrote: > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> > wrote: > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > >> Hi, Numan, > >> > >> I am curious why default 5 seconds inactivity time does not work? Do > >> you have more details? > >> > >> Does the glitch usually happen around the HA switch over? If this > >> happens during normal operation, > >> Then this is not HA specific issue, but an indication of some > >> connectivity issues. > >> > > > > Hi Andy. This happens in the openstack deployment and when the > > neutron-server is busy handling lots of API requests. > > Normally the deployment would be having 3 controller nodes and > > neutron-server would be running in each node. On each controller node, > > neutron-server starts around 10 - 12 neutron workers (which are separate > > processes). Number of API workers is a configuration option and normally > > number of cores = no of neutron works if not configured. > > > > I have tested in both physical nodes deployment and virtual deployment > (3 > > controllers running as vms in a node). Around 40 connections are opened > to > > the OVN north ovsdb-server by all the neutron workers in the physical > > deployment and around 15 connections are opened in the virtual > deployment. > > When neutron-server is loaded with many API requests, I have noticed > that, > > ovsdb-server drops the connections when it doesn't get the echo reply > every > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and the > > response from the neutron-server is very slow and bad. With this patch > it > > seems to work fine. > > > > The issue is not because of any network issues but because of lots of > > connections from the neutron-server workers to the ovsdb-server and > failure > > by the idl clients to reply to the echo request every 5 seconds when the > > neutron-server is loaded. > > We have to disable the inactivity probe everywhere each time we have > done performance testing so far. > > > I can make the patch to provide the configuration option to override the > > inactivity probe value so that it doesn't affect others who use the OVN > OCF > > pacemaker script. > > > > Let me know your comments. > > I think the default through this script should match the normal > default. It looks like it defaults to 60s in this patch instead of > 5s? I would make it match. Ack. Will do that in the next patch. Thanks > I do like exposing the ability to change > it, though. We could consider setting a different default through our > OpenStack work. > > > > > Thanks > > Numan > > > > > >> > >> On Thu, Oct 12, 2017 at 11:08 AM, Andy Zhou <azhou@ovn.org> wrote: > >> > Sure, I will take a look. > >> > > >> > On Thu, Oct 12, 2017 at 10:49 AM, Ben Pfaff <blp@ovn.org> wrote: > >> >> Hi Andy. In the IRC meeting today, Numan suggested that you might > be an > >> >> appropriate reviewer for this patch, so if you agree and you have a > >> >> chance to look at this then it would be appreciated. > >> >> > >> >> Thanks, > >> >> > >> >> Ben. > >> >> > >> >> On Wed, Oct 11, 2017 at 02:22:33PM +0530, nusiddiq@redhat.com wrote: > >> >>> From: Numan Siddique <nusiddiq@redhat.com> > >> >>> > >> >>> In the case of OVN HA deployments with openstack, it has been > noticed > >> >>> that the 5 seconds inactivity probe interval is not enough and > >> ovsdb-servers > >> >>> time out. > >> >>> This patch > >> >>> - providdes an option to configure this value. > >> >>> - creates a connection row in NB/SB dbs and sets the target and > >> >>> inactivity_probe values when the node is promoted to master. > >> >>> > >> >>> CC: Andy Zhou <azhou@ovn.org> > >> >>> Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > >> >>> --- > >> >>> ovn/utilities/ovndb-servers.ocf | 27 +++++++++++++++++++++++++++ > >> >>> 1 file changed, 27 insertions(+) > >> >>> > >> >>> diff --git a/ovn/utilities/ovndb-servers.ocf > >> b/ovn/utilities/ovndb-servers.ocf > >> >>> index fe1207c22..92620af6a 100755 > >> >>> --- a/ovn/utilities/ovndb-servers.ocf > >> >>> +++ b/ovn/utilities/ovndb-servers.ocf > >> >>> @@ -8,6 +8,8 @@ > >> >>> : ${SB_MASTER_PORT_DEFAULT="6642"} > >> >>> : ${SB_MASTER_PROTO_DEFAULT="tcp"} > >> >>> : ${MANAGE_NORTHD_DEFAULT="no"} > >> >>> +: ${INACTIVE_PROBE_DEFAULT="60000"} > >> >>> + > >> >>> CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" > >> >>> CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config > >> --name OVN_REPL_INFO -s ovn_ovsdb_master_server" > >> >>> OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} > >> >>> @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_ > >> nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} > >> >>> SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_ > >> PORT_DEFAULT}} > >> >>> SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_ > >> MASTER_PROTO_DEFAULT}} > >> >>> MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_ > NORTHD_DEFAULT}} > >> >>> +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${ > >> INACTIVE_PROBE_DEFAULT}} > >> >>> > >> >>> # Invalid IP address is an address that can never exist in the > >> network, as > >> >>> # mentioned in rfc-5737. The ovsdb servers connects to this IP > >> address till > >> >>> @@ -101,6 +104,14 @@ ovsdb_server_metadata() { > >> >>> <content type="string" /> > >> >>> </parameter> > >> >>> > >> >>> + <parameter name="inactive_probe_interval" unique="1"> > >> >>> + <longdesc lang="en"> > >> >>> + Inactive probe interval to set for ovsdb-server. > >> >>> + </longdesc> > >> >>> + <shortdesc lang="en">Set inactive probe interval</shortdesc> > >> >>> + <content type="string" /> > >> >>> + </parameter> > >> >>> + > >> >>> </parameters> > >> >>> > >> >>> <actions> > >> >>> @@ -138,6 +149,22 @@ ovsdb_server_notify() { > >> >>> ${OVN_CTL} --ovn-manage-ovsdb=no start_northd > >> >>> fi > >> >>> > >> >>> + conn=`ovn-nbctl get NB_global . connections` > >> >>> + if [ "$conn" == "[]" ] > >> >>> + then > >> >>> + ovn-nbctl -- --id=@conn_uuid create Connection \ > >> >>> +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ > >> >>> +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . > >> connections=@conn_uuid > >> >>> + fi > >> >>> + > >> >>> + conn=`ovn-sbctl get SB_global . connections` > >> >>> + if [ "$conn" == "[]" ] > >> >>> + then > >> >>> + ovn-sbctl -- --id=@conn_uuid create Connection \ > >> >>> +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ > >> >>> +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . > >> connections=@conn_uuid > >> >>> + fi > >> >>> + > >> >>> else > >> >>> if [ "$MANAGE_NORTHD" = "yes" ]; then > >> >>> # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so > >> that > >> >>> -- > >> >>> 2.13.5 > >> >>> > >> >>> _______________________________________________ > >> >>> dev mailing list > >> >>> dev@openvswitch.org > >> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > > -- > Russell Bryant >
On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp@ovn.org> wrote: > On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> > wrote: > > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > > > >> Hi, Numan, > > >> > > >> I am curious why default 5 seconds inactivity time does not work? Do > > >> you have more details? > > >> > > >> Does the glitch usually happen around the HA switch over? If this > > >> happens during normal operation, > > >> Then this is not HA specific issue, but an indication of some > > >> connectivity issues. > > >> > > > > > > Hi Andy. This happens in the openstack deployment and when the > > > neutron-server is busy handling lots of API requests. > > > Normally the deployment would be having 3 controller nodes and > > > neutron-server would be running in each node. On each controller node, > > > neutron-server starts around 10 - 12 neutron workers (which are > separate > > > processes). Number of API workers is a configuration option and > normally > > > number of cores = no of neutron works if not configured. > > > > > > I have tested in both physical nodes deployment and virtual > deployment (3 > > > controllers running as vms in a node). Around 40 connections are > opened to > > > the OVN north ovsdb-server by all the neutron workers in the physical > > > deployment and around 15 connections are opened in the virtual > deployment. > > > When neutron-server is loaded with many API requests, I have noticed > that, > > > ovsdb-server drops the connections when it doesn't get the echo reply > every > > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and > the > > > response from the neutron-server is very slow and bad. With this > patch it > > > seems to work fine. > > > > > > The issue is not because of any network issues but because of lots of > > > connections from the neutron-server workers to the ovsdb-server and > failure > > > by the idl clients to reply to the echo request every 5 seconds when > the > > > neutron-server is loaded. > > > > We have to disable the inactivity probe everywhere each time we have > > done performance testing so far. > > Really this seems that it's a bug (or inadequacy) in ovsdb-server. It's > pretty sad that ovsdb-server can't reply within 5 seconds It's actually the ovsdb python idl client which is not able to reply within 5 seconds for the echo request from ovsdb-server. > (maybe there's > a 2x or 3x multiplier on the response time, I don't recall). I hope > that the clustered database does better here. > > That said, if in the real world we need 60 seconds for now, let's use it > but remember that we should get our act together later. (Maybe a > comment would be helpful.) >
On Mon, Oct 16, 2017 at 02:50:48PM +0530, Numan Siddique wrote: > On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp@ovn.org> wrote: > > > On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > > > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> > > wrote: > > > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > > > > > >> Hi, Numan, > > > >> > > > >> I am curious why default 5 seconds inactivity time does not work? Do > > > >> you have more details? > > > >> > > > >> Does the glitch usually happen around the HA switch over? If this > > > >> happens during normal operation, > > > >> Then this is not HA specific issue, but an indication of some > > > >> connectivity issues. > > > >> > > > > > > > > Hi Andy. This happens in the openstack deployment and when the > > > > neutron-server is busy handling lots of API requests. > > > > Normally the deployment would be having 3 controller nodes and > > > > neutron-server would be running in each node. On each controller node, > > > > neutron-server starts around 10 - 12 neutron workers (which are > > separate > > > > processes). Number of API workers is a configuration option and > > normally > > > > number of cores = no of neutron works if not configured. > > > > > > > > I have tested in both physical nodes deployment and virtual > > deployment (3 > > > > controllers running as vms in a node). Around 40 connections are > > opened to > > > > the OVN north ovsdb-server by all the neutron workers in the physical > > > > deployment and around 15 connections are opened in the virtual > > deployment. > > > > When neutron-server is loaded with many API requests, I have noticed > > that, > > > > ovsdb-server drops the connections when it doesn't get the echo reply > > every > > > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and > > the > > > > response from the neutron-server is very slow and bad. With this > > patch it > > > > seems to work fine. > > > > > > > > The issue is not because of any network issues but because of lots of > > > > connections from the neutron-server workers to the ovsdb-server and > > failure > > > > by the idl clients to reply to the echo request every 5 seconds when > > the > > > > neutron-server is loaded. > > > > > > We have to disable the inactivity probe everywhere each time we have > > > done performance testing so far. > > > > Really this seems that it's a bug (or inadequacy) in ovsdb-server. It's > > pretty sad that ovsdb-server can't reply within 5 seconds > > > It's actually the ovsdb python idl client which is not able to reply within > 5 seconds for the > echo request from ovsdb-server. Oh, I'm surprised that ovsdb-server is doing the echo-requests, I thought that we generally did them from the client end.
On Mon, Oct 16, 2017 at 10:58:43AM -0700, Ben Pfaff wrote: > On Mon, Oct 16, 2017 at 02:50:48PM +0530, Numan Siddique wrote: > > On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp@ovn.org> wrote: > > > > > On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > > > > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique <nusiddiq@redhat.com> > > > wrote: > > > > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> wrote: > > > > > > > > > >> Hi, Numan, > > > > >> > > > > >> I am curious why default 5 seconds inactivity time does not work? Do > > > > >> you have more details? > > > > >> > > > > >> Does the glitch usually happen around the HA switch over? If this > > > > >> happens during normal operation, > > > > >> Then this is not HA specific issue, but an indication of some > > > > >> connectivity issues. > > > > >> > > > > > > > > > > Hi Andy. This happens in the openstack deployment and when the > > > > > neutron-server is busy handling lots of API requests. > > > > > Normally the deployment would be having 3 controller nodes and > > > > > neutron-server would be running in each node. On each controller node, > > > > > neutron-server starts around 10 - 12 neutron workers (which are > > > separate > > > > > processes). Number of API workers is a configuration option and > > > normally > > > > > number of cores = no of neutron works if not configured. > > > > > > > > > > I have tested in both physical nodes deployment and virtual > > > deployment (3 > > > > > controllers running as vms in a node). Around 40 connections are > > > opened to > > > > > the OVN north ovsdb-server by all the neutron workers in the physical > > > > > deployment and around 15 connections are opened in the virtual > > > deployment. > > > > > When neutron-server is loaded with many API requests, I have noticed > > > that, > > > > > ovsdb-server drops the connections when it doesn't get the echo reply > > > every > > > > > 5 seconds. This leads to lot of reconnections to the ovsdb-server and > > > the > > > > > response from the neutron-server is very slow and bad. With this > > > patch it > > > > > seems to work fine. > > > > > > > > > > The issue is not because of any network issues but because of lots of > > > > > connections from the neutron-server workers to the ovsdb-server and > > > failure > > > > > by the idl clients to reply to the echo request every 5 seconds when > > > the > > > > > neutron-server is loaded. > > > > > > > > We have to disable the inactivity probe everywhere each time we have > > > > done performance testing so far. > > > > > > Really this seems that it's a bug (or inadequacy) in ovsdb-server. It's > > > pretty sad that ovsdb-server can't reply within 5 seconds > > > > > > It's actually the ovsdb python idl client which is not able to reply within > > 5 seconds for the > > echo request from ovsdb-server. > > Oh, I'm surprised that ovsdb-server is doing the echo-requests, I > thought that we generally did them from the client end. One perfectly acceptable approach might be to simply disable echo-requests on the server side entirely and do them from the client.
Acked-By: Miguel Angel Ajo <majopela@redhat.com> It makes sense to be able to configure the inactive probe time, also disabling the echo requests on server, as Ben said I agree would also make sense in any future patch. On Mon, Oct 16, 2017 at 9:48 PM, Ben Pfaff <blp@ovn.org> wrote: > On Mon, Oct 16, 2017 at 10:58:43AM -0700, Ben Pfaff wrote: > > On Mon, Oct 16, 2017 at 02:50:48PM +0530, Numan Siddique wrote: > > > On Sat, Oct 14, 2017 at 2:56 AM, Ben Pfaff <blp@ovn.org> wrote: > > > > > > > On Fri, Oct 13, 2017 at 12:06:56PM -0400, Russell Bryant wrote: > > > > > On Fri, Oct 13, 2017 at 8:30 AM, Numan Siddique < > nusiddiq@redhat.com> > > > > wrote: > > > > > > On Fri, Oct 13, 2017 at 6:05 AM, Andy Zhou <azhou@ovn.org> > wrote: > > > > > > > > > > > >> Hi, Numan, > > > > > >> > > > > > >> I am curious why default 5 seconds inactivity time does not > work? Do > > > > > >> you have more details? > > > > > >> > > > > > >> Does the glitch usually happen around the HA switch over? If > this > > > > > >> happens during normal operation, > > > > > >> Then this is not HA specific issue, but an indication of some > > > > > >> connectivity issues. > > > > > >> > > > > > > > > > > > > Hi Andy. This happens in the openstack deployment and when the > > > > > > neutron-server is busy handling lots of API requests. > > > > > > Normally the deployment would be having 3 controller nodes and > > > > > > neutron-server would be running in each node. On each > controller node, > > > > > > neutron-server starts around 10 - 12 neutron workers (which are > > > > separate > > > > > > processes). Number of API workers is a configuration option and > > > > normally > > > > > > number of cores = no of neutron works if not configured. > > > > > > > > > > > > I have tested in both physical nodes deployment and virtual > > > > deployment (3 > > > > > > controllers running as vms in a node). Around 40 connections are > > > > opened to > > > > > > the OVN north ovsdb-server by all the neutron workers in the > physical > > > > > > deployment and around 15 connections are opened in the virtual > > > > deployment. > > > > > > When neutron-server is loaded with many API requests, I have > noticed > > > > that, > > > > > > ovsdb-server drops the connections when it doesn't get the echo > reply > > > > every > > > > > > 5 seconds. This leads to lot of reconnections to the > ovsdb-server and > > > > the > > > > > > response from the neutron-server is very slow and bad. With this > > > > patch it > > > > > > seems to work fine. > > > > > > > > > > > > The issue is not because of any network issues but because of > lots of > > > > > > connections from the neutron-server workers to the ovsdb-server > and > > > > failure > > > > > > by the idl clients to reply to the echo request every 5 seconds > when > > > > the > > > > > > neutron-server is loaded. > > > > > > > > > > We have to disable the inactivity probe everywhere each time we > have > > > > > done performance testing so far. > > > > > > > > Really this seems that it's a bug (or inadequacy) in ovsdb-server. > It's > > > > pretty sad that ovsdb-server can't reply within 5 seconds > > > > > > > > > It's actually the ovsdb python idl client which is not able to reply > within > > > 5 seconds for the > > > echo request from ovsdb-server. > > > > Oh, I'm surprised that ovsdb-server is doing the echo-requests, I > > thought that we generally did them from the client end. > > One perfectly acceptable approach might be to simply disable > echo-requests on the server side entirely and do them from the client. > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
diff --git a/ovn/utilities/ovndb-servers.ocf b/ovn/utilities/ovndb-servers.ocf index fe1207c22..92620af6a 100755 --- a/ovn/utilities/ovndb-servers.ocf +++ b/ovn/utilities/ovndb-servers.ocf @@ -8,6 +8,8 @@ : ${SB_MASTER_PORT_DEFAULT="6642"} : ${SB_MASTER_PROTO_DEFAULT="tcp"} : ${MANAGE_NORTHD_DEFAULT="no"} +: ${INACTIVE_PROBE_DEFAULT="60000"} + CRM_MASTER="${HA_SBIN_DIR}/crm_master -l reboot" CRM_ATTR_REPL_INFO="${HA_SBIN_DIR}/crm_attribute --type crm_config --name OVN_REPL_INFO -s ovn_ovsdb_master_server" OVN_CTL=${OCF_RESKEY_ovn_ctl:-${OVN_CTL_DEFAULT}} @@ -17,6 +19,7 @@ NB_MASTER_PROTO=${OCF_RESKEY_nb_master_protocol:-${NB_MASTER_PROTO_DEFAULT}} SB_MASTER_PORT=${OCF_RESKEY_sb_master_port:-${SB_MASTER_PORT_DEFAULT}} SB_MASTER_PROTO=${OCF_RESKEY_sb_master_protocol:-${SB_MASTER_PROTO_DEFAULT}} MANAGE_NORTHD=${OCF_RESKEY_manage_northd:-${MANAGE_NORTHD_DEFAULT}} +INACTIVE_PROBE=${OCF_RESKEY_inactive_probe_interval:-${INACTIVE_PROBE_DEFAULT}} # Invalid IP address is an address that can never exist in the network, as # mentioned in rfc-5737. The ovsdb servers connects to this IP address till @@ -101,6 +104,14 @@ ovsdb_server_metadata() { <content type="string" /> </parameter> + <parameter name="inactive_probe_interval" unique="1"> + <longdesc lang="en"> + Inactive probe interval to set for ovsdb-server. + </longdesc> + <shortdesc lang="en">Set inactive probe interval</shortdesc> + <content type="string" /> + </parameter> + </parameters> <actions> @@ -138,6 +149,22 @@ ovsdb_server_notify() { ${OVN_CTL} --ovn-manage-ovsdb=no start_northd fi + conn=`ovn-nbctl get NB_global . connections` + if [ "$conn" == "[]" ] + then + ovn-nbctl -- --id=@conn_uuid create Connection \ +target="p${NB_MASTER_PROTO}\:${NB_MASTER_PORT}\:${MASTER_IP}" \ +inactivity_probe=$INACTIVE_PROBE -- set NB_Global . connections=@conn_uuid + fi + + conn=`ovn-sbctl get SB_global . connections` + if [ "$conn" == "[]" ] + then + ovn-sbctl -- --id=@conn_uuid create Connection \ +target="p${SB_MASTER_PROTO}\:${SB_MASTER_PORT}\:${MASTER_IP}" \ +inactivity_probe=$INACTIVE_PROBE -- set SB_Global . connections=@conn_uuid + fi + else if [ "$MANAGE_NORTHD" = "yes" ]; then # Stop ovn-northd service. Set --ovn-manage-ovsdb=no so that