Message ID | 20190712175000.11085-1-nusiddiq@redhat.com |
---|---|
State | Superseded |
Headers | show |
Series | [ovs-dev] ovn-northd: Fix the ovn-northd continous looping | expand |
On 7/12/2019 10:50 AM, nusiddiq@redhat.com wrote: > From: Numan Siddique <nusiddiq@redhat.com> In the subject line s/continous/continuous/ > > ovn-northd wakes up continuously from poll_block(). This issue can be reproduced > in the sandbox with the below commands > > ovn-nbctl lr-add lr0 > ovn-nbctl ls-add public > ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24 > ovn-nbctl lsp-add public public-lr0 > ovn-nbctl lsp-set-type public-lr0 router > ovn-nbctl lsp-set-addresses public-lr0 router > ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public > ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20 > > This issue is seen after the commit [1], which makes use of the function - > sbrec_port_binding_update_nat_addresses_addvalue() to add a value to > Port_Binding.nat_addresses column. > > Looks like the IDL client code is sending the transactions to the ovsdb-server repeatedly > to update the Port_Binding.nat_addresses even though the Southbound DB has updated > the column when this function is used. The actual bug seems to be in the IDL client code > and that needs to be fixed. This patch as a quick fix, fixes ovn-northd's continuous loop > by not using this function, instead making use of sbrec_port_binding_set_nat_addresses(). > > The below messages are seen continuously when the ovn-nortdh debug logs are enabled. > > **** > > 2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received reply, > result=[{},{"count":1},{"count":1}], id=18628 > 2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage) > 2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request, > method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"}, > {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row": > {"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses", > "insert",["set",["00:00:20:20:12:13 172.168.0.100 is_chassis_resident(\"cr-lr0-public\")"]]]], > "where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18629 > > 2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18629 > 2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout at ../lib/ovsdb-idl.c:5397 (75% CPU usage) > 2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request, > method="transact", params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"}, > {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]], > "row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"}, > {"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13 172.168.0.100 > is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid", > "56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], id=18630 > 2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received reply, result=[{},{"count":1},{"count":1}], id=18630 > ****** > > The OpenStack CI tests for networking-ovn is frequently failing few tests after this > commit. The failure seems to be related to timing issues as ovn-northd is hogging > the CPU continuously. We are also seeing travis CI test failures after this commit. > > [1] - ed198fb3b92e > > Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with reside-on-redirect-chassis options set") > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > --- > ovn/northd/ovn-northd.c | 19 ++++++++++--------- > 1 file changed, 10 insertions(+), 9 deletions(-) > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > index ce382ac89..127227712 100644 > --- a/ovn/northd/ovn-northd.c > +++ b/ovn/northd/ovn-northd.c > @@ -2530,13 +2530,6 @@ ovn_port_update_sbrec(struct northd_context *ctx, > } > } > > - sbrec_port_binding_set_nat_addresses(op->sb, > - (const char **) nats, n_nats); > - for (size_t i = 0; i < n_nats; i++) { > - free(nats[i]); > - } > - free(nats); > - > /* Add the router mac and IPv4 addresses to > * Port_Binding.nat_addresses so that GARP is sent for these > * IPs by the ovn-controller on which the distributed gateway > @@ -2578,10 +2571,18 @@ ovn_port_update_sbrec(struct northd_context *ctx, > op->peer->od->l3redirect_port->json_key); > } > > - sbrec_port_binding_update_nat_addresses_addvalue( > - op->sb, ds_cstr(&garp_info)); > + n_nats++; > + nats = xrealloc(nats, (n_nats * sizeof *nats)); > + nats[n_nats - 1] = ds_steal_cstr(&garp_info); > ds_destroy(&garp_info); > } > + > + sbrec_port_binding_set_nat_addresses(op->sb, > + (const char **) nats, n_nats); > + for (size_t i = 0; i < n_nats; i++) { > + free(nats[i]); > + } > + free(nats); > } > > sbrec_port_binding_set_parent_port(op->sb, op->nbsp->parent_name); Works for me. With the subject line fixed up I think it's fine. Tested-by: Greg Rose <gvrose8192@gmail.com> Reviewed-by: Greg Rose <gvrose8192@gmail.com>
On Sat, Jul 13, 2019 at 2:16 AM Gregory Rose <gvrose8192@gmail.com> wrote: > On 7/12/2019 10:50 AM, nusiddiq@redhat.com wrote: > > From: Numan Siddique <nusiddiq@redhat.com> > > In the subject line s/continous/continuous/ > > > > > ovn-northd wakes up continuously from poll_block(). This issue can be > reproduced > > in the sandbox with the below commands > > > > ovn-nbctl lr-add lr0 > > ovn-nbctl ls-add public > > ovn-nbctl lrp-add lr0 lr0-public 00:00:20:20:12:13 172.168.0.100/24 > > ovn-nbctl lsp-add public public-lr0 > > ovn-nbctl lsp-set-type public-lr0 router > > ovn-nbctl lsp-set-addresses public-lr0 router > > ovn-nbctl lsp-set-options public-lr0 router-port=lr0-public > > ovn-nbctl lrp-set-gateway-chassis lr0-public chassis-1 20 > > > > This issue is seen after the commit [1], which makes use of the function > - > > sbrec_port_binding_update_nat_addresses_addvalue() to add a value to > > Port_Binding.nat_addresses column. > > > > Looks like the IDL client code is sending the transactions to the > ovsdb-server repeatedly > > to update the Port_Binding.nat_addresses even though the Southbound DB > has updated > > the column when this function is used. The actual bug seems to be in the > IDL client code > > and that needs to be fixed. This patch as a quick fix, fixes > ovn-northd's continuous loop > > by not using this function, instead making use of > sbrec_port_binding_set_nat_addresses(). > > > > The below messages are seen continuously when the ovn-nortdh debug logs > are enabled. > > > > **** > > > > 2019-07-12T17:26:13.837Z|74512|jsonrpc|DBG|unix:sb1.ovsdb: received > reply, > > result=[{},{"count":1},{"count":1}], id=18628 > > 2019-07-12T17:26:13.837Z|74513|poll_loop|DBG|wakeup due to 0-ms timeout > at ../lib/ovsdb-idl.c:5397 (75% CPU usage) > > 2019-07-12T17:26:13.837Z|74514|jsonrpc|DBG|unix:sb1.ovsdb: send request, > > method="transact", > params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"}, > > > {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"row": > > > {"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"},{"mutations":[["nat_addresses", > > "insert",["set",["00:00:20:20:12:13 172.168.0.100 > is_chassis_resident(\"cr-lr0-public\")"]]]], > > > "where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], > id=18629 > > > > 2019-07-12T17:26:13.837Z|74516|jsonrpc|DBG|unix:sb1.ovsdb: received > reply, result=[{},{"count":1},{"count":1}], id=18629 > > 2019-07-12T17:26:13.837Z|74517|poll_loop|DBG|wakeup due to 0-ms timeout > at ../lib/ovsdb-idl.c:5397 (75% CPU usage) > > 2019-07-12T17:26:13.837Z|74518|jsonrpc|DBG|unix:sb1.ovsdb: send request, > > method="transact", > params=["OVN_Southbound",{"lock":"ovn_northd","op":"assert"}, > > > {"where":[["_uuid","==",["uuid","56a9eb75-8d3b-4144-b4e7-1bb749645011"]]], > > "row":{"nat_addresses":["set",[]]},"op":"update","table":"Port_Binding"}, > > {"mutations":[["nat_addresses","insert",["set",["00:00:20:20:12:13 > 172.168.0.100 > > > is_chassis_resident(\"cr-lr0-public\")"]]]],"where":[["_uuid","==",["uuid", > > > "56a9eb75-8d3b-4144-b4e7-1bb749645011"]]],"op":"mutate","table":"Port_Binding"}], > id=18630 > > 2019-07-12T17:26:13.837Z|74520|jsonrpc|DBG|unix:sb1.ovsdb: received > reply, result=[{},{"count":1},{"count":1}], id=18630 > > ****** > > > > The OpenStack CI tests for networking-ovn is frequently failing few > tests after this > > commit. The failure seems to be related to timing issues as ovn-northd > is hogging > > the CPU continuously. We are also seeing travis CI test failures after > this commit. > > > > [1] - ed198fb3b92e > > > > Fixes: ed198fb3b92e("ovn: Send GARP for the router ports with > reside-on-redirect-chassis options set") > > Signed-off-by: Numan Siddique <nusiddiq@redhat.com> > > --- > > ovn/northd/ovn-northd.c | 19 ++++++++++--------- > > 1 file changed, 10 insertions(+), 9 deletions(-) > > > > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c > > index ce382ac89..127227712 100644 > > --- a/ovn/northd/ovn-northd.c > > +++ b/ovn/northd/ovn-northd.c > > @@ -2530,13 +2530,6 @@ ovn_port_update_sbrec(struct northd_context *ctx, > > } > > } > > > > - sbrec_port_binding_set_nat_addresses(op->sb, > > - (const char **) nats, > n_nats); > > - for (size_t i = 0; i < n_nats; i++) { > > - free(nats[i]); > > - } > > - free(nats); > > - > > /* Add the router mac and IPv4 addresses to > > * Port_Binding.nat_addresses so that GARP is sent for > these > > * IPs by the ovn-controller on which the distributed > gateway > > @@ -2578,10 +2571,18 @@ ovn_port_update_sbrec(struct northd_context *ctx, > > > op->peer->od->l3redirect_port->json_key); > > } > > > > - sbrec_port_binding_update_nat_addresses_addvalue( > > - op->sb, ds_cstr(&garp_info)); > > + n_nats++; > > + nats = xrealloc(nats, (n_nats * sizeof *nats)); > > + nats[n_nats - 1] = ds_steal_cstr(&garp_info); > > ds_destroy(&garp_info); > > } > > + > > + sbrec_port_binding_set_nat_addresses(op->sb, > > + (const char **) nats, > n_nats); > > + for (size_t i = 0; i < n_nats; i++) { > > + free(nats[i]); > > + } > > + free(nats); > > } > > > > sbrec_port_binding_set_parent_port(op->sb, > op->nbsp->parent_name); > > Works for me. With the subject line fixed up I think it's fine. > Tested-by: Greg Rose <gvrose8192@gmail.com> > Reviewed-by: Greg Rose <gvrose8192@gmail.com> > Thanks for the review. I submitted v2 fixing the typo. Numan Thanks.
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c index ce382ac89..127227712 100644 --- a/ovn/northd/ovn-northd.c +++ b/ovn/northd/ovn-northd.c @@ -2530,13 +2530,6 @@ ovn_port_update_sbrec(struct northd_context *ctx, } } - sbrec_port_binding_set_nat_addresses(op->sb, - (const char **) nats, n_nats); - for (size_t i = 0; i < n_nats; i++) { - free(nats[i]); - } - free(nats); - /* Add the router mac and IPv4 addresses to * Port_Binding.nat_addresses so that GARP is sent for these * IPs by the ovn-controller on which the distributed gateway @@ -2578,10 +2571,18 @@ ovn_port_update_sbrec(struct northd_context *ctx, op->peer->od->l3redirect_port->json_key); } - sbrec_port_binding_update_nat_addresses_addvalue( - op->sb, ds_cstr(&garp_info)); + n_nats++; + nats = xrealloc(nats, (n_nats * sizeof *nats)); + nats[n_nats - 1] = ds_steal_cstr(&garp_info); ds_destroy(&garp_info); } + + sbrec_port_binding_set_nat_addresses(op->sb, + (const char **) nats, n_nats); + for (size_t i = 0; i < n_nats; i++) { + free(nats[i]); + } + free(nats); } sbrec_port_binding_set_parent_port(op->sb, op->nbsp->parent_name);