Message ID | 20211026004218.2206164-1-hzhou@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] northd.c: Fix north blocking on deleting duplicated SB datapath. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
On Mon, Oct 25, 2021 at 8:42 PM Han Zhou <hzhou@ovn.org> wrote: > > Duplicated datapaths shouldn't be created in the first place, but in > case it is created because of either bug or dirty data, it should be > properly deleted instead of causing permanent failure in northd. > > Currently, when there is a duplicated datapath record and a ip_multicast > record referencing it created, northd tries to delete the duplicated > datapath but the transaction fails due to schema constraint: > > transaction error: {"details":"Deletion of 1 weak reference(s) to deleted (or never-existing) rows from column \"datapath\" in \"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this column to become empty, but constraints on this column disallow an empty column.","error":"constraint violation"} > > Northd would be blocked forever until manual deletion of the > ip_multicast record. > > The problem is that in the same transaction only datapath is deleted but > not the ip_multicast record that references the datapath. In > build_ip_mcast() there is logic to delete ip_multicast records with > non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find > the datapath and regard it as valid whenever the > external-ids:logical-switch/router matches. This patch fixes that by > comparing the sbrec of the datapath afterwards, and regards it as valid > only if the sbrec matches, too. This way, both ip_multicast and datapath > records are deleted in the same transaction and won't cause any trouble. > > Reported-by: Vladislav Odintsov <odivlad@gmail.com> > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html > Signed-off-by: Han Zhou <hzhou@ovn.org> Acked-by: Numan Siddique <numans@ovn.org> Numan > --- > northd/northd.c | 7 ++++++- > tests/ovn-northd.at | 20 ++++++++++++++++++++ > 2 files changed, 26 insertions(+), 1 deletion(-) > > diff --git a/northd/northd.c b/northd/northd.c > index f71121486..da4f9cd14 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -1045,7 +1045,12 @@ ovn_datapath_from_sbrec(struct hmap *datapaths, > !smap_get_uuid(&sb->external_ids, "logical-router", &key)) { > return NULL; > } > - return ovn_datapath_find(datapaths, &key); > + struct ovn_datapath *od = ovn_datapath_find(datapaths, &key); > + if (od && (od->sb == sb)) { > + return od; > + } > + > + return NULL; > } > > static bool > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > index 544820764..e2b9924b6 100644 > --- a/tests/ovn-northd.at > +++ b/tests/ovn-northd.at > @@ -5446,3 +5446,23 @@ wait_row_count Port_binding 1 logical-port=S1-vm2 requested_chassis=$ch2_uuid > > AT_CLEANUP > ]) > + > +# Duplicated datapaths shouldn't be created, but in case it is created because > +# of bug or dirty data, it should be properly deleted instead of causing > +# permanent failure in northd. > +OVN_FOR_EACH_NORTHD([ > +AT_SETUP([handling duplicated datapaths]) > +ovn_start > + > +check ovn-nbctl --wait=sb ls-add ls1 > +ls1_uuid=$(fetch_column nb:Logical_Switch _uuid) > + > +# create a duplicated sb datapath (and an IP_Mulicast record that references > +# it) on purpose. > +AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123 -- create IP_Multicast datapath=@dp], [0], [ignore]) > + > +# northd should delete one of the datapaths in the end > +wait_row_count Datapath_Binding 1 > + > +AT_CLEANUP > +]) > -- > 2.30.2 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Tue, Oct 26, 2021 at 1:31 PM Numan Siddique <numans@ovn.org> wrote: > > On Mon, Oct 25, 2021 at 8:42 PM Han Zhou <hzhou@ovn.org> wrote: > > > > Duplicated datapaths shouldn't be created in the first place, but in > > case it is created because of either bug or dirty data, it should be > > properly deleted instead of causing permanent failure in northd. > > > > Currently, when there is a duplicated datapath record and a ip_multicast > > record referencing it created, northd tries to delete the duplicated > > datapath but the transaction fails due to schema constraint: > > > > transaction error: {"details":"Deletion of 1 weak reference(s) to deleted (or never-existing) rows from column \"datapath\" in \"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this column to become empty, but constraints on this column disallow an empty column.","error":"constraint violation"} > > > > Northd would be blocked forever until manual deletion of the > > ip_multicast record. > > > > The problem is that in the same transaction only datapath is deleted but > > not the ip_multicast record that references the datapath. In > > build_ip_mcast() there is logic to delete ip_multicast records with > > non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find > > the datapath and regard it as valid whenever the > > external-ids:logical-switch/router matches. This patch fixes that by > > comparing the sbrec of the datapath afterwards, and regards it as valid > > only if the sbrec matches, too. This way, both ip_multicast and datapath > > records are deleted in the same transaction and won't cause any trouble. > > > > Reported-by: Vladislav Odintsov <odivlad@gmail.com> > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html > > Signed-off-by: Han Zhou <hzhou@ovn.org> > > Acked-by: Numan Siddique <numans@ovn.org> > > Numan > Thanks Numan. I corrected a typo in the title (s/north/northd) and applied to the main, and backported up to branch-20.12. > > --- > > northd/northd.c | 7 ++++++- > > tests/ovn-northd.at | 20 ++++++++++++++++++++ > > 2 files changed, 26 insertions(+), 1 deletion(-) > > > > diff --git a/northd/northd.c b/northd/northd.c > > index f71121486..da4f9cd14 100644 > > --- a/northd/northd.c > > +++ b/northd/northd.c > > @@ -1045,7 +1045,12 @@ ovn_datapath_from_sbrec(struct hmap *datapaths, > > !smap_get_uuid(&sb->external_ids, "logical-router", &key)) { > > return NULL; > > } > > - return ovn_datapath_find(datapaths, &key); > > + struct ovn_datapath *od = ovn_datapath_find(datapaths, &key); > > + if (od && (od->sb == sb)) { > > + return od; > > + } > > + > > + return NULL; > > } > > > > static bool > > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at > > index 544820764..e2b9924b6 100644 > > --- a/tests/ovn-northd.at > > +++ b/tests/ovn-northd.at > > @@ -5446,3 +5446,23 @@ wait_row_count Port_binding 1 logical-port=S1-vm2 requested_chassis=$ch2_uuid > > > > AT_CLEANUP > > ]) > > + > > +# Duplicated datapaths shouldn't be created, but in case it is created because > > +# of bug or dirty data, it should be properly deleted instead of causing > > +# permanent failure in northd. > > +OVN_FOR_EACH_NORTHD([ > > +AT_SETUP([handling duplicated datapaths]) > > +ovn_start > > + > > +check ovn-nbctl --wait=sb ls-add ls1 > > +ls1_uuid=$(fetch_column nb:Logical_Switch _uuid) > > + > > +# create a duplicated sb datapath (and an IP_Mulicast record that references > > +# it) on purpose. > > +AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123 -- create IP_Multicast datapath=@dp], [0], [ignore]) > > + > > +# northd should delete one of the datapaths in the end > > +wait_row_count Datapath_Binding 1 > > + > > +AT_CLEANUP > > +]) > > -- > > 2.30.2 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
diff --git a/northd/northd.c b/northd/northd.c index f71121486..da4f9cd14 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -1045,7 +1045,12 @@ ovn_datapath_from_sbrec(struct hmap *datapaths, !smap_get_uuid(&sb->external_ids, "logical-router", &key)) { return NULL; } - return ovn_datapath_find(datapaths, &key); + struct ovn_datapath *od = ovn_datapath_find(datapaths, &key); + if (od && (od->sb == sb)) { + return od; + } + + return NULL; } static bool diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at index 544820764..e2b9924b6 100644 --- a/tests/ovn-northd.at +++ b/tests/ovn-northd.at @@ -5446,3 +5446,23 @@ wait_row_count Port_binding 1 logical-port=S1-vm2 requested_chassis=$ch2_uuid AT_CLEANUP ]) + +# Duplicated datapaths shouldn't be created, but in case it is created because +# of bug or dirty data, it should be properly deleted instead of causing +# permanent failure in northd. +OVN_FOR_EACH_NORTHD([ +AT_SETUP([handling duplicated datapaths]) +ovn_start + +check ovn-nbctl --wait=sb ls-add ls1 +ls1_uuid=$(fetch_column nb:Logical_Switch _uuid) + +# create a duplicated sb datapath (and an IP_Mulicast record that references +# it) on purpose. +AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123 -- create IP_Multicast datapath=@dp], [0], [ignore]) + +# northd should delete one of the datapaths in the end +wait_row_count Datapath_Binding 1 + +AT_CLEANUP +])
Duplicated datapaths shouldn't be created in the first place, but in case it is created because of either bug or dirty data, it should be properly deleted instead of causing permanent failure in northd. Currently, when there is a duplicated datapath record and a ip_multicast record referencing it created, northd tries to delete the duplicated datapath but the transaction fails due to schema constraint: transaction error: {"details":"Deletion of 1 weak reference(s) to deleted (or never-existing) rows from column \"datapath\" in \"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this column to become empty, but constraints on this column disallow an empty column.","error":"constraint violation"} Northd would be blocked forever until manual deletion of the ip_multicast record. The problem is that in the same transaction only datapath is deleted but not the ip_multicast record that references the datapath. In build_ip_mcast() there is logic to delete ip_multicast records with non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find the datapath and regard it as valid whenever the external-ids:logical-switch/router matches. This patch fixes that by comparing the sbrec of the datapath afterwards, and regards it as valid only if the sbrec matches, too. This way, both ip_multicast and datapath records are deleted in the same transaction and won't cause any trouble. Reported-by: Vladislav Odintsov <odivlad@gmail.com> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html Signed-off-by: Han Zhou <hzhou@ovn.org> --- northd/northd.c | 7 ++++++- tests/ovn-northd.at | 20 ++++++++++++++++++++ 2 files changed, 26 insertions(+), 1 deletion(-)