diff mbox series

[ovs-dev] northd.c: Fix north blocking on deleting duplicated SB datapath.

Message ID 20211026004218.2206164-1-hzhou@ovn.org
State Accepted
Headers show
Series [ovs-dev] northd.c: Fix north blocking on deleting duplicated SB datapath. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Han Zhou Oct. 26, 2021, 12:42 a.m. UTC
Duplicated datapaths shouldn't be created in the first place, but in
case it is created because of either bug or dirty data, it should be
properly deleted instead of causing permanent failure in northd.

Currently, when there is a duplicated datapath record and a ip_multicast
record referencing it created, northd tries to delete the duplicated
datapath but the transaction fails due to schema constraint:

transaction error: {"details":"Deletion of 1 weak reference(s) to deleted (or never-existing) rows from column \"datapath\" in \"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this column to become empty, but constraints on this column disallow an empty column.","error":"constraint violation"}

Northd would be blocked forever until manual deletion of the
ip_multicast record.

The problem is that in the same transaction only datapath is deleted but
not the ip_multicast record that references the datapath. In
build_ip_mcast() there is logic to delete ip_multicast records with
non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find
the datapath and regard it as valid whenever the
external-ids:logical-switch/router matches. This patch fixes that by
comparing the sbrec of the datapath afterwards, and regards it as valid
only if the sbrec matches, too. This way, both ip_multicast and datapath
records are deleted in the same transaction and won't cause any trouble.

Reported-by: Vladislav Odintsov <odivlad@gmail.com>
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 northd/northd.c     |  7 ++++++-
 tests/ovn-northd.at | 20 ++++++++++++++++++++
 2 files changed, 26 insertions(+), 1 deletion(-)

Comments

Numan Siddique Oct. 26, 2021, 8:31 p.m. UTC | #1
On Mon, Oct 25, 2021 at 8:42 PM Han Zhou <hzhou@ovn.org> wrote:
>
> Duplicated datapaths shouldn't be created in the first place, but in
> case it is created because of either bug or dirty data, it should be
> properly deleted instead of causing permanent failure in northd.
>
> Currently, when there is a duplicated datapath record and a ip_multicast
> record referencing it created, northd tries to delete the duplicated
> datapath but the transaction fails due to schema constraint:
>
> transaction error: {"details":"Deletion of 1 weak reference(s) to deleted (or never-existing) rows from column \"datapath\" in \"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this column to become empty, but constraints on this column disallow an empty column.","error":"constraint violation"}
>
> Northd would be blocked forever until manual deletion of the
> ip_multicast record.
>
> The problem is that in the same transaction only datapath is deleted but
> not the ip_multicast record that references the datapath. In
> build_ip_mcast() there is logic to delete ip_multicast records with
> non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find
> the datapath and regard it as valid whenever the
> external-ids:logical-switch/router matches. This patch fixes that by
> comparing the sbrec of the datapath afterwards, and regards it as valid
> only if the sbrec matches, too. This way, both ip_multicast and datapath
> records are deleted in the same transaction and won't cause any trouble.
>
> Reported-by: Vladislav Odintsov <odivlad@gmail.com>
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html
> Signed-off-by: Han Zhou <hzhou@ovn.org>

Acked-by: Numan Siddique <numans@ovn.org>

Numan

> ---
>  northd/northd.c     |  7 ++++++-
>  tests/ovn-northd.at | 20 ++++++++++++++++++++
>  2 files changed, 26 insertions(+), 1 deletion(-)
>
> diff --git a/northd/northd.c b/northd/northd.c
> index f71121486..da4f9cd14 100644
> --- a/northd/northd.c
> +++ b/northd/northd.c
> @@ -1045,7 +1045,12 @@ ovn_datapath_from_sbrec(struct hmap *datapaths,
>          !smap_get_uuid(&sb->external_ids, "logical-router", &key)) {
>          return NULL;
>      }
> -    return ovn_datapath_find(datapaths, &key);
> +    struct ovn_datapath *od = ovn_datapath_find(datapaths, &key);
> +    if (od && (od->sb == sb)) {
> +        return od;
> +    }
> +
> +    return NULL;
>  }
>
>  static bool
> diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> index 544820764..e2b9924b6 100644
> --- a/tests/ovn-northd.at
> +++ b/tests/ovn-northd.at
> @@ -5446,3 +5446,23 @@ wait_row_count Port_binding 1 logical-port=S1-vm2 requested_chassis=$ch2_uuid
>
>  AT_CLEANUP
>  ])
> +
> +# Duplicated datapaths shouldn't be created, but in case it is created because
> +# of bug or dirty data, it should be properly deleted instead of causing
> +# permanent failure in northd.
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([handling duplicated datapaths])
> +ovn_start
> +
> +check ovn-nbctl --wait=sb ls-add ls1
> +ls1_uuid=$(fetch_column nb:Logical_Switch _uuid)
> +
> +# create a duplicated sb datapath (and an IP_Mulicast record that references
> +# it) on purpose.
> +AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123 -- create IP_Multicast datapath=@dp], [0], [ignore])
> +
> +# northd should delete one of the datapaths in the end
> +wait_row_count Datapath_Binding 1
> +
> +AT_CLEANUP
> +])
> --
> 2.30.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou Oct. 26, 2021, 10:02 p.m. UTC | #2
On Tue, Oct 26, 2021 at 1:31 PM Numan Siddique <numans@ovn.org> wrote:
>
> On Mon, Oct 25, 2021 at 8:42 PM Han Zhou <hzhou@ovn.org> wrote:
> >
> > Duplicated datapaths shouldn't be created in the first place, but in
> > case it is created because of either bug or dirty data, it should be
> > properly deleted instead of causing permanent failure in northd.
> >
> > Currently, when there is a duplicated datapath record and a ip_multicast
> > record referencing it created, northd tries to delete the duplicated
> > datapath but the transaction fails due to schema constraint:
> >
> > transaction error: {"details":"Deletion of 1 weak reference(s) to
deleted (or never-existing) rows from column \"datapath\" in
\"IP_Multicast\" row 72bac803-e484-4358-9e48-11911c8aa16f caused this
column to become empty, but constraints on this column disallow an empty
column.","error":"constraint violation"}
> >
> > Northd would be blocked forever until manual deletion of the
> > ip_multicast record.
> >
> > The problem is that in the same transaction only datapath is deleted but
> > not the ip_multicast record that references the datapath. In
> > build_ip_mcast() there is logic to delete ip_multicast records with
> > non-exist datapath, but the logic of ovn_datapath_from_sbrec() can find
> > the datapath and regard it as valid whenever the
> > external-ids:logical-switch/router matches. This patch fixes that by
> > comparing the sbrec of the datapath afterwards, and regards it as valid
> > only if the sbrec matches, too. This way, both ip_multicast and datapath
> > records are deleted in the same transaction and won't cause any trouble.
> >
> > Reported-by: Vladislav Odintsov <odivlad@gmail.com>
> > Reported-at:
https://mail.openvswitch.org/pipermail/ovs-dev/2021-October/388269.html
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
>
> Acked-by: Numan Siddique <numans@ovn.org>
>
> Numan
>

Thanks Numan. I corrected a typo in the title (s/north/northd) and applied
to the main, and backported up to branch-20.12.

> > ---
> >  northd/northd.c     |  7 ++++++-
> >  tests/ovn-northd.at | 20 ++++++++++++++++++++
> >  2 files changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/northd/northd.c b/northd/northd.c
> > index f71121486..da4f9cd14 100644
> > --- a/northd/northd.c
> > +++ b/northd/northd.c
> > @@ -1045,7 +1045,12 @@ ovn_datapath_from_sbrec(struct hmap *datapaths,
> >          !smap_get_uuid(&sb->external_ids, "logical-router", &key)) {
> >          return NULL;
> >      }
> > -    return ovn_datapath_find(datapaths, &key);
> > +    struct ovn_datapath *od = ovn_datapath_find(datapaths, &key);
> > +    if (od && (od->sb == sb)) {
> > +        return od;
> > +    }
> > +
> > +    return NULL;
> >  }
> >
> >  static bool
> > diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
> > index 544820764..e2b9924b6 100644
> > --- a/tests/ovn-northd.at
> > +++ b/tests/ovn-northd.at
> > @@ -5446,3 +5446,23 @@ wait_row_count Port_binding 1
logical-port=S1-vm2 requested_chassis=$ch2_uuid
> >
> >  AT_CLEANUP
> >  ])
> > +
> > +# Duplicated datapaths shouldn't be created, but in case it is created
because
> > +# of bug or dirty data, it should be properly deleted instead of
causing
> > +# permanent failure in northd.
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([handling duplicated datapaths])
> > +ovn_start
> > +
> > +check ovn-nbctl --wait=sb ls-add ls1
> > +ls1_uuid=$(fetch_column nb:Logical_Switch _uuid)
> > +
> > +# create a duplicated sb datapath (and an IP_Mulicast record that
references
> > +# it) on purpose.
> > +AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding
external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123
-- create IP_Multicast datapath=@dp], [0], [ignore])
> > +
> > +# northd should delete one of the datapaths in the end
> > +wait_row_count Datapath_Binding 1
> > +
> > +AT_CLEANUP
> > +])
> > --
> > 2.30.2
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
diff mbox series

Patch

diff --git a/northd/northd.c b/northd/northd.c
index f71121486..da4f9cd14 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -1045,7 +1045,12 @@  ovn_datapath_from_sbrec(struct hmap *datapaths,
         !smap_get_uuid(&sb->external_ids, "logical-router", &key)) {
         return NULL;
     }
-    return ovn_datapath_find(datapaths, &key);
+    struct ovn_datapath *od = ovn_datapath_find(datapaths, &key);
+    if (od && (od->sb == sb)) {
+        return od;
+    }
+
+    return NULL;
 }
 
 static bool
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 544820764..e2b9924b6 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -5446,3 +5446,23 @@  wait_row_count Port_binding 1 logical-port=S1-vm2 requested_chassis=$ch2_uuid
 
 AT_CLEANUP
 ])
+
+# Duplicated datapaths shouldn't be created, but in case it is created because
+# of bug or dirty data, it should be properly deleted instead of causing
+# permanent failure in northd.
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([handling duplicated datapaths])
+ovn_start
+
+check ovn-nbctl --wait=sb ls-add ls1
+ls1_uuid=$(fetch_column nb:Logical_Switch _uuid)
+
+# create a duplicated sb datapath (and an IP_Mulicast record that references
+# it) on purpose.
+AT_CHECK([ovn-sbctl --id=@dp create Datapath_Binding external_ids:logical-switch=$ls1_uuid external_ids:name=ls1 tunnel_key=123 -- create IP_Multicast datapath=@dp], [0], [ignore])
+
+# northd should delete one of the datapaths in the end
+wait_row_count Datapath_Binding 1
+
+AT_CLEANUP
+])