diff mbox series

[ovs-dev] ofproto: Use xlate map for uuid lookups

Message ID 20220223184812.2413004-1-grive@u256.net
State Accepted
Commit f77dbc1eb2da2523625cd36922c6fccfcb3f3eb7
Headers show
Series [ovs-dev] ofproto: Use xlate map for uuid lookups | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Gaetan Rivet Feb. 23, 2022, 6:48 p.m. UTC
The ofproto map 'all_ofproto_dpifs_by_uuid' does not support
concurrent accesses. It is however read by upcall handler threads
and written by the main thread at the same time.

Additionally, handler threads will change the ams_seq while
an ofproto is being destroyed, triggering crashes with the
following backtrace:

(gdb) bt
  hmap_next (hmap.h:398)
  seq_wake_waiters (seq.c:326)
  seq_change_protected (seq.c:134)
  seq_change (seq.c:144)
  ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
  process_upcall (ofproto_dpif_upcall.c:1782)
  recv_upcalls (ofproto_dpif_upcall.c:1026)
  udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
  ovsthread_wrapper (ovs_thread.c:734)

To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
Instead, another map already storing ofprotos in xlate can be used.

During an ofproto destruction, its reference is removed from the current
xlate xcfg. Such change is committed only after all threads have quiesced
at least once during xlate_txn_commit(). This wait ensures that the
removal is seen by all threads, rendering impossible for a thread to
still hold a reference while the destruction proceeds.

Furthermore, the xlate maps are copied during updates instead of
being written in place. It is thus correct to read xcfg->xbridges while
inserting or removing from new_xcfg->xbridges.

Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges,
it is important to use a high level of entropy. As it used the ofproto pointer
hashed, fewer bits were random compared to the uuid key used in
'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
in xbridges as well, improving entropy.

Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
---

Following the discussion on the fix
https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunjian@huawei.com/

I tested it with Peng's ofproto refcount patch in tree:
https://patchwork.ozlabs.org/project/openvswitch/patch/20220219032607.15757-1-hepeng.0320@bytedance.com/

CI result: https://github.com/grivet/ovs/actions/runs/1889083195

 ofproto/ofproto-dpif-xlate.c | 21 +++++++++++++++++++--
 ofproto/ofproto-dpif-xlate.h |  1 +
 ofproto/ofproto-dpif.c       | 19 +------------------
 3 files changed, 21 insertions(+), 20 deletions(-)

Comments

0-day Robot Feb. 23, 2022, 7:58 p.m. UTC | #1
Bleep bloop.  Greetings Gaetan Rivet, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Unexpected sign-offs from developers who are not authors or co-authors or committers: Yunjian Wang <wangyunjian@huawei.com>
Lines checked: 166, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Alin-Gabriel Serdean Feb. 24, 2022, 12:17 p.m. UTC | #2
I enabled ASan on MSVC
(https://devblogs.microsoft.com/cppblog/addresssanitizer-asan-for-windows-wi
th-msvc/)
and this patch alleviates the issues found by the CI and my local testing.

Tested-by: Alin-Gabriel Serdean <aserdean@ovn.org>
Acked-by: Alin-Gabriel Serdean <aserdean@ovn.org>

-----Original Message-----
From: dev <ovs-dev-bounces@openvswitch.org> On Behalf Of Gaetan Rivet
Sent: Wednesday, February 23, 2022 8:48 PM
To: ovs-dev@openvswitch.org
Subject: [ovs-dev] [PATCH] ofproto: Use xlate map for uuid lookups

The ofproto map 'all_ofproto_dpifs_by_uuid' does not support concurrent
accesses. It is however read by upcall handler threads and written by the
main thread at the same time.

Additionally, handler threads will change the ams_seq while an ofproto is
being destroyed, triggering crashes with the following backtrace:

(gdb) bt
  hmap_next (hmap.h:398)
  seq_wake_waiters (seq.c:326)
  seq_change_protected (seq.c:134)
  seq_change (seq.c:144)
  ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
  process_upcall (ofproto_dpif_upcall.c:1782)
  recv_upcalls (ofproto_dpif_upcall.c:1026)
  udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
  ovsthread_wrapper (ovs_thread.c:734)

To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
Instead, another map already storing ofprotos in xlate can be used.

During an ofproto destruction, its reference is removed from the current
xlate xcfg. Such change is committed only after all threads have quiesced at
least once during xlate_txn_commit(). This wait ensures that the removal is
seen by all threads, rendering impossible for a thread to still hold a
reference while the destruction proceeds.

Furthermore, the xlate maps are copied during updates instead of being
written in place. It is thus correct to read xcfg->xbridges while inserting
or removing from new_xcfg->xbridges.

Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges, it
is important to use a high level of entropy. As it used the ofproto pointer
hashed, fewer bits were random compared to the uuid key used in
'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
in xbridges as well, improving entropy.

Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user
action cookie.")
Suggested-by: Adrian Moreno <amorenoz@redhat.com>
Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
Signed-off-by: Gaetan Rivet <grive@u256.net>
---
Adrian Moreno Feb. 28, 2022, 10:04 a.m. UTC | #3
On 2/23/22 19:48, Gaetan Rivet wrote:
> The ofproto map 'all_ofproto_dpifs_by_uuid' does not support
> concurrent accesses. It is however read by upcall handler threads
> and written by the main thread at the same time.
> 
> Additionally, handler threads will change the ams_seq while
> an ofproto is being destroyed, triggering crashes with the
> following backtrace:
> 
> (gdb) bt
>    hmap_next (hmap.h:398)
>    seq_wake_waiters (seq.c:326)
>    seq_change_protected (seq.c:134)
>    seq_change (seq.c:144)
>    ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
>    process_upcall (ofproto_dpif_upcall.c:1782)
>    recv_upcalls (ofproto_dpif_upcall.c:1026)
>    udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
>    ovsthread_wrapper (ovs_thread.c:734)
> 
> To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
> Instead, another map already storing ofprotos in xlate can be used.
> 
> During an ofproto destruction, its reference is removed from the current
> xlate xcfg. Such change is committed only after all threads have quiesced
> at least once during xlate_txn_commit(). This wait ensures that the
> removal is seen by all threads, rendering impossible for a thread to
> still hold a reference while the destruction proceeds.
> 
> Furthermore, the xlate maps are copied during updates instead of
> being written in place. It is thus correct to read xcfg->xbridges while
> inserting or removing from new_xcfg->xbridges.
> 
> Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges,
> it is important to use a high level of entropy. As it used the ofproto pointer
> hashed, fewer bits were random compared to the uuid key used in
> 'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
> in xbridges as well, improving entropy.
> 
> Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
> Suggested-by: Adrian Moreno <amorenoz@redhat.com>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> Signed-off-by: Gaetan Rivet <grive@u256.net>
> ---
> 
> Following the discussion on the fix
> https://patchwork.ozlabs.org/project/openvswitch/patch/1638530715-44436-1-git-send-email-wangyunjian@huawei.com/
> 
> I tested it with Peng's ofproto refcount patch in tree:
> https://patchwork.ozlabs.org/project/openvswitch/patch/20220219032607.15757-1-hepeng.0320@bytedance.com/
> 
> CI result: https://github.com/grivet/ovs/actions/runs/1889083195
> 
>   ofproto/ofproto-dpif-xlate.c | 21 +++++++++++++++++++--
>   ofproto/ofproto-dpif-xlate.h |  1 +
>   ofproto/ofproto-dpif.c       | 19 +------------------
>   3 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
> index 578cbfe58..4a7388071 100644
> --- a/ofproto/ofproto-dpif-xlate.c
> +++ b/ofproto/ofproto-dpif-xlate.c
> @@ -865,7 +865,7 @@ xlate_xbridge_init(struct xlate_cfg *xcfg, struct xbridge *xbridge)
>       ovs_list_init(&xbridge->xbundles);
>       hmap_init(&xbridge->xports);
>       hmap_insert(&xcfg->xbridges, &xbridge->hmap_node,
> -                hash_pointer(xbridge->ofproto, 0));
> +                uuid_hash(&xbridge->ofproto->uuid));
>   }
>   
>   static void
> @@ -1639,7 +1639,7 @@ xbridge_lookup(struct xlate_cfg *xcfg, const struct ofproto_dpif *ofproto)
>   
>       xbridges = &xcfg->xbridges;
>   
> -    HMAP_FOR_EACH_IN_BUCKET (xbridge, hmap_node, hash_pointer(ofproto, 0),
> +    HMAP_FOR_EACH_IN_BUCKET (xbridge, hmap_node, uuid_hash(&ofproto->uuid),
>                                xbridges) {
>           if (xbridge->ofproto == ofproto) {
>               return xbridge;
> @@ -1661,6 +1661,23 @@ xbridge_lookup_by_uuid(struct xlate_cfg *xcfg, const struct uuid *uuid)
>       return NULL;
>   }
>   
> +struct ofproto_dpif *
> +xlate_ofproto_lookup(const struct uuid *uuid)
> +{
> +    struct xlate_cfg *xcfg = ovsrcu_get(struct xlate_cfg *, &xcfgp);
> +    struct xbridge *xbridge;
> +
> +    if (!xcfg) {
> +        return NULL;
> +    }
> +
> +    xbridge = xbridge_lookup_by_uuid(xcfg, uuid);
> +    if (xbridge != NULL) {
> +        return xbridge->ofproto;
> +    }
> +    return NULL;
> +}
> +
>   static struct xbundle *
>   xbundle_lookup(struct xlate_cfg *xcfg, const struct ofbundle *ofbundle)
>   {
> diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
> index 851088d79..2ba90e999 100644
> --- a/ofproto/ofproto-dpif-xlate.h
> +++ b/ofproto/ofproto-dpif-xlate.h
> @@ -176,6 +176,7 @@ void xlate_ofproto_set(struct ofproto_dpif *, const char *name, struct dpif *,
>                          bool forward_bpdu, bool has_in_band,
>                          const struct dpif_backer_support *support);
>   void xlate_remove_ofproto(struct ofproto_dpif *);
> +struct ofproto_dpif *xlate_ofproto_lookup(const struct uuid *uuid);
>   
>   void xlate_bundle_set(struct ofproto_dpif *, struct ofbundle *,
>                         const char *name, enum port_vlan_mode,
> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
> index 8143dd965..7b4a1b3d8 100644
> --- a/ofproto/ofproto-dpif.c
> +++ b/ofproto/ofproto-dpif.c
> @@ -215,10 +215,6 @@ struct shash all_dpif_backers = SHASH_INITIALIZER(&all_dpif_backers);
>   static struct hmap all_ofproto_dpifs_by_name =
>                             HMAP_INITIALIZER(&all_ofproto_dpifs_by_name);
>   
> -/* All existing ofproto_dpif instances, indexed by ->uuid. */
> -static struct hmap all_ofproto_dpifs_by_uuid =
> -                          HMAP_INITIALIZER(&all_ofproto_dpifs_by_uuid);
> -
>   static bool ofproto_use_tnl_push_pop = true;
>   static void ofproto_unixctl_init(void);
>   static void ct_zone_config_init(struct dpif_backer *backer);
> @@ -1720,9 +1716,6 @@ construct(struct ofproto *ofproto_)
>       hmap_insert(&all_ofproto_dpifs_by_name,
>                   &ofproto->all_ofproto_dpifs_by_name_node,
>                   hash_string(ofproto->up.name, 0));
> -    hmap_insert(&all_ofproto_dpifs_by_uuid,
> -                &ofproto->all_ofproto_dpifs_by_uuid_node,
> -                uuid_hash(&ofproto->uuid));
>       memset(&ofproto->stats, 0, sizeof ofproto->stats);
>   
>       ofproto_init_tables(ofproto_, N_TABLES);
> @@ -1820,8 +1813,6 @@ destruct(struct ofproto *ofproto_, bool del)
>   
>       hmap_remove(&all_ofproto_dpifs_by_name,
>                   &ofproto->all_ofproto_dpifs_by_name_node);
> -    hmap_remove(&all_ofproto_dpifs_by_uuid,
> -                &ofproto->all_ofproto_dpifs_by_uuid_node);
>   
>       OFPROTO_FOR_EACH_TABLE (table, &ofproto->up) {
>           CLS_FOR_EACH (rule, up.cr, &table->cls) {
> @@ -5818,15 +5809,7 @@ ofproto_dpif_lookup_by_name(const char *name)
>   struct ofproto_dpif *
>   ofproto_dpif_lookup_by_uuid(const struct uuid *uuid)
>   {
> -    struct ofproto_dpif *ofproto;
> -
> -    HMAP_FOR_EACH_WITH_HASH (ofproto, all_ofproto_dpifs_by_uuid_node,
> -                             uuid_hash(uuid), &all_ofproto_dpifs_by_uuid) {
> -        if (uuid_equals(&ofproto->uuid, uuid)) {
> -            return ofproto;
> -        }
> -    }
> -    return NULL;
> +    return xlate_ofproto_lookup(uuid);
>   }
>   
>   static void

Hi Gaëtan.

Thanks for working on the patch. It perferctly reflects what we've discussed.

Acked-by: Adrian Moreno <amorenoz@redhat.com>
Ilya Maximets March 7, 2022, 7:14 p.m. UTC | #4
On 2/23/22 19:48, Gaetan Rivet wrote:
> The ofproto map 'all_ofproto_dpifs_by_uuid' does not support
> concurrent accesses. It is however read by upcall handler threads
> and written by the main thread at the same time.
> 
> Additionally, handler threads will change the ams_seq while
> an ofproto is being destroyed, triggering crashes with the
> following backtrace:
> 
> (gdb) bt
>   hmap_next (hmap.h:398)
>   seq_wake_waiters (seq.c:326)
>   seq_change_protected (seq.c:134)
>   seq_change (seq.c:144)
>   ofproto_dpif_send_async_msg (ofproto_dpif.c:263)
>   process_upcall (ofproto_dpif_upcall.c:1782)
>   recv_upcalls (ofproto_dpif_upcall.c:1026)
>   udpif_upcall_handler (ofproto/ofproto_dpif_upcall.c:945)
>   ovsthread_wrapper (ovs_thread.c:734)
> 
> To solve both issues, remove the 'all_ofproto_dpifs_by_uuid'.
> Instead, another map already storing ofprotos in xlate can be used.
> 
> During an ofproto destruction, its reference is removed from the current
> xlate xcfg. Such change is committed only after all threads have quiesced
> at least once during xlate_txn_commit(). This wait ensures that the
> removal is seen by all threads, rendering impossible for a thread to
> still hold a reference while the destruction proceeds.
> 
> Furthermore, the xlate maps are copied during updates instead of
> being written in place. It is thus correct to read xcfg->xbridges while
> inserting or removing from new_xcfg->xbridges.
> 
> Finally, now that ofproto_dpifs lookups are done through xcfg->xbridges,
> it is important to use a high level of entropy. As it used the ofproto pointer
> hashed, fewer bits were random compared to the uuid key used in
> 'all_ofproto_dpifs_by_uuid'. To solve this, use the ofproto uuid as the key
> in xbridges as well, improving entropy.
> 
> Fixes: fcb9579be3c7 ("ofproto: Add 'ofproto_uuid' and 'ofp_in_port' to user action cookie.")
> Suggested-by: Adrian Moreno <amorenoz@redhat.com>
> Signed-off-by: Yunjian Wang <wangyunjian@huawei.com>
> Signed-off-by: Gaetan Rivet <grive@u256.net>
> ---

Thanks, everyone!  Applied and backported down to 2.13.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 578cbfe58..4a7388071 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -865,7 +865,7 @@  xlate_xbridge_init(struct xlate_cfg *xcfg, struct xbridge *xbridge)
     ovs_list_init(&xbridge->xbundles);
     hmap_init(&xbridge->xports);
     hmap_insert(&xcfg->xbridges, &xbridge->hmap_node,
-                hash_pointer(xbridge->ofproto, 0));
+                uuid_hash(&xbridge->ofproto->uuid));
 }
 
 static void
@@ -1639,7 +1639,7 @@  xbridge_lookup(struct xlate_cfg *xcfg, const struct ofproto_dpif *ofproto)
 
     xbridges = &xcfg->xbridges;
 
-    HMAP_FOR_EACH_IN_BUCKET (xbridge, hmap_node, hash_pointer(ofproto, 0),
+    HMAP_FOR_EACH_IN_BUCKET (xbridge, hmap_node, uuid_hash(&ofproto->uuid),
                              xbridges) {
         if (xbridge->ofproto == ofproto) {
             return xbridge;
@@ -1661,6 +1661,23 @@  xbridge_lookup_by_uuid(struct xlate_cfg *xcfg, const struct uuid *uuid)
     return NULL;
 }
 
+struct ofproto_dpif *
+xlate_ofproto_lookup(const struct uuid *uuid)
+{
+    struct xlate_cfg *xcfg = ovsrcu_get(struct xlate_cfg *, &xcfgp);
+    struct xbridge *xbridge;
+
+    if (!xcfg) {
+        return NULL;
+    }
+
+    xbridge = xbridge_lookup_by_uuid(xcfg, uuid);
+    if (xbridge != NULL) {
+        return xbridge->ofproto;
+    }
+    return NULL;
+}
+
 static struct xbundle *
 xbundle_lookup(struct xlate_cfg *xcfg, const struct ofbundle *ofbundle)
 {
diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
index 851088d79..2ba90e999 100644
--- a/ofproto/ofproto-dpif-xlate.h
+++ b/ofproto/ofproto-dpif-xlate.h
@@ -176,6 +176,7 @@  void xlate_ofproto_set(struct ofproto_dpif *, const char *name, struct dpif *,
                        bool forward_bpdu, bool has_in_band,
                        const struct dpif_backer_support *support);
 void xlate_remove_ofproto(struct ofproto_dpif *);
+struct ofproto_dpif *xlate_ofproto_lookup(const struct uuid *uuid);
 
 void xlate_bundle_set(struct ofproto_dpif *, struct ofbundle *,
                       const char *name, enum port_vlan_mode,
diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c
index 8143dd965..7b4a1b3d8 100644
--- a/ofproto/ofproto-dpif.c
+++ b/ofproto/ofproto-dpif.c
@@ -215,10 +215,6 @@  struct shash all_dpif_backers = SHASH_INITIALIZER(&all_dpif_backers);
 static struct hmap all_ofproto_dpifs_by_name =
                           HMAP_INITIALIZER(&all_ofproto_dpifs_by_name);
 
-/* All existing ofproto_dpif instances, indexed by ->uuid. */
-static struct hmap all_ofproto_dpifs_by_uuid =
-                          HMAP_INITIALIZER(&all_ofproto_dpifs_by_uuid);
-
 static bool ofproto_use_tnl_push_pop = true;
 static void ofproto_unixctl_init(void);
 static void ct_zone_config_init(struct dpif_backer *backer);
@@ -1720,9 +1716,6 @@  construct(struct ofproto *ofproto_)
     hmap_insert(&all_ofproto_dpifs_by_name,
                 &ofproto->all_ofproto_dpifs_by_name_node,
                 hash_string(ofproto->up.name, 0));
-    hmap_insert(&all_ofproto_dpifs_by_uuid,
-                &ofproto->all_ofproto_dpifs_by_uuid_node,
-                uuid_hash(&ofproto->uuid));
     memset(&ofproto->stats, 0, sizeof ofproto->stats);
 
     ofproto_init_tables(ofproto_, N_TABLES);
@@ -1820,8 +1813,6 @@  destruct(struct ofproto *ofproto_, bool del)
 
     hmap_remove(&all_ofproto_dpifs_by_name,
                 &ofproto->all_ofproto_dpifs_by_name_node);
-    hmap_remove(&all_ofproto_dpifs_by_uuid,
-                &ofproto->all_ofproto_dpifs_by_uuid_node);
 
     OFPROTO_FOR_EACH_TABLE (table, &ofproto->up) {
         CLS_FOR_EACH (rule, up.cr, &table->cls) {
@@ -5818,15 +5809,7 @@  ofproto_dpif_lookup_by_name(const char *name)
 struct ofproto_dpif *
 ofproto_dpif_lookup_by_uuid(const struct uuid *uuid)
 {
-    struct ofproto_dpif *ofproto;
-
-    HMAP_FOR_EACH_WITH_HASH (ofproto, all_ofproto_dpifs_by_uuid_node,
-                             uuid_hash(uuid), &all_ofproto_dpifs_by_uuid) {
-        if (uuid_equals(&ofproto->uuid, uuid)) {
-            return ofproto;
-        }
-    }
-    return NULL;
+    return xlate_ofproto_lookup(uuid);
 }
 
 static void