diff mbox series

[ovs-dev,v2] conntrack: Add rcu support.

Message ID 1537218349-76439-1-git-send-email-dlu998@gmail.com
State Superseded
Headers show
Series [ovs-dev,v2] conntrack: Add rcu support. | expand

Commit Message

Darrell Ball Sept. 17, 2018, 9:05 p.m. UTC
Add in rcu support for conntrack. At the same time, the array of
hmaps is replaced by a cmap.  Using a single map also simplifies
the handling of nat and allows the removal of the nat_conn map.
There is still some cleanup to do and a few things to check. I'll
probably split out into several patches later

Signed-off-by: Darrell Ball <dlu998@gmail.com>
---

v1->v2: Some synchronization changes; new expiry lists lock.
        Fix the 4 Travis builds that were failing.

 lib/conntrack-icmp.c    |  20 +-
 lib/conntrack-other.c   |  14 +-
 lib/conntrack-private.h | 129 ++++++--
 lib/conntrack-tcp.c     |  33 +-
 lib/conntrack.c         | 853 ++++++++++++++++++------------------------------
 lib/conntrack.h         |  86 +----
 6 files changed, 475 insertions(+), 660 deletions(-)

Comments

Ben Pfaff Sept. 18, 2018, 4:06 a.m. UTC | #1
On Mon, Sep 17, 2018 at 02:05:49PM -0700, Darrell Ball wrote:
> Add in rcu support for conntrack. At the same time, the array of
> hmaps is replaced by a cmap.  Using a single map also simplifies
> the handling of nat and allows the removal of the nat_conn map.
> There is still some cleanup to do and a few things to check. I'll
> probably split out into several patches later
> 
> Signed-off-by: Darrell Ball <dlu998@gmail.com>
> ---
> 
> v1->v2: Some synchronization changes; new expiry lists lock.
>         Fix the 4 Travis builds that were failing.

Would you mind dropping the ct-specific locking constructs too?  They do
not add any additional abstraction and they will make it harder to debug
because the source file name and line number recorded in the ovs_mutex
will always point to the ct-specific function, not to their callers.

I haven't fully reviewed this (not sure I'm the right person).

Thanks,

Ben.
Darrell Ball Sept. 18, 2018, 7:52 a.m. UTC | #2
On Mon, Sep 17, 2018 at 9:06 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Mon, Sep 17, 2018 at 02:05:49PM -0700, Darrell Ball wrote:
> > Add in rcu support for conntrack. At the same time, the array of
> > hmaps is replaced by a cmap.  Using a single map also simplifies
> > the handling of nat and allows the removal of the nat_conn map.
> > There is still some cleanup to do and a few things to check. I'll
> > probably split out into several patches later
> >
> > Signed-off-by: Darrell Ball <dlu998@gmail.com>
> > ---
> >
> > v1->v2: Some synchronization changes; new expiry lists lock.
> >         Fix the 4 Travis builds that were failing.
>
> Would you mind dropping the ct-specific locking constructs too?  They do
> not add any additional abstraction and they will make it harder to debug
> because the source file name and line number recorded in the ovs_mutex
> will always point to the ct-specific function, not to their callers.
>

It is a bit of a tradeoff, but sure it is probably better to remove the
wrappers.


>
> I haven't fully reviewed this (not sure I'm the right person).
>

Since you wrote the rcu code, you would be a good candidate :-)

I did some more cleanups; V3 incoming.

Thanks Darrell


>
> Thanks,
>
> Ben.
>
Darrell Ball Sept. 18, 2018, 7:10 p.m. UTC | #3
On Tue, Sep 18, 2018 at 12:52 AM, Darrell Ball <dlu998@gmail.com> wrote:

>
>
> On Mon, Sep 17, 2018 at 9:06 PM, Ben Pfaff <blp@ovn.org> wrote:
>
>> On Mon, Sep 17, 2018 at 02:05:49PM -0700, Darrell Ball wrote:
>> > Add in rcu support for conntrack. At the same time, the array of
>> > hmaps is replaced by a cmap.  Using a single map also simplifies
>> > the handling of nat and allows the removal of the nat_conn map.
>> > There is still some cleanup to do and a few things to check. I'll
>> > probably split out into several patches later
>> >
>> > Signed-off-by: Darrell Ball <dlu998@gmail.com>
>> > ---
>> >
>> > v1->v2: Some synchronization changes; new expiry lists lock.
>> >         Fix the 4 Travis builds that were failing.
>>
>> Would you mind dropping the ct-specific locking constructs too?  They do
>> not add any additional abstraction and they will make it harder to debug
>> because the source file name and line number recorded in the ovs_mutex
>> will always point to the ct-specific function, not to their callers.
>>
>
> It is a bit of a tradeoff, but sure it is probably better to remove the
> wrappers.
>
>
>>
>> I haven't fully reviewed this (not sure I'm the right person).
>>
>
> Since you wrote the rcu code, you would be a good candidate :-)
>
> I did some more cleanups; V3 incoming.
>


V3 will be superceded.
I was able to later optimize by removing the expiry list lock and did
some further cleanups, so I'll send a V4.

Thanks Darrell



>
> Thanks Darrell
>
>
>>
>> Thanks,
>>
>> Ben.
>>
>
>
diff mbox series

Patch

diff --git a/lib/conntrack-icmp.c b/lib/conntrack-icmp.c
index 40fd1d8..4c7960b 100644
--- a/lib/conntrack-icmp.c
+++ b/lib/conntrack-icmp.c
@@ -1,5 +1,5 @@ 
 /*
- * Copyright (c) 2015, 2016 Nicira, Inc.
+ * Copyright (c) 2015, 2016, 2017, 2018 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -46,16 +46,17 @@  conn_icmp_cast(const struct conn *conn)
 }
 
 static enum ct_update_res
-icmp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
-                 struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
+icmp_conn_update(struct conn *conn_, struct dp_packet *pkt OVS_UNUSED,
+                 bool reply, long long now)
 {
     struct conn_icmp *conn = conn_icmp_cast(conn_);
 
-    if (reply && conn->state != ICMPS_REPLY) {
+    conn->state = ICMPS_FIRST;
+    if (reply) {
         conn->state = ICMPS_REPLY;
     }
 
-    conn_update_expiration(ctb, &conn->up, icmp_timeouts[conn->state], now);
+    conn_update_expiration(&conn->up, icmp_timeouts[conn->state], now);
 
     return CT_UPDATE_VALID;
 }
@@ -79,15 +80,12 @@  icmp6_valid_new(struct dp_packet *pkt)
 }
 
 static struct conn *
-icmp_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt OVS_UNUSED,
-               long long now)
+icmp_new_conn(struct dp_packet *pkt OVS_UNUSED, long long now)
 {
-    struct conn_icmp *conn;
-
-    conn = xzalloc(sizeof *conn);
+    struct conn_icmp *conn = xzalloc(sizeof *conn);
     conn->state = ICMPS_FIRST;
 
-    conn_init_expiration(ctb, &conn->up, icmp_timeouts[conn->state], now);
+    conn_init_expiration(&conn->up, icmp_timeouts[conn->state], now);
 
     return &conn->up;
 }
diff --git a/lib/conntrack-other.c b/lib/conntrack-other.c
index 2920889..79efb5f 100644
--- a/lib/conntrack-other.c
+++ b/lib/conntrack-other.c
@@ -1,5 +1,5 @@ 
 /*
- * Copyright (c) 2015, 2016 Nicira, Inc.
+ * Copyright (c) 2015, 2016, 2017, 2018 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -43,8 +43,8 @@  conn_other_cast(const struct conn *conn)
 }
 
 static enum ct_update_res
-other_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
-                  struct dp_packet *pkt OVS_UNUSED, bool reply, long long now)
+other_conn_update(struct conn *conn_, struct dp_packet *pkt OVS_UNUSED,
+                  bool reply, long long now)
 {
     struct conn_other *conn = conn_other_cast(conn_);
 
@@ -54,7 +54,7 @@  other_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
         conn->state = OTHERS_MULTIPLE;
     }
 
-    conn_update_expiration(ctb, &conn->up, other_timeouts[conn->state], now);
+    conn_update_expiration(&conn->up, other_timeouts[conn->state], now);
 
     return CT_UPDATE_VALID;
 }
@@ -66,15 +66,13 @@  other_valid_new(struct dp_packet *pkt OVS_UNUSED)
 }
 
 static struct conn *
-other_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt OVS_UNUSED,
-               long long now)
+other_new_conn(struct dp_packet *pkt OVS_UNUSED, long long now)
 {
     struct conn_other *conn;
 
     conn = xzalloc(sizeof *conn);
     conn->state = OTHERS_FIRST;
-
-    conn_init_expiration(ctb, &conn->up, other_timeouts[conn->state], now);
+    conn_init_expiration(&conn->up, other_timeouts[conn->state], now);
 
     return &conn->up;
 }
diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h
index a344801..eafe739 100644
--- a/lib/conntrack-private.h
+++ b/lib/conntrack-private.h
@@ -1,5 +1,5 @@ 
 /*
- * Copyright (c) 2015, 2016, 2017 Nicira, Inc.
+ * Copyright (c) 2015, 2016, 2017, 2018 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -21,6 +21,7 @@ 
 #include <netinet/in.h>
 #include <netinet/ip6.h>
 
+#include "cmap.h"
 #include "conntrack.h"
 #include "ct-dpif.h"
 #include "openvswitch/hmap.h"
@@ -57,12 +58,6 @@  struct conn_key {
     uint8_t nw_proto;
 };
 
-struct nat_conn_key_node {
-    struct hmap_node node;
-    struct conn_key key;
-    struct conn_key value;
-};
-
 /* This is used for alg expectations; an expectation is a
  * context created in preparation for establishing a data
  * connection. The expectation is created by the control
@@ -87,18 +82,107 @@  struct alg_exp_node {
     bool nat_rpl_dst;
 };
 
+struct OVS_LOCKABLE ct_ce_lock {
+    struct ovs_mutex lock;
+};
+
+static inline void ct_ce_lock_init(struct ct_ce_lock *ce_lock)
+{
+    ovs_mutex_init_adaptive(&ce_lock->lock);
+}
+
+static inline void ct_ce_lock_lock(struct ct_ce_lock *ce_lock)
+    OVS_ACQUIRES(ce_lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_lock(&ce_lock->lock);
+}
+
+static inline void ct_ce_lock_unlock(struct ct_ce_lock *ce_lock)
+    OVS_RELEASES(ce_lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_unlock(&ce_lock->lock);
+}
+
+static inline void ct_ce_lock_destroy(struct ct_ce_lock *ce_lock)
+{
+    ovs_mutex_destroy(&ce_lock->lock);
+}
+
+struct OVS_LOCKABLE ct_expiry_lock {
+    struct ovs_mutex lock;
+};
+
+static inline void ct_ex_lock_init(struct ct_expiry_lock *ex_lock)
+{
+    ovs_mutex_init_adaptive(&ex_lock->lock);
+}
+
+static inline void ct_ex_lock_lock(struct ct_expiry_lock *ex_lock)
+    OVS_ACQUIRES(ex_lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_lock(&ex_lock->lock);
+}
+
+static inline void ct_ex_lock_unlock(struct ct_expiry_lock *ex_lock)
+    OVS_RELEASES(ex_lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_unlock(&ex_lock->lock);
+}
+
+static inline void ct_ex_lock_destroy(struct ct_expiry_lock *ex_lock)
+{
+    ovs_mutex_destroy(&ex_lock->lock);
+}
+
+struct OVS_LOCKABLE ct_lock {
+    struct ovs_mutex lock;
+};
+
+static inline void ct_lock_init(struct ct_lock *lock)
+{
+    ovs_mutex_init_adaptive(&lock->lock);
+}
+
+static inline void ct_lock_lock(struct ct_lock *lock)
+    OVS_ACQUIRES(lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_lock(&lock->lock);
+}
+
+static inline void ct_lock_unlock(struct ct_lock *lock)
+    OVS_RELEASES(lock)
+    OVS_NO_THREAD_SAFETY_ANALYSIS
+{
+    ovs_mutex_unlock(&lock->lock);
+}
+
+static inline void ct_lock_destroy(struct ct_lock *lock)
+{
+    ovs_mutex_destroy(&lock->lock);
+}
+
+extern struct ct_lock lock;
+
 struct conn {
     struct conn_key key;
     struct conn_key rev_key;
     /* Only used for orig_tuple support. */
     struct conn_key master_key;
+    struct ct_ce_lock lock;
     long long expiration;
     struct ovs_list exp_node;
     struct hmap_node node;
+    struct cmap_node cm_node;
     ovs_u128 label;
     /* XXX: consider flattening. */
     struct nat_action_info_t *nat_info;
     char *alg;
+    struct conn *nat_conn;
     int seq_skew;
     uint32_t mark;
     uint8_t conn_type;
@@ -119,39 +203,42 @@  enum ct_conn_type {
     CT_CONN_TYPE_UN_NAT,
 };
 
+extern struct ct_l4_proto ct_proto_tcp;
+extern struct ct_l4_proto ct_proto_other;
+extern struct ct_l4_proto ct_proto_icmp4;
+extern struct ct_l4_proto ct_proto_icmp6;
+
 struct ct_l4_proto {
-    struct conn *(*new_conn)(struct conntrack_bucket *, struct dp_packet *pkt,
-                             long long now);
+    struct conn *(*new_conn)(struct dp_packet *pkt, long long now);
     bool (*valid_new)(struct dp_packet *pkt);
     enum ct_update_res (*conn_update)(struct conn *conn,
-                                      struct conntrack_bucket *,
                                       struct dp_packet *pkt, bool reply,
                                       long long now);
     void (*conn_get_protoinfo)(const struct conn *,
                                struct ct_dpif_protoinfo *);
 };
 
-extern struct ct_l4_proto ct_proto_tcp;
-extern struct ct_l4_proto ct_proto_other;
-extern struct ct_l4_proto ct_proto_icmp4;
-extern struct ct_l4_proto ct_proto_icmp6;
-
 extern long long ct_timeout_val[];
+extern struct ovs_list cm_exp_lists[N_CT_TM];
+extern struct ct_expiry_lock ex_lock;
 
 static inline void
-conn_init_expiration(struct conntrack_bucket *ctb, struct conn *conn,
-                        enum ct_timeout tm, long long now)
+conn_init_expiration(struct conn *conn, enum ct_timeout tm, long long now)
 {
     conn->expiration = now + ct_timeout_val[tm];
-    ovs_list_push_back(&ctb->exp_lists[tm], &conn->exp_node);
+    ct_ex_lock_lock(&ex_lock);
+    ovs_list_push_back(&cm_exp_lists[tm], &conn->exp_node);
+    ct_ex_lock_unlock(&ex_lock);
 }
 
 static inline void
-conn_update_expiration(struct conntrack_bucket *ctb, struct conn *conn,
-                       enum ct_timeout tm, long long now)
+conn_update_expiration(struct conn *conn, enum ct_timeout tm, long long now)
 {
+    ct_ex_lock_lock(&ex_lock);
     ovs_list_remove(&conn->exp_node);
-    conn_init_expiration(ctb, conn, tm, now);
+    conn->expiration = now + ct_timeout_val[tm];
+    ovs_list_push_back(&cm_exp_lists[tm], &conn->exp_node);
+    ct_ex_lock_unlock(&ex_lock);
 }
 
 static inline uint32_t
diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c
index 86d313d..e0b4d11 100644
--- a/lib/conntrack-tcp.c
+++ b/lib/conntrack-tcp.c
@@ -145,8 +145,8 @@  tcp_get_wscale(const struct tcp_header *tcp)
 }
 
 static enum ct_update_res
-tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
-                struct dp_packet *pkt, bool reply, long long now)
+tcp_conn_update(struct conn *conn_, struct dp_packet *pkt, bool reply,
+                long long now)
 {
     struct conn_tcp *conn = conn_tcp_cast(conn_);
     struct tcp_header *tcp = dp_packet_l4(pkt);
@@ -156,20 +156,23 @@  tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
     struct tcp_peer *dst = &conn->peer[reply ? 0 : 1];
     uint8_t sws = 0, dws = 0;
     uint16_t tcp_flags = TCP_FLAGS(tcp->tcp_ctl);
+    enum ct_update_res rc = CT_UPDATE_VALID;
 
     uint16_t win = ntohs(tcp->tcp_winsz);
     uint32_t ack, end, seq, orig_seq;
     uint32_t p_len = tcp_payload_length(pkt);
 
     if (tcp_invalid_flags(tcp_flags)) {
-        return CT_UPDATE_INVALID;
+        rc = CT_UPDATE_INVALID;
+        goto out;
     }
 
     if (((tcp_flags & (TCP_SYN | TCP_ACK)) == TCP_SYN)
         && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
         && src->state >= CT_DPIF_TCPS_FIN_WAIT_2) {
         src->state = dst->state = CT_DPIF_TCPS_CLOSED;
-        return CT_UPDATE_NEW;
+        rc = CT_UPDATE_NEW;
+        goto out;
     }
 
     if (src->wscale & CT_WSCALE_FLAG
@@ -317,18 +320,18 @@  tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
 
         if (src->state >= CT_DPIF_TCPS_FIN_WAIT_2
             && dst->state >= CT_DPIF_TCPS_FIN_WAIT_2) {
-            conn_update_expiration(ctb, &conn->up, CT_TM_TCP_CLOSED, now);
+            conn_update_expiration(&conn->up, CT_TM_TCP_CLOSED, now);
         } else if (src->state >= CT_DPIF_TCPS_CLOSING
                    && dst->state >= CT_DPIF_TCPS_CLOSING) {
-            conn_update_expiration(ctb, &conn->up, CT_TM_TCP_FIN_WAIT, now);
+            conn_update_expiration(&conn->up, CT_TM_TCP_FIN_WAIT, now);
         } else if (src->state < CT_DPIF_TCPS_ESTABLISHED
                    || dst->state < CT_DPIF_TCPS_ESTABLISHED) {
-            conn_update_expiration(ctb, &conn->up, CT_TM_TCP_OPENING, now);
+            conn_update_expiration(&conn->up, CT_TM_TCP_OPENING, now);
         } else if (src->state >= CT_DPIF_TCPS_CLOSING
                    || dst->state >= CT_DPIF_TCPS_CLOSING) {
-            conn_update_expiration(ctb, &conn->up, CT_TM_TCP_CLOSING, now);
+            conn_update_expiration(&conn->up, CT_TM_TCP_CLOSING, now);
         } else {
-            conn_update_expiration(ctb, &conn->up, CT_TM_TCP_ESTABLISHED, now);
+            conn_update_expiration(&conn->up, CT_TM_TCP_ESTABLISHED, now);
         }
     } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT
                 || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2
@@ -385,10 +388,12 @@  tcp_conn_update(struct conn *conn_, struct conntrack_bucket *ctb,
             src->state = dst->state = CT_DPIF_TCPS_TIME_WAIT;
         }
     } else {
-        return CT_UPDATE_INVALID;
+        rc = CT_UPDATE_INVALID;
+        goto out;
     }
 
-    return CT_UPDATE_VALID;
+out:
+    return rc;
 }
 
 static bool
@@ -412,8 +417,7 @@  tcp_valid_new(struct dp_packet *pkt)
 }
 
 static struct conn *
-tcp_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt,
-             long long now)
+tcp_new_conn(struct dp_packet *pkt, long long now)
 {
     struct conn_tcp* newconn = NULL;
     struct tcp_header *tcp = dp_packet_l4(pkt);
@@ -449,8 +453,7 @@  tcp_new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt,
     src->state = CT_DPIF_TCPS_SYN_SENT;
     dst->state = CT_DPIF_TCPS_CLOSED;
 
-    conn_init_expiration(ctb, &newconn->up, CT_TM_TCP_FIRST_PACKET,
-                         now);
+    conn_init_expiration(&newconn->up, CT_TM_TCP_FIRST_PACKET, now);
 
     return &newconn->up;
 }
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 974f985..fe4099a 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -1,5 +1,5 @@ 
 /*
- * Copyright (c) 2015, 2016, 2017 Nicira, Inc.
+ * Copyright (c) 2015, 2016, 2017, 2018 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -76,22 +76,25 @@  enum ct_alg_ctl_type {
     CT_ALG_CTL_SIP,
 };
 
+static struct cmap cm_conns;
+struct ct_lock lock;
+struct ct_expiry_lock ex_lock;
+struct ovs_list cm_exp_lists[N_CT_TM];
+
 static bool conn_key_extract(struct conntrack *, struct dp_packet *,
                              ovs_be16 dl_type, struct conn_lookup_ctx *,
                              uint16_t zone);
 static uint32_t conn_key_hash(const struct conn_key *, uint32_t basis);
 static void conn_key_reverse(struct conn_key *);
-static void conn_key_lookup(struct conntrack_bucket *ctb,
-                            struct conn_lookup_ctx *ctx,
-                            long long now);
 static bool valid_new(struct dp_packet *pkt, struct conn_key *);
-static struct conn *new_conn(struct conntrack_bucket *, struct dp_packet *pkt,
-                             struct conn_key *, long long now);
-static void delete_conn(struct conn *);
-static enum ct_update_res conn_update(struct conn *,
-                                      struct conntrack_bucket *ctb,
-                                      struct dp_packet *, bool reply,
+static struct conn *new_conn(struct dp_packet *pkt, struct conn_key *,
+                             long long now);
+static enum ct_update_res conn_update(struct dp_packet *pkt,
+                                      struct conn *conn,
+                                      struct conn_lookup_ctx *ctx,
                                       long long now);
+static void delete_conn(struct conn *);
+static void delete_conn_one(struct conn *conn);
 static bool conn_expired(struct conn *, long long now);
 static void set_mark(struct dp_packet *, struct conn *,
                      uint32_t val, uint32_t mask);
@@ -100,21 +103,6 @@  static void set_label(struct dp_packet *, struct conn *,
                       const struct ovs_key_ct_labels *mask);
 static void *clean_thread_main(void *f_);
 
-static struct nat_conn_key_node *
-nat_conn_keys_lookup(struct hmap *nat_conn_keys,
-                     const struct conn_key *key,
-                     uint32_t basis);
-
-static bool
-nat_conn_keys_insert(struct hmap *nat_conn_keys,
-                     const struct conn *nat_conn,
-                     uint32_t hash_basis);
-
-static void
-nat_conn_keys_remove(struct hmap *nat_conn_keys,
-                     const struct conn_key *key,
-                     uint32_t basis);
-
 static bool
 nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
                        struct conn *nat_conn);
@@ -151,8 +139,7 @@  detect_ftp_ctl_type(const struct conn_lookup_ctx *ctx,
                     struct dp_packet *pkt);
 
 static void
-expectation_clean(struct conntrack *ct, const struct conn_key *master_key,
-                  uint32_t basis);
+expectation_clean(struct conntrack *ct, const struct conn_key *master_key);
 
 static struct ct_l4_proto *l4_protos[] = {
     [IPPROTO_TCP] = &ct_proto_tcp,
@@ -250,6 +237,50 @@  conn_key_cmp(const struct conn_key *key1, const struct conn_key *key2)
 }
 
 static void
+conn_key_lookup(const struct conn_key *key, uint32_t hash, long long now,
+                struct conn **conn_out, bool *reply)
+{
+    struct conn *conn;
+    *conn_out = NULL;
+
+    CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &cm_conns) {
+        if (!conn_key_cmp(&conn->key, key) && !conn_expired(conn, now)) {
+            *conn_out = conn;
+            *reply = false;
+            break;
+        }
+        if (!conn_key_cmp(&conn->rev_key, key) && !conn_expired(conn, now)) {
+            *conn_out = conn;
+            *reply = true;
+            break;
+        }
+    }
+}
+
+static bool
+conn_available(const struct conn_key *key, uint32_t hash, long long now)
+{
+    struct conn *conn;
+    bool found = false;
+
+    CMAP_FOR_EACH_WITH_HASH (conn, cm_node, hash, &cm_conns) {
+        if (!conn_key_cmp(&conn->key, key)
+            && !conn_expired(conn, now)) {
+            found = true;
+            break;
+        }
+
+        if (!conn_key_cmp(&conn->rev_key, key)
+                && !conn_expired(conn, now)) {
+            found = true;
+            break;
+        }
+    }
+
+    return !found;
+}
+
+static void
 ct_print_conn_info(const struct conn *c, const char *log_msg,
                    enum vlog_level vll, bool force, bool rl_on)
 {
@@ -309,31 +340,25 @@  ct_print_conn_info(const struct conn *c, const char *log_msg,
 void
 conntrack_init(struct conntrack *ct)
 {
-    long long now = time_msec();
-
     ct_rwlock_init(&ct->resources_lock);
     ct_rwlock_wrlock(&ct->resources_lock);
-    hmap_init(&ct->nat_conn_keys);
     hmap_init(&ct->alg_expectations);
     hindex_init(&ct->alg_expectation_refs);
     ovs_list_init(&ct->alg_exp_list);
     ct_rwlock_unlock(&ct->resources_lock);
 
-    for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) {
-        struct conntrack_bucket *ctb = &ct->buckets[i];
+    ct_lock_init(&lock);
+    ct_lock_lock(&lock);
+    cmap_init(&cm_conns);
+    ct_lock_unlock(&lock);
 
-        ct_lock_init(&ctb->lock);
-        ct_lock_lock(&ctb->lock);
-        hmap_init(&ctb->connections);
-        for (unsigned j = 0; j < ARRAY_SIZE(ctb->exp_lists); j++) {
-            ovs_list_init(&ctb->exp_lists[j]);
-        }
-        ct_lock_unlock(&ctb->lock);
-        ovs_mutex_init(&ctb->cleanup_mutex);
-        ovs_mutex_lock(&ctb->cleanup_mutex);
-        ctb->next_cleanup = now + CT_TM_MIN;
-        ovs_mutex_unlock(&ctb->cleanup_mutex);
+    ct_ex_lock_init(&ex_lock);
+    ct_ex_lock_lock(&ex_lock);
+    for (unsigned i = 0; i < ARRAY_SIZE(cm_exp_lists); i++) {
+        ovs_list_init(&cm_exp_lists[i]);
     }
+    ct_ex_lock_unlock(&ex_lock);
+
     ct->hash_basis = random_uint32();
     atomic_count_init(&ct->n_conn, 0);
     atomic_init(&ct->n_conn_limit, DEFAULT_N_CONN_LIMIT);
@@ -341,36 +366,73 @@  conntrack_init(struct conntrack *ct)
     ct->clean_thread = ovs_thread_create("ct_clean", clean_thread_main, ct);
 }
 
+/* Must be called with 'conn' of 'conn_type' CT_CONN_TYPE_DEFAULT.  Also
+ * removes the associated nat 'conn' from the lookup datastructures. */
+static void
+conn_clean(struct conntrack *ct, struct conn *conn)
+    OVS_REQUIRES(lock)
+{
+    ovs_assert(conn->conn_type == CT_CONN_TYPE_DEFAULT);
+
+    if (conn->alg) {
+        expectation_clean(ct, &conn->key);
+    }
+
+    uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis);
+    cmap_remove(&cm_conns, &conn->cm_node, hash);
+    ct_ex_lock_lock(&ex_lock);
+    ovs_list_remove(&conn->exp_node);
+    ct_ex_lock_unlock(&ex_lock);
+    if (conn->nat_conn) {
+        hash = conn_key_hash(&conn->nat_conn->key, ct->hash_basis);
+        cmap_remove(&cm_conns, &conn->nat_conn->cm_node, hash);
+        ct_ex_lock_lock(&ex_lock);
+        ovs_list_remove(&conn->nat_conn->exp_node);
+        ct_ex_lock_unlock(&ex_lock);
+    }
+    ovsrcu_postpone(delete_conn, conn);
+    atomic_count_dec(&ct->n_conn);
+}
+
+static void
+conn_clean_one(struct conntrack *ct, struct conn *conn)
+    OVS_REQUIRES(lock)
+{
+    if (conn->alg) {
+        expectation_clean(ct, &conn->key);
+    }
+
+    uint32_t hash = conn_key_hash(&conn->key, ct->hash_basis);
+    cmap_remove(&cm_conns, &conn->cm_node, hash);
+    ct_ex_lock_lock(&ex_lock);
+    ovs_list_remove(&conn->exp_node);
+    ct_ex_lock_unlock(&ex_lock);
+    if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
+        atomic_count_dec(&ct->n_conn);
+    }
+    ovsrcu_postpone(delete_conn_one, conn);
+}
+
 /* Destroys the connection tracker 'ct' and frees all the allocated memory. */
 void
 conntrack_destroy(struct conntrack *ct)
 {
+    struct conn *conn;
     latch_set(&ct->clean_thread_exit);
     pthread_join(ct->clean_thread, NULL);
     latch_destroy(&ct->clean_thread_exit);
-    for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) {
-        struct conntrack_bucket *ctb = &ct->buckets[i];
-        struct conn *conn;
 
-        ovs_mutex_destroy(&ctb->cleanup_mutex);
-        ct_lock_lock(&ctb->lock);
-        HMAP_FOR_EACH_POP (conn, node, &ctb->connections) {
-            if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
-                atomic_count_dec(&ct->n_conn);
-            }
-            delete_conn(conn);
-        }
-        hmap_destroy(&ctb->connections);
-        ct_lock_unlock(&ctb->lock);
-        ct_lock_destroy(&ctb->lock);
-    }
-    ct_rwlock_wrlock(&ct->resources_lock);
-    struct nat_conn_key_node *nat_conn_key_node;
-    HMAP_FOR_EACH_POP (nat_conn_key_node, node, &ct->nat_conn_keys) {
-        free(nat_conn_key_node);
+    ct_lock_lock(&lock);
+    CMAP_FOR_EACH (conn, cm_node, &cm_conns) {
+        conn_clean_one(ct, conn);
     }
-    hmap_destroy(&ct->nat_conn_keys);
+    cmap_destroy(&cm_conns);
+    ct_lock_unlock(&lock);
+    ct_lock_destroy(&lock);
+
+    ct_ex_lock_destroy(&ex_lock);
 
+    ct_rwlock_wrlock(&ct->resources_lock);
     struct alg_exp_node *alg_exp_node;
     HMAP_FOR_EACH_POP (alg_exp_node, node, &ct->alg_expectations) {
         free(alg_exp_node);
@@ -383,14 +445,6 @@  conntrack_destroy(struct conntrack *ct)
     ct_rwlock_destroy(&ct->resources_lock);
 }
 
-static unsigned hash_to_bucket(uint32_t hash)
-{
-    /* Extracts the most significant bits in hash. The least significant bits
-     * are already used internally by the hmap implementation. */
-    BUILD_ASSERT(CONNTRACK_BUCKETS_SHIFT < 32 && CONNTRACK_BUCKETS_SHIFT >= 1);
-
-    return (hash >> (32 - CONNTRACK_BUCKETS_SHIFT)) % CONNTRACK_BUCKETS;
-}
 
 static void
 write_ct_md(struct dp_packet *pkt, uint16_t zone, const struct conn *conn,
@@ -738,87 +792,19 @@  un_nat_packet(struct dp_packet *pkt, const struct conn *conn,
     }
 }
 
-/* Typical usage of this helper is in non per-packet code;
- * this is because the bucket lock needs to be held for lookup
- * and a hash would have already been needed. Hence, this function
- * is just intended for code clarity. */
-static struct conn *
-conn_lookup(struct conntrack *ct, const struct conn_key *key, long long now)
-{
-    struct conn_lookup_ctx ctx;
-    ctx.conn = NULL;
-    ctx.key = *key;
-    ctx.hash = conn_key_hash(key, ct->hash_basis);
-    unsigned bucket = hash_to_bucket(ctx.hash);
-    conn_key_lookup(&ct->buckets[bucket], &ctx, now);
-    return ctx.conn;
-}
-
 static void
 conn_seq_skew_set(struct conntrack *ct, const struct conn_key *key,
                   long long now, int seq_skew, bool seq_skew_dir)
 {
-    unsigned bucket = hash_to_bucket(conn_key_hash(key, ct->hash_basis));
-    ct_lock_lock(&ct->buckets[bucket].lock);
-    struct conn *conn = conn_lookup(ct, key, now);
+    struct conn *conn;
+    bool reply;
+    uint32_t hash = conn_key_hash(key, ct->hash_basis);
+    conn_key_lookup(key, hash, now, &conn, &reply);
+
     if (conn && seq_skew) {
         conn->seq_skew = seq_skew;
         conn->seq_skew_dir = seq_skew_dir;
     }
-    ct_lock_unlock(&ct->buckets[bucket].lock);
-}
-
-static void
-nat_clean(struct conntrack *ct, struct conn *conn,
-          struct conntrack_bucket *ctb)
-    OVS_REQUIRES(ctb->lock)
-{
-    ct_rwlock_wrlock(&ct->resources_lock);
-    nat_conn_keys_remove(&ct->nat_conn_keys, &conn->rev_key, ct->hash_basis);
-    ct_rwlock_unlock(&ct->resources_lock);
-    ct_lock_unlock(&ctb->lock);
-    unsigned bucket_rev_conn =
-        hash_to_bucket(conn_key_hash(&conn->rev_key, ct->hash_basis));
-    ct_lock_lock(&ct->buckets[bucket_rev_conn].lock);
-    ct_rwlock_wrlock(&ct->resources_lock);
-    long long now = time_msec();
-    struct conn *rev_conn = conn_lookup(ct, &conn->rev_key, now);
-    struct nat_conn_key_node *nat_conn_key_node =
-        nat_conn_keys_lookup(&ct->nat_conn_keys, &conn->rev_key,
-                             ct->hash_basis);
-
-    /* In the unlikely event, rev conn was recreated, then skip
-     * rev_conn cleanup. */
-    if (rev_conn && (!nat_conn_key_node ||
-                     conn_key_cmp(&nat_conn_key_node->value,
-                                  &rev_conn->rev_key))) {
-        hmap_remove(&ct->buckets[bucket_rev_conn].connections,
-                    &rev_conn->node);
-        free(rev_conn);
-    }
-
-    delete_conn(conn);
-    ct_rwlock_unlock(&ct->resources_lock);
-    ct_lock_unlock(&ct->buckets[bucket_rev_conn].lock);
-    ct_lock_lock(&ctb->lock);
-}
-
-static void
-conn_clean(struct conntrack *ct, struct conn *conn,
-           struct conntrack_bucket *ctb)
-    OVS_REQUIRES(ctb->lock)
-{
-    if (conn->alg) {
-        expectation_clean(ct, &conn->key, ct->hash_basis);
-    }
-    ovs_list_remove(&conn->exp_node);
-    hmap_remove(&ctb->connections, &conn->node);
-    atomic_count_dec(&ct->n_conn);
-    if (conn->nat_info) {
-        nat_clean(ct, conn, ctb);
-    } else {
-        delete_conn(conn);
-    }
 }
 
 static bool
@@ -841,17 +827,16 @@  ct_verify_helper(const char *helper, enum ct_alg_ctl_type ct_alg_ctl)
     }
 }
 
-/* This function is called with the bucket lock held. */
 static struct conn *
 conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
                struct conn_lookup_ctx *ctx, bool commit, long long now,
                const struct nat_action_info_t *nat_action_info,
-               struct conn *conn_for_un_nat_copy,
-               const char *helper,
-               const struct alg_exp_node *alg_exp,
+               const char *helper, const struct alg_exp_node *alg_exp,
                enum ct_alg_ctl_type ct_alg_ctl)
+    OVS_REQUIRES(lock)
 {
     struct conn *nc = NULL;
+    struct conn *nat_conn = NULL;
 
     if (!valid_new(pkt, &ctx->key)) {
         pkt->md.ct_state = CS_INVALID;
@@ -873,8 +858,7 @@  conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
             return nc;
         }
 
-        unsigned bucket = hash_to_bucket(ctx->hash);
-        nc = new_conn(&ct->buckets[bucket], pkt, &ctx->key, now);
+        nc = new_conn(pkt, &ctx->key, now);
         ctx->conn = nc;
         nc->rev_key = nc->key;
         conn_key_reverse(&nc->rev_key);
@@ -893,6 +877,8 @@  conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
         if (nat_action_info) {
             nc->nat_info = xmemdup(nat_action_info, sizeof *nc->nat_info);
 
+            nat_conn = xzalloc(sizeof *nat_conn);
+
             if (alg_exp) {
                 if (alg_exp->nat_rpl_dst) {
                     nc->rev_key.dst.addr = alg_exp->alg_nat_repl_addr;
@@ -901,70 +887,60 @@  conn_not_found(struct conntrack *ct, struct dp_packet *pkt,
                     nc->rev_key.src.addr = alg_exp->alg_nat_repl_addr;
                     nc->nat_info->nat_action = NAT_ACTION_DST;
                 }
-                *conn_for_un_nat_copy = *nc;
-                ct_rwlock_wrlock(&ct->resources_lock);
-                bool new_insert = nat_conn_keys_insert(&ct->nat_conn_keys,
-                                                       conn_for_un_nat_copy,
-                                                       ct->hash_basis);
-                ct_rwlock_unlock(&ct->resources_lock);
-                if (!new_insert) {
-                    char *log_msg = xasprintf("Pre-existing alg "
-                                              "nat_conn_key");
-                    ct_print_conn_info(conn_for_un_nat_copy, log_msg, VLL_INFO,
-                                       true, false);
-                    free(log_msg);
-                }
+                *nat_conn = *nc;
             } else {
-                *conn_for_un_nat_copy = *nc;
-                ct_rwlock_wrlock(&ct->resources_lock);
-                bool nat_res = nat_select_range_tuple(ct, nc,
-                                                      conn_for_un_nat_copy);
+                *nat_conn = *nc;
+                bool nat_res = nat_select_range_tuple(ct, nc, nat_conn);
 
                 if (!nat_res) {
                     goto nat_res_exhaustion;
                 }
 
-                /* Update nc with nat adjustments made to
-                 * conn_for_un_nat_copy by nat_select_range_tuple(). */
-                *nc = *conn_for_un_nat_copy;
-                ct_rwlock_unlock(&ct->resources_lock);
+                /* Update nc with nat adjustments. */
+                *nc = *nat_conn;
             }
-            conn_for_un_nat_copy->conn_type = CT_CONN_TYPE_UN_NAT;
-            conn_for_un_nat_copy->nat_info = NULL;
-            conn_for_un_nat_copy->alg = NULL;
             nat_packet(pkt, nc, ctx->icmp_related);
-        }
-        hmap_insert(&ct->buckets[bucket].connections, &nc->node, ctx->hash);
+
+            nat_conn->key = nc->rev_key;
+            nat_conn->rev_key = nc->key;
+            nat_conn->conn_type = CT_CONN_TYPE_UN_NAT;
+            nat_conn->nat_info = NULL;
+            nat_conn->alg = NULL;
+            nat_conn->nat_conn = NULL;
+            uint32_t nat_hash = conn_key_hash(&nat_conn->key,
+                                              ct->hash_basis);
+            cmap_insert(&cm_conns, &nat_conn->cm_node, nat_hash);
+        }
+
+        nc->nat_conn = nat_conn;
+        ct_ce_lock_init(&nc->lock);
+        nc->conn_type = CT_CONN_TYPE_DEFAULT;
+        cmap_insert(&cm_conns, &nc->cm_node, ctx->hash);
         atomic_count_inc(&ct->n_conn);
     }
 
     return nc;
 
-    /* This would be a user error or a DOS attack.
-     * A user error is prevented by allocating enough
-     * combinations of NAT addresses when combined with
-     * ephemeral ports.  A DOS attack should be protected
-     * against with firewall rules or a separate firewall.
-     * Also using zone partitioning can limit DoS impact. */
+    /* This would be a user error or a DOS attack.  A user error is prevented
+     * by allocating enough combinations of NAT addresses when combined with
+     * ephemeral ports.  A DOS attack should be protected against with
+     * firewall rules or a separate firewall.  Also using zone partitioning
+     * can limit DoS impact. */
 nat_res_exhaustion:
+    free(nat_conn);
     ovs_list_remove(&nc->exp_node);
     delete_conn(nc);
-    /* conn_for_un_nat_copy is a local variable in process_one; this
-     * memset() serves to document that conn_for_un_nat_copy is from
-     * this point on unused. */
-    memset(conn_for_un_nat_copy, 0, sizeof *conn_for_un_nat_copy);
-    ct_rwlock_unlock(&ct->resources_lock);
     static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 5);
     VLOG_WARN_RL(&rl, "Unable to NAT due to tuple space exhaustion - "
                  "if DoS attack, use firewalling and/or zone partitioning.");
     return NULL;
 }
 
+/* Reminder: change **conn to *conn. */
 static bool
 conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
-                  struct conn_lookup_ctx *ctx, struct conn **conn,
-                  long long now, unsigned bucket)
-    OVS_REQUIRES(ct->buckets[bucket].lock)
+                  struct conn_lookup_ctx *ctx, struct conn *conn,
+                  long long now)
 {
     bool create_new_conn = false;
 
@@ -974,12 +950,13 @@  conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
             pkt->md.ct_state |= CS_REPLY_DIR;
         }
     } else {
-        if ((*conn)->alg_related) {
+        if (conn->alg_related) {
             pkt->md.ct_state |= CS_RELATED;
         }
 
-        enum ct_update_res res = conn_update(*conn, &ct->buckets[bucket],
-                                             pkt, ctx->reply, now);
+        ct_lock_lock(&lock);
+        enum ct_update_res res = conn_update(pkt, conn, ctx, now);
+        ct_lock_unlock(&lock);
 
         switch (res) {
         case CT_UPDATE_VALID:
@@ -993,7 +970,9 @@  conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
             pkt->md.ct_state = CS_INVALID;
             break;
         case CT_UPDATE_NEW:
-            conn_clean(ct, *conn, &ct->buckets[bucket]);
+            ct_lock_lock(&lock);
+            conn_clean(ct, conn);
+            ct_lock_unlock(&lock);
             create_new_conn = true;
             break;
         default:
@@ -1004,51 +983,6 @@  conn_update_state(struct conntrack *ct, struct dp_packet *pkt,
 }
 
 static void
-create_un_nat_conn(struct conntrack *ct, struct conn *conn_for_un_nat_copy,
-                   long long now, bool alg_un_nat)
-{
-    struct conn *nc = xmemdup(conn_for_un_nat_copy, sizeof *nc);
-    nc->key = conn_for_un_nat_copy->rev_key;
-    nc->rev_key = conn_for_un_nat_copy->key;
-    uint32_t un_nat_hash = conn_key_hash(&nc->key, ct->hash_basis);
-    unsigned un_nat_conn_bucket = hash_to_bucket(un_nat_hash);
-    ct_lock_lock(&ct->buckets[un_nat_conn_bucket].lock);
-    struct conn *rev_conn = conn_lookup(ct, &nc->key, now);
-
-    if (alg_un_nat) {
-        if (!rev_conn) {
-            hmap_insert(&ct->buckets[un_nat_conn_bucket].connections,
-                        &nc->node, un_nat_hash);
-        } else {
-            char *log_msg = xasprintf("Unusual condition for un_nat conn "
-                                      "create for alg: rev_conn %p", rev_conn);
-            ct_print_conn_info(nc, log_msg, VLL_INFO, true, false);
-            free(log_msg);
-            free(nc);
-        }
-    } else {
-        ct_rwlock_rdlock(&ct->resources_lock);
-
-        struct nat_conn_key_node *nat_conn_key_node =
-            nat_conn_keys_lookup(&ct->nat_conn_keys, &nc->key, ct->hash_basis);
-        if (nat_conn_key_node && !conn_key_cmp(&nat_conn_key_node->value,
-            &nc->rev_key) && !rev_conn) {
-            hmap_insert(&ct->buckets[un_nat_conn_bucket].connections,
-                        &nc->node, un_nat_hash);
-        } else {
-            char *log_msg = xasprintf("Unusual condition for un_nat conn "
-                                      "create: nat_conn_key_node/rev_conn "
-                                      "%p/%p", nat_conn_key_node, rev_conn);
-            ct_print_conn_info(nc, log_msg, VLL_INFO, true, false);
-            free(log_msg);
-            free(nc);
-        }
-        ct_rwlock_unlock(&ct->resources_lock);
-    }
-    ct_lock_unlock(&ct->buckets[un_nat_conn_bucket].lock);
-}
-
-static void
 handle_nat(struct dp_packet *pkt, struct conn *conn,
            uint16_t zone, bool reply, bool related)
 {
@@ -1071,9 +1005,8 @@  handle_nat(struct dp_packet *pkt, struct conn *conn,
 static bool
 check_orig_tuple(struct conntrack *ct, struct dp_packet *pkt,
                  struct conn_lookup_ctx *ctx_in, long long now,
-                 unsigned *bucket, struct conn **conn,
+                 struct conn **conn,
                  const struct nat_action_info_t *nat_action_info)
-    OVS_REQUIRES(ct->buckets[*bucket].lock)
 {
     if ((ctx_in->key.dl_type == htons(ETH_TYPE_IP) &&
          !pkt->md.ct_orig_tuple.ipv4.ipv4_proto) ||
@@ -1084,57 +1017,48 @@  check_orig_tuple(struct conntrack *ct, struct dp_packet *pkt,
         return false;
     }
 
-    ct_lock_unlock(&ct->buckets[*bucket].lock);
-    struct conn_lookup_ctx ctx;
-    memset(&ctx, 0 , sizeof ctx);
-    ctx.conn = NULL;
+    struct conn_key key;
+    memset(&key, 0 , sizeof key);
 
     if (ctx_in->key.dl_type == htons(ETH_TYPE_IP)) {
-        ctx.key.src.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_src;
-        ctx.key.dst.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_dst;
+        key.src.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_src;
+        key.dst.addr.ipv4_aligned = pkt->md.ct_orig_tuple.ipv4.ipv4_dst;
 
         if (ctx_in->key.nw_proto == IPPROTO_ICMP) {
-            ctx.key.src.icmp_id = ctx_in->key.src.icmp_id;
-            ctx.key.dst.icmp_id = ctx_in->key.dst.icmp_id;
+            key.src.icmp_id = ctx_in->key.src.icmp_id;
+            key.dst.icmp_id = ctx_in->key.dst.icmp_id;
             uint16_t src_port = ntohs(pkt->md.ct_orig_tuple.ipv4.src_port);
-            ctx.key.src.icmp_type = (uint8_t) src_port;
-            ctx.key.dst.icmp_type = reverse_icmp_type(ctx.key.src.icmp_type);
+            key.src.icmp_type = (uint8_t) src_port;
+            key.dst.icmp_type = reverse_icmp_type(key.src.icmp_type);
         } else {
-            ctx.key.src.port = pkt->md.ct_orig_tuple.ipv4.src_port;
-            ctx.key.dst.port = pkt->md.ct_orig_tuple.ipv4.dst_port;
+            key.src.port = pkt->md.ct_orig_tuple.ipv4.src_port;
+            key.dst.port = pkt->md.ct_orig_tuple.ipv4.dst_port;
         }
-        ctx.key.nw_proto = pkt->md.ct_orig_tuple.ipv4.ipv4_proto;
+        key.nw_proto = pkt->md.ct_orig_tuple.ipv4.ipv4_proto;
     } else {
-        ctx.key.src.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_src;
-        ctx.key.dst.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_dst;
+        key.src.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_src;
+        key.dst.addr.ipv6_aligned = pkt->md.ct_orig_tuple.ipv6.ipv6_dst;
 
         if (ctx_in->key.nw_proto == IPPROTO_ICMPV6) {
-            ctx.key.src.icmp_id = ctx_in->key.src.icmp_id;
-            ctx.key.dst.icmp_id = ctx_in->key.dst.icmp_id;
+            key.src.icmp_id = ctx_in->key.src.icmp_id;
+            key.dst.icmp_id = ctx_in->key.dst.icmp_id;
             uint16_t src_port = ntohs(pkt->md.ct_orig_tuple.ipv6.src_port);
-            ctx.key.src.icmp_type = (uint8_t) src_port;
-            ctx.key.dst.icmp_type = reverse_icmp6_type(ctx.key.src.icmp_type);
+            key.src.icmp_type = (uint8_t) src_port;
+            key.dst.icmp_type = reverse_icmp6_type(key.src.icmp_type);
         } else {
-            ctx.key.src.port = pkt->md.ct_orig_tuple.ipv6.src_port;
-            ctx.key.dst.port = pkt->md.ct_orig_tuple.ipv6.dst_port;
+            key.src.port = pkt->md.ct_orig_tuple.ipv6.src_port;
+            key.dst.port = pkt->md.ct_orig_tuple.ipv6.dst_port;
         }
-        ctx.key.nw_proto = pkt->md.ct_orig_tuple.ipv6.ipv6_proto;
+        key.nw_proto = pkt->md.ct_orig_tuple.ipv6.ipv6_proto;
     }
 
-    ctx.key.dl_type = ctx_in->key.dl_type;
-    ctx.key.zone = pkt->md.ct_zone;
-    ctx.hash = conn_key_hash(&ctx.key, ct->hash_basis);
-    *bucket = hash_to_bucket(ctx.hash);
-    ct_lock_lock(&ct->buckets[*bucket].lock);
-    conn_key_lookup(&ct->buckets[*bucket], &ctx, now);
-    *conn = ctx.conn;
-    return *conn ? true : false;
-}
+    key.dl_type = ctx_in->key.dl_type;
+    key.zone = pkt->md.ct_zone;
+    uint32_t hash = conn_key_hash(&key, ct->hash_basis);
+    bool reply;
+    conn_key_lookup(&key, hash, now, conn, &reply);
 
-static bool
-is_un_nat_conn_valid(const struct conn *un_nat_conn)
-{
-    return un_nat_conn->conn_type == CT_CONN_TYPE_UN_NAT;
+    return *conn ? true : false;
 }
 
 static bool
@@ -1142,24 +1066,25 @@  conn_update_state_alg(struct conntrack *ct, struct dp_packet *pkt,
                       struct conn_lookup_ctx *ctx, struct conn *conn,
                       const struct nat_action_info_t *nat_action_info,
                       enum ct_alg_ctl_type ct_alg_ctl, long long now,
-                      unsigned bucket, bool *create_new_conn)
-    OVS_REQUIRES(ct->buckets[bucket].lock)
+                      bool *create_new_conn)
 {
     if (is_ftp_ctl(ct_alg_ctl)) {
         /* Keep sequence tracking in sync with the source of the
          * sequence skew. */
         if (ctx->reply != conn->seq_skew_dir) {
+            ct_ce_lock_lock(&conn->lock);
             handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
                            !!nat_action_info);
-            *create_new_conn = conn_update_state(ct, pkt, ctx, &conn, now,
-                                                bucket);
+            ct_ce_lock_unlock(&conn->lock);
+            *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now);
         } else {
-            *create_new_conn = conn_update_state(ct, pkt, ctx, &conn, now,
-                                                bucket);
+            *create_new_conn = conn_update_state(ct, pkt, ctx, conn, now);
 
             if (*create_new_conn == false) {
+                ct_ce_lock_lock(&conn->lock);
                 handle_ftp_ctl(ct, ctx, pkt, conn, now, CT_FTP_CTL_OTHER,
                                !!nat_action_info);
+                ct_ce_lock_unlock(&conn->lock);
             }
         }
         return true;
@@ -1170,74 +1095,59 @@  conn_update_state_alg(struct conntrack *ct, struct dp_packet *pkt,
 static void
 process_one(struct conntrack *ct, struct dp_packet *pkt,
             struct conn_lookup_ctx *ctx, uint16_t zone,
-            bool force, bool commit, long long now, const uint32_t *setmark,
+            bool force, bool commit, long long now,
+            const uint32_t *setmark,
             const struct ovs_key_ct_labels *setlabel,
             const struct nat_action_info_t *nat_action_info,
             ovs_be16 tp_src, ovs_be16 tp_dst, const char *helper)
 {
-    struct conn *conn;
-    unsigned bucket = hash_to_bucket(ctx->hash);
-    ct_lock_lock(&ct->buckets[bucket].lock);
-    conn_key_lookup(&ct->buckets[bucket], ctx, now);
-    conn = ctx->conn;
+    bool create_new_conn = false;
+
+    conn_key_lookup(&ctx->key, ctx->hash, now, &ctx->conn, &ctx->reply);
+    struct conn *conn = ctx->conn;
 
     /* Delete found entry if in wrong direction. 'force' implies commit. */
     if (conn && force && ctx->reply) {
-        conn_clean(ct, conn, &ct->buckets[bucket]);
+        ct_lock_lock(&lock);
+        conn_clean(ct, conn);
+        ct_lock_unlock(&lock);
         conn = NULL;
     }
 
     if (OVS_LIKELY(conn)) {
         if (conn->conn_type == CT_CONN_TYPE_UN_NAT) {
-
             ctx->reply = true;
+            struct conn *rev_conn = conn;  /* Save for debugging. */
+            uint32_t hash = conn_key_hash(&conn->rev_key, ct->hash_basis);
+            conn_key_lookup(&ctx->key, hash, now, &conn, &ctx->reply);
 
-            struct conn_lookup_ctx ctx2;
-            ctx2.conn = NULL;
-            ctx2.key = conn->rev_key;
-            ctx2.hash = conn_key_hash(&conn->rev_key, ct->hash_basis);
-
-            ct_lock_unlock(&ct->buckets[bucket].lock);
-            bucket = hash_to_bucket(ctx2.hash);
-
-            ct_lock_lock(&ct->buckets[bucket].lock);
-            conn_key_lookup(&ct->buckets[bucket], &ctx2, now);
-
-            if (ctx2.conn) {
-                conn = ctx2.conn;
-            } else {
-                /* It is a race condition where conn has timed out and removed
-                 * between unlock of the rev_conn and lock of the forward conn;
-                 * nothing to do. */
+            if (!conn) {
                 pkt->md.ct_state |= CS_TRACKED | CS_INVALID;
-                ct_lock_unlock(&ct->buckets[bucket].lock);
+                char *log_msg = xasprintf("Missing master conn %p", rev_conn);
+                ct_print_conn_info(conn, log_msg, VLL_ERR, true, false);
+                free(log_msg);
                 return;
             }
         }
     }
 
-    bool create_new_conn = false;
-    struct conn conn_for_un_nat_copy;
-    conn_for_un_nat_copy.conn_type = CT_CONN_TYPE_DEFAULT;
-
     enum ct_alg_ctl_type ct_alg_ctl = get_alg_ctl_type(pkt, tp_src, tp_dst,
                                                        helper);
 
     if (OVS_LIKELY(conn)) {
         if (OVS_LIKELY(!conn_update_state_alg(ct, pkt, ctx, conn,
                                               nat_action_info,
-                                              ct_alg_ctl, now, bucket,
+                                              ct_alg_ctl, now,
                                               &create_new_conn))) {
-            create_new_conn = conn_update_state(ct, pkt, ctx, &conn, now,
-                                                bucket);
+
+            create_new_conn = conn_update_state(ct, pkt, ctx, conn, now);
         }
         if (nat_action_info && !create_new_conn) {
             handle_nat(pkt, conn, zone, ctx->reply, ctx->icmp_related);
         }
 
-    } else if (check_orig_tuple(ct, pkt, ctx, now, &bucket, &conn,
-                               nat_action_info)) {
-        create_new_conn = conn_update_state(ct, pkt, ctx, &conn, now, bucket);
+    } else if (check_orig_tuple(ct, pkt, ctx, now, &conn, nat_action_info)) {
+        create_new_conn = conn_update_state(ct, pkt, ctx, conn, now);
     } else {
         if (ctx->icmp_related) {
             /* An icmp related conn should always be found; no new
@@ -1252,7 +1162,6 @@  process_one(struct conntrack *ct, struct dp_packet *pkt,
     struct alg_exp_node alg_exp_entry;
 
     if (OVS_UNLIKELY(create_new_conn)) {
-
         ct_rwlock_rdlock(&ct->resources_lock);
         alg_exp = expectation_lookup(&ct->alg_expectations, &ctx->key,
                                      ct->hash_basis,
@@ -1263,9 +1172,10 @@  process_one(struct conntrack *ct, struct dp_packet *pkt,
         }
         ct_rwlock_unlock(&ct->resources_lock);
 
+        ct_lock_lock(&lock);
         conn = conn_not_found(ct, pkt, ctx, commit, now, nat_action_info,
-                              &conn_for_un_nat_copy, helper, alg_exp,
-                              ct_alg_ctl);
+                              helper, alg_exp, ct_alg_ctl);
+        ct_lock_unlock(&lock);
     }
 
     write_ct_md(pkt, zone, conn, &ctx->key, alg_exp);
@@ -1278,23 +1188,12 @@  process_one(struct conntrack *ct, struct dp_packet *pkt,
         set_label(pkt, conn, &setlabel[0], &setlabel[1]);
     }
 
-    struct conn conn_for_expectation;
-    if (OVS_UNLIKELY((ct_alg_ctl != CT_ALG_CTL_NONE) && conn)) {
-        conn_for_expectation = *conn;
-    }
-
-    ct_lock_unlock(&ct->buckets[bucket].lock);
-
-    if (is_un_nat_conn_valid(&conn_for_un_nat_copy)) {
-        create_un_nat_conn(ct, &conn_for_un_nat_copy, now, !!alg_exp);
-    }
-
     handle_alg_ctl(ct, ctx, pkt, ct_alg_ctl, conn, now, !!nat_action_info,
-                   &conn_for_expectation);
+                   conn);
 }
 
 /* Sends the packets in '*pkt_batch' through the connection tracker 'ct'.  All
- * the packets should have the same 'dl_type' (IPv4 or IPv6) and should have
+ * the packets must have the same 'dl_type' (IPv4 or IPv6) and should have
  * the l3 and and l4 offset properly set.
  *
  * If 'commit' is true, the packets are allowed to create new entries in the
@@ -1310,12 +1209,12 @@  conntrack_execute(struct conntrack *ct, struct dp_packet_batch *pkt_batch,
                   const struct nat_action_info_t *nat_action_info,
                   long long now)
 {
-
     struct dp_packet *packet;
     struct conn_lookup_ctx ctx;
 
     DP_PACKET_BATCH_FOR_EACH (i, packet, pkt_batch) {
-        if (!conn_key_extract(ct, packet, dl_type, &ctx, zone)) {
+        if (packet->md.ct_state == CS_INVALID
+            || !conn_key_extract(ct, packet, dl_type, &ctx, zone)) {
             packet->md.ct_state = CS_INVALID;
             write_ct_md(packet, zone, NULL, NULL, NULL);
             continue;
@@ -1373,31 +1272,37 @@  set_label(struct dp_packet *pkt, struct conn *conn,
  * LLONG_MAX if 'ctb' is empty.  The return value might be smaller than 'now',
  * if 'limit' is reached */
 static long long
-sweep_bucket(struct conntrack *ct, struct conntrack_bucket *ctb,
-             long long now, size_t limit)
-    OVS_REQUIRES(ctb->lock)
+ct_sweep(struct conntrack *ct OVS_UNUSED, long long now, size_t limit)
 {
     struct conn *conn, *next;
     long long min_expiration = LLONG_MAX;
     size_t count = 0;
 
+    ct_lock_lock(&lock);
+
     for (unsigned i = 0; i < N_CT_TM; i++) {
-        LIST_FOR_EACH_SAFE (conn, next, exp_node, &ctb->exp_lists[i]) {
+        /* Does not need to be 'safe' anymore. */
+        LIST_FOR_EACH_SAFE (conn, next, exp_node, &cm_exp_lists[i]) {
             if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
                 if (!conn_expired(conn, now) || count >= limit) {
                     min_expiration = MIN(min_expiration, conn->expiration);
                     if (count >= limit) {
                         /* Do not check other lists. */
                         COVERAGE_INC(conntrack_long_cleanup);
-                        return min_expiration;
+                        goto out;
                     }
                     break;
                 }
-                conn_clean(ct, conn, ctb);
+                conn_clean(ct, conn);
                 count++;
             }
         }
     }
+
+out:
+    VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec", count,
+             time_msec() - now);
+    ct_lock_unlock(&lock);
     return min_expiration;
 }
 
@@ -1410,48 +1315,12 @@  conntrack_clean(struct conntrack *ct, long long now)
 {
     long long next_wakeup = now + CT_TM_MIN;
     unsigned int n_conn_limit;
-    size_t clean_count = 0;
 
     atomic_read_relaxed(&ct->n_conn_limit, &n_conn_limit);
 
-    for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) {
-        struct conntrack_bucket *ctb = &ct->buckets[i];
-        size_t prev_count;
-        long long min_exp;
-
-        ovs_mutex_lock(&ctb->cleanup_mutex);
-        if (ctb->next_cleanup > now) {
-            goto next_bucket;
-        }
-
-        ct_lock_lock(&ctb->lock);
-        prev_count = hmap_count(&ctb->connections);
-        /* If the connections are well distributed among buckets, we want to
-         * limit to 10% of the global limit equally split among buckets. If
-         * the bucket is busier than the others, we limit to 10% of its
-         * current size. */
-        min_exp = sweep_bucket(ct, ctb, now,
-                MAX(prev_count/10, n_conn_limit/(CONNTRACK_BUCKETS*10)));
-        clean_count += prev_count - hmap_count(&ctb->connections);
-
-        if (min_exp > now) {
-            /* We call hmap_shrink() only if sweep_bucket() managed to delete
-             * every expired connection. */
-            hmap_shrink(&ctb->connections);
-        }
-
-        ct_lock_unlock(&ctb->lock);
-
-        ctb->next_cleanup = MIN(min_exp, now + CT_TM_MIN);
-
-next_bucket:
-        next_wakeup = MIN(next_wakeup, ctb->next_cleanup);
-        ovs_mutex_unlock(&ctb->cleanup_mutex);
-    }
-
-    VLOG_DBG("conntrack cleanup %"PRIuSIZE" entries in %lld msec",
-             clean_count, time_msec() - now);
-
+    long long cm_min_exp =
+        ct_sweep(ct, now, MAX(cmap_count(&cm_conns) / 10, n_conn_limit / 10));
+    next_wakeup = MIN(cm_min_exp, next_wakeup);
     return next_wakeup;
 }
 
@@ -2172,7 +2041,9 @@  nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
 
     uint16_t port = first_port;
     bool all_ports_tried = false;
-    bool original_ports_tried = false;
+    /* For DNAT, we don't try ephemeral ports. */
+    bool ephemeral_ports_tried =
+        conn->nat_info->nat_action & NAT_ACTION_DST ? true : false;
     struct ct_addr first_addr = ct_addr;
 
     while (true) {
@@ -2191,8 +2062,11 @@  nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
             nat_conn->rev_key.src.port = htons(port);
         }
 
-        bool new_insert = nat_conn_keys_insert(&ct->nat_conn_keys, nat_conn,
-                                               ct->hash_basis);
+        uint32_t conn_hash = conn_key_hash(&nat_conn->rev_key,
+                                           ct->hash_basis);
+        bool new_insert = conn_available(&nat_conn->rev_key, conn_hash,
+                                         time_msec());
+
         if (new_insert) {
             return true;
         } else if (!all_ports_tried) {
@@ -2218,8 +2092,8 @@  nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
                 ct_addr = conn->nat_info->min_addr;
             }
             if (!memcmp(&ct_addr, &first_addr, sizeof ct_addr)) {
-                if (!original_ports_tried) {
-                    original_ports_tried = true;
+                if (!ephemeral_ports_tried) {
+                    ephemeral_ports_tried = true;
                     ct_addr = conn->nat_info->min_addr;
                     min_port = MIN_NAT_EPHEMERAL_PORT;
                     max_port = MAX_NAT_EPHEMERAL_PORT;
@@ -2235,94 +2109,6 @@  nat_select_range_tuple(struct conntrack *ct, const struct conn *conn,
     return false;
 }
 
-/* This function must be called with the ct->resources lock taken. */
-static struct nat_conn_key_node *
-nat_conn_keys_lookup(struct hmap *nat_conn_keys,
-                     const struct conn_key *key,
-                     uint32_t basis)
-{
-    struct nat_conn_key_node *nat_conn_key_node;
-
-    HMAP_FOR_EACH_WITH_HASH (nat_conn_key_node, node,
-                             conn_key_hash(key, basis), nat_conn_keys) {
-        if (!conn_key_cmp(&nat_conn_key_node->key, key)) {
-            return nat_conn_key_node;
-        }
-    }
-    return NULL;
-}
-
-/* This function must be called with the ct->resources lock taken. */
-static bool
-nat_conn_keys_insert(struct hmap *nat_conn_keys, const struct conn *nat_conn,
-                     uint32_t basis)
-{
-    struct nat_conn_key_node *nat_conn_key_node =
-        nat_conn_keys_lookup(nat_conn_keys, &nat_conn->rev_key, basis);
-
-    if (!nat_conn_key_node) {
-        struct nat_conn_key_node *nat_conn_key = xzalloc(sizeof *nat_conn_key);
-        nat_conn_key->key = nat_conn->rev_key;
-        nat_conn_key->value = nat_conn->key;
-        hmap_insert(nat_conn_keys, &nat_conn_key->node,
-                    conn_key_hash(&nat_conn_key->key, basis));
-        return true;
-    }
-    return false;
-}
-
-/* This function must be called with the ct->resources write lock taken. */
-static void
-nat_conn_keys_remove(struct hmap *nat_conn_keys,
-                     const struct conn_key *key,
-                     uint32_t basis)
-{
-    struct nat_conn_key_node *nat_conn_key_node;
-
-    HMAP_FOR_EACH_WITH_HASH (nat_conn_key_node, node,
-                             conn_key_hash(key, basis), nat_conn_keys) {
-        if (!conn_key_cmp(&nat_conn_key_node->key, key)) {
-            hmap_remove(nat_conn_keys, &nat_conn_key_node->node);
-            free(nat_conn_key_node);
-            return;
-        }
-    }
-}
-
-static void
-conn_key_lookup(struct conntrack_bucket *ctb, struct conn_lookup_ctx *ctx,
-                long long now)
-    OVS_REQUIRES(ctb->lock)
-{
-    uint32_t hash = ctx->hash;
-    struct conn *conn;
-
-    ctx->conn = NULL;
-
-    HMAP_FOR_EACH_WITH_HASH (conn, node, hash, &ctb->connections) {
-        if (!conn_key_cmp(&conn->key, &ctx->key)
-                && !conn_expired(conn, now)) {
-            ctx->conn = conn;
-            ctx->reply = false;
-            break;
-        }
-        if (!conn_key_cmp(&conn->rev_key, &ctx->key)
-                && !conn_expired(conn, now)) {
-            ctx->conn = conn;
-            ctx->reply = true;
-            break;
-        }
-    }
-}
-
-static enum ct_update_res
-conn_update(struct conn *conn, struct conntrack_bucket *ctb,
-            struct dp_packet *pkt, bool reply, long long now)
-{
-    return l4_protos[conn->key.nw_proto]->conn_update(conn, ctb, pkt,
-                                                      reply, now);
-}
-
 static bool
 conn_expired(struct conn *conn, long long now)
 {
@@ -2339,10 +2125,9 @@  valid_new(struct dp_packet *pkt, struct conn_key *key)
 }
 
 static struct conn *
-new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt,
-         struct conn_key *key, long long now)
+new_conn(struct dp_packet *pkt, struct conn_key *key, long long now)
 {
-    struct conn *newconn = l4_protos[key->nw_proto]->new_conn(ctb, pkt, now);
+    struct conn *newconn = l4_protos[key->nw_proto]->new_conn(pkt, now);
     if (newconn) {
         newconn->key = *key;
     }
@@ -2350,11 +2135,40 @@  new_conn(struct conntrack_bucket *ctb, struct dp_packet *pkt,
     return newconn;
 }
 
+static enum ct_update_res
+conn_update(struct dp_packet *pkt, struct conn *conn,
+            struct conn_lookup_ctx *ctx, long long now)
+{
+    ct_ce_lock_lock(&conn->lock);
+    enum ct_update_res update_res =
+        l4_protos[conn->key.nw_proto]->conn_update(conn, pkt, ctx->reply,
+                                                   now);
+    ct_ce_lock_unlock(&conn->lock);
+    return update_res;
+}
+
 static void
 delete_conn(struct conn *conn)
 {
-    free(conn->nat_info);
-    free(conn->alg);
+    if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
+        free(conn->nat_info);
+        free(conn->alg);
+        ct_ce_lock_destroy(&conn->lock);
+        if (conn->nat_conn) {
+            free(conn->nat_conn);
+        }
+        free(conn);
+    }
+}
+
+static void
+delete_conn_one(struct conn *conn)
+{
+    if (conn->conn_type == CT_CONN_TYPE_DEFAULT) {
+        free(conn->nat_info);
+        free(conn->alg);
+        ct_ce_lock_destroy(&conn->lock);
+    }
     free(conn);
 }
 
@@ -2487,46 +2301,30 @@  conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump,
     }
 
     dump->ct = ct;
-    *ptot_bkts = CONNTRACK_BUCKETS;
+    *ptot_bkts = 1; /* Need to clean up the callers. */
     return 0;
 }
 
 int
 conntrack_dump_next(struct conntrack_dump *dump, struct ct_dpif_entry *entry)
 {
-    struct conntrack *ct = dump->ct;
     long long now = time_msec();
 
-    while (dump->bucket < CONNTRACK_BUCKETS) {
-        struct hmap_node *node;
-
-        ct_lock_lock(&ct->buckets[dump->bucket].lock);
-        for (;;) {
-            struct conn *conn;
-
-            node = hmap_at_position(&ct->buckets[dump->bucket].connections,
-                                    &dump->bucket_pos);
-            if (!node) {
-                break;
-            }
-            INIT_CONTAINER(conn, node, node);
-            if ((!dump->filter_zone || conn->key.zone == dump->zone) &&
-                 (conn->conn_type != CT_CONN_TYPE_UN_NAT)) {
-                conn_to_ct_dpif_entry(conn, entry, now, dump->bucket);
-                break;
-            }
-            /* Else continue, until we find an entry in the appropriate zone
-             * or the bucket has been scanned completely. */
+    for (;;) {
+        struct cmap_node *cm_node = cmap_next_position(&cm_conns,
+                                                       &dump->cm_pos);
+        if (!cm_node) {
+            break;
         }
-        ct_lock_unlock(&ct->buckets[dump->bucket].lock);
-
-        if (!node) {
-            memset(&dump->bucket_pos, 0, sizeof dump->bucket_pos);
-            dump->bucket++;
-        } else {
+        struct conn *conn;
+        INIT_CONTAINER(conn, cm_node, cm_node);
+        if ((!dump->filter_zone || conn->key.zone == dump->zone) &&
+             (conn->conn_type != CT_CONN_TYPE_UN_NAT)) {
+            conn_to_ct_dpif_entry(conn, entry, now, 0);
             return 0;
         }
     }
+
     return EOF;
 }
 
@@ -2539,19 +2337,18 @@  conntrack_dump_done(struct conntrack_dump *dump OVS_UNUSED)
 int
 conntrack_flush(struct conntrack *ct, const uint16_t *zone)
 {
-    for (unsigned i = 0; i < CONNTRACK_BUCKETS; i++) {
-        struct conn *conn, *next;
+    struct conn *conn;
 
-        ct_lock_lock(&ct->buckets[i].lock);
-        HMAP_FOR_EACH_SAFE (conn, next, node, &ct->buckets[i].connections) {
-            if ((!zone || *zone == conn->key.zone) &&
-                (conn->conn_type == CT_CONN_TYPE_DEFAULT)) {
-                conn_clean(ct, conn, &ct->buckets[i]);
-            }
+    ct_lock_lock(&lock);
+
+    CMAP_FOR_EACH (conn, cm_node, &cm_conns) {
+        if (!zone || *zone == conn->key.zone) {
+            conn_clean_one(ct, conn);
         }
-        ct_lock_unlock(&ct->buckets[i].lock);
     }
 
+    ct_lock_unlock(&lock);
+
     return 0;
 }
 
@@ -2559,22 +2356,24 @@  int
 conntrack_flush_tuple(struct conntrack *ct, const struct ct_dpif_tuple *tuple,
                       uint16_t zone)
 {
-    struct conn_lookup_ctx ctx;
     int error = 0;
+    struct conn_lookup_ctx ctx;
 
     memset(&ctx, 0, sizeof(ctx));
     tuple_to_conn_key(tuple, zone, &ctx.key);
     ctx.hash = conn_key_hash(&ctx.key, ct->hash_basis);
-    unsigned bucket = hash_to_bucket(ctx.hash);
 
-    ct_lock_lock(&ct->buckets[bucket].lock);
-    conn_key_lookup(&ct->buckets[bucket], &ctx, time_msec());
-    if (ctx.conn) {
-        conn_clean(ct, ctx.conn, &ct->buckets[bucket]);
+    ct_lock_lock(&lock);
+    conn_key_lookup(&ctx.key, ctx.hash, time_msec(), &ctx.conn, &ctx.reply);
+
+    if (ctx.conn && ctx.conn->conn_type == CT_CONN_TYPE_DEFAULT) {
+        conn_clean(ct, ctx.conn);
     } else {
         error = ENOENT;
     }
-    ct_lock_unlock(&ct->buckets[bucket].lock);
+
+    ct_lock_unlock(&lock);
+
     return error;
 }
 
@@ -2674,17 +2473,17 @@  expectation_ref_create(struct hindex *alg_expectation_refs,
 }
 
 static void
-expectation_clean(struct conntrack *ct, const struct conn_key *master_key,
-                  uint32_t basis)
+expectation_clean(struct conntrack *ct, const struct conn_key *master_key)
 {
     ct_rwlock_wrlock(&ct->resources_lock);
 
     struct alg_exp_node *node, *next;
     HINDEX_FOR_EACH_WITH_HASH_SAFE (node, next, node_ref,
-                                    conn_key_hash(master_key, basis),
+                                    conn_key_hash(master_key, ct->hash_basis),
                                     &ct->alg_expectation_refs) {
         if (!conn_key_cmp(&node->master_key, master_key)) {
-            expectation_remove(&ct->alg_expectations, &node->key, basis);
+            expectation_remove(&ct->alg_expectations, &node->key,
+                               ct->hash_basis);
             hindex_remove(&ct->alg_expectation_refs, &node->node_ref);
             free(node);
         }
diff --git a/lib/conntrack.h b/lib/conntrack.h
index e3a5dcc..2ab9ac1 100644
--- a/lib/conntrack.h
+++ b/lib/conntrack.h
@@ -1,5 +1,5 @@ 
 /*
- * Copyright (c) 2015, 2016, 2017 Nicira, Inc.
+ * Copyright (c) 2015, 2016, 2017, 2018 Nicira, Inc.
  *
  * Licensed under the Apache License, Version 2.0 (the "License");
  * you may not use this file except in compliance with the License.
@@ -19,6 +19,7 @@ 
 
 #include <stdbool.h>
 
+#include "cmap.h"
 #include "latch.h"
 #include "odp-netlink.h"
 #include "openvswitch/hmap.h"
@@ -104,6 +105,7 @@  struct conntrack_dump {
     struct conntrack *ct;
     unsigned bucket;
     struct hmap_position bucket_pos;
+    struct cmap_position cm_pos;
     bool filter_zone;
     uint16_t zone;
 };
@@ -123,41 +125,11 @@  int conntrack_set_maxconns(struct conntrack *ct, uint32_t maxconns);
 int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns);
 int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns);
 
-/* 'struct ct_lock' is a wrapper for an adaptive mutex.  It's useful to try
- * different types of locks (e.g. spinlocks) */
-
-struct OVS_LOCKABLE ct_lock {
-    struct ovs_mutex lock;
-};
 
 struct OVS_LOCKABLE ct_rwlock {
     struct ovs_rwlock lock;
 };
 
-static inline void ct_lock_init(struct ct_lock *lock)
-{
-    ovs_mutex_init_adaptive(&lock->lock);
-}
-
-static inline void ct_lock_lock(struct ct_lock *lock)
-    OVS_ACQUIRES(lock)
-    OVS_NO_THREAD_SAFETY_ANALYSIS
-{
-    ovs_mutex_lock(&lock->lock);
-}
-
-static inline void ct_lock_unlock(struct ct_lock *lock)
-    OVS_RELEASES(lock)
-    OVS_NO_THREAD_SAFETY_ANALYSIS
-{
-    ovs_mutex_unlock(&lock->lock);
-}
-
-static inline void ct_lock_destroy(struct ct_lock *lock)
-{
-    ovs_mutex_destroy(&lock->lock);
-}
-
 static inline void ct_rwlock_init(struct ct_rwlock *lock)
 {
     ovs_rwlock_init(&lock->lock);
@@ -222,42 +194,7 @@  enum ct_timeout {
     N_CT_TM
 };
 
-/* Locking:
- *
- * The connections are kept in different buckets, which are completely
- * independent. The connection bucket is determined by the hash of its key.
- *
- * Each bucket has two locks. Acquisition order is, from outermost to
- * innermost:
- *
- *    cleanup_mutex
- *    lock
- *
- * */
-struct conntrack_bucket {
-    /* Protects 'connections' and 'exp_lists'.  Used in the fast path */
-    struct ct_lock lock;
-    /* Contains the connections in the bucket, indexed by 'struct conn_key' */
-    struct hmap connections OVS_GUARDED;
-    /* For each possible timeout we have a list of connections. When the
-     * timeout of a connection is updated, we move it to the back of the list.
-     * Since the connection in a list have the same relative timeout, the list
-     * will be ordered, with the oldest connections to the front. */
-    struct ovs_list exp_lists[N_CT_TM] OVS_GUARDED;
-
-    /* Protects 'next_cleanup'. Used to make sure that there's only one thread
-     * performing the cleanup. */
-    struct ovs_mutex cleanup_mutex;
-    long long next_cleanup OVS_GUARDED;
-};
-
-#define CONNTRACK_BUCKETS_SHIFT 8
-#define CONNTRACK_BUCKETS (1 << CONNTRACK_BUCKETS_SHIFT)
-
 struct conntrack {
-    /* Independent buckets containing the connections */
-    struct conntrack_bucket buckets[CONNTRACK_BUCKETS];
-
     /* Salt for hashing a connection key. */
     uint32_t hash_basis;
 
@@ -273,24 +210,17 @@  struct conntrack {
      * will be accepted. */
     atomic_uint n_conn_limit;
 
-    /* The following resources are referenced during nat connection
-     * creation and deletion. */
-    struct hmap nat_conn_keys OVS_GUARDED;
     /* Hash table for alg expectations. Expectations are created
      * by control connections to help create data connections. */
     struct hmap alg_expectations OVS_GUARDED;
-    /* Used to lookup alg expectations from the control context. */
+    /* Only needed to be able to cleanup expectations from non-control
+     * connection context; otherwise a pointer to the expectation from
+     * the control connection would suffice. */
     struct hindex alg_expectation_refs OVS_GUARDED;
     /* Expiry list for alg expectations. */
     struct ovs_list alg_exp_list OVS_GUARDED;
-    /* This lock is used during NAT connection creation and deletion;
-     * it is taken after a bucket lock and given back before that
-     * bucket unlock.
-     * This lock is similarly used to guard alg_expectations and
-     * alg_expectation_refs. If a bucket lock is also held during
-     * the normal code flow, then is must be taken first and released
-     * last.
-     */
+    /* This lock is used to guard alg_expectations and
+     * alg_expectation_refs. */
     struct ct_rwlock resources_lock;
 
 };