[ovs-dev] About a race condition between handle_upcalls and revalidate

Message ID 40322c88-ed71-8d1c-9a21-8702ce837c71@mellanox.com
State Not Applicable
Headers show

Commit Message

Paul Blakey March 19, 2017, 3:10 p.m.
On 19/03/2017 16:28, Paul Blakey wrote:
> Hi all,
>
> While using out patches for HW offload we've noticed we get a ovs
> assertion at transition ukey, which tries to
> transition the ukey state from EVICTED back to OPERATIONAL.
> With furthur investigation it seem that this can happen without our HW
> offload patches as there might be a race between handle_upcalls and
> revalidate.
>
> The procedure is as such:
>
> handle_upcalls receives a new upcall and creates a new ukey, its state
> is VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
> that the handler running handle_upcalls wants to reaquirce the ukey
> lock, but in the meantime revalidators dump (line 2261) the already
> inserted flow and want to delete this flow (line 2328, say because of
> openflow db changes, or aging). The revalidator deletes the flow and
> moves the ukey from
> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED
> (line 2134)
>
> finally handler succesfuly acquires the flow and now set the state to
> OPERERTIONAL which will cause the assert in transition_ukey.
>
> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>
> I can provide a test the could show this happening, basicly adding a
> sleep before (writing it now).
>
> Thanks,
> Paul Blakey.
>
>
>


Here's a test:
1) Install the below patch
2) Set max-idle to 500ms, n-handlers to 1, revalidators to 10
3) Create a simple setup (like veth pair in OVS_TRAFFIC_VSWITCHD_START 
traffic testsuite)
4) Ping one of the machines/veths slowly and wait till you get to 
sleeping part and stop (put: 24 print)


Here is my log:
.......
2017-03-19T14:48:18.609Z|00121|ofproto_dpif_upcall(handler12)|INFO|ukey_install__ 
1696 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 0 to VISIBLE
2017-03-19T14:48:18.609Z|00122|ofproto_dpif_upcall(handler12)|INFO|state 
transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 0 -> 1
2017-03-19T14:48:18.609Z|00123|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 1/20 sec 1
2017-03-19T14:48:18.832Z|00001|ofproto_dpif_upcall(revalidator15)|INFO|revalidate 
2376 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 1 to 
OPERATIONAL
2017-03-19T14:48:18.832Z|00002|ofproto_dpif_upcall(revalidator15)|INFO|state 
transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 1 -> 2
2017-03-19T14:48:18.832Z|00003|ofproto_dpif_upcall(revalidator15)|INFO|reval_op_init 
2272 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 2 to 
EVICITING
2017-03-19T14:48:18.832Z|00004|ofproto_dpif_upcall(revalidator15)|INFO|state 
transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 2 -> 3
2017-03-19T14:48:18.832Z|00005|ofproto_dpif_upcall(revalidator15)|INFO|push_dp_ops 
2182 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 3 to EVICTED
2017-03-19T14:48:18.832Z|00006|ofproto_dpif_upcall(revalidator15)|INFO|state 
transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 3 -> 4
2017-03-19T14:48:19.609Z|00124|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 2/20 sec 4
2017-03-19T14:48:20.610Z|00125|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 3/20 sec 4
2017-03-19T14:48:21.610Z|00126|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 4/20 sec 4
2017-03-19T14:48:22.610Z|00127|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 5/20 sec 4
2017-03-19T14:48:23.610Z|00128|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 6/20 sec 4
2017-03-19T14:48:24.610Z|00129|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 7/20 sec 4
2017-03-19T14:48:25.611Z|00130|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 8/20 sec 4
2017-03-19T14:48:26.611Z|00131|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 9/20 sec 4
2017-03-19T14:48:27.611Z|00132|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 10/20 sec 4
2017-03-19T14:48:28.611Z|00133|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 11/20 sec 4
2017-03-19T14:48:29.611Z|00134|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 12/20 sec 4
2017-03-19T14:48:30.611Z|00135|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 13/20 sec 4
2017-03-19T14:48:31.611Z|00136|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 14/20 sec 4
2017-03-19T14:48:32.612Z|00137|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 15/20 sec 4
2017-03-19T14:48:33.612Z|00138|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 16/20 sec 4
2017-03-19T14:48:34.612Z|00139|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 17/20 sec 4
2017-03-19T14:48:35.612Z|00140|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 18/20 sec 4
2017-03-19T14:48:36.612Z|00141|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 19/20 sec 4
2017-03-19T14:48:37.612Z|00142|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 20/20 sec 4
2017-03-19T14:48:38.613Z|00143|ofproto_dpif_upcall(handler12)|INFO|put: 25
2017-03-19T14:48:38.613Z|00144|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls 
1434 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 4 to 
OPERATIONAL
2017-03-19T14:48:38.613Z|00145|ofproto_dpif_upcall(handler12)|INFO|state 
transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 4 -> 2
2017-03-19T14:48:38.613Z|00146|util(handler12)|EMER|ofproto/ofproto-dpif-upcall.c:1716: 
assertion dst >= ukey->state failed in transition_ukey()




commit 4755254db3053bcf7ceac88fedce60fd228c8640 (HEAD)
Author: Paul Blakey <paulb@mellanox.com>
Date:   Sun Mar 19 17:00:30 2017 +0200

     test
     add prints and forcefully slow down some actions (EVICTED->DELETED, 
and VISIBLE -> OPERATIONAL)


+            ds_destroy(&ds);
          }
      }
  }
@@ -1659,6 +1680,10 @@ ukey_install__(struct udpif *udpif, struct 
udpif_key *new_ukey)
      } else {
          ovs_mutex_lock(&new_ukey->mutex);
          cmap_insert(&umap->cmap, &new_ukey->cmap_node, new_ukey->hash);
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        odp_format_ufid(&new_ukey->ufid, &ds);
+        VLOG_INFO("%s %d %s transition from %d to VISIBLE",  __func__, 
__LINE__, ds_cstr(&ds), new_ukey->state);
+        ds_destroy(&ds);
          transition_ukey(new_ukey, UKEY_VISIBLE);
          locked = true;
      }
@@ -1671,6 +1696,12 @@ static void
  transition_ukey(struct udpif_key *ukey, enum ukey_state dst)
      OVS_REQUIRES(ukey->mutex)
  {
+    struct ds ds = DS_EMPTY_INITIALIZER;
+
+    odp_format_ufid(&ukey->ufid, &ds);
+    VLOG_INFO("state transition for ukey %s: %d -> %d", ds_cstr(&ds), 
ukey->state, dst);
+    ds_destroy(&ds);
+
      ovs_assert(dst >= ukey->state);
      if (ukey->state == dst && dst == UKEY_OPERATIONAL) {
          return;
@@ -1699,12 +1730,12 @@ transition_ukey(struct udpif_key *ukey, enum 
ukey_state dst)
                                     dst < UKEY_DELETED)) {
          ukey->state = dst;
      } else {
-        struct ds ds = DS_EMPTY_INITIALIZER;
+        struct ds ds2 = DS_EMPTY_INITIALIZER;

-        odp_format_ufid(&ukey->ufid, &ds);
+        odp_format_ufid(&ukey->ufid, &ds2);
          VLOG_WARN_RL(&rl, "Invalid state transition for ukey %s: %d -> 
%d",
-                     ds_cstr(&ds), ukey->state, dst);
-        ds_destroy(&ds);
+                     ds_cstr(&ds2), ukey->state, dst);
+        ds_destroy(&ds2);
      }
  }

@@ -1797,6 +1828,10 @@ ukey_delete(struct umap *umap, struct udpif_key 
*ukey)
      if (ukey->state < UKEY_DELETED) {
          cmap_remove(&umap->cmap, &ukey->cmap_node, ukey->hash);
          ovsrcu_postpone(ukey_delete__, ukey);
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        odp_format_ufid(&ukey->ufid, &ds);
+        VLOG_INFO("%s %d %s transition from %d to DELETED",  __func__, 
__LINE__, ds_cstr(&ds), ukey->state);
+        ds_destroy(&ds);
          transition_ukey(ukey, UKEY_DELETED);
      }
      ovs_mutex_unlock(&ukey->mutex);
@@ -2131,6 +2166,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op 
*ops, size_t n_ops)

          if (op->ukey) {
              ovs_mutex_lock(&op->ukey->mutex);
+            struct ds ds = DS_EMPTY_INITIALIZER;
+            odp_format_ufid(&op->ukey->ufid, &ds);
+            VLOG_INFO("%s %d %s transition from %d to EVICTED", 
__func__, __LINE__, ds_cstr(&ds), op->ukey->state);
+            ds_destroy(&ds);
              transition_ukey(op->ukey, UKEY_EVICTED);
              push->used = MAX(stats->used, op->ukey->stats.used);
              push->tcp_flags = stats->tcp_flags | 
op->ukey->stats.tcp_flags;
@@ -2217,6 +2256,10 @@ reval_op_init(struct ukey_op *op, enum 
reval_result result,
  {
      if (result == UKEY_DELETE) {
          delete_op_init(udpif, op, ukey);
+        struct ds ds = DS_EMPTY_INITIALIZER;
+        odp_format_ufid(&ukey->ufid, &ds);
+        VLOG_INFO("%s %d %s transition from %d to EVICITING", 
__func__, __LINE__, ds_cstr(&ds), ukey->state);
+        ds_destroy(&ds);
          transition_ukey(ukey, UKEY_EVICTING);
      } else if (result == UKEY_MODIFY) {
          /* Store the new recircs. */
@@ -2317,6 +2360,10 @@ revalidate(struct revalidator *revalidator)
              }

              /* The flow is now confirmed to be in the datapath. */
+            struct ds ds = DS_EMPTY_INITIALIZER;
+            odp_format_ufid(&ukey->ufid, &ds);
+            VLOG_INFO("%s %d %s transition from %d to OPERATIONAL", 
__func__, __LINE__, ds_cstr(&ds), ukey->state);
+            ds_destroy(&ds);
              transition_ukey(ukey, UKEY_OPERATIONAL);

              if (!used) {
@@ -2417,6 +2464,11 @@ revalidator_sweep__(struct revalidator 
*revalidator, bool purge)
              ovs_mutex_unlock(&ukey->mutex);

              if (ukey_state == UKEY_EVICTED) {
+
+                while (global_temp) {
+                    xsleep(1);
+                }
+
                  /* The common flow deletion case involves deletion of 
the flow
                   * during the dump phase and ukey deletion here. */
                  ovs_mutex_lock(&umap->mutex);
@@ -2471,6 +2523,10 @@ dp_purge_cb(void *aux, unsigned pmd_id)
          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
              if (ukey->pmd_id == pmd_id) {
                  delete_op_init(udpif, &ops[n_ops++], ukey);
+                struct ds ds = DS_EMPTY_INITIALIZER;
+                odp_format_ufid(&ukey->ufid, &ds);
+                VLOG_INFO("%s %d %s transition from %d to EVICITING", 
__func__, __LINE__, ds_cstr(&ds), ukey->state);
+                ds_destroy(&ds);
                  transition_ukey(ukey, UKEY_EVICTING);

                  if (n_ops == REVALIDATE_MAX_BATCH) {

Comments

Paul Blakey March 19, 2017, 3:44 p.m. | #1
On 19/03/2017 17:10, Paul Blakey wrote:
>> Hi all,
>>
>> While using out patches for HW offload we've noticed we get a ovs
>> assertion at transition ukey, which tries to
>> transition the ukey state from EVICTED back to OPERATIONAL.
>> With furthur investigation it seem that this can happen without our HW
>> offload patches as there might be a race between handle_upcalls and
>> revalidate.
>>
>> The procedure is as such:
>>
>> handle_upcalls receives a new upcall and creates a new ukey, its state
>> is VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
>> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
>> that the handler running handle_upcalls wants to reaquirce the ukey
>> lock, but in the meantime revalidators dump (line 2261) the already
>> inserted flow and want to delete this flow (line 2328, say because of
>> openflow db changes, or aging). The revalidator deletes the flow and
>> moves the ukey from
>> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED
>> (line 2134)
>>
>> finally handler succesfuly acquires the flow and now set the state to
>> OPERERTIONAL which will cause the assert in transition_ukey.
>>
>> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>>
>> I can provide a test the could show this happening, basicly adding a
>> sleep before (writing it now).
>>
>> Thanks,
>> Paul Blakey.
>>
>>
>>
>
>
> Here's a test:
> 1) Install the below patch
> 2) Set max-idle to 500ms, n-handlers to 1, revalidators to 10
> 3) Create a simple setup (like veth pair in OVS_TRAFFIC_VSWITCHD_START
> traffic testsuite)
> 4) Ping one of the machines/veths slowly and wait till you get to
> sleeping part and stop (put: 24 print)
>
>
> Here is my log:
> .......
> 2017-03-19T14:48:18.609Z|00121|ofproto_dpif_upcall(handler12)|INFO|ukey_install__
> 1696 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 0 to VISIBLE
> 2017-03-19T14:48:18.609Z|00122|ofproto_dpif_upcall(handler12)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 0 -> 1
> 2017-03-19T14:48:18.609Z|00123|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 1/20 sec 1
> 2017-03-19T14:48:18.832Z|00001|ofproto_dpif_upcall(revalidator15)|INFO|revalidate
> 2376 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 1 to
> OPERATIONAL
> 2017-03-19T14:48:18.832Z|00002|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 1 -> 2
> 2017-03-19T14:48:18.832Z|00003|ofproto_dpif_upcall(revalidator15)|INFO|reval_op_init
> 2272 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 2 to
> EVICITING
> 2017-03-19T14:48:18.832Z|00004|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 2 -> 3
> 2017-03-19T14:48:18.832Z|00005|ofproto_dpif_upcall(revalidator15)|INFO|push_dp_ops
> 2182 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 3 to EVICTED
> 2017-03-19T14:48:18.832Z|00006|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 3 -> 4
> 2017-03-19T14:48:19.609Z|00124|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 2/20 sec 4
> 2017-03-19T14:48:20.610Z|00125|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 3/20 sec 4
> 2017-03-19T14:48:21.610Z|00126|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 4/20 sec 4
> 2017-03-19T14:48:22.610Z|00127|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 5/20 sec 4
> 2017-03-19T14:48:23.610Z|00128|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 6/20 sec 4
> 2017-03-19T14:48:24.610Z|00129|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 7/20 sec 4
> 2017-03-19T14:48:25.611Z|00130|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 8/20 sec 4
> 2017-03-19T14:48:26.611Z|00131|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 9/20 sec 4
> 2017-03-19T14:48:27.611Z|00132|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 10/20 sec 4
> 2017-03-19T14:48:28.611Z|00133|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 11/20 sec 4
> 2017-03-19T14:48:29.611Z|00134|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 12/20 sec 4
> 2017-03-19T14:48:30.611Z|00135|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 13/20 sec 4
> 2017-03-19T14:48:31.611Z|00136|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 14/20 sec 4
> 2017-03-19T14:48:32.612Z|00137|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 15/20 sec 4
> 2017-03-19T14:48:33.612Z|00138|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 16/20 sec 4
> 2017-03-19T14:48:34.612Z|00139|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 17/20 sec 4
> 2017-03-19T14:48:35.612Z|00140|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 18/20 sec 4
> 2017-03-19T14:48:36.612Z|00141|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 19/20 sec 4
> 2017-03-19T14:48:37.612Z|00142|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 20/20 sec 4
> 2017-03-19T14:48:38.613Z|00143|ofproto_dpif_upcall(handler12)|INFO|put: 25
> 2017-03-19T14:48:38.613Z|00144|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1434 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 4 to
> OPERATIONAL
> 2017-03-19T14:48:38.613Z|00145|ofproto_dpif_upcall(handler12)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 4 -> 2
> 2017-03-19T14:48:38.613Z|00146|util(handler12)|EMER|ofproto/ofproto-dpif-upcall.c:1716:
> assertion dst >= ukey->state failed in transition_ukey()
>
>
>
>
> commit 4755254db3053bcf7ceac88fedce60fd228c8640 (HEAD)
> Author: Paul Blakey <paulb@mellanox.com>
> Date:   Sun Mar 19 17:00:30 2017 +0200
>
>     test
>     add prints and forcefully slow down some actions (EVICTED->DELETED,
> and VISIBLE -> OPERATIONAL)
>
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 660383f..0dfffdb 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -45,6 +45,8 @@
>  #define UPCALL_MAX_BATCH 64
>  #define REVALIDATE_MAX_BATCH 50
>
> +int global_temp = 0;
> +
>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>
>  COVERAGE_DEFINE(dumped_duplicate_flow);
> @@ -1399,15 +1401,34 @@ handle_upcalls(struct udpif *udpif, struct
> upcall *upcalls,
>      dpif_operate(udpif->dpif, opsp, n_opsp);
>      for (i = 0; i < n_ops; i++) {
>          struct udpif_key *ukey = ops[i].ukey;
> +        static int i = 0;
>
>          if (ukey) {
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&ukey->ufid, &ds);
> +
> +            if (i++%25 == 25-1) {
> +                global_temp = 1;
> +                for (int z = 0; z < 20; z++) {
> +                    VLOG_INFO("%s %d %s sleeping %d/%d sec %d",
> __func__, __LINE__, ds_cstr(&ds), z+1, 20, ukey->state);
> +                    xsleep(1);
> +                }
> +            }
> +            VLOG_INFO("put: %d", i);
> +
>              ovs_mutex_lock(&ukey->mutex);
>              if (ops[i].dop.error) {
> +                VLOG_INFO("%s %d %s transition from %d to EVICTED",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>                  transition_ukey(ukey, UKEY_EVICTED);
>              } else {
> +                VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>  __func__, __LINE__, ds_cstr(&ds), ukey->state);
>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>              }
>              ovs_mutex_unlock(&ukey->mutex);
> +
> +            global_temp = 0;
> +
> +            ds_destroy(&ds);
>          }
>      }
>  }
> @@ -1659,6 +1680,10 @@ ukey_install__(struct udpif *udpif, struct
> udpif_key *new_ukey)
>      } else {
>          ovs_mutex_lock(&new_ukey->mutex);
>          cmap_insert(&umap->cmap, &new_ukey->cmap_node, new_ukey->hash);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&new_ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to VISIBLE",  __func__,
> __LINE__, ds_cstr(&ds), new_ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(new_ukey, UKEY_VISIBLE);
>          locked = true;
>      }
> @@ -1671,6 +1696,12 @@ static void
>  transition_ukey(struct udpif_key *ukey, enum ukey_state dst)
>      OVS_REQUIRES(ukey->mutex)
>  {
> +    struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +    odp_format_ufid(&ukey->ufid, &ds);
> +    VLOG_INFO("state transition for ukey %s: %d -> %d", ds_cstr(&ds),
> ukey->state, dst);
> +    ds_destroy(&ds);
> +
>      ovs_assert(dst >= ukey->state);
>      if (ukey->state == dst && dst == UKEY_OPERATIONAL) {
>          return;
> @@ -1699,12 +1730,12 @@ transition_ukey(struct udpif_key *ukey, enum
> ukey_state dst)
>                                     dst < UKEY_DELETED)) {
>          ukey->state = dst;
>      } else {
> -        struct ds ds = DS_EMPTY_INITIALIZER;
> +        struct ds ds2 = DS_EMPTY_INITIALIZER;
>
> -        odp_format_ufid(&ukey->ufid, &ds);
> +        odp_format_ufid(&ukey->ufid, &ds2);
>          VLOG_WARN_RL(&rl, "Invalid state transition for ukey %s: %d ->
> %d",
> -                     ds_cstr(&ds), ukey->state, dst);
> -        ds_destroy(&ds);
> +                     ds_cstr(&ds2), ukey->state, dst);
> +        ds_destroy(&ds2);
>      }
>  }
>
> @@ -1797,6 +1828,10 @@ ukey_delete(struct umap *umap, struct udpif_key
> *ukey)
>      if (ukey->state < UKEY_DELETED) {
>          cmap_remove(&umap->cmap, &ukey->cmap_node, ukey->hash);
>          ovsrcu_postpone(ukey_delete__, ukey);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to DELETED",  __func__,
> __LINE__, ds_cstr(&ds), ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(ukey, UKEY_DELETED);
>      }
>      ovs_mutex_unlock(&ukey->mutex);
> @@ -2131,6 +2166,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> *ops, size_t n_ops)
>
>          if (op->ukey) {
>              ovs_mutex_lock(&op->ukey->mutex);
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&op->ukey->ufid, &ds);
> +            VLOG_INFO("%s %d %s transition from %d to EVICTED",
> __func__, __LINE__, ds_cstr(&ds), op->ukey->state);
> +            ds_destroy(&ds);
>              transition_ukey(op->ukey, UKEY_EVICTED);
>              push->used = MAX(stats->used, op->ukey->stats.used);
>              push->tcp_flags = stats->tcp_flags |
> op->ukey->stats.tcp_flags;
> @@ -2217,6 +2256,10 @@ reval_op_init(struct ukey_op *op, enum
> reval_result result,
>  {
>      if (result == UKEY_DELETE) {
>          delete_op_init(udpif, op, ukey);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to EVICITING", __func__,
> __LINE__, ds_cstr(&ds), ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(ukey, UKEY_EVICTING);
>      } else if (result == UKEY_MODIFY) {
>          /* Store the new recircs. */
> @@ -2317,6 +2360,10 @@ revalidate(struct revalidator *revalidator)
>              }
>
>              /* The flow is now confirmed to be in the datapath. */
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&ukey->ufid, &ds);
> +            VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
> +            ds_destroy(&ds);
>              transition_ukey(ukey, UKEY_OPERATIONAL);
>
>              if (!used) {
> @@ -2417,6 +2464,11 @@ revalidator_sweep__(struct revalidator
> *revalidator, bool purge)
>              ovs_mutex_unlock(&ukey->mutex);
>
>              if (ukey_state == UKEY_EVICTED) {
> +
> +                while (global_temp) {
> +                    xsleep(1);
> +                }
> +
>                  /* The common flow deletion case involves deletion of
> the flow
>                   * during the dump phase and ukey deletion here. */
>                  ovs_mutex_lock(&umap->mutex);
> @@ -2471,6 +2523,10 @@ dp_purge_cb(void *aux, unsigned pmd_id)
>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>              if (ukey->pmd_id == pmd_id) {
>                  delete_op_init(udpif, &ops[n_ops++], ukey);
> +                struct ds ds = DS_EMPTY_INITIALIZER;
> +                odp_format_ufid(&ukey->ufid, &ds);
> +                VLOG_INFO("%s %d %s transition from %d to EVICITING",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
> +                ds_destroy(&ds);
>                  transition_ukey(ukey, UKEY_EVICTING);
>
>                  if (n_ops == REVALIDATE_MAX_BATCH) {


What I also notice happening without the second xsleep (in 
revalidator_sweep__) is that the ukey gets deleted entirely and handler 
is stuck on acquiring the non existent lock.
Paul Blakey March 20, 2017, 12:58 p.m. | #2
On 19/03/2017 17:10, Paul Blakey wrote:
>
>
> On 19/03/2017 16:28, Paul Blakey wrote:
>> Hi all,
>>
>> While using out patches for HW offload we've noticed we get a ovs
>> assertion at transition ukey, which tries to
>> transition the ukey state from EVICTED back to OPERATIONAL.
>> With furthur investigation it seem that this can happen without our HW
>> offload patches as there might be a race between handle_upcalls and
>> revalidate.
>>
>> The procedure is as such:
>>
>> handle_upcalls receives a new upcall and creates a new ukey, its state
>> is VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
>> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
>> that the handler running handle_upcalls wants to reaquirce the ukey
>> lock, but in the meantime revalidators dump (line 2261) the already
>> inserted flow and want to delete this flow (line 2328, say because of
>> openflow db changes, or aging). The revalidator deletes the flow and
>> moves the ukey from
>> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED
>> (line 2134)
>>
>> finally handler succesfuly acquires the flow and now set the state to
>> OPERERTIONAL which will cause the assert in transition_ukey.
>>
>> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>>
>> I can provide a test the could show this happening, basicly adding a
>> sleep before (writing it now).
>>
>> Thanks,
>> Paul Blakey.
>>
>>
>>
>
>
> Here's a test:
> 1) Install the below patch
> 2) Set max-idle to 500ms, n-handlers to 1, revalidators to 10
> 3) Create a simple setup (like veth pair in OVS_TRAFFIC_VSWITCHD_START
> traffic testsuite)
> 4) Ping one of the machines/veths slowly and wait till you get to
> sleeping part and stop (put: 24 print)
>
>
> Here is my log:
> .......
> 2017-03-19T14:48:18.609Z|00121|ofproto_dpif_upcall(handler12)|INFO|ukey_install__
> 1696 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 0 to VISIBLE
> 2017-03-19T14:48:18.609Z|00122|ofproto_dpif_upcall(handler12)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 0 -> 1
> 2017-03-19T14:48:18.609Z|00123|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 1/20 sec 1
> 2017-03-19T14:48:18.832Z|00001|ofproto_dpif_upcall(revalidator15)|INFO|revalidate
> 2376 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 1 to
> OPERATIONAL
> 2017-03-19T14:48:18.832Z|00002|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 1 -> 2
> 2017-03-19T14:48:18.832Z|00003|ofproto_dpif_upcall(revalidator15)|INFO|reval_op_init
> 2272 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 2 to
> EVICITING
> 2017-03-19T14:48:18.832Z|00004|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 2 -> 3
> 2017-03-19T14:48:18.832Z|00005|ofproto_dpif_upcall(revalidator15)|INFO|push_dp_ops
> 2182 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 3 to EVICTED
> 2017-03-19T14:48:18.832Z|00006|ofproto_dpif_upcall(revalidator15)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 3 -> 4
> 2017-03-19T14:48:19.609Z|00124|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 2/20 sec 4
> 2017-03-19T14:48:20.610Z|00125|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 3/20 sec 4
> 2017-03-19T14:48:21.610Z|00126|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 4/20 sec 4
> 2017-03-19T14:48:22.610Z|00127|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 5/20 sec 4
> 2017-03-19T14:48:23.610Z|00128|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 6/20 sec 4
> 2017-03-19T14:48:24.610Z|00129|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 7/20 sec 4
> 2017-03-19T14:48:25.611Z|00130|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 8/20 sec 4
> 2017-03-19T14:48:26.611Z|00131|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 9/20 sec 4
> 2017-03-19T14:48:27.611Z|00132|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 10/20 sec 4
> 2017-03-19T14:48:28.611Z|00133|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 11/20 sec 4
> 2017-03-19T14:48:29.611Z|00134|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 12/20 sec 4
> 2017-03-19T14:48:30.611Z|00135|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 13/20 sec 4
> 2017-03-19T14:48:31.611Z|00136|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 14/20 sec 4
> 2017-03-19T14:48:32.612Z|00137|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 15/20 sec 4
> 2017-03-19T14:48:33.612Z|00138|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 16/20 sec 4
> 2017-03-19T14:48:34.612Z|00139|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 17/20 sec 4
> 2017-03-19T14:48:35.612Z|00140|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 18/20 sec 4
> 2017-03-19T14:48:36.612Z|00141|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 19/20 sec 4
> 2017-03-19T14:48:37.612Z|00142|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 20/20 sec 4
> 2017-03-19T14:48:38.613Z|00143|ofproto_dpif_upcall(handler12)|INFO|put: 25
> 2017-03-19T14:48:38.613Z|00144|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
> 1434 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 4 to
> OPERATIONAL
> 2017-03-19T14:48:38.613Z|00145|ofproto_dpif_upcall(handler12)|INFO|state
> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 4 -> 2
> 2017-03-19T14:48:38.613Z|00146|util(handler12)|EMER|ofproto/ofproto-dpif-upcall.c:1716:
> assertion dst >= ukey->state failed in transition_ukey()
>
>
>
>
> commit 4755254db3053bcf7ceac88fedce60fd228c8640 (HEAD)
> Author: Paul Blakey <paulb@mellanox.com>
> Date:   Sun Mar 19 17:00:30 2017 +0200
>
>     test
>     add prints and forcefully slow down some actions (EVICTED->DELETED,
> and VISIBLE -> OPERATIONAL)
>
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 660383f..0dfffdb 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -45,6 +45,8 @@
>  #define UPCALL_MAX_BATCH 64
>  #define REVALIDATE_MAX_BATCH 50
>
> +int global_temp = 0;
> +
>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>
>  COVERAGE_DEFINE(dumped_duplicate_flow);
> @@ -1399,15 +1401,34 @@ handle_upcalls(struct udpif *udpif, struct
> upcall *upcalls,
>      dpif_operate(udpif->dpif, opsp, n_opsp);
>      for (i = 0; i < n_ops; i++) {
>          struct udpif_key *ukey = ops[i].ukey;
> +        static int i = 0;
>
>          if (ukey) {
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&ukey->ufid, &ds);
> +
> +            if (i++%25 == 25-1) {
> +                global_temp = 1;
> +                for (int z = 0; z < 20; z++) {
> +                    VLOG_INFO("%s %d %s sleeping %d/%d sec %d",
> __func__, __LINE__, ds_cstr(&ds), z+1, 20, ukey->state);
> +                    xsleep(1);
> +                }
> +            }
> +            VLOG_INFO("put: %d", i);
> +
>              ovs_mutex_lock(&ukey->mutex);
>              if (ops[i].dop.error) {
> +                VLOG_INFO("%s %d %s transition from %d to EVICTED",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>                  transition_ukey(ukey, UKEY_EVICTED);
>              } else {
> +                VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>  __func__, __LINE__, ds_cstr(&ds), ukey->state);
>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>              }
>              ovs_mutex_unlock(&ukey->mutex);
> +
> +            global_temp = 0;
> +
> +            ds_destroy(&ds);
>          }
>      }
>  }
> @@ -1659,6 +1680,10 @@ ukey_install__(struct udpif *udpif, struct
> udpif_key *new_ukey)
>      } else {
>          ovs_mutex_lock(&new_ukey->mutex);
>          cmap_insert(&umap->cmap, &new_ukey->cmap_node, new_ukey->hash);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&new_ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to VISIBLE",  __func__,
> __LINE__, ds_cstr(&ds), new_ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(new_ukey, UKEY_VISIBLE);
>          locked = true;
>      }
> @@ -1671,6 +1696,12 @@ static void
>  transition_ukey(struct udpif_key *ukey, enum ukey_state dst)
>      OVS_REQUIRES(ukey->mutex)
>  {
> +    struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +    odp_format_ufid(&ukey->ufid, &ds);
> +    VLOG_INFO("state transition for ukey %s: %d -> %d", ds_cstr(&ds),
> ukey->state, dst);
> +    ds_destroy(&ds);
> +
>      ovs_assert(dst >= ukey->state);
>      if (ukey->state == dst && dst == UKEY_OPERATIONAL) {
>          return;
> @@ -1699,12 +1730,12 @@ transition_ukey(struct udpif_key *ukey, enum
> ukey_state dst)
>                                     dst < UKEY_DELETED)) {
>          ukey->state = dst;
>      } else {
> -        struct ds ds = DS_EMPTY_INITIALIZER;
> +        struct ds ds2 = DS_EMPTY_INITIALIZER;
>
> -        odp_format_ufid(&ukey->ufid, &ds);
> +        odp_format_ufid(&ukey->ufid, &ds2);
>          VLOG_WARN_RL(&rl, "Invalid state transition for ukey %s: %d ->
> %d",
> -                     ds_cstr(&ds), ukey->state, dst);
> -        ds_destroy(&ds);
> +                     ds_cstr(&ds2), ukey->state, dst);
> +        ds_destroy(&ds2);
>      }
>  }
>
> @@ -1797,6 +1828,10 @@ ukey_delete(struct umap *umap, struct udpif_key
> *ukey)
>      if (ukey->state < UKEY_DELETED) {
>          cmap_remove(&umap->cmap, &ukey->cmap_node, ukey->hash);
>          ovsrcu_postpone(ukey_delete__, ukey);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to DELETED",  __func__,
> __LINE__, ds_cstr(&ds), ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(ukey, UKEY_DELETED);
>      }
>      ovs_mutex_unlock(&ukey->mutex);
> @@ -2131,6 +2166,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> *ops, size_t n_ops)
>
>          if (op->ukey) {
>              ovs_mutex_lock(&op->ukey->mutex);
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&op->ukey->ufid, &ds);
> +            VLOG_INFO("%s %d %s transition from %d to EVICTED",
> __func__, __LINE__, ds_cstr(&ds), op->ukey->state);
> +            ds_destroy(&ds);
>              transition_ukey(op->ukey, UKEY_EVICTED);
>              push->used = MAX(stats->used, op->ukey->stats.used);
>              push->tcp_flags = stats->tcp_flags |
> op->ukey->stats.tcp_flags;
> @@ -2217,6 +2256,10 @@ reval_op_init(struct ukey_op *op, enum
> reval_result result,
>  {
>      if (result == UKEY_DELETE) {
>          delete_op_init(udpif, op, ukey);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +        odp_format_ufid(&ukey->ufid, &ds);
> +        VLOG_INFO("%s %d %s transition from %d to EVICITING", __func__,
> __LINE__, ds_cstr(&ds), ukey->state);
> +        ds_destroy(&ds);
>          transition_ukey(ukey, UKEY_EVICTING);
>      } else if (result == UKEY_MODIFY) {
>          /* Store the new recircs. */
> @@ -2317,6 +2360,10 @@ revalidate(struct revalidator *revalidator)
>              }
>
>              /* The flow is now confirmed to be in the datapath. */
> +            struct ds ds = DS_EMPTY_INITIALIZER;
> +            odp_format_ufid(&ukey->ufid, &ds);
> +            VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
> +            ds_destroy(&ds);
>              transition_ukey(ukey, UKEY_OPERATIONAL);
>
>              if (!used) {
> @@ -2417,6 +2464,11 @@ revalidator_sweep__(struct revalidator
> *revalidator, bool purge)
>              ovs_mutex_unlock(&ukey->mutex);
>
>              if (ukey_state == UKEY_EVICTED) {
> +
> +                while (global_temp) {
> +                    xsleep(1);
> +                }
> +
>                  /* The common flow deletion case involves deletion of
> the flow
>                   * during the dump phase and ukey deletion here. */
>                  ovs_mutex_lock(&umap->mutex);
> @@ -2471,6 +2523,10 @@ dp_purge_cb(void *aux, unsigned pmd_id)
>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>              if (ukey->pmd_id == pmd_id) {
>                  delete_op_init(udpif, &ops[n_ops++], ukey);
> +                struct ds ds = DS_EMPTY_INITIALIZER;
> +                odp_format_ufid(&ukey->ufid, &ds);
> +                VLOG_INFO("%s %d %s transition from %d to EVICITING",
> __func__, __LINE__, ds_cstr(&ds), ukey->state);
> +                ds_destroy(&ds);
>                  transition_ukey(ukey, UKEY_EVICTING);
>
>                  if (n_ops == REVALIDATE_MAX_BATCH) {
>
>

Hi all, oops, wrong patch, I accidently override i with static i, I'll 
retest it again and resend the patch.
Joe Stringer March 20, 2017, 9:09 p.m. | #3
On 19 March 2017 at 08:44, Paul Blakey <paulb@mellanox.com> wrote:
>
>
> On 19/03/2017 17:10, Paul Blakey wrote:
>>>
>>> Hi all,
>>>
>>> While using out patches for HW offload we've noticed we get a ovs
>>> assertion at transition ukey, which tries to
>>> transition the ukey state from EVICTED back to OPERATIONAL.
>>> With furthur investigation it seem that this can happen without our HW
>>> offload patches as there might be a race between handle_upcalls and
>>> revalidate.
>>>
>>> The procedure is as such:
>>>
>>> handle_upcalls receives a new upcall and creates a new ukey, its state
>>> is VISIBLE, it then it installs a flow (FLOW_PUT) to the datapat and
>>> upon success wants to set the ukey state to OPERATIONAL (line 1408). for
>>> that the handler running handle_upcalls wants to reaquirce the ukey
>>> lock, but in the meantime revalidators dump (line 2261) the already
>>> inserted flow and want to delete this flow (line 2328, say because of
>>> openflow db changes, or aging). The revalidator deletes the flow and
>>> moves the ukey from
>>> VISIBLE -> OPERATIONAL (line 2320) -> EVICTING (line 2220) -> EVICTED
>>> (line 2134)
>>>
>>> finally handler succesfuly acquires the flow and now set the state to
>>> OPERERTIONAL which will cause the assert in transition_ukey.
>>>
>>> Line numbers in question are from ofproto/ofproto-dpif-upcall.c
>>>
>>> I can provide a test the could show this happening, basicly adding a
>>> sleep before (writing it now).
>>>
>>> Thanks,
>>> Paul Blakey.
>>>
>>>
>>>
>>
>>
>> Here's a test:
>> 1) Install the below patch
>> 2) Set max-idle to 500ms, n-handlers to 1, revalidators to 10
>> 3) Create a simple setup (like veth pair in OVS_TRAFFIC_VSWITCHD_START
>> traffic testsuite)
>> 4) Ping one of the machines/veths slowly and wait till you get to
>> sleeping part and stop (put: 24 print)
>>
>>
>> Here is my log:
>> .......
>>
>> 2017-03-19T14:48:18.609Z|00121|ofproto_dpif_upcall(handler12)|INFO|ukey_install__
>> 1696 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 0 to
>> VISIBLE
>> 2017-03-19T14:48:18.609Z|00122|ofproto_dpif_upcall(handler12)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 0 -> 1
>>
>> 2017-03-19T14:48:18.609Z|00123|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 1/20 sec 1
>>
>> 2017-03-19T14:48:18.832Z|00001|ofproto_dpif_upcall(revalidator15)|INFO|revalidate
>> 2376 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 1 to
>> OPERATIONAL
>>
>> 2017-03-19T14:48:18.832Z|00002|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 1 -> 2
>>
>> 2017-03-19T14:48:18.832Z|00003|ofproto_dpif_upcall(revalidator15)|INFO|reval_op_init
>> 2272 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 2 to
>> EVICITING
>>
>> 2017-03-19T14:48:18.832Z|00004|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 2 -> 3
>>
>> 2017-03-19T14:48:18.832Z|00005|ofproto_dpif_upcall(revalidator15)|INFO|push_dp_ops
>> 2182 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 3 to
>> EVICTED
>>
>> 2017-03-19T14:48:18.832Z|00006|ofproto_dpif_upcall(revalidator15)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 3 -> 4
>>
>> 2017-03-19T14:48:19.609Z|00124|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 2/20 sec 4
>>
>> 2017-03-19T14:48:20.610Z|00125|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 3/20 sec 4
>>
>> 2017-03-19T14:48:21.610Z|00126|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 4/20 sec 4
>>
>> 2017-03-19T14:48:22.610Z|00127|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 5/20 sec 4
>>
>> 2017-03-19T14:48:23.610Z|00128|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 6/20 sec 4
>>
>> 2017-03-19T14:48:24.610Z|00129|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 7/20 sec 4
>>
>> 2017-03-19T14:48:25.611Z|00130|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 8/20 sec 4
>>
>> 2017-03-19T14:48:26.611Z|00131|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 9/20 sec 4
>>
>> 2017-03-19T14:48:27.611Z|00132|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 10/20 sec 4
>>
>> 2017-03-19T14:48:28.611Z|00133|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 11/20 sec 4
>>
>> 2017-03-19T14:48:29.611Z|00134|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 12/20 sec 4
>>
>> 2017-03-19T14:48:30.611Z|00135|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 13/20 sec 4
>>
>> 2017-03-19T14:48:31.611Z|00136|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 14/20 sec 4
>>
>> 2017-03-19T14:48:32.612Z|00137|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 15/20 sec 4
>>
>> 2017-03-19T14:48:33.612Z|00138|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 16/20 sec 4
>>
>> 2017-03-19T14:48:34.612Z|00139|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 17/20 sec 4
>>
>> 2017-03-19T14:48:35.612Z|00140|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 18/20 sec 4
>>
>> 2017-03-19T14:48:36.612Z|00141|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 19/20 sec 4
>>
>> 2017-03-19T14:48:37.612Z|00142|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1423 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 sleeping 20/20 sec 4
>> 2017-03-19T14:48:38.613Z|00143|ofproto_dpif_upcall(handler12)|INFO|put: 25
>>
>> 2017-03-19T14:48:38.613Z|00144|ofproto_dpif_upcall(handler12)|INFO|handle_upcalls
>> 1434 ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1 transition from 4 to
>> OPERATIONAL
>> 2017-03-19T14:48:38.613Z|00145|ofproto_dpif_upcall(handler12)|INFO|state
>> transition for ukey ufid:60dc4fe2-5166-4817-99e9-e62d79d6a7e1: 4 -> 2
>>
>> 2017-03-19T14:48:38.613Z|00146|util(handler12)|EMER|ofproto/ofproto-dpif-upcall.c:1716:
>> assertion dst >= ukey->state failed in transition_ukey()
>>
>>
>>
>>
>> commit 4755254db3053bcf7ceac88fedce60fd228c8640 (HEAD)
>> Author: Paul Blakey <paulb@mellanox.com>
>> Date:   Sun Mar 19 17:00:30 2017 +0200
>>
>>     test
>>     add prints and forcefully slow down some actions (EVICTED->DELETED,
>> and VISIBLE -> OPERATIONAL)
>>
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index 660383f..0dfffdb 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -45,6 +45,8 @@
>>  #define UPCALL_MAX_BATCH 64
>>  #define REVALIDATE_MAX_BATCH 50
>>
>> +int global_temp = 0;
>> +
>>  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);
>>
>>  COVERAGE_DEFINE(dumped_duplicate_flow);
>> @@ -1399,15 +1401,34 @@ handle_upcalls(struct udpif *udpif, struct
>> upcall *upcalls,
>>      dpif_operate(udpif->dpif, opsp, n_opsp);
>>      for (i = 0; i < n_ops; i++) {
>>          struct udpif_key *ukey = ops[i].ukey;
>> +        static int i = 0;
>>
>>          if (ukey) {
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&ukey->ufid, &ds);
>> +
>> +            if (i++%25 == 25-1) {
>> +                global_temp = 1;
>> +                for (int z = 0; z < 20; z++) {
>> +                    VLOG_INFO("%s %d %s sleeping %d/%d sec %d",
>> __func__, __LINE__, ds_cstr(&ds), z+1, 20, ukey->state);
>> +                    xsleep(1);
>> +                }
>> +            }
>> +            VLOG_INFO("put: %d", i);
>> +
>>              ovs_mutex_lock(&ukey->mutex);
>>              if (ops[i].dop.error) {
>> +                VLOG_INFO("%s %d %s transition from %d to EVICTED",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>>                  transition_ukey(ukey, UKEY_EVICTED);
>>              } else {
>> +                VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>>  __func__, __LINE__, ds_cstr(&ds), ukey->state);
>>                  transition_ukey(ukey, UKEY_OPERATIONAL);
>>              }
>>              ovs_mutex_unlock(&ukey->mutex);
>> +
>> +            global_temp = 0;
>> +
>> +            ds_destroy(&ds);
>>          }
>>      }
>>  }
>> @@ -1659,6 +1680,10 @@ ukey_install__(struct udpif *udpif, struct
>> udpif_key *new_ukey)
>>      } else {
>>          ovs_mutex_lock(&new_ukey->mutex);
>>          cmap_insert(&umap->cmap, &new_ukey->cmap_node, new_ukey->hash);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&new_ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to VISIBLE",  __func__,
>> __LINE__, ds_cstr(&ds), new_ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(new_ukey, UKEY_VISIBLE);
>>          locked = true;
>>      }
>> @@ -1671,6 +1696,12 @@ static void
>>  transition_ukey(struct udpif_key *ukey, enum ukey_state dst)
>>      OVS_REQUIRES(ukey->mutex)
>>  {
>> +    struct ds ds = DS_EMPTY_INITIALIZER;
>> +
>> +    odp_format_ufid(&ukey->ufid, &ds);
>> +    VLOG_INFO("state transition for ukey %s: %d -> %d", ds_cstr(&ds),
>> ukey->state, dst);
>> +    ds_destroy(&ds);
>> +
>>      ovs_assert(dst >= ukey->state);
>>      if (ukey->state == dst && dst == UKEY_OPERATIONAL) {
>>          return;
>> @@ -1699,12 +1730,12 @@ transition_ukey(struct udpif_key *ukey, enum
>> ukey_state dst)
>>                                     dst < UKEY_DELETED)) {
>>          ukey->state = dst;
>>      } else {
>> -        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        struct ds ds2 = DS_EMPTY_INITIALIZER;
>>
>> -        odp_format_ufid(&ukey->ufid, &ds);
>> +        odp_format_ufid(&ukey->ufid, &ds2);
>>          VLOG_WARN_RL(&rl, "Invalid state transition for ukey %s: %d ->
>> %d",
>> -                     ds_cstr(&ds), ukey->state, dst);
>> -        ds_destroy(&ds);
>> +                     ds_cstr(&ds2), ukey->state, dst);
>> +        ds_destroy(&ds2);
>>      }
>>  }
>>
>> @@ -1797,6 +1828,10 @@ ukey_delete(struct umap *umap, struct udpif_key
>> *ukey)
>>      if (ukey->state < UKEY_DELETED) {
>>          cmap_remove(&umap->cmap, &ukey->cmap_node, ukey->hash);
>>          ovsrcu_postpone(ukey_delete__, ukey);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to DELETED",  __func__,
>> __LINE__, ds_cstr(&ds), ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(ukey, UKEY_DELETED);
>>      }
>>      ovs_mutex_unlock(&ukey->mutex);
>> @@ -2131,6 +2166,10 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>> *ops, size_t n_ops)
>>
>>          if (op->ukey) {
>>              ovs_mutex_lock(&op->ukey->mutex);
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&op->ukey->ufid, &ds);
>> +            VLOG_INFO("%s %d %s transition from %d to EVICTED",
>> __func__, __LINE__, ds_cstr(&ds), op->ukey->state);
>> +            ds_destroy(&ds);
>>              transition_ukey(op->ukey, UKEY_EVICTED);
>>              push->used = MAX(stats->used, op->ukey->stats.used);
>>              push->tcp_flags = stats->tcp_flags |
>> op->ukey->stats.tcp_flags;
>> @@ -2217,6 +2256,10 @@ reval_op_init(struct ukey_op *op, enum
>> reval_result result,
>>  {
>>      if (result == UKEY_DELETE) {
>>          delete_op_init(udpif, op, ukey);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +        odp_format_ufid(&ukey->ufid, &ds);
>> +        VLOG_INFO("%s %d %s transition from %d to EVICITING", __func__,
>> __LINE__, ds_cstr(&ds), ukey->state);
>> +        ds_destroy(&ds);
>>          transition_ukey(ukey, UKEY_EVICTING);
>>      } else if (result == UKEY_MODIFY) {
>>          /* Store the new recircs. */
>> @@ -2317,6 +2360,10 @@ revalidate(struct revalidator *revalidator)
>>              }
>>
>>              /* The flow is now confirmed to be in the datapath. */
>> +            struct ds ds = DS_EMPTY_INITIALIZER;
>> +            odp_format_ufid(&ukey->ufid, &ds);
>> +            VLOG_INFO("%s %d %s transition from %d to OPERATIONAL",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>> +            ds_destroy(&ds);
>>              transition_ukey(ukey, UKEY_OPERATIONAL);
>>
>>              if (!used) {
>> @@ -2417,6 +2464,11 @@ revalidator_sweep__(struct revalidator
>> *revalidator, bool purge)
>>              ovs_mutex_unlock(&ukey->mutex);
>>
>>              if (ukey_state == UKEY_EVICTED) {
>> +
>> +                while (global_temp) {
>> +                    xsleep(1);
>> +                }
>> +
>>                  /* The common flow deletion case involves deletion of
>> the flow
>>                   * during the dump phase and ukey deletion here. */
>>                  ovs_mutex_lock(&umap->mutex);
>> @@ -2471,6 +2523,10 @@ dp_purge_cb(void *aux, unsigned pmd_id)
>>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>              if (ukey->pmd_id == pmd_id) {
>>                  delete_op_init(udpif, &ops[n_ops++], ukey);
>> +                struct ds ds = DS_EMPTY_INITIALIZER;
>> +                odp_format_ufid(&ukey->ufid, &ds);
>> +                VLOG_INFO("%s %d %s transition from %d to EVICITING",
>> __func__, __LINE__, ds_cstr(&ds), ukey->state);
>> +                ds_destroy(&ds);
>>                  transition_ukey(ukey, UKEY_EVICTING);
>>
>>                  if (n_ops == REVALIDATE_MAX_BATCH) {
>
>
>
> What I also notice happening without the second xsleep (in
> revalidator_sweep__) is that the ukey gets deleted entirely and handler is
> stuck on acquiring the non existent lock.

Any xsleep is a bit dangerous here because xsleep will quiesce, and
we're relying on the fact that handlers will not quiesce RCU while
installing flows. If it quiesces, then the memory may end up being
properly freed so you would have to perform a fresh lookup to find the
ukey.

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 660383f..0dfffdb 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -45,6 +45,8 @@ 
  #define UPCALL_MAX_BATCH 64
  #define REVALIDATE_MAX_BATCH 50

+int global_temp = 0;
+
  VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall);

  COVERAGE_DEFINE(dumped_duplicate_flow);
@@ -1399,15 +1401,34 @@  handle_upcalls(struct udpif *udpif, struct 
upcall *upcalls,
      dpif_operate(udpif->dpif, opsp, n_opsp);
      for (i = 0; i < n_ops; i++) {
          struct udpif_key *ukey = ops[i].ukey;
+        static int i = 0;

          if (ukey) {
+            struct ds ds = DS_EMPTY_INITIALIZER;
+            odp_format_ufid(&ukey->ufid, &ds);
+
+            if (i++%25 == 25-1) {
+                global_temp = 1;
+                for (int z = 0; z < 20; z++) {
+                    VLOG_INFO("%s %d %s sleeping %d/%d sec %d", 
__func__, __LINE__, ds_cstr(&ds), z+1, 20, ukey->state);
+                    xsleep(1);
+                }
+            }
+            VLOG_INFO("put: %d", i);
+
              ovs_mutex_lock(&ukey->mutex);
              if (ops[i].dop.error) {
+                VLOG_INFO("%s %d %s transition from %d to EVICTED", 
__func__, __LINE__, ds_cstr(&ds), ukey->state);
                  transition_ukey(ukey, UKEY_EVICTED);
              } else {
+                VLOG_INFO("%s %d %s transition from %d to OPERATIONAL", 
  __func__, __LINE__, ds_cstr(&ds), ukey->state);
                  transition_ukey(ukey, UKEY_OPERATIONAL);
              }
              ovs_mutex_unlock(&ukey->mutex);
+
+            global_temp = 0;
+