diff mbox series

[ovs-dev] ofproto-dpif-upcall: Try lock for umap iteration during sweep.

Message ID 20240315072029.1974541-1-i@liuyulong.me
State Superseded
Headers show
Series [ovs-dev] ofproto-dpif-upcall: Try lock for umap iteration during sweep. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation fail test: fail

Commit Message

LIU Yulong March 15, 2024, 7:20 a.m. UTC
A potential race condition happened with the following 3 threads:
* PMD thread replaced the old_ukey and transitioned the state to
  UKEY_DELETED.
* RCU thread is freeing the old_ukey mutex.
* While the revalidator thread is trying to lock the old_ukey mutex.
Then vswitchd process aborts at the revalidator thread try_lock of
ukey->mutex because of the NULL pointer.

This patch adds the try_lock for the ukeys' basket umap to avoid
the PMD and revalidator access to the same umap for replacing the
ukey and transitioning the ukey state.

More details can be found at:
[1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
[2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html

Signed-off-by: LIU Yulong <i@liuyulong.me>
---
 ofproto/ofproto-dpif-upcall.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Eelco Chaudron March 15, 2024, 8:56 a.m. UTC | #1
Hi,

Thanks for submitting a patch and researching the problem. I haven't had the chance to review your patch yet, as I still need to dedicate some time to review it, based on the complexity. However, if you've sent a new version in the meantime, please make sure to label it with a version number. For reference, here's an example of someone marking their patch as "v3" in the subject line, with additional context provided in the commit messages following the "---".

https://mail.openvswitch.org/pipermail/ovs-dev/2024-March/412232.html

Talking about the subject line, your patch would probably receive a message from the robot about your Subject line. It's advisable to run the checkpatch tool before sending out the patch, as it will likely identify similar issues. Below is an example of running the check on the last commit in your git repository:

./utilities/checkpatch.py -S -1

Lastly, you may consider revising your commit message to ensure clarity on why the issue is problematic (refer to the comment below).

Awaiting your v3 eagerly.

Thanks,

Eelco


On 15 Mar 2024, at 8:20, LIU Yulong wrote:

> A potential race condition happened with the following 3 threads:
> * PMD thread replaced the old_ukey and transitioned the state to
>   UKEY_DELETED.
> * RCU thread is freeing the old_ukey mutex.
> * While the revalidator thread is trying to lock the old_ukey mutex.

Can you explain why this is a problem in more details? The RCU thread should only free the structure if we go through quiescent state. The sweep is using a CMAP which should be fine. We also hold the ukey->mutex mutex, so the state should not change underneath us. Also taking the umap->mutex lock for the entire sweep phase was not the design goal of using a CMAP.

I hope that if I go over the actual code it will be clear. But the idea of the commit message is to tell the full story, so we can verify the code behaves as described.

> Then vswitchd process aborts at the revalidator thread try_lock of
> ukey->mutex because of the NULL pointer.
>
> This patch adds the try_lock for the ukeys' basket umap to avoid
> the PMD and revalidator access to the same umap for replacing the
> ukey and transitioning the ukey state.
>
> More details can be found at:
> [1] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-March/052973.html
> [2] https://mail.openvswitch.org/pipermail/ovs-discuss/2024-February/052949.html
>
> Signed-off-by: LIU Yulong <i@liuyulong.me>
> ---
>  ofproto/ofproto-dpif-upcall.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 9a5c5c29c..ef13f820a 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2974,6 +2974,10 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge)
>          struct umap *umap = &udpif->ukeys[i];
>          size_t n_ops = 0;
>
> +        if (ovs_mutex_trylock(&umap->mutex)) {
> +            continue;
> +        }
> +
>          CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>              enum ukey_state ukey_state;
>
> @@ -3013,9 +3017,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge)
>              if (ukey_state == UKEY_EVICTED) {
>                  /* The common flow deletion case involves deletion of the flow
>                   * during the dump phase and ukey deletion here. */
> -                ovs_mutex_lock(&umap->mutex);
>                  ukey_delete(umap, ukey);
> -                ovs_mutex_unlock(&umap->mutex);
>              }
>
>              if (n_ops == REVALIDATE_MAX_BATCH) {
> @@ -3025,6 +3027,7 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge)
>                  n_ops = 0;
>              }
>          }
> +        ovs_mutex_unlock(&umap->mutex);
>          if (n_ops) {
>              push_ukey_ops(udpif, umap, ops, n_ops);
> -- 
> 2.27.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 9a5c5c29c..ef13f820a 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2974,6 +2974,10 @@  revalidator_sweep__(struct revalidator *revalidator, bool purge)
         struct umap *umap = &udpif->ukeys[i];
         size_t n_ops = 0;
 
+        if (ovs_mutex_trylock(&umap->mutex)) {
+            continue;
+        }
+
         CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
             enum ukey_state ukey_state;
 
@@ -3013,9 +3017,7 @@  revalidator_sweep__(struct revalidator *revalidator, bool purge)
             if (ukey_state == UKEY_EVICTED) {
                 /* The common flow deletion case involves deletion of the flow
                  * during the dump phase and ukey deletion here. */
-                ovs_mutex_lock(&umap->mutex);
                 ukey_delete(umap, ukey);
-                ovs_mutex_unlock(&umap->mutex);
             }
 
             if (n_ops == REVALIDATE_MAX_BATCH) {
@@ -3025,6 +3027,7 @@  revalidator_sweep__(struct revalidator *revalidator, bool purge)
                 n_ops = 0;
             }
         }
+        ovs_mutex_unlock(&umap->mutex);
 
         if (n_ops) {
             push_ukey_ops(udpif, umap, ops, n_ops);