diff mbox series

[ovs-dev,ovs-dev,v3,2/4] ofproto-dpif-upcall: fix race condition

Message ID 20220923162920.612492-2-hepeng.0320@bytedance.com
State Rejected
Headers show
Series [ovs-dev,ovs-dev,v3,1/4] ofproto-dpif-upcall: fix push_dp_ops | expand

Commit Message

Peng He Sept. 23, 2022, 4:29 p.m. UTC
There is a race condition between the revalidator threads and
the handler/pmd threads.

revalidator                          PMD threads
push_dp_ops deletes a key and tries
to del the dp magaflow.
                                     does the upcall, generates a new ukey,
                                     and replaces the old ukey, now the old
                                     ukey state is UKEY_DELETED

dp_ops succeeds, tries to change
the old ukey's state into
UKEY_EVICTED, however, the old
ukey's state is already UKEY_DELETED,
so OVS aborts.

I did not observe this in the real environment, as it takes time for
PMDs to finish the upcall and replace the old ukeys. Normally, the
revalidator will change ukey state into UKEY_EVICTED first.
But it's better to cover this case.

Signed-off-by: Peng He <hepeng.0320@bytedance.com>
---
 ofproto/ofproto-dpif-upcall.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Eelco Chaudron Sept. 30, 2022, 2:37 p.m. UTC | #1
On 23 Sep 2022, at 18:29, Peng He wrote:

> There is a race condition between the revalidator threads and
> the handler/pmd threads.
>
> revalidator                          PMD threads
> push_dp_ops deletes a key and tries
> to del the dp magaflow.
>                                      does the upcall, generates a new ukey,
>                                      and replaces the old ukey, now the old
>                                      ukey state is UKEY_DELETED
>
> dp_ops succeeds, tries to change
> the old ukey's state into
> UKEY_EVICTED, however, the old
> ukey's state is already UKEY_DELETED,
> so OVS aborts.

Can you give a call trace example, as try_ukey_replace() should handle this?

If it is a valid state change, it should be handled transition_ukey() not by bypassing the call to it.

> I did not observe this in the real environment, as it takes time for
> PMDs to finish the upcall and replace the old ukeys. Normally, the
> revalidator will change ukey state into UKEY_EVICTED first.
> But it's better to cover this case.
>
> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> ---
>  ofproto/ofproto-dpif-upcall.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 7ea2a30f5..e8bbcfeaf 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops)
>          if (op->dop.error) {
>              if (op->ukey) {
>                  ovs_mutex_lock(&op->ukey->mutex);
> -                transition_ukey(op->ukey, UKEY_EVICTED);
> +                if (op->ukey->state < UKEY_EVICTED) {
> +                    transition_ukey(op->ukey, UKEY_EVICTED);
> +                }
>                  ovs_mutex_unlock(&op->ukey->mutex);
>              }
>              /* if it's a flow_del error, 'stats' is unusable, it's ok
> @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops)
>
>          if (op->ukey) {
>              ovs_mutex_lock(&op->ukey->mutex);
> -            transition_ukey(op->ukey, UKEY_EVICTED);
> +            if (op->ukey->state < UKEY_EVICTED) {
> +                transition_ukey(op->ukey, UKEY_EVICTED);
> +            }
>              push->used = MAX(stats->used, op->ukey->stats.used);
>              push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags;
>              push->n_packets = stats->n_packets - op->ukey->stats.n_packets;
> -- 
> 2.25.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Peng He Sept. 30, 2022, 2:46 p.m. UTC | #2
Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道:

>
>
> On 23 Sep 2022, at 18:29, Peng He wrote:
>
> > There is a race condition between the revalidator threads and
> > the handler/pmd threads.
> >
> > revalidator                          PMD threads
> > push_dp_ops deletes a key and tries
> > to del the dp magaflow.
> >                                      does the upcall, generates a new
> ukey,
> >                                      and replaces the old ukey, now the
> old
> >                                      ukey state is UKEY_DELETED
> >
> > dp_ops succeeds, tries to change
> > the old ukey's state into
> > UKEY_EVICTED, however, the old
> > ukey's state is already UKEY_DELETED,
> > so OVS aborts.
>
> Can you give a call trace example, as try_ukey_replace() should handle
> this?
>
> If it is a valid state change, it should be handled transition_ukey() not
> by bypassing the call to it.
>

It means if the ukey->state >=  UKEY_EVICTED, it's not needed to do the
state transition.
It's not a state change. since we have actually two threads moving forward
the state machine.

>
> > I did not observe this in the real environment, as it takes time for
> > PMDs to finish the upcall and replace the old ukeys. Normally, the
> > revalidator will change ukey state into UKEY_EVICTED first.
> > But it's better to cover this case.
> >
> > Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> > ---
> >  ofproto/ofproto-dpif-upcall.c | 8 ++++++--
> >  1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/ofproto/ofproto-dpif-upcall.c
> b/ofproto/ofproto-dpif-upcall.c
> > index 7ea2a30f5..e8bbcfeaf 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> *ops, size_t n_ops)
> >          if (op->dop.error) {
> >              if (op->ukey) {
> >                  ovs_mutex_lock(&op->ukey->mutex);
> > -                transition_ukey(op->ukey, UKEY_EVICTED);
> > +                if (op->ukey->state < UKEY_EVICTED) {
> > +                    transition_ukey(op->ukey, UKEY_EVICTED);
> > +                }
> >                  ovs_mutex_unlock(&op->ukey->mutex);
> >              }
> >              /* if it's a flow_del error, 'stats' is unusable, it's ok
> > @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> *ops, size_t n_ops)
> >
> >          if (op->ukey) {
> >              ovs_mutex_lock(&op->ukey->mutex);
> > -            transition_ukey(op->ukey, UKEY_EVICTED);
> > +            if (op->ukey->state < UKEY_EVICTED) {
> > +                transition_ukey(op->ukey, UKEY_EVICTED);
> > +            }
> >              push->used = MAX(stats->used, op->ukey->stats.used);
> >              push->tcp_flags = stats->tcp_flags |
> op->ukey->stats.tcp_flags;
> >              push->n_packets = stats->n_packets -
> op->ukey->stats.n_packets;
> > --
> > 2.25.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Eelco Chaudron Sept. 30, 2022, 3:15 p.m. UTC | #3
On 30 Sep 2022, at 16:46, Peng He wrote:

> Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道:
>
>>
>>
>> On 23 Sep 2022, at 18:29, Peng He wrote:
>>
>>> There is a race condition between the revalidator threads and
>>> the handler/pmd threads.
>>>
>>> revalidator                          PMD threads
>>> push_dp_ops deletes a key and tries
>>> to del the dp magaflow.
>>>                                      does the upcall, generates a new
>> ukey,
>>>                                      and replaces the old ukey, now the
>> old
>>>                                      ukey state is UKEY_DELETED
>>>
>>> dp_ops succeeds, tries to change
>>> the old ukey's state into
>>> UKEY_EVICTED, however, the old
>>> ukey's state is already UKEY_DELETED,
>>> so OVS aborts.
>>
>> Can you give a call trace example, as try_ukey_replace() should handle
>> this?
>>
>> If it is a valid state change, it should be handled transition_ukey() not
>> by bypassing the call to it.
>>
>
> It means if the ukey->state >=  UKEY_EVICTED, it's not needed to do the
> state transition.
> It's not a state change. since we have actually two threads moving forward
> the state machine.

Well, I think it should be handled through the state machine, as we still take additional actions as if we would have entered this state.

Can you explain what code path is causing this, as try_ukey_replace() will fail if it’s not in UKEY_EVICTED state?

>>
>>> I did not observe this in the real environment, as it takes time for
>>> PMDs to finish the upcall and replace the old ukeys. Normally, the
>>> revalidator will change ukey state into UKEY_EVICTED first.
>>> But it's better to cover this case.
>>>
>>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
>>> ---
>>>  ofproto/ofproto-dpif-upcall.c | 8 ++++++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/ofproto/ofproto-dpif-upcall.c
>> b/ofproto/ofproto-dpif-upcall.c
>>> index 7ea2a30f5..e8bbcfeaf 100644
>>> --- a/ofproto/ofproto-dpif-upcall.c
>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>> @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>> *ops, size_t n_ops)
>>>          if (op->dop.error) {
>>>              if (op->ukey) {
>>>                  ovs_mutex_lock(&op->ukey->mutex);
>>> -                transition_ukey(op->ukey, UKEY_EVICTED);
>>> +                if (op->ukey->state < UKEY_EVICTED) {
>>> +                    transition_ukey(op->ukey, UKEY_EVICTED);
>>> +                }
>>>                  ovs_mutex_unlock(&op->ukey->mutex);
>>>              }
>>>              /* if it's a flow_del error, 'stats' is unusable, it's ok
>>> @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
>> *ops, size_t n_ops)
>>>
>>>          if (op->ukey) {
>>>              ovs_mutex_lock(&op->ukey->mutex);
>>> -            transition_ukey(op->ukey, UKEY_EVICTED);
>>> +            if (op->ukey->state < UKEY_EVICTED) {
>>> +                transition_ukey(op->ukey, UKEY_EVICTED);
>>> +            }
>>>              push->used = MAX(stats->used, op->ukey->stats.used);
>>>              push->tcp_flags = stats->tcp_flags |
>> op->ukey->stats.tcp_flags;
>>>              push->n_packets = stats->n_packets -
>> op->ukey->stats.n_packets;
>>> --
>>> 2.25.1
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>
>>
>
> -- 
> hepeng
Peng He Sept. 30, 2022, 3:42 p.m. UTC | #4
Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 23:15写道:

>
>
> On 30 Sep 2022, at 16:46, Peng He wrote:
>
> > Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道:
> >
> >>
> >>
> >> On 23 Sep 2022, at 18:29, Peng He wrote:
> >>
> >>> There is a race condition between the revalidator threads and
> >>> the handler/pmd threads.
> >>>
> >>> revalidator                          PMD threads
> >>> push_dp_ops deletes a key and tries
> >>> to del the dp magaflow.
> >>>                                      does the upcall, generates a new
> >> ukey,
> >>>                                      and replaces the old ukey, now the
> >> old
> >>>                                      ukey state is UKEY_DELETED
> >>>
> >>> dp_ops succeeds, tries to change
> >>> the old ukey's state into
> >>> UKEY_EVICTED, however, the old
> >>> ukey's state is already UKEY_DELETED,
> >>> so OVS aborts.
> >>
> >> Can you give a call trace example, as try_ukey_replace() should handle
> >> this?
> >>
> >> If it is a valid state change, it should be handled transition_ukey()
> not
> >> by bypassing the call to it.
> >>
> >
> > It means if the ukey->state >=  UKEY_EVICTED, it's not needed to do the
> > state transition.
> > It's not a state change. since we have actually two threads moving
> forward
> > the state machine.
>
> Well, I think it should be handled through the state machine, as we still
> take additional actions as if we would have entered this state.
>
> Can you explain what code path is causing this, as try_ukey_replace() will
> fail if it’s not in UKEY_EVICTED state?
>
Yes, I have missed the check in try_ukey_replace(), if so, then the patch
is not needed.



>
> >>
> >>> I did not observe this in the real environment, as it takes time for
> >>> PMDs to finish the upcall and replace the old ukeys. Normally, the
> >>> revalidator will change ukey state into UKEY_EVICTED first.
> >>> But it's better to cover this case.
> >>>
> >>> Signed-off-by: Peng He <hepeng.0320@bytedance.com>
> >>> ---
> >>>  ofproto/ofproto-dpif-upcall.c | 8 ++++++--
> >>>  1 file changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/ofproto/ofproto-dpif-upcall.c
> >> b/ofproto/ofproto-dpif-upcall.c
> >>> index 7ea2a30f5..e8bbcfeaf 100644
> >>> --- a/ofproto/ofproto-dpif-upcall.c
> >>> +++ b/ofproto/ofproto-dpif-upcall.c
> >>> @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> >> *ops, size_t n_ops)
> >>>          if (op->dop.error) {
> >>>              if (op->ukey) {
> >>>                  ovs_mutex_lock(&op->ukey->mutex);
> >>> -                transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +                if (op->ukey->state < UKEY_EVICTED) {
> >>> +                    transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +                }
> >>>                  ovs_mutex_unlock(&op->ukey->mutex);
> >>>              }
> >>>              /* if it's a flow_del error, 'stats' is unusable, it's ok
> >>> @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op
> >> *ops, size_t n_ops)
> >>>
> >>>          if (op->ukey) {
> >>>              ovs_mutex_lock(&op->ukey->mutex);
> >>> -            transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +            if (op->ukey->state < UKEY_EVICTED) {
> >>> +                transition_ukey(op->ukey, UKEY_EVICTED);
> >>> +            }
> >>>              push->used = MAX(stats->used, op->ukey->stats.used);
> >>>              push->tcp_flags = stats->tcp_flags |
> >> op->ukey->stats.tcp_flags;
> >>>              push->n_packets = stats->n_packets -
> >> op->ukey->stats.n_packets;
> >>> --
> >>> 2.25.1
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>
> >>
> >
> > --
> > hepeng
>
>
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 7ea2a30f5..e8bbcfeaf 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2420,7 +2420,9 @@  push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops)
         if (op->dop.error) {
             if (op->ukey) {
                 ovs_mutex_lock(&op->ukey->mutex);
-                transition_ukey(op->ukey, UKEY_EVICTED);
+                if (op->ukey->state < UKEY_EVICTED) {
+                    transition_ukey(op->ukey, UKEY_EVICTED);
+                }
                 ovs_mutex_unlock(&op->ukey->mutex);
             }
             /* if it's a flow_del error, 'stats' is unusable, it's ok
@@ -2441,7 +2443,9 @@  push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops)
 
         if (op->ukey) {
             ovs_mutex_lock(&op->ukey->mutex);
-            transition_ukey(op->ukey, UKEY_EVICTED);
+            if (op->ukey->state < UKEY_EVICTED) {
+                transition_ukey(op->ukey, UKEY_EVICTED);
+            }
             push->used = MAX(stats->used, op->ukey->stats.used);
             push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags;
             push->n_packets = stats->n_packets - op->ukey->stats.n_packets;