Message ID | 20220923162920.612492-2-hepeng.0320@bytedance.com |
---|---|
State | Rejected |
Headers | show |
Series | [ovs-dev,ovs-dev,v3,1/4] ofproto-dpif-upcall: fix push_dp_ops | expand |
On 23 Sep 2022, at 18:29, Peng He wrote: > There is a race condition between the revalidator threads and > the handler/pmd threads. > > revalidator PMD threads > push_dp_ops deletes a key and tries > to del the dp magaflow. > does the upcall, generates a new ukey, > and replaces the old ukey, now the old > ukey state is UKEY_DELETED > > dp_ops succeeds, tries to change > the old ukey's state into > UKEY_EVICTED, however, the old > ukey's state is already UKEY_DELETED, > so OVS aborts. Can you give a call trace example, as try_ukey_replace() should handle this? If it is a valid state change, it should be handled transition_ukey() not by bypassing the call to it. > I did not observe this in the real environment, as it takes time for > PMDs to finish the upcall and replace the old ukeys. Normally, the > revalidator will change ukey state into UKEY_EVICTED first. > But it's better to cover this case. > > Signed-off-by: Peng He <hepeng.0320@bytedance.com> > --- > ofproto/ofproto-dpif-upcall.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 7ea2a30f5..e8bbcfeaf 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) > if (op->dop.error) { > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > - transition_ukey(op->ukey, UKEY_EVICTED); > + if (op->ukey->state < UKEY_EVICTED) { > + transition_ukey(op->ukey, UKEY_EVICTED); > + } > ovs_mutex_unlock(&op->ukey->mutex); > } > /* if it's a flow_del error, 'stats' is unusable, it's ok > @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) > > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > - transition_ukey(op->ukey, UKEY_EVICTED); > + if (op->ukey->state < UKEY_EVICTED) { > + transition_ukey(op->ukey, UKEY_EVICTED); > + } > push->used = MAX(stats->used, op->ukey->stats.used); > push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags; > push->n_packets = stats->n_packets - op->ukey->stats.n_packets; > -- > 2.25.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道: > > > On 23 Sep 2022, at 18:29, Peng He wrote: > > > There is a race condition between the revalidator threads and > > the handler/pmd threads. > > > > revalidator PMD threads > > push_dp_ops deletes a key and tries > > to del the dp magaflow. > > does the upcall, generates a new > ukey, > > and replaces the old ukey, now the > old > > ukey state is UKEY_DELETED > > > > dp_ops succeeds, tries to change > > the old ukey's state into > > UKEY_EVICTED, however, the old > > ukey's state is already UKEY_DELETED, > > so OVS aborts. > > Can you give a call trace example, as try_ukey_replace() should handle > this? > > If it is a valid state change, it should be handled transition_ukey() not > by bypassing the call to it. > It means if the ukey->state >= UKEY_EVICTED, it's not needed to do the state transition. It's not a state change. since we have actually two threads moving forward the state machine. > > > I did not observe this in the real environment, as it takes time for > > PMDs to finish the upcall and replace the old ukeys. Normally, the > > revalidator will change ukey state into UKEY_EVICTED first. > > But it's better to cover this case. > > > > Signed-off-by: Peng He <hepeng.0320@bytedance.com> > > --- > > ofproto/ofproto-dpif-upcall.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 7ea2a30f5..e8bbcfeaf 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op > *ops, size_t n_ops) > > if (op->dop.error) { > > if (op->ukey) { > > ovs_mutex_lock(&op->ukey->mutex); > > - transition_ukey(op->ukey, UKEY_EVICTED); > > + if (op->ukey->state < UKEY_EVICTED) { > > + transition_ukey(op->ukey, UKEY_EVICTED); > > + } > > ovs_mutex_unlock(&op->ukey->mutex); > > } > > /* if it's a flow_del error, 'stats' is unusable, it's ok > > @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op > *ops, size_t n_ops) > > > > if (op->ukey) { > > ovs_mutex_lock(&op->ukey->mutex); > > - transition_ukey(op->ukey, UKEY_EVICTED); > > + if (op->ukey->state < UKEY_EVICTED) { > > + transition_ukey(op->ukey, UKEY_EVICTED); > > + } > > push->used = MAX(stats->used, op->ukey->stats.used); > > push->tcp_flags = stats->tcp_flags | > op->ukey->stats.tcp_flags; > > push->n_packets = stats->n_packets - > op->ukey->stats.n_packets; > > -- > > 2.25.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On 30 Sep 2022, at 16:46, Peng He wrote: > Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道: > >> >> >> On 23 Sep 2022, at 18:29, Peng He wrote: >> >>> There is a race condition between the revalidator threads and >>> the handler/pmd threads. >>> >>> revalidator PMD threads >>> push_dp_ops deletes a key and tries >>> to del the dp magaflow. >>> does the upcall, generates a new >> ukey, >>> and replaces the old ukey, now the >> old >>> ukey state is UKEY_DELETED >>> >>> dp_ops succeeds, tries to change >>> the old ukey's state into >>> UKEY_EVICTED, however, the old >>> ukey's state is already UKEY_DELETED, >>> so OVS aborts. >> >> Can you give a call trace example, as try_ukey_replace() should handle >> this? >> >> If it is a valid state change, it should be handled transition_ukey() not >> by bypassing the call to it. >> > > It means if the ukey->state >= UKEY_EVICTED, it's not needed to do the > state transition. > It's not a state change. since we have actually two threads moving forward > the state machine. Well, I think it should be handled through the state machine, as we still take additional actions as if we would have entered this state. Can you explain what code path is causing this, as try_ukey_replace() will fail if it’s not in UKEY_EVICTED state? >> >>> I did not observe this in the real environment, as it takes time for >>> PMDs to finish the upcall and replace the old ukeys. Normally, the >>> revalidator will change ukey state into UKEY_EVICTED first. >>> But it's better to cover this case. >>> >>> Signed-off-by: Peng He <hepeng.0320@bytedance.com> >>> --- >>> ofproto/ofproto-dpif-upcall.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/ofproto/ofproto-dpif-upcall.c >> b/ofproto/ofproto-dpif-upcall.c >>> index 7ea2a30f5..e8bbcfeaf 100644 >>> --- a/ofproto/ofproto-dpif-upcall.c >>> +++ b/ofproto/ofproto-dpif-upcall.c >>> @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op >> *ops, size_t n_ops) >>> if (op->dop.error) { >>> if (op->ukey) { >>> ovs_mutex_lock(&op->ukey->mutex); >>> - transition_ukey(op->ukey, UKEY_EVICTED); >>> + if (op->ukey->state < UKEY_EVICTED) { >>> + transition_ukey(op->ukey, UKEY_EVICTED); >>> + } >>> ovs_mutex_unlock(&op->ukey->mutex); >>> } >>> /* if it's a flow_del error, 'stats' is unusable, it's ok >>> @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op >> *ops, size_t n_ops) >>> >>> if (op->ukey) { >>> ovs_mutex_lock(&op->ukey->mutex); >>> - transition_ukey(op->ukey, UKEY_EVICTED); >>> + if (op->ukey->state < UKEY_EVICTED) { >>> + transition_ukey(op->ukey, UKEY_EVICTED); >>> + } >>> push->used = MAX(stats->used, op->ukey->stats.used); >>> push->tcp_flags = stats->tcp_flags | >> op->ukey->stats.tcp_flags; >>> push->n_packets = stats->n_packets - >> op->ukey->stats.n_packets; >>> -- >>> 2.25.1 >>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> > > -- > hepeng
Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 23:15写道: > > > On 30 Sep 2022, at 16:46, Peng He wrote: > > > Eelco Chaudron <echaudro@redhat.com> 于2022年9月30日周五 22:37写道: > > > >> > >> > >> On 23 Sep 2022, at 18:29, Peng He wrote: > >> > >>> There is a race condition between the revalidator threads and > >>> the handler/pmd threads. > >>> > >>> revalidator PMD threads > >>> push_dp_ops deletes a key and tries > >>> to del the dp magaflow. > >>> does the upcall, generates a new > >> ukey, > >>> and replaces the old ukey, now the > >> old > >>> ukey state is UKEY_DELETED > >>> > >>> dp_ops succeeds, tries to change > >>> the old ukey's state into > >>> UKEY_EVICTED, however, the old > >>> ukey's state is already UKEY_DELETED, > >>> so OVS aborts. > >> > >> Can you give a call trace example, as try_ukey_replace() should handle > >> this? > >> > >> If it is a valid state change, it should be handled transition_ukey() > not > >> by bypassing the call to it. > >> > > > > It means if the ukey->state >= UKEY_EVICTED, it's not needed to do the > > state transition. > > It's not a state change. since we have actually two threads moving > forward > > the state machine. > > Well, I think it should be handled through the state machine, as we still > take additional actions as if we would have entered this state. > > Can you explain what code path is causing this, as try_ukey_replace() will > fail if it’s not in UKEY_EVICTED state? > Yes, I have missed the check in try_ukey_replace(), if so, then the patch is not needed. > > >> > >>> I did not observe this in the real environment, as it takes time for > >>> PMDs to finish the upcall and replace the old ukeys. Normally, the > >>> revalidator will change ukey state into UKEY_EVICTED first. > >>> But it's better to cover this case. > >>> > >>> Signed-off-by: Peng He <hepeng.0320@bytedance.com> > >>> --- > >>> ofproto/ofproto-dpif-upcall.c | 8 ++++++-- > >>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>> > >>> diff --git a/ofproto/ofproto-dpif-upcall.c > >> b/ofproto/ofproto-dpif-upcall.c > >>> index 7ea2a30f5..e8bbcfeaf 100644 > >>> --- a/ofproto/ofproto-dpif-upcall.c > >>> +++ b/ofproto/ofproto-dpif-upcall.c > >>> @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op > >> *ops, size_t n_ops) > >>> if (op->dop.error) { > >>> if (op->ukey) { > >>> ovs_mutex_lock(&op->ukey->mutex); > >>> - transition_ukey(op->ukey, UKEY_EVICTED); > >>> + if (op->ukey->state < UKEY_EVICTED) { > >>> + transition_ukey(op->ukey, UKEY_EVICTED); > >>> + } > >>> ovs_mutex_unlock(&op->ukey->mutex); > >>> } > >>> /* if it's a flow_del error, 'stats' is unusable, it's ok > >>> @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op > >> *ops, size_t n_ops) > >>> > >>> if (op->ukey) { > >>> ovs_mutex_lock(&op->ukey->mutex); > >>> - transition_ukey(op->ukey, UKEY_EVICTED); > >>> + if (op->ukey->state < UKEY_EVICTED) { > >>> + transition_ukey(op->ukey, UKEY_EVICTED); > >>> + } > >>> push->used = MAX(stats->used, op->ukey->stats.used); > >>> push->tcp_flags = stats->tcp_flags | > >> op->ukey->stats.tcp_flags; > >>> push->n_packets = stats->n_packets - > >> op->ukey->stats.n_packets; > >>> -- > >>> 2.25.1 > >>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >> > >> > > > > -- > > hepeng > >
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 7ea2a30f5..e8bbcfeaf 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -2420,7 +2420,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) if (op->dop.error) { if (op->ukey) { ovs_mutex_lock(&op->ukey->mutex); - transition_ukey(op->ukey, UKEY_EVICTED); + if (op->ukey->state < UKEY_EVICTED) { + transition_ukey(op->ukey, UKEY_EVICTED); + } ovs_mutex_unlock(&op->ukey->mutex); } /* if it's a flow_del error, 'stats' is unusable, it's ok @@ -2441,7 +2443,9 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) if (op->ukey) { ovs_mutex_lock(&op->ukey->mutex); - transition_ukey(op->ukey, UKEY_EVICTED); + if (op->ukey->state < UKEY_EVICTED) { + transition_ukey(op->ukey, UKEY_EVICTED); + } push->used = MAX(stats->used, op->ukey->stats.used); push->tcp_flags = stats->tcp_flags | op->ukey->stats.tcp_flags; push->n_packets = stats->n_packets - op->ukey->stats.n_packets;
There is a race condition between the revalidator threads and the handler/pmd threads. revalidator PMD threads push_dp_ops deletes a key and tries to del the dp magaflow. does the upcall, generates a new ukey, and replaces the old ukey, now the old ukey state is UKEY_DELETED dp_ops succeeds, tries to change the old ukey's state into UKEY_EVICTED, however, the old ukey's state is already UKEY_DELETED, so OVS aborts. I did not observe this in the real environment, as it takes time for PMDs to finish the upcall and replace the old ukeys. Normally, the revalidator will change ukey state into UKEY_EVICTED first. But it's better to cover this case. Signed-off-by: Peng He <hepeng.0320@bytedance.com> --- ofproto/ofproto-dpif-upcall.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)