Message ID | 20221003040220.1153222-2-hepeng.0320@bytedance.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev,v4,1/3] ofproto-dpif-upcall: fix push_dp_ops | expand |
On 3 Oct 2022, at 6:02, Peng He wrote: > push_dp_ops only handles delete ops errors but ignores the modify > ops results. It's better to handle all the dp operation errors in > a consistent way. > > We observe in the production environment that sometimes a megaflow > with wrong actions keep staying in datapath. The coverage command shows > revalidators have dumped several times, however the correct > actions are not set. This implies that the ukey's action does not > equal to the meagaflow's, i.e. revalidators think the underlying > megaflow's actions are correct however they are not. > > We also check the megaflow using the ofproto/trace command, and the > actions are not matched with the ones in the actual magaflow. By > performing a revalidator/purge command, the right actions are set. > > This patch prevents the inconsistency by considering modify failure > in revalidators. > > Signed-off-by: Peng He <hepeng.0320@bytedance.com> > --- > ofproto/ofproto-dpif-upcall.c | 37 ++++++++++++++++++++++++----------- > 1 file changed, 26 insertions(+), 11 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 7ad728adf..dd1abfdee 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -2416,26 +2416,41 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) > > for (i = 0; i < n_ops; i++) { > struct ukey_op *op = &ops[i]; > - struct dpif_flow_stats *push, *stats, push_buf; > - > - stats = op->dop.flow_del.stats; > - push = &push_buf; > - > - if (op->dop.type != DPIF_OP_FLOW_DEL) { > - /* Only deleted flows need their stats pushed. */ > - continue; > - } > > if (op->dop.error) { > - /* flow_del error, 'stats' is unusable. */ > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > - transition_ukey(op->ukey, UKEY_EVICTED); > + if (op->dop.type == DPIF_OP_FLOW_DEL) > + transition_ukey(op->ukey, UKEY_EVICTED); > + else { > + /* modify error has variety reasons, it's easy > + * to transition the state into UKEY_EVICTED, and > + * let revalidator_sweep remove the ukey. > + * > + * In the next round of revalidating, the revalidator > + * can recover the ukey through ukey_create_from_dpif_flow, > + * if recover fails, the revalidator will remove the datapath > + * flow itself. > + */ > + transition_ukey(op->ukey, UKEY_EVICTING); > + transition_ukey(op->ukey, UKEY_EVICTED); > + } > ovs_mutex_unlock(&op->ukey->mutex); > } > + /* if modify or delete fails, there is no need to push stats */ > continue; > } > > + if (op->dop.type != DPIF_OP_FLOW_DEL) { > + /* Only deleted flows need their stats pushed. */ The code looks good, however, what was the conclusion on updating stats on other flows that now got deleted instead of updated? Do they not need their stats pushed? See v3 discussion. > + continue; > + } > + > + struct dpif_flow_stats *push, *stats, push_buf; > + > + stats = op->dop.flow_del.stats; > + push = &push_buf; > + > if (op->ukey) { > ovs_mutex_lock(&op->ukey->mutex); > transition_ukey(op->ukey, UKEY_EVICTED); > -- > 2.25.1
sorry to miss that part, will take some time to think if we need it or not. BTW, looks like the kernel handle_upcalls code path has also this issue. the code is here: for (i = 0; i < n_ops; i++) { struct udpif_key *ukey = ops[i].ukey; if (ukey) { ovs_mutex_lock(&ukey->mutex); if (ops[i].dop.error) { transition_ukey(ukey, UKEY_EVICTED); } else if (ukey->state < UKEY_OPERATIONAL) { transition_ukey(ukey, UKEY_OPERATIONAL); } ovs_mutex_unlock(&ukey->mutex); } } Eelco Chaudron <echaudro@redhat.com> 于2022年10月19日周三 18:44写道: > > > On 3 Oct 2022, at 6:02, Peng He wrote: > > > push_dp_ops only handles delete ops errors but ignores the modify > > ops results. It's better to handle all the dp operation errors in > > a consistent way. > > > > We observe in the production environment that sometimes a megaflow > > with wrong actions keep staying in datapath. The coverage command shows > > revalidators have dumped several times, however the correct > > actions are not set. This implies that the ukey's action does not > > equal to the meagaflow's, i.e. revalidators think the underlying > > megaflow's actions are correct however they are not. > > > > We also check the megaflow using the ofproto/trace command, and the > > actions are not matched with the ones in the actual magaflow. By > > performing a revalidator/purge command, the right actions are set. > > > > This patch prevents the inconsistency by considering modify failure > > in revalidators. > > > > Signed-off-by: Peng He <hepeng.0320@bytedance.com> > > --- > > ofproto/ofproto-dpif-upcall.c | 37 ++++++++++++++++++++++++----------- > > 1 file changed, 26 insertions(+), 11 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 7ad728adf..dd1abfdee 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -2416,26 +2416,41 @@ push_dp_ops(struct udpif *udpif, struct ukey_op > *ops, size_t n_ops) > > > > for (i = 0; i < n_ops; i++) { > > struct ukey_op *op = &ops[i]; > > - struct dpif_flow_stats *push, *stats, push_buf; > > - > > - stats = op->dop.flow_del.stats; > > - push = &push_buf; > > - > > - if (op->dop.type != DPIF_OP_FLOW_DEL) { > > - /* Only deleted flows need their stats pushed. */ > > - continue; > > - } > > > > if (op->dop.error) { > > - /* flow_del error, 'stats' is unusable. */ > > if (op->ukey) { > > ovs_mutex_lock(&op->ukey->mutex); > > - transition_ukey(op->ukey, UKEY_EVICTED); > > + if (op->dop.type == DPIF_OP_FLOW_DEL) > > + transition_ukey(op->ukey, UKEY_EVICTED); > > + else { > > + /* modify error has variety reasons, it's easy > > + * to transition the state into UKEY_EVICTED, and > > + * let revalidator_sweep remove the ukey. > > + * > > + * In the next round of revalidating, the > revalidator > > + * can recover the ukey through > ukey_create_from_dpif_flow, > > + * if recover fails, the revalidator will remove > the datapath > > + * flow itself. > > + */ > > + transition_ukey(op->ukey, UKEY_EVICTING); > > + transition_ukey(op->ukey, UKEY_EVICTED); > > + } > > ovs_mutex_unlock(&op->ukey->mutex); > > } > > + /* if modify or delete fails, there is no need to push > stats */ > > continue; > > } > > > > + if (op->dop.type != DPIF_OP_FLOW_DEL) { > > + /* Only deleted flows need their stats pushed. */ > > The code looks good, however, what was the conclusion on updating stats on > other flows that now got deleted instead of updated? Do they not need their > stats pushed? See v3 discussion. > > > + continue; > > + } > > + > > + struct dpif_flow_stats *push, *stats, push_buf; > > + > > + stats = op->dop.flow_del.stats; > > + push = &push_buf; > > + > > if (op->ukey) { > > ovs_mutex_lock(&op->ukey->mutex); > > transition_ukey(op->ukey, UKEY_EVICTED); > > -- > > 2.25.1 > >
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 7ad728adf..dd1abfdee 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -2416,26 +2416,41 @@ push_dp_ops(struct udpif *udpif, struct ukey_op *ops, size_t n_ops) for (i = 0; i < n_ops; i++) { struct ukey_op *op = &ops[i]; - struct dpif_flow_stats *push, *stats, push_buf; - - stats = op->dop.flow_del.stats; - push = &push_buf; - - if (op->dop.type != DPIF_OP_FLOW_DEL) { - /* Only deleted flows need their stats pushed. */ - continue; - } if (op->dop.error) { - /* flow_del error, 'stats' is unusable. */ if (op->ukey) { ovs_mutex_lock(&op->ukey->mutex); - transition_ukey(op->ukey, UKEY_EVICTED); + if (op->dop.type == DPIF_OP_FLOW_DEL) + transition_ukey(op->ukey, UKEY_EVICTED); + else { + /* modify error has variety reasons, it's easy + * to transition the state into UKEY_EVICTED, and + * let revalidator_sweep remove the ukey. + * + * In the next round of revalidating, the revalidator + * can recover the ukey through ukey_create_from_dpif_flow, + * if recover fails, the revalidator will remove the datapath + * flow itself. + */ + transition_ukey(op->ukey, UKEY_EVICTING); + transition_ukey(op->ukey, UKEY_EVICTED); + } ovs_mutex_unlock(&op->ukey->mutex); } + /* if modify or delete fails, there is no need to push stats */ continue; } + if (op->dop.type != DPIF_OP_FLOW_DEL) { + /* Only deleted flows need their stats pushed. */ + continue; + } + + struct dpif_flow_stats *push, *stats, push_buf; + + stats = op->dop.flow_del.stats; + push = &push_buf; + if (op->ukey) { ovs_mutex_lock(&op->ukey->mutex); transition_ukey(op->ukey, UKEY_EVICTED);
push_dp_ops only handles delete ops errors but ignores the modify ops results. It's better to handle all the dp operation errors in a consistent way. We observe in the production environment that sometimes a megaflow with wrong actions keep staying in datapath. The coverage command shows revalidators have dumped several times, however the correct actions are not set. This implies that the ukey's action does not equal to the meagaflow's, i.e. revalidators think the underlying megaflow's actions are correct however they are not. We also check the megaflow using the ofproto/trace command, and the actions are not matched with the ones in the actual magaflow. By performing a revalidator/purge command, the right actions are set. This patch prevents the inconsistency by considering modify failure in revalidators. Signed-off-by: Peng He <hepeng.0320@bytedance.com> --- ofproto/ofproto-dpif-upcall.c | 37 ++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-)