| Message ID | 1650021957-10477-1-git-send-email-wenx05124561@163.com |
|---|---|
| State | Changes Requested |
| Headers | show |
| Series | [ovs-dev,v2] netdev-offload: make netdev-offload-tc work with flow-restore-wait | expand |
| Context | Check | Description |
|---|---|---|
| ovsrobot/apply-robot | success | apply and check: success |
| ovsrobot/intel-ovs-compilation | success | test: success |
| ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
On 15 Apr 2022, at 13:25, wenx05124561@163.com wrote: > From: wenxu <wenxu@chinatelecom.cn> > > The netdev-offload in tc mode can't work with flow-restore-wait. > When the vswitchd restart with flow-restore-wait, the tc qdisc > will be delete in netdev_set_flow_api_enabled. The netdev flow > api can be enabled after the flow-restore-wait flag removing. > > Signed-off-by: wenxu <wenxu@chinatelecom.cn> > --- > lib/netdev-linux.c | 16 ++++++++-------- > vswitchd/bridge.c | 18 ++++++++++-------- > 2 files changed, 18 insertions(+), 16 deletions(-) > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > index 9d12502..b6315e3 100644 > --- a/lib/netdev-linux.c > +++ b/lib/netdev-linux.c > @@ -2784,17 +2784,17 @@ netdev_linux_set_policing(struct netdev *netdev_, uint32_t kbits_rate, > goto out; > } > > - /* Remove any existing ingress qdisc. */ > - error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); > - if (error) { > - VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", > - netdev_name, ovs_strerror(error)); > - goto out; > - } > - > if (kbits_rate || kpkts_rate) { > const char *cls_name = "matchall"; > > + /* Remove any existing ingress qdisc. */ > + error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); > + if (error) { > + VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", > + netdev_name, ovs_strerror(error)); > + goto out; > + } Are we sure we are not breaking a corner case here where something might already have been configured, and we are not cleaning it up? I.e. what if we configured some rate limiting, and now we unconfigure it, the values are all zero, and it will not be removed? > + > error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS); > if (error) { > VLOG_WARN_RL(&rl, "%s: adding policing qdisc failed: %s", > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > index e328d8e..d8843a7 100644 > --- a/vswitchd/bridge.c > +++ b/vswitchd/bridge.c > @@ -3288,8 +3288,17 @@ bridge_run(void) > } > cfg = ovsrec_open_vswitch_first(idl); > > + /* Once the value of flow-restore-wait is false, we no longer should > + * check its value from the database. */ > + if (cfg && ofproto_get_flow_restore_wait()) { > + ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, > + "flow-restore-wait", false)); > + } > + > if (cfg) { > - netdev_set_flow_api_enabled(&cfg->other_config); > + if (!ofproto_get_flow_restore_wait()) { > + netdev_set_flow_api_enabled(&cfg->other_config); See my previous comment here, this needs more input: However, the main problem with this patch might be the following statement in the documentation: “The default value is <code>false</code>. Changing this value requires restarting the daemon” I’ve dealt with a case in the past that was failing because I enabled offload but did not restart. Unfortunately, I can not remember the details, I think it had something to do with feature detection. I’ve copied on some more folks who might know more details about this requirement. > + } > dpdk_init(&cfg->other_config); > userspace_tso_init(&cfg->other_config); > } > @@ -3300,13 +3309,6 @@ bridge_run(void) > * returns immediately. */ > bridge_init_ofproto(cfg); > > - /* Once the value of flow-restore-wait is false, we no longer should > - * check its value from the database. */ > - if (cfg && ofproto_get_flow_restore_wait()) { > - ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, > - "flow-restore-wait", false)); > - } > - > bridge_run__(); > > /* Re-configure SSL. We do this on every trip through the main loop,
On Fri, Apr 22, 2022 at 1:41 AM Eelco Chaudron <echaudro@redhat.com> wrote: > > > > On 15 Apr 2022, at 13:25, wenx05124561@163.com wrote: > > > From: wenxu <wenxu@chinatelecom.cn> > > > > The netdev-offload in tc mode can't work with flow-restore-wait. > > When the vswitchd restart with flow-restore-wait, the tc qdisc > > will be delete in netdev_set_flow_api_enabled. The netdev flow > > api can be enabled after the flow-restore-wait flag removing. > > > > Signed-off-by: wenxu <wenxu@chinatelecom.cn> Hi, I found this patch useful, but it seems inactive for a long time. I hope we can revive and update it. Regardless of the issues pointed out by Eelco, this patch works well for traffic not going through tunnels, but for tunnelled traffic, e.g. geneve traffic, I found that even with the patch, when OVS starts, the ingress qdisc for the genev_sys_6081 device is deleted, so the traffic is still broken even with flow-restore-wait set. I didn't find yet in the code where it could be deleted. Any hint/insight would be appreciated. > > --- > > lib/netdev-linux.c | 16 ++++++++-------- > > vswitchd/bridge.c | 18 ++++++++++-------- > > 2 files changed, 18 insertions(+), 16 deletions(-) > > > > diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c > > index 9d12502..b6315e3 100644 > > --- a/lib/netdev-linux.c > > +++ b/lib/netdev-linux.c > > @@ -2784,17 +2784,17 @@ netdev_linux_set_policing(struct netdev *netdev_, uint32_t kbits_rate, > > goto out; > > } > > > > - /* Remove any existing ingress qdisc. */ > > - error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); > > - if (error) { > > - VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", > > - netdev_name, ovs_strerror(error)); > > - goto out; > > - } > > - > > if (kbits_rate || kpkts_rate) { > > const char *cls_name = "matchall"; > > > > + /* Remove any existing ingress qdisc. */ > > + error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); > > + if (error) { > > + VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", > > + netdev_name, ovs_strerror(error)); > > + goto out; > > + } > > Are we sure we are not breaking a corner case here where something might already have been configured, and we are not cleaning it up? > > I.e. what if we configured some rate limiting, and now we unconfigure it, the values are all zero, and it will not be removed? > Agree. This is incorrect. I think it is better to use the approach from v1, to avoid configuring qos when flow-restore-wait=true, hopefully good enough for the short period before flow restore is done. > > + > > error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS); > > if (error) { > > VLOG_WARN_RL(&rl, "%s: adding policing qdisc failed: %s", > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > > index e328d8e..d8843a7 100644 > > --- a/vswitchd/bridge.c > > +++ b/vswitchd/bridge.c > > @@ -3288,8 +3288,17 @@ bridge_run(void) > > } > > cfg = ovsrec_open_vswitch_first(idl); > > > > + /* Once the value of flow-restore-wait is false, we no longer should > > + * check its value from the database. */ > > + if (cfg && ofproto_get_flow_restore_wait()) { > > + ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, > > + "flow-restore-wait", false)); > > + } > > + > > if (cfg) { > > - netdev_set_flow_api_enabled(&cfg->other_config); > > + if (!ofproto_get_flow_restore_wait()) { > > + netdev_set_flow_api_enabled(&cfg->other_config); > > See my previous comment here, this needs more input: > > However, the main problem with this patch might be the following statement in the documentation: > > “The default value is <code>false</code>. Changing this value requires restarting the daemon” > > I’ve dealt with a case in the past that was failing because I enabled offload but did not restart. Unfortunately, I can not remember the details, I think it had something to do with feature detection. I’ve copied on some more folks who might know more details about this requirement. > > Based on my test, this seems to be fine, but I am not sure if I am just lucky with my environment. Thanks, Han > > > + } > > dpdk_init(&cfg->other_config); > > userspace_tso_init(&cfg->other_config); > > } > > @@ -3300,13 +3309,6 @@ bridge_run(void) > > * returns immediately. */ > > bridge_init_ofproto(cfg); > > > > - /* Once the value of flow-restore-wait is false, we no longer should > > - * check its value from the database. */ > > - if (cfg && ofproto_get_flow_restore_wait()) { > > - ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, > > - "flow-restore-wait", false)); > > - } > > - > > bridge_run__(); > > > > /* Re-configure SSL. We do this on every trip through the main loop, > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 3/14/24 06:23, Han Zhou wrote: > > > On Fri, Apr 22, 2022 at 1:41 AM Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote: >> >> >> >> On 15 Apr 2022, at 13:25, wenx05124561@163.com <mailto:wenx05124561@163.com> wrote: >> >> > From: wenxu <wenxu@chinatelecom.cn <mailto:wenxu@chinatelecom.cn>> >> > >> > The netdev-offload in tc mode can't work with flow-restore-wait. >> > When the vswitchd restart with flow-restore-wait, the tc qdisc >> > will be delete in netdev_set_flow_api_enabled. The netdev flow >> > api can be enabled after the flow-restore-wait flag removing. >> > >> > Signed-off-by: wenxu <wenxu@chinatelecom.cn <mailto:wenxu@chinatelecom.cn>> > > Hi, I found this patch useful, but it seems inactive for a long time. I hope we can revive and update it. > > Regardless of the issues pointed out by Eelco, this patch works well for traffic not going through tunnels, but for tunnelled traffic, e.g. geneve traffic, I found that even with the patch, when OVS starts, the ingress qdisc for the genev_sys_6081 device is deleted, so the traffic is still broken even with flow-restore-wait set. I didn't find yet in the code where it could be deleted. Any hint/insight would be appreciated. I'm not sure what is gong on here, but the tunnel offload is special as it works via egress qdisc on a bridge port. I wonder if this one is getting messed with on restart. Just a guess. Best regards, Ilya Maximets.
On Thu, Mar 21, 2024 at 10:05 AM Ilya Maximets <i.maximets@ovn.org> wrote: > > On 3/14/24 06:23, Han Zhou wrote: > > > > > > On Fri, Apr 22, 2022 at 1:41 AM Eelco Chaudron <echaudro@redhat.com <mailto:echaudro@redhat.com>> wrote: > >> > >> > >> > >> On 15 Apr 2022, at 13:25, wenx05124561@163.com <mailto: wenx05124561@163.com> wrote: > >> > >> > From: wenxu <wenxu@chinatelecom.cn <mailto:wenxu@chinatelecom.cn>> > >> > > >> > The netdev-offload in tc mode can't work with flow-restore-wait. > >> > When the vswitchd restart with flow-restore-wait, the tc qdisc > >> > will be delete in netdev_set_flow_api_enabled. The netdev flow > >> > api can be enabled after the flow-restore-wait flag removing. > >> > > >> > Signed-off-by: wenxu <wenxu@chinatelecom.cn <mailto: wenxu@chinatelecom.cn>> > > > > Hi, I found this patch useful, but it seems inactive for a long time. I hope we can revive and update it. > > > > Regardless of the issues pointed out by Eelco, this patch works well for traffic not going through tunnels, but for tunnelled traffic, e.g. geneve traffic, I found that even with the patch, when OVS starts, the ingress qdisc for the genev_sys_6081 device is deleted, so the traffic is still broken even with flow-restore-wait set. I didn't find yet in the code where it could be deleted. Any hint/insight would be appreciated. > > I'm not sure what is gong on here, but the tunnel offload is special > as it works via egress qdisc on a bridge port. I wonder if this one > is getting messed with on restart. Just a guess. > Thanks Ilya for the hints at the last OVN meeting. I checked again and the reason why it didn't work for the tunnelled traffic was clear. The egress qdisc on the bridge port was preserved. It was only because of the genev_sys_6081, which was recreated when OVS starts, because of in my test environment it was using an old version of OVS which didn't have the patch: b5313a8ceca8 ("ofproto: Fix re-creation of tunnel backing interfaces on restart.") After applying the patch, the genev_sys_6081 is not recreated and the ingress qdisc was preserved for flow-restore-wait, and the tunnelled traffic kept working during OVS restart. So, Ilya, what's your feedback to this patch and the comments? Thanks, Han > Best regards, Ilya Maximets.
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c index 9d12502..b6315e3 100644 --- a/lib/netdev-linux.c +++ b/lib/netdev-linux.c @@ -2784,17 +2784,17 @@ netdev_linux_set_policing(struct netdev *netdev_, uint32_t kbits_rate, goto out; } - /* Remove any existing ingress qdisc. */ - error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); - if (error) { - VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", - netdev_name, ovs_strerror(error)); - goto out; - } - if (kbits_rate || kpkts_rate) { const char *cls_name = "matchall"; + /* Remove any existing ingress qdisc. */ + error = tc_add_del_qdisc(ifindex, false, 0, TC_INGRESS); + if (error) { + VLOG_WARN_RL(&rl, "%s: removing policing failed: %s", + netdev_name, ovs_strerror(error)); + goto out; + } + error = tc_add_del_qdisc(ifindex, true, 0, TC_INGRESS); if (error) { VLOG_WARN_RL(&rl, "%s: adding policing qdisc failed: %s", diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index e328d8e..d8843a7 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -3288,8 +3288,17 @@ bridge_run(void) } cfg = ovsrec_open_vswitch_first(idl); + /* Once the value of flow-restore-wait is false, we no longer should + * check its value from the database. */ + if (cfg && ofproto_get_flow_restore_wait()) { + ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, + "flow-restore-wait", false)); + } + if (cfg) { - netdev_set_flow_api_enabled(&cfg->other_config); + if (!ofproto_get_flow_restore_wait()) { + netdev_set_flow_api_enabled(&cfg->other_config); + } dpdk_init(&cfg->other_config); userspace_tso_init(&cfg->other_config); } @@ -3300,13 +3309,6 @@ bridge_run(void) * returns immediately. */ bridge_init_ofproto(cfg); - /* Once the value of flow-restore-wait is false, we no longer should - * check its value from the database. */ - if (cfg && ofproto_get_flow_restore_wait()) { - ofproto_set_flow_restore_wait(smap_get_bool(&cfg->other_config, - "flow-restore-wait", false)); - } - bridge_run__(); /* Re-configure SSL. We do this on every trip through the main loop,