Message ID | 20200420134342.17502-1-gmuthukr@redhat.com |
---|---|
State | Accepted |
Commit | e3ca911fc116714466e06de25901978cba8f9718 |
Headers | show |
Series | [ovs-dev] ofproto: report coverage on hitting datapath flow limit | expand |
I see this change fixing a commit b4c632526b pushed a few years back, but still there is room for instrumentation like below. Could this patch be reviewed ?. Thanks, Gowrishankar On Mon, Apr 20, 2020 at 7:13 PM Gowrishankar Muthukrishnan < gmuthukr@redhat.com> wrote: > Whenever the number of flows in the datapath crosses above > the flow limit set/autoconfigured, it is helpful to report > this event through coverage counter for an operator/devops > engineer to know and take proactive corrections in the > switch configuration. > > Today, these events are reported in ovs vswitch log when > a new flow can not be inserted in upcall processing in which > case ovs writes a warning, otherwise an auto correction > made by ovs to flush old flows without any intimation at all. > > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> > --- > ofproto/ofproto-dpif-upcall.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 5e08ef10d..a76532ec7 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); > COVERAGE_DEFINE(upcall_ukey_contention); > COVERAGE_DEFINE(upcall_ukey_replace); > COVERAGE_DEFINE(revalidate_missed_dp_flow); > +COVERAGE_DEFINE(upcall_flow_limit_hit); > > /* A thread that reads upcalls from dpif, forwards each upcall's packet, > * and possibly sets up a kernel flow as a cache. */ > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct > upcall *upcall) > > atomic_read_relaxed(&udpif->flow_limit, &flow_limit); > if (udpif_get_n_flows(udpif) >= flow_limit) { > + COVERAGE_INC(upcall_flow_limit_hit); > VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached"); > return false; > } > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator) > * datapath flows, so we will recover before all the flows > are > * gone.) */ > n_dp_flows = udpif_get_n_flows(udpif); > + if (n_dp_flows >= flow_limit) { > + COVERAGE_INC(upcall_flow_limit_hit); > + } > + > kill_them_all = n_dp_flows > flow_limit * 2; > max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle; > > -- > 2.21.1 > >
On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote: > Whenever the number of flows in the datapath crosses above > the flow limit set/autoconfigured, it is helpful to report > this event through coverage counter for an operator/devops > engineer to know and take proactive corrections in the > switch configuration. > > Today, these events are reported in ovs vswitch log when > a new flow can not be inserted in upcall processing in which > case ovs writes a warning, otherwise an auto correction > made by ovs to flush old flows without any intimation at all. > > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> > --- Thanks, the patch looks good to me. I thought logging to ovs-vswitchd.log is good enough, because that's usually the first file we look, then if necessary we check the coverage log. Just curious, do you have some case where you keep seeing the flow_limit_hit frequently? Acked-by: William Tu <u9012063@gmail.com> > ofproto/ofproto-dpif-upcall.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 5e08ef10d..a76532ec7 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); > COVERAGE_DEFINE(upcall_ukey_contention); > COVERAGE_DEFINE(upcall_ukey_replace); > COVERAGE_DEFINE(revalidate_missed_dp_flow); > +COVERAGE_DEFINE(upcall_flow_limit_hit); > > /* A thread that reads upcalls from dpif, forwards each upcall's packet, > * and possibly sets up a kernel flow as a cache. */ > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct upcall *upcall) > > atomic_read_relaxed(&udpif->flow_limit, &flow_limit); > if (udpif_get_n_flows(udpif) >= flow_limit) { > + COVERAGE_INC(upcall_flow_limit_hit); > VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached"); > return false; > } > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator) > * datapath flows, so we will recover before all the flows are > * gone.) */ > n_dp_flows = udpif_get_n_flows(udpif); > + if (n_dp_flows >= flow_limit) { > + COVERAGE_INC(upcall_flow_limit_hit); > + } > + > kill_them_all = n_dp_flows > flow_limit * 2; > max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle; > > -- > 2.21.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, May 14, 2020 at 8:06 PM William Tu <u9012063@gmail.com> wrote: > On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote: > > Whenever the number of flows in the datapath crosses above > > the flow limit set/autoconfigured, it is helpful to report > > this event through coverage counter for an operator/devops > > engineer to know and take proactive corrections in the > > switch configuration. > > > > Today, these events are reported in ovs vswitch log when > > a new flow can not be inserted in upcall processing in which > > case ovs writes a warning, otherwise an auto correction > > made by ovs to flush old flows without any intimation at all. > > > > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> > > --- > > Thanks, the patch looks good to me. > > I thought logging to ovs-vswitchd.log is good enough, because > that's usually the first file we look, then if necessary we check > the coverage log. Just curious, do you have some case where you > keep seeing the flow_limit_hit frequently? > > I have not had a chance to see this event in deployments, but with a coverage counter we can check *quickly* going forward. Thanks for accepting the patch. Regards, Gowrishankar > Acked-by: William Tu <u9012063@gmail.com> > > > ofproto/ofproto-dpif-upcall.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 5e08ef10d..a76532ec7 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); > > COVERAGE_DEFINE(upcall_ukey_contention); > > COVERAGE_DEFINE(upcall_ukey_replace); > > COVERAGE_DEFINE(revalidate_missed_dp_flow); > > +COVERAGE_DEFINE(upcall_flow_limit_hit); > > > > /* A thread that reads upcalls from dpif, forwards each upcall's packet, > > * and possibly sets up a kernel flow as a cache. */ > > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct > upcall *upcall) > > > > atomic_read_relaxed(&udpif->flow_limit, &flow_limit); > > if (udpif_get_n_flows(udpif) >= flow_limit) { > > + COVERAGE_INC(upcall_flow_limit_hit); > > VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached"); > > return false; > > } > > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator) > > * datapath flows, so we will recover before all the > flows are > > * gone.) */ > > n_dp_flows = udpif_get_n_flows(udpif); > > + if (n_dp_flows >= flow_limit) { > > + COVERAGE_INC(upcall_flow_limit_hit); > > + } > > + > > kill_them_all = n_dp_flows > flow_limit * 2; > > max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle; > > > > -- > > 2.21.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
Hi, Any more ack(s) required for this patch to be merged ?. Regards, Gowrishankar On Thu, May 14, 2020 at 8:06 PM William Tu <u9012063@gmail.com> wrote: > On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote: > > Whenever the number of flows in the datapath crosses above > > the flow limit set/autoconfigured, it is helpful to report > > this event through coverage counter for an operator/devops > > engineer to know and take proactive corrections in the > > switch configuration. > > > > Today, these events are reported in ovs vswitch log when > > a new flow can not be inserted in upcall processing in which > > case ovs writes a warning, otherwise an auto correction > > made by ovs to flush old flows without any intimation at all. > > > > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> > > --- > > Thanks, the patch looks good to me. > > I thought logging to ovs-vswitchd.log is good enough, because > that's usually the first file we look, then if necessary we check > the coverage log. Just curious, do you have some case where you > keep seeing the flow_limit_hit frequently? > > Acked-by: William Tu <u9012063@gmail.com> > > > ofproto/ofproto-dpif-upcall.c | 6 ++++++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 5e08ef10d..a76532ec7 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); > > COVERAGE_DEFINE(upcall_ukey_contention); > > COVERAGE_DEFINE(upcall_ukey_replace); > > COVERAGE_DEFINE(revalidate_missed_dp_flow); > > +COVERAGE_DEFINE(upcall_flow_limit_hit); > > > > /* A thread that reads upcalls from dpif, forwards each upcall's packet, > > * and possibly sets up a kernel flow as a cache. */ > > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct > upcall *upcall) > > > > atomic_read_relaxed(&udpif->flow_limit, &flow_limit); > > if (udpif_get_n_flows(udpif) >= flow_limit) { > > + COVERAGE_INC(upcall_flow_limit_hit); > > VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached"); > > return false; > > } > > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator) > > * datapath flows, so we will recover before all the > flows are > > * gone.) */ > > n_dp_flows = udpif_get_n_flows(udpif); > > + if (n_dp_flows >= flow_limit) { > > + COVERAGE_INC(upcall_flow_limit_hit); > > + } > > + > > kill_them_all = n_dp_flows > flow_limit * 2; > > max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle; > > > > -- > > 2.21.1 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > >
On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote: > Whenever the number of flows in the datapath crosses above > the flow limit set/autoconfigured, it is helpful to report > this event through coverage counter for an operator/devops > engineer to know and take proactive corrections in the > switch configuration. > > Today, these events are reported in ovs vswitch log when > a new flow can not be inserted in upcall processing in which > case ovs writes a warning, otherwise an auto correction > made by ovs to flush old flows without any intimation at all. > > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> Applied to master, thanks William
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 5e08ef10d..a76532ec7 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall); COVERAGE_DEFINE(upcall_ukey_contention); COVERAGE_DEFINE(upcall_ukey_replace); COVERAGE_DEFINE(revalidate_missed_dp_flow); +COVERAGE_DEFINE(upcall_flow_limit_hit); /* A thread that reads upcalls from dpif, forwards each upcall's packet, * and possibly sets up a kernel flow as a cache. */ @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct upcall *upcall) atomic_read_relaxed(&udpif->flow_limit, &flow_limit); if (udpif_get_n_flows(udpif) >= flow_limit) { + COVERAGE_INC(upcall_flow_limit_hit); VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached"); return false; } @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator) * datapath flows, so we will recover before all the flows are * gone.) */ n_dp_flows = udpif_get_n_flows(udpif); + if (n_dp_flows >= flow_limit) { + COVERAGE_INC(upcall_flow_limit_hit); + } + kill_them_all = n_dp_flows > flow_limit * 2; max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
Whenever the number of flows in the datapath crosses above the flow limit set/autoconfigured, it is helpful to report this event through coverage counter for an operator/devops engineer to know and take proactive corrections in the switch configuration. Today, these events are reported in ovs vswitch log when a new flow can not be inserted in upcall processing in which case ovs writes a warning, otherwise an auto correction made by ovs to flush old flows without any intimation at all. Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com> --- ofproto/ofproto-dpif-upcall.c | 6 ++++++ 1 file changed, 6 insertions(+)