diff mbox series

[ovs-dev] ofproto: report coverage on hitting datapath flow limit

Message ID 20200420134342.17502-1-gmuthukr@redhat.com
State Accepted
Commit e3ca911fc116714466e06de25901978cba8f9718
Headers show
Series [ovs-dev] ofproto: report coverage on hitting datapath flow limit | expand

Commit Message

Gowrishankar Muthukrishnan April 20, 2020, 1:43 p.m. UTC
Whenever the number of flows in the datapath crosses above
the flow limit set/autoconfigured, it is helpful to report
this event through coverage counter for an operator/devops
engineer to know and take proactive corrections in the
switch configuration.

Today, these events are reported in ovs vswitch log when
a new flow can not be inserted in upcall processing in which
case ovs writes a warning, otherwise an auto correction
made by ovs to flush old flows without any intimation at all.

Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
---
 ofproto/ofproto-dpif-upcall.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Gowrishankar Muthukrishnan May 14, 2020, 12:15 p.m. UTC | #1
I see this change fixing a commit b4c632526b pushed a few years back, but
still there is room for instrumentation like below. Could this patch be
reviewed ?.

Thanks,
Gowrishankar

On Mon, Apr 20, 2020 at 7:13 PM Gowrishankar Muthukrishnan <
gmuthukr@redhat.com> wrote:

> Whenever the number of flows in the datapath crosses above
> the flow limit set/autoconfigured, it is helpful to report
> this event through coverage counter for an operator/devops
> engineer to know and take proactive corrections in the
> switch configuration.
>
> Today, these events are reported in ovs vswitch log when
> a new flow can not be inserted in upcall processing in which
> case ovs writes a warning, otherwise an auto correction
> made by ovs to flush old flows without any intimation at all.
>
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
> ---
>  ofproto/ofproto-dpif-upcall.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 5e08ef10d..a76532ec7 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(upcall_ukey_contention);
>  COVERAGE_DEFINE(upcall_ukey_replace);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(upcall_flow_limit_hit);
>
>  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
>   * and possibly sets up a kernel flow as a cache. */
> @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct
> upcall *upcall)
>
>      atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
>      if (udpif_get_n_flows(udpif) >= flow_limit) {
> +        COVERAGE_INC(upcall_flow_limit_hit);
>          VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
>          return false;
>      }
> @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator)
>           *       datapath flows, so we will recover before all the flows
> are
>           *       gone.) */
>          n_dp_flows = udpif_get_n_flows(udpif);
> +        if (n_dp_flows >= flow_limit) {
> +            COVERAGE_INC(upcall_flow_limit_hit);
> +        }
> +
>          kill_them_all = n_dp_flows > flow_limit * 2;
>          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
>
> --
> 2.21.1
>
>
William Tu May 14, 2020, 2:36 p.m. UTC | #2
On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote:
> Whenever the number of flows in the datapath crosses above
> the flow limit set/autoconfigured, it is helpful to report
> this event through coverage counter for an operator/devops
> engineer to know and take proactive corrections in the
> switch configuration.
> 
> Today, these events are reported in ovs vswitch log when
> a new flow can not be inserted in upcall processing in which
> case ovs writes a warning, otherwise an auto correction
> made by ovs to flush old flows without any intimation at all.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
> ---

Thanks, the patch looks good to me.

I thought logging to ovs-vswitchd.log is good enough, because
that's usually the first file we look, then if necessary we check
the coverage log. Just curious, do you have some case where you
keep seeing the flow_limit_hit frequently?

Acked-by: William Tu <u9012063@gmail.com>

>  ofproto/ofproto-dpif-upcall.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index 5e08ef10d..a76532ec7 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall);
>  COVERAGE_DEFINE(upcall_ukey_contention);
>  COVERAGE_DEFINE(upcall_ukey_replace);
>  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> +COVERAGE_DEFINE(upcall_flow_limit_hit);
>  
>  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
>   * and possibly sets up a kernel flow as a cache. */
> @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct upcall *upcall)
>  
>      atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
>      if (udpif_get_n_flows(udpif) >= flow_limit) {
> +        COVERAGE_INC(upcall_flow_limit_hit);
>          VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
>          return false;
>      }
> @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator)
>           *       datapath flows, so we will recover before all the flows are
>           *       gone.) */
>          n_dp_flows = udpif_get_n_flows(udpif);
> +        if (n_dp_flows >= flow_limit) {
> +            COVERAGE_INC(upcall_flow_limit_hit);
> +        }
> +
>          kill_them_all = n_dp_flows > flow_limit * 2;
>          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
>  
> -- 
> 2.21.1
> 
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Gowrishankar Muthukrishnan May 15, 2020, 11:53 a.m. UTC | #3
On Thu, May 14, 2020 at 8:06 PM William Tu <u9012063@gmail.com> wrote:

> On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote:
> > Whenever the number of flows in the datapath crosses above
> > the flow limit set/autoconfigured, it is helpful to report
> > this event through coverage counter for an operator/devops
> > engineer to know and take proactive corrections in the
> > switch configuration.
> >
> > Today, these events are reported in ovs vswitch log when
> > a new flow can not be inserted in upcall processing in which
> > case ovs writes a warning, otherwise an auto correction
> > made by ovs to flush old flows without any intimation at all.
> >
> > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
> > ---
>
> Thanks, the patch looks good to me.
>
> I thought logging to ovs-vswitchd.log is good enough, because
> that's usually the first file we look, then if necessary we check
> the coverage log. Just curious, do you have some case where you
> keep seeing the flow_limit_hit frequently?
>
> I have not had a chance to see this event in deployments, but with a
coverage counter we can check *quickly* going forward.
Thanks for accepting the patch.

Regards,
Gowrishankar

> Acked-by: William Tu <u9012063@gmail.com>
>
> >  ofproto/ofproto-dpif-upcall.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/ofproto/ofproto-dpif-upcall.c
> b/ofproto/ofproto-dpif-upcall.c
> > index 5e08ef10d..a76532ec7 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall);
> >  COVERAGE_DEFINE(upcall_ukey_contention);
> >  COVERAGE_DEFINE(upcall_ukey_replace);
> >  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> > +COVERAGE_DEFINE(upcall_flow_limit_hit);
> >
> >  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
> >   * and possibly sets up a kernel flow as a cache. */
> > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct
> upcall *upcall)
> >
> >      atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
> >      if (udpif_get_n_flows(udpif) >= flow_limit) {
> > +        COVERAGE_INC(upcall_flow_limit_hit);
> >          VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
> >          return false;
> >      }
> > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator)
> >           *       datapath flows, so we will recover before all the
> flows are
> >           *       gone.) */
> >          n_dp_flows = udpif_get_n_flows(udpif);
> > +        if (n_dp_flows >= flow_limit) {
> > +            COVERAGE_INC(upcall_flow_limit_hit);
> > +        }
> > +
> >          kill_them_all = n_dp_flows > flow_limit * 2;
> >          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
> >
> > --
> > 2.21.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
Gowrishankar Muthukrishnan July 9, 2020, 6:45 a.m. UTC | #4
Hi,
Any more ack(s) required for this patch to be merged ?.

Regards,
Gowrishankar

On Thu, May 14, 2020 at 8:06 PM William Tu <u9012063@gmail.com> wrote:

> On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote:
> > Whenever the number of flows in the datapath crosses above
> > the flow limit set/autoconfigured, it is helpful to report
> > this event through coverage counter for an operator/devops
> > engineer to know and take proactive corrections in the
> > switch configuration.
> >
> > Today, these events are reported in ovs vswitch log when
> > a new flow can not be inserted in upcall processing in which
> > case ovs writes a warning, otherwise an auto correction
> > made by ovs to flush old flows without any intimation at all.
> >
> > Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
> > ---
>
> Thanks, the patch looks good to me.
>
> I thought logging to ovs-vswitchd.log is good enough, because
> that's usually the first file we look, then if necessary we check
> the coverage log. Just curious, do you have some case where you
> keep seeing the flow_limit_hit frequently?
>
> Acked-by: William Tu <u9012063@gmail.com>
>
> >  ofproto/ofproto-dpif-upcall.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/ofproto/ofproto-dpif-upcall.c
> b/ofproto/ofproto-dpif-upcall.c
> > index 5e08ef10d..a76532ec7 100644
> > --- a/ofproto/ofproto-dpif-upcall.c
> > +++ b/ofproto/ofproto-dpif-upcall.c
> > @@ -56,6 +56,7 @@ COVERAGE_DEFINE(handler_duplicate_upcall);
> >  COVERAGE_DEFINE(upcall_ukey_contention);
> >  COVERAGE_DEFINE(upcall_ukey_replace);
> >  COVERAGE_DEFINE(revalidate_missed_dp_flow);
> > +COVERAGE_DEFINE(upcall_flow_limit_hit);
> >
> >  /* A thread that reads upcalls from dpif, forwards each upcall's packet,
> >   * and possibly sets up a kernel flow as a cache. */
> > @@ -1281,6 +1282,7 @@ should_install_flow(struct udpif *udpif, struct
> upcall *upcall)
> >
> >      atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
> >      if (udpif_get_n_flows(udpif) >= flow_limit) {
> > +        COVERAGE_INC(upcall_flow_limit_hit);
> >          VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
> >          return false;
> >      }
> > @@ -2624,6 +2626,10 @@ revalidate(struct revalidator *revalidator)
> >           *       datapath flows, so we will recover before all the
> flows are
> >           *       gone.) */
> >          n_dp_flows = udpif_get_n_flows(udpif);
> > +        if (n_dp_flows >= flow_limit) {
> > +            COVERAGE_INC(upcall_flow_limit_hit);
> > +        }
> > +
> >          kill_them_all = n_dp_flows > flow_limit * 2;
> >          max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;
> >
> > --
> > 2.21.1
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
William Tu July 13, 2020, 11:15 p.m. UTC | #5
On Mon, Apr 20, 2020 at 07:13:42PM +0530, Gowrishankar Muthukrishnan wrote:
> Whenever the number of flows in the datapath crosses above
> the flow limit set/autoconfigured, it is helpful to report
> this event through coverage counter for an operator/devops
> engineer to know and take proactive corrections in the
> switch configuration.
> 
> Today, these events are reported in ovs vswitch log when
> a new flow can not be inserted in upcall processing in which
> case ovs writes a warning, otherwise an auto correction
> made by ovs to flush old flows without any intimation at all.
> 
> Signed-off-by: Gowrishankar Muthukrishnan <gmuthukr@redhat.com>
Applied to master, thanks
William
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 5e08ef10d..a76532ec7 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -56,6 +56,7 @@  COVERAGE_DEFINE(handler_duplicate_upcall);
 COVERAGE_DEFINE(upcall_ukey_contention);
 COVERAGE_DEFINE(upcall_ukey_replace);
 COVERAGE_DEFINE(revalidate_missed_dp_flow);
+COVERAGE_DEFINE(upcall_flow_limit_hit);
 
 /* A thread that reads upcalls from dpif, forwards each upcall's packet,
  * and possibly sets up a kernel flow as a cache. */
@@ -1281,6 +1282,7 @@  should_install_flow(struct udpif *udpif, struct upcall *upcall)
 
     atomic_read_relaxed(&udpif->flow_limit, &flow_limit);
     if (udpif_get_n_flows(udpif) >= flow_limit) {
+        COVERAGE_INC(upcall_flow_limit_hit);
         VLOG_WARN_RL(&rl, "upcall: datapath flow limit reached");
         return false;
     }
@@ -2624,6 +2626,10 @@  revalidate(struct revalidator *revalidator)
          *       datapath flows, so we will recover before all the flows are
          *       gone.) */
         n_dp_flows = udpif_get_n_flows(udpif);
+        if (n_dp_flows >= flow_limit) {
+            COVERAGE_INC(upcall_flow_limit_hit);
+        }
+
         kill_them_all = n_dp_flows > flow_limit * 2;
         max_idle = n_dp_flows > flow_limit ? 100 : ofproto_max_idle;