diff mbox series

[ovs-dev] timeval: expose unreasonably long poll counter

Message ID 20230830175055.791445-1-aconole@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] timeval: expose unreasonably long poll counter | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Aaron Conole Aug. 30, 2023, 5:50 p.m. UTC
Martin Kennelly observes that even though this data is available to
humans via the journal/log files, these aren't exactly easy for a
developer to make any kind of behavioral inferences.  This kind of
log and counter would be useful when checking on system health to
let us know that an Open vSwitch component is noticing some kind of
system level hiccup.

Add a new coverage counter to track information on these events, and
let a developer or system engineer know how long these events have
occurred with some historical context.

Reported-at: https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
Reported-by: Martin Kennelly <mkennell@redhat.com>
Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/timeval.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Eelco Chaudron Aug. 31, 2023, 7:11 a.m. UTC | #1
On 30 Aug 2023, at 19:50, Aaron Conole wrote:

> Martin Kennelly observes that even though this data is available to
> humans via the journal/log files, these aren't exactly easy for a
> developer to make any kind of behavioral inferences.  This kind of
> log and counter would be useful when checking on system health to
> let us know that an Open vSwitch component is noticing some kind of
> system level hiccup.
>
> Add a new coverage counter to track information on these events, and
> let a developer or system engineer know how long these events have
> occurred with some historical context.
>
> Reported-at: https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
> Reported-by: Martin Kennelly <mkennell@redhat.com>
> Signed-off-by: Aaron Conole <aconole@redhat.com>

Thanks for adding this Aaron! I think it makes sense to be able to see this has happened even if the log rotates, etc.

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ilya Maximets Sept. 8, 2023, 10:38 p.m. UTC | #2
On 8/30/23 19:50, Aaron Conole wrote:
> Martin Kennelly observes that even though this data is available to
> humans via the journal/log files, these aren't exactly easy for a
> developer to make any kind of behavioral inferences.  This kind of
> log and counter would be useful when checking on system health to
> let us know that an Open vSwitch component is noticing some kind of
> system level hiccup.
> 
> Add a new coverage counter to track information on these events, and
> let a developer or system engineer know how long these events have
> occurred with some historical context.
> 
> Reported-at: https://lists.linuxfoundation.org/pipermail/ovs-discuss/2023-June/052523.html
> Reported-by: Martin Kennelly <mkennell@redhat.com>
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  lib/timeval.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/lib/timeval.c b/lib/timeval.c
> index 193c7bab17..00e5f2a74d 100644
> --- a/lib/timeval.c
> +++ b/lib/timeval.c
> @@ -40,6 +40,7 @@
>  #include "openvswitch/vlog.h"
>  
>  VLOG_DEFINE_THIS_MODULE(timeval);
> +COVERAGE_DEFINE(long_poll_interval);
>  
>  #if !defined(HAVE_CLOCK_GETTIME)
>  typedef unsigned int clockid_t;
> @@ -645,6 +646,8 @@ log_poll_interval(long long int last_wakeup)
>          struct rusage rusage;
>  
>          if (!getrusage_thread(&rusage)) {
> +            COVERAGE_INC(long_poll_interval);

Not counting if getrusage_thread failed seems strange, it's still
a long poll interval and it's still going to be logged.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/timeval.c b/lib/timeval.c
index 193c7bab17..00e5f2a74d 100644
--- a/lib/timeval.c
+++ b/lib/timeval.c
@@ -40,6 +40,7 @@ 
 #include "openvswitch/vlog.h"
 
 VLOG_DEFINE_THIS_MODULE(timeval);
+COVERAGE_DEFINE(long_poll_interval);
 
 #if !defined(HAVE_CLOCK_GETTIME)
 typedef unsigned int clockid_t;
@@ -645,6 +646,8 @@  log_poll_interval(long long int last_wakeup)
         struct rusage rusage;
 
         if (!getrusage_thread(&rusage)) {
+            COVERAGE_INC(long_poll_interval);
+
             VLOG_WARN("Unreasonably long %lldms poll interval"
                       " (%lldms user, %lldms system)",
                       interval,