diff mbox series

[ovs-dev,v10] ofproto-dpif-upcall: Don't set statistics to 0 when they jump back

Message ID 20230523131115.193932-1-bnemeth@redhat.com
State Changes Requested
Headers show
Series [ovs-dev,v10] ofproto-dpif-upcall: Don't set statistics to 0 when they jump back | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Balazs Nemeth May 23, 2023, 1:11 p.m. UTC
The only way that stats->{n_packets,n_bytes} would decrease is due to an
overflow, or if there are bugs in how statistics are handled. In the
past, there were multiple issues that caused a jump backward. A
workaround was in place to set the statistics to 0 in that case. When
this happened while the revalidator was under heavy load, the workaround
had an unintended side effect where should_revalidate returned false
causing the flow to be removed because the metric it calculated was
based on a bogus value. Since many of those bugs have now been
identified and resolved, there is no need to set the statistics to 0. In
addition, the (unlikely) overflow still needs to be handled
appropriately. If an unexpected jump does happen, just log it as a
warning.

Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
---
 ofproto/ofproto-dpif-upcall.c | 26 ++++++++++++++++++--------
 1 file changed, 18 insertions(+), 8 deletions(-)

Comments

Eelco Chaudron May 24, 2023, 10:10 a.m. UTC | #1
On 23 May 2023, at 15:11, Balazs Nemeth wrote:

> The only way that stats->{n_packets,n_bytes} would decrease is due to an
> overflow, or if there are bugs in how statistics are handled. In the
> past, there were multiple issues that caused a jump backward. A
> workaround was in place to set the statistics to 0 in that case. When
> this happened while the revalidator was under heavy load, the workaround
> had an unintended side effect where should_revalidate returned false
> causing the flow to be removed because the metric it calculated was
> based on a bogus value. Since many of those bugs have now been
> identified and resolved, there is no need to set the statistics to 0. In
> addition, the (unlikely) overflow still needs to be handled
> appropriately. If an unexpected jump does happen, just log it as a
> warning.
>
> Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>

Thanks for all the re-work, this revision looks good to me.

Acked-by: Eelco Chaudron <echaudro@redhat.com>

Cheers,

Eelco
Ilya Maximets May 24, 2023, 7:05 p.m. UTC | #2
On 5/23/23 15:11, Balazs Nemeth wrote:
> The only way that stats->{n_packets,n_bytes} would decrease is due to an
> overflow, or if there are bugs in how statistics are handled. In the
> past, there were multiple issues that caused a jump backward. A
> workaround was in place to set the statistics to 0 in that case. When
> this happened while the revalidator was under heavy load, the workaround
> had an unintended side effect where should_revalidate returned false
> causing the flow to be removed because the metric it calculated was
> based on a bogus value. Since many of those bugs have now been
> identified and resolved, there is no need to set the statistics to 0. In
> addition, the (unlikely) overflow still needs to be handled
> appropriately. If an unexpected jump does happen, just log it as a
> warning.
> 
> Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
> ---
>  ofproto/ofproto-dpif-upcall.c | 26 ++++++++++++++++++--------
>  1 file changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
> index cd57fdbd9..b06315d22 100644
> --- a/ofproto/ofproto-dpif-upcall.c
> +++ b/ofproto/ofproto-dpif-upcall.c
> @@ -2368,24 +2368,34 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>      enum reval_result result = UKEY_DELETE;
>      struct dpif_flow_stats push;
>  
> -    ofpbuf_clear(odp_actions);
> -
>      push.used = stats->used;
>      push.tcp_flags = stats->tcp_flags;
> -    push.n_packets = (stats->n_packets > ukey->stats.n_packets
> -                      ? stats->n_packets - ukey->stats.n_packets
> -                      : 0);
> -    push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes
> -                    ? stats->n_bytes - ukey->stats.n_bytes
> -                    : 0);
> +    push.n_packets = stats->n_packets - ukey->stats.n_packets;
> +    push.n_bytes = stats->n_bytes - ukey->stats.n_bytes;
>  
>      if (stats->n_packets < ukey->stats.n_packets &&
>          ukey->stats.n_packets < UINT64_THREE_QUARTERS) {
> +        static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 5);
> +        struct ds ds = DS_EMPTY_INITIALIZER;
> +
>          /* Report cases where the packet counter is lower than the previous
>           * instance, but exclude the potential wrapping of an uint64_t. */
>          COVERAGE_INC(ukey_invalid_stat_reset);
> +
> +        odp_flow_key_format(ukey->key, ukey->key_len, &ds);
> +        ds_put_cstr(&ds, ", actions:");
> +        format_odp_actions(&ds, odp_actions->data, odp_actions->size, NULL);

The 'odp_actions' here is a scratch pad for the processing below,
it doesn't contain any actual actions for this ukey.
We should get real actions from the ukey->actions with rcu_get()
and print them out instead, e.g.:

        actions = ovsrcu_get(struct ofpbuf *, &ukey->actions);
        format_odp_actions(&ds, actions->data, actions->size, NULL);

> +        odp_format_ufid(&ukey->ufid, &ds);
> +        ds_put_cstr(&ds, " ");

Above two lines should be in the opposite order, otherwise ufid is
glued to the actions.

> +
> +        VLOG_WARN_RL(&rll, "Unexpected jump in packet stats from %"PRIu64
> +                " to %"PRIu64" when handling ukey %s",
> +                stats->n_packets, ukey->stats.n_packets, ds_cstr(&ds));
> +        ds_destroy(&ds);
>      }
>  
> +    ofpbuf_clear(odp_actions);

No need to move this.

Best regards, Ilya Maximets.
Eelco Chaudron May 24, 2023, 7:11 p.m. UTC | #3
On 24 May 2023, at 21:05, Ilya Maximets wrote:

> On 5/23/23 15:11, Balazs Nemeth wrote:
>> The only way that stats->{n_packets,n_bytes} would decrease is due to an
>> overflow, or if there are bugs in how statistics are handled. In the
>> past, there were multiple issues that caused a jump backward. A
>> workaround was in place to set the statistics to 0 in that case. When
>> this happened while the revalidator was under heavy load, the workaround
>> had an unintended side effect where should_revalidate returned false
>> causing the flow to be removed because the metric it calculated was
>> based on a bogus value. Since many of those bugs have now been
>> identified and resolved, there is no need to set the statistics to 0. In
>> addition, the (unlikely) overflow still needs to be handled
>> appropriately. If an unexpected jump does happen, just log it as a
>> warning.
>>
>> Signed-off-by: Balazs Nemeth <bnemeth@redhat.com>
>> ---
>>  ofproto/ofproto-dpif-upcall.c | 26 ++++++++++++++++++--------
>>  1 file changed, 18 insertions(+), 8 deletions(-)
>>
>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>> index cd57fdbd9..b06315d22 100644
>> --- a/ofproto/ofproto-dpif-upcall.c
>> +++ b/ofproto/ofproto-dpif-upcall.c
>> @@ -2368,24 +2368,34 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>      enum reval_result result = UKEY_DELETE;
>>      struct dpif_flow_stats push;
>>
>> -    ofpbuf_clear(odp_actions);
>> -
>>      push.used = stats->used;
>>      push.tcp_flags = stats->tcp_flags;
>> -    push.n_packets = (stats->n_packets > ukey->stats.n_packets
>> -                      ? stats->n_packets - ukey->stats.n_packets
>> -                      : 0);
>> -    push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes
>> -                    ? stats->n_bytes - ukey->stats.n_bytes
>> -                    : 0);
>> +    push.n_packets = stats->n_packets - ukey->stats.n_packets;
>> +    push.n_bytes = stats->n_bytes - ukey->stats.n_bytes;
>>
>>      if (stats->n_packets < ukey->stats.n_packets &&
>>          ukey->stats.n_packets < UINT64_THREE_QUARTERS) {
>> +        static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 5);
>> +        struct ds ds = DS_EMPTY_INITIALIZER;
>> +
>>          /* Report cases where the packet counter is lower than the previous
>>           * instance, but exclude the potential wrapping of an uint64_t. */
>>          COVERAGE_INC(ukey_invalid_stat_reset);
>> +
>> +        odp_flow_key_format(ukey->key, ukey->key_len, &ds);
>> +        ds_put_cstr(&ds, ", actions:");
>> +        format_odp_actions(&ds, odp_actions->data, odp_actions->size, NULL);
>
> The 'odp_actions' here is a scratch pad for the processing below,
> it doesn't contain any actual actions for this ukey.
> We should get real actions from the ukey->actions with rcu_get()
> and print them out instead, e.g.:
>
>         actions = ovsrcu_get(struct ofpbuf *, &ukey->actions);
>         format_odp_actions(&ds, actions->data, actions->size, NULL);
>
>> +        odp_format_ufid(&ukey->ufid, &ds);
>> +        ds_put_cstr(&ds, " ");
>
> Above two lines should be in the opposite order, otherwise ufid is
> glued to the actions.

I guess this should be before the odp_flow_key_format() function, missed it when reviewing :( The same as other logs do.

>> +
>> +        VLOG_WARN_RL(&rll, "Unexpected jump in packet stats from %"PRIu64
>> +                " to %"PRIu64" when handling ukey %s",
>> +                stats->n_packets, ukey->stats.n_packets, ds_cstr(&ds));
>> +        ds_destroy(&ds);
>>      }
>>
>> +    ofpbuf_clear(odp_actions);
>
> No need to move this.
>
> Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index cd57fdbd9..b06315d22 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -2368,24 +2368,34 @@  revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
     enum reval_result result = UKEY_DELETE;
     struct dpif_flow_stats push;
 
-    ofpbuf_clear(odp_actions);
-
     push.used = stats->used;
     push.tcp_flags = stats->tcp_flags;
-    push.n_packets = (stats->n_packets > ukey->stats.n_packets
-                      ? stats->n_packets - ukey->stats.n_packets
-                      : 0);
-    push.n_bytes = (stats->n_bytes > ukey->stats.n_bytes
-                    ? stats->n_bytes - ukey->stats.n_bytes
-                    : 0);
+    push.n_packets = stats->n_packets - ukey->stats.n_packets;
+    push.n_bytes = stats->n_bytes - ukey->stats.n_bytes;
 
     if (stats->n_packets < ukey->stats.n_packets &&
         ukey->stats.n_packets < UINT64_THREE_QUARTERS) {
+        static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 5);
+        struct ds ds = DS_EMPTY_INITIALIZER;
+
         /* Report cases where the packet counter is lower than the previous
          * instance, but exclude the potential wrapping of an uint64_t. */
         COVERAGE_INC(ukey_invalid_stat_reset);
+
+        odp_flow_key_format(ukey->key, ukey->key_len, &ds);
+        ds_put_cstr(&ds, ", actions:");
+        format_odp_actions(&ds, odp_actions->data, odp_actions->size, NULL);
+        odp_format_ufid(&ukey->ufid, &ds);
+        ds_put_cstr(&ds, " ");
+
+        VLOG_WARN_RL(&rll, "Unexpected jump in packet stats from %"PRIu64
+                " to %"PRIu64" when handling ukey %s",
+                stats->n_packets, ukey->stats.n_packets, ds_cstr(&ds));
+        ds_destroy(&ds);
     }
 
+    ofpbuf_clear(odp_actions);
+
     if (need_revalidate) {
         if (should_revalidate(udpif, ukey, push.n_packets)) {
             if (!ukey->xcache) {