diff mbox series

[ovs-dev] ofproto-dpif-upcall: Fix ukey installation failure logs and counters.

Message ID 20240404120938.2207251-1-i.maximets@ovn.org
State Accepted
Commit ed379a810ac81bfa7d86543d7b1914b17d2c0063
Headers show
Series [ovs-dev] ofproto-dpif-upcall: Fix ukey installation failure logs and counters. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/intel-ovs-compilation success test: success

Commit Message

Ilya Maximets April 4, 2024, 12:09 p.m. UTC
ukey_install() returns boolean signaling if the ukey was installed
or not.  Installation may fail for a few reasons:

 1. Conflicting ukey.
 2. Mutex contention while trying to replace existing ukey.
 3. The same ukey already exists and active.

Only the first case here signals an actual problem.  Third one is
a little odd for userspace datapath, but harmless.  Second is the
most common one that can easily happen during normal operation
since other threads like revalidators may be currently working on
this ukey preventing an immediate access.

Since only the first case is actually worth logging and it already
has its own log message, removing the 'upcall installation fails'
warning from the upcall_cb().  This should fix most of the random
failures of userspace system tests in CI.

While at it, also fixing coverage counters.  Mutex contention was
mistakenly counted as a duplicate upcall.  ukey contention for
revalidators was counted only in one of two places.

New counter added for the ukey contention on replace.  We should
not re-use existing upcall_ukey_contention counter for this, since
it may lead to double counting.

Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.")
Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.")
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 ofproto/ofproto-dpif-upcall.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

Comments

Eelco Chaudron April 5, 2024, 12:33 p.m. UTC | #1
On 4 Apr 2024, at 14:09, Ilya Maximets wrote:

> ukey_install() returns boolean signaling if the ukey was installed
> or not.  Installation may fail for a few reasons:
>
>  1. Conflicting ukey.
>  2. Mutex contention while trying to replace existing ukey.
>  3. The same ukey already exists and active.
>
> Only the first case here signals an actual problem.  Third one is
> a little odd for userspace datapath, but harmless.  Second is the
> most common one that can easily happen during normal operation
> since other threads like revalidators may be currently working on
> this ukey preventing an immediate access.
>
> Since only the first case is actually worth logging and it already
> has its own log message, removing the 'upcall installation fails'
> warning from the upcall_cb().  This should fix most of the random
> failures of userspace system tests in CI.
>
> While at it, also fixing coverage counters.  Mutex contention was
> mistakenly counted as a duplicate upcall.  ukey contention for
> revalidators was counted only in one of two places.
>
> New counter added for the ukey contention on replace.  We should
> not re-use existing upcall_ukey_contention counter for this, since
> it may lead to double counting.
>
> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.")
> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.")
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

Thanks for looking into this, and the patch looks good to me.

Maybe we should have another patch fixing some of the namings?

  upcall_ukey_replace -> ukey_replace
  handler_duplicate_upcall -> duplicate_upcall
  upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention

Cheers,

Eelco

Acked-by: Eelco Chaudron <echaudro@redhat.com>
Ilya Maximets April 5, 2024, 1:04 p.m. UTC | #2
On 4/5/24 14:33, Eelco Chaudron wrote:
> 
> 
> On 4 Apr 2024, at 14:09, Ilya Maximets wrote:
> 
>> ukey_install() returns boolean signaling if the ukey was installed
>> or not.  Installation may fail for a few reasons:
>>
>>  1. Conflicting ukey.
>>  2. Mutex contention while trying to replace existing ukey.
>>  3. The same ukey already exists and active.
>>
>> Only the first case here signals an actual problem.  Third one is
>> a little odd for userspace datapath, but harmless.  Second is the
>> most common one that can easily happen during normal operation
>> since other threads like revalidators may be currently working on
>> this ukey preventing an immediate access.
>>
>> Since only the first case is actually worth logging and it already
>> has its own log message, removing the 'upcall installation fails'
>> warning from the upcall_cb().  This should fix most of the random
>> failures of userspace system tests in CI.
>>
>> While at it, also fixing coverage counters.  Mutex contention was
>> mistakenly counted as a duplicate upcall.  ukey contention for
>> revalidators was counted only in one of two places.
>>
>> New counter added for the ukey contention on replace.  We should
>> not re-use existing upcall_ukey_contention counter for this, since
>> it may lead to double counting.
>>
>> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.")
>> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.")
>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>> ---
> 
> Thanks for looking into this, and the patch looks good to me.
> 
> Maybe we should have another patch fixing some of the namings?
> 
>   upcall_ukey_replace -> ukey_replace
>   handler_duplicate_upcall -> duplicate_upcall
>   upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention

I had something similar in mind, but I didn't make any radical
renaming since I plan to backport this change.

One more thing is that we would likely want to distinguish
contention in handlers/pmds from contention in revalidators,
so these will need separate counters, I think.

> 
> Cheers,
> 
> Eelco
> 
> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>
Eelco Chaudron April 5, 2024, 1:46 p.m. UTC | #3
On 5 Apr 2024, at 15:04, Ilya Maximets wrote:

> On 4/5/24 14:33, Eelco Chaudron wrote:
>>
>>
>> On 4 Apr 2024, at 14:09, Ilya Maximets wrote:
>>
>>> ukey_install() returns boolean signaling if the ukey was installed
>>> or not.  Installation may fail for a few reasons:
>>>
>>>  1. Conflicting ukey.
>>>  2. Mutex contention while trying to replace existing ukey.
>>>  3. The same ukey already exists and active.
>>>
>>> Only the first case here signals an actual problem.  Third one is
>>> a little odd for userspace datapath, but harmless.  Second is the
>>> most common one that can easily happen during normal operation
>>> since other threads like revalidators may be currently working on
>>> this ukey preventing an immediate access.
>>>
>>> Since only the first case is actually worth logging and it already
>>> has its own log message, removing the 'upcall installation fails'
>>> warning from the upcall_cb().  This should fix most of the random
>>> failures of userspace system tests in CI.
>>>
>>> While at it, also fixing coverage counters.  Mutex contention was
>>> mistakenly counted as a duplicate upcall.  ukey contention for
>>> revalidators was counted only in one of two places.
>>>
>>> New counter added for the ukey contention on replace.  We should
>>> not re-use existing upcall_ukey_contention counter for this, since
>>> it may lead to double counting.
>>>
>>> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.")
>>> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.")
>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>> ---
>>
>> Thanks for looking into this, and the patch looks good to me.
>>
>> Maybe we should have another patch fixing some of the namings?
>>
>>   upcall_ukey_replace -> ukey_replace
>>   handler_duplicate_upcall -> duplicate_upcall
>>   upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention
>
> I had something similar in mind, but I didn't make any radical
> renaming since I plan to backport this change.
>
> One more thing is that we would likely want to distinguish
> contention in handlers/pmds from contention in revalidators,
> so these will need separate counters, I think.

Ack but all can call the update functions ;) We can follow this up after the backport…

>>
>> Cheers,
>>
>> Eelco
>>
>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>
Ilya Maximets April 5, 2024, 11:32 p.m. UTC | #4
On 4/5/24 15:46, Eelco Chaudron wrote:
> 
> 
> On 5 Apr 2024, at 15:04, Ilya Maximets wrote:
> 
>> On 4/5/24 14:33, Eelco Chaudron wrote:
>>>
>>>
>>> On 4 Apr 2024, at 14:09, Ilya Maximets wrote:
>>>
>>>> ukey_install() returns boolean signaling if the ukey was installed
>>>> or not.  Installation may fail for a few reasons:
>>>>
>>>>  1. Conflicting ukey.
>>>>  2. Mutex contention while trying to replace existing ukey.
>>>>  3. The same ukey already exists and active.
>>>>
>>>> Only the first case here signals an actual problem.  Third one is
>>>> a little odd for userspace datapath, but harmless.  Second is the
>>>> most common one that can easily happen during normal operation
>>>> since other threads like revalidators may be currently working on
>>>> this ukey preventing an immediate access.
>>>>
>>>> Since only the first case is actually worth logging and it already
>>>> has its own log message, removing the 'upcall installation fails'
>>>> warning from the upcall_cb().  This should fix most of the random
>>>> failures of userspace system tests in CI.
>>>>
>>>> While at it, also fixing coverage counters.  Mutex contention was
>>>> mistakenly counted as a duplicate upcall.  ukey contention for
>>>> revalidators was counted only in one of two places.
>>>>
>>>> New counter added for the ukey contention on replace.  We should
>>>> not re-use existing upcall_ukey_contention counter for this, since
>>>> it may lead to double counting.
>>>>
>>>> Fixes: 67f08985d769 ("upcall: Replace ukeys for deleted flows.")
>>>> Fixes: 9cec8274ed9a ("ofproto-dpif-upcall: Add VLOG_WARN_RL logs for upcall_cb() error.")
>>>> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
>>>> ---
>>>
>>> Thanks for looking into this, and the patch looks good to me.
>>>
>>> Maybe we should have another patch fixing some of the namings?
>>>
>>>   upcall_ukey_replace -> ukey_replace
>>>   handler_duplicate_upcall -> duplicate_upcall
>>>   upcall_ukey_contention -> revalidate_ukey_contention -> ukey_contention
>>
>> I had something similar in mind, but I didn't make any radical
>> renaming since I plan to backport this change.
>>
>> One more thing is that we would likely want to distinguish
>> contention in handlers/pmds from contention in revalidators,
>> so these will need separate counters, I think.
> 
> Ack but all can call the update functions ;) We can follow this up after the backport…

Sure.

> 
>>>
>>> Cheers,
>>>
>>> Eelco
>>>
>>> Acked-by: Eelco Chaudron <echaudro@redhat.com>
>>>
> 

Thanks!  Applied and backported down to 2.17.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index d88195636..73901b651 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -59,6 +59,7 @@  COVERAGE_DEFINE(handler_duplicate_upcall);
 COVERAGE_DEFINE(revalidate_missed_dp_flow);
 COVERAGE_DEFINE(ukey_dp_change);
 COVERAGE_DEFINE(ukey_invalid_stat_reset);
+COVERAGE_DEFINE(ukey_replace_contention);
 COVERAGE_DEFINE(upcall_flow_limit_grew);
 COVERAGE_DEFINE(upcall_flow_limit_hit);
 COVERAGE_DEFINE(upcall_flow_limit_kill);
@@ -1449,8 +1450,6 @@  upcall_cb(const struct dp_packet *packet, const struct flow *flow, ovs_u128 *ufi
     }
 
     if (upcall.ukey && !ukey_install(udpif, upcall.ukey)) {
-        static struct vlog_rate_limit rll = VLOG_RATE_LIMIT_INIT(1, 1);
-        VLOG_WARN_RL(&rll, "upcall_cb failure: ukey installation fails");
         error = ENOSPC;
     }
 out:
@@ -1948,15 +1947,15 @@  try_ukey_replace(struct umap *umap, struct udpif_key *old_ukey,
             transition_ukey(old_ukey, UKEY_DELETED);
             transition_ukey(new_ukey, UKEY_VISIBLE);
             replaced = true;
+            COVERAGE_INC(upcall_ukey_replace);
+        } else {
+            COVERAGE_INC(handler_duplicate_upcall);
         }
         ovs_mutex_unlock(&old_ukey->mutex);
-    }
-
-    if (replaced) {
-        COVERAGE_INC(upcall_ukey_replace);
     } else {
-        COVERAGE_INC(handler_duplicate_upcall);
+        COVERAGE_INC(ukey_replace_contention);
     }
+
     return replaced;
 }
 
@@ -3009,6 +3008,7 @@  revalidator_sweep__(struct revalidator *revalidator, bool purge)
             /* Handler threads could be holding a ukey lock while it installs a
              * new flow, so don't hang around waiting for access to it. */
             if (ovs_mutex_trylock(&ukey->mutex)) {
+                COVERAGE_INC(upcall_ukey_contention);
                 continue;
             }
             ukey_state = ukey->state;