diff mbox series

[ovs-dev] netlink-socket: Make log extack messages as ERR rather than DBG

Message ID 20220124092523.58669-1-abhiramrn@gmail.com
State Rejected
Headers show
Series [ovs-dev] netlink-socket: Make log extack messages as ERR rather than DBG | expand

Commit Message

Abhiram RN Jan. 24, 2022, 9:25 a.m. UTC
Making the log extack messages as ERR rather than DBG will help debugging
when we dont know for sure where is the issue. If it is DBG these useful
log extack messages might get lost due to excessive logs. Also these
error messages will be printed only when we enable it.

netlink_socket(revalidator31)|DBG|Dropped 25 log messages in last 1 seconds
 (most recently, 1 seconds ago) due to excessive rate

By checking for the log extack err_msg we are printing error message only
when there is a valid log extack message present and other cases it will
be still kept as DBG so that there is no spamming.

Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
---
 lib/netlink-socket.c  | 10 ++++++++--
 tests/ofproto-dpif.at |  2 +-
 tests/ovs-vswitchd.at |  5 +++--
 3 files changed, 12 insertions(+), 5 deletions(-)

Comments

Ilya Maximets Feb. 2, 2022, 4:42 p.m. UTC | #1
On 1/24/22 10:25, Abhiram R N wrote:
> Making the log extack messages as ERR rather than DBG will help debugging
> when we dont know for sure where is the issue. If it is DBG these useful
> log extack messages might get lost due to excessive logs. Also these
> error messages will be printed only when we enable it.
> 
> netlink_socket(revalidator31)|DBG|Dropped 25 log messages in last 1 seconds
>  (most recently, 1 seconds ago) due to excessive rate
> 
> By checking for the log extack err_msg we are printing error message only
> when there is a valid log extack message present and other cases it will
> be still kept as DBG so that there is no spamming.

Hi, Abhiram.  Thanks for the patch!

The problem with this implementation, however, is that we're getting
a lot of errors in logs during the normal run.  For example, all the
system tests (make check-kernel) are failing for me due to unexpected
erorrs in the log.

OVS does a lot of probing to figure out what features are supported
by the underlying datapath and these may/will produce all sorts of
errors.  I also think that we will have more and more extended error
messages inside the kernel, so errors will be reported even more
frequently.  This makes testing harder and will trigger the monitoring
tools in production environments.  So, I'm not sure how the good
solution should look like.

For the rate limiting though, there is a solution.  You may disable
it for this particular module with the vlog/disable-rate-limit appctl.

Best regards, Ilya Maximets.
Abhiram RN Feb. 3, 2022, 7 a.m. UTC | #2
Hi Ilya,

Thanks for your response.

I understand the problem you have mentioned.

Also for the rate limiting atleast we have solution as you have pointed
out which can be helpful.

Thanks & Regards,
Abhiram R N

On Wed, Feb 02, 2022 at 05:42:29PM +0100, Ilya Maximets wrote:
> On 1/24/22 10:25, Abhiram R N wrote:
> > Making the log extack messages as ERR rather than DBG will help debugging
> > when we dont know for sure where is the issue. If it is DBG these useful
> > log extack messages might get lost due to excessive logs. Also these
> > error messages will be printed only when we enable it.
> > 
> > netlink_socket(revalidator31)|DBG|Dropped 25 log messages in last 1 seconds
> >  (most recently, 1 seconds ago) due to excessive rate
> > 
> > By checking for the log extack err_msg we are printing error message only
> > when there is a valid log extack message present and other cases it will
> > be still kept as DBG so that there is no spamming.
> 
> Hi, Abhiram.  Thanks for the patch!
> 
> The problem with this implementation, however, is that we're getting
> a lot of errors in logs during the normal run.  For example, all the
> system tests (make check-kernel) are failing for me due to unexpected
> erorrs in the log.
> 
> OVS does a lot of probing to figure out what features are supported
> by the underlying datapath and these may/will produce all sorts of
> errors.  I also think that we will have more and more extended error
> messages inside the kernel, so errors will be reported even more
> frequently.  This makes testing harder and will trigger the monitoring
> tools in production environments.  So, I'm not sure how the good
> solution should look like.
> 
> For the rate limiting though, there is a solution.  You may disable
> it for this particular module with the vlog/disable-rate-limit appctl.
> 
> Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 93c1fa561..435a5945f 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -920,9 +920,15 @@  nl_sock_transact_multiple__(struct nl_sock *sock,
         /* Fill in the results for 'txn'. */
         if (nl_msg_nlmsgerr(buf_txn->reply, &txn->error, &err_msg)) {
             if (txn->error) {
-                VLOG_DBG_RL(&rl, "received NAK error=%d - %s",
+                if (err_msg != NULL) {
+                    VLOG_ERR_RL(&rl, "received NAK error=%d - %s",
                             txn->error,
-                            err_msg ? err_msg : ovs_strerror(txn->error));
+                            err_msg);
+                } else {
+                    VLOG_DBG_RL(&rl, "received NAK error=%d - %s",
+                            txn->error,
+                            ovs_strerror(txn->error));
+                }
             }
             if (txn->reply) {
                 ofpbuf_clear(txn->reply);
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 1660b0856..5e7a4f125 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -6542,7 +6542,7 @@  m4_define([CHECK_SFLOW_SAMPLING_PACKET],
   dnl sleep long enough to get more than one counter sample
   dnl from each datasource so we can check sequence numbers
   ovs-appctl time/warp 2000 100
-  OVS_VSWITCHD_STOP
+  OVS_VSWITCHD_STOP("/Unknown device type/d")
   OVS_APP_EXIT_AND_WAIT([test-sflow])
 
   AT_CHECK_UNQUOTED([[sort sflow.log | $EGREP 'HEADER|ERROR' | sed 's/ /\
diff --git a/tests/ovs-vswitchd.at b/tests/ovs-vswitchd.at
index bba4fea2b..95a12f609 100644
--- a/tests/ovs-vswitchd.at
+++ b/tests/ovs-vswitchd.at
@@ -117,10 +117,11 @@  bridge br0: using datapath ID
 OVS_APP_EXIT_AND_WAIT_BY_TARGET(["`pwd`"/unixctl2], [ovs-vswitchd-2.pid])
 
 # check the fakelog, should not see WARN/ERR/EMER log other than the one
-# for reporting the existing ovs-vswitchd process and the one for killing
-# the process.
+# for reporting the existing ovs-vswitchd process , the one for killing
+# the process and a log extack message saying NAK error.
 AT_CHECK([sed -n "
 /|ERR|another ovs-vswitchd process is running/d
+/|ERR|received NAK error/d
 /|WARN|/p
 /|ERR|/p
 /|EMER|/p" fakelog