diff mbox series

[net] net: sched: reduce amount of log messages in act_mirred

Message ID a59f92670c72db738d91b639ecc72ef8daf69300.1585866258.git.marcelo.leitner@gmail.com
State Changes Requested
Delegated to: David Miller
Headers show
Series [net] net: sched: reduce amount of log messages in act_mirred | expand

Commit Message

Marcelo Ricardo Leitner April 2, 2020, 10:26 p.m. UTC
OVS bridge is usually left down. When using OVS offload, then, it is
quite common to trigger this message. Some cards, for example, can't
offload broadcasts because they can't output to more than 2 ports.
Due to this, act_mirred will try to output to the OVS bridge itself,
which is often down, and floods the log. (yes, the ratelimit is not
enough)

As act_mirred is already incrementing the overlimit counter for each
drop, there is no need to keep flooding the logs here. Lets log it once,
warn the sysadmin, and let the counters do the rest.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sched/act_mirred.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

David Miller April 3, 2020, 1:04 a.m. UTC | #1
From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Date: Thu,  2 Apr 2020 19:26:12 -0300

> @@ -245,8 +245,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>  	}
>  
>  	if (unlikely(!(dev->flags & IFF_UP))) {
> -		net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
> -				       dev->name);
> +		pr_notice_once("tc mirred: device %s is down\n",
> +			       dev->name);

This reduction is too extreme.

If someone causes this problem, reconfigures everything thinking that the
problem will be fixed, they won't see this message the second time and
mistakenly think it's working.
Marcelo Ricardo Leitner April 3, 2020, 1:14 a.m. UTC | #2
On Thu, Apr 02, 2020 at 06:04:17PM -0700, David Miller wrote:
> From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> Date: Thu,  2 Apr 2020 19:26:12 -0300
> 
> > @@ -245,8 +245,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
> >  	}
> >  
> >  	if (unlikely(!(dev->flags & IFF_UP))) {
> > -		net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
> > -				       dev->name);
> > +		pr_notice_once("tc mirred: device %s is down\n",
> > +			       dev->name);
> 
> This reduction is too extreme.
> 
> If someone causes this problem, reconfigures everything thinking that the
> problem will be fixed, they won't see this message the second time and
> mistakenly think it's working.

Fair point. Then what about removing it entirely? printk's are not the
best way to debug packet drops anyway and the action already registers
the drops in its stats.

Or perhaps a marker in the message, stating that it is logged only
once per boot. I'm leaning to the one above, to just remove it.
Cong Wang April 3, 2020, 4:43 a.m. UTC | #3
On Thu, Apr 2, 2020 at 6:14 PM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Thu, Apr 02, 2020 at 06:04:17PM -0700, David Miller wrote:
> > From: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> > Date: Thu,  2 Apr 2020 19:26:12 -0300
> >
> > > @@ -245,8 +245,8 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
> > >     }
> > >
> > >     if (unlikely(!(dev->flags & IFF_UP))) {
> > > -           net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
> > > -                                  dev->name);
> > > +           pr_notice_once("tc mirred: device %s is down\n",
> > > +                          dev->name);
> >
> > This reduction is too extreme.
> >
> > If someone causes this problem, reconfigures everything thinking that the
> > problem will be fixed, they won't see this message the second time and
> > mistakenly think it's working.
>
> Fair point. Then what about removing it entirely? printk's are not the
> best way to debug packet drops anyway and the action already registers
> the drops in its stats.
>
> Or perhaps a marker in the message, stating that it is logged only
> once per boot. I'm leaning to the one above, to just remove it.

I think the reason why we print that is we do not handle
NETDEV_DOWN event in mirred_device_event() or check IFF_UP
in tcf_mirred_init(). I think if we can do both, we can remove
this message entirely. I am not sure whether the latter would break
existing expectations, as users may want to add a down device
as a target and bring it up afterward.

Thanks.
diff mbox series

Patch

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 83dd82fc9f40ce800b99eae5c0b279dce5b2c1c9..bd1e2c98aaaefc689e52840b9be53ef9de4dd86d 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -245,8 +245,8 @@  static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
 	}
 
 	if (unlikely(!(dev->flags & IFF_UP))) {
-		net_notice_ratelimited("tc mirred to Houston: device %s is down\n",
-				       dev->name);
+		pr_notice_once("tc mirred: device %s is down\n",
+			       dev->name);
 		goto out;
 	}